Hello, everyone, my name is Qin Jianxiang, a folk inventor. I have applied for more than 20 invention patents. When I heard this number, some friends thought it was amazing, but they didn’t understand how these inventions are related to our daily life. 's patent. In fact, there are 100 innovation points in my brain that can be applied for patents, but the time and money cost of applying for patents are too high, so I can only apply for some high-level innovations with high commercial value.
In order to eliminate everyone's sense of mystery about patents, I plan to write an article about popular science. Exactly, on February 23, I received a notice of acceptance from the State Patent Office for a patent titled "Key Protection Method, Device, Equipment and Storage Medium Based on HMAC Algorithm". This patent and Taobao's 1.1 billion customers It is related to the shocking case of mobile phone number theft. The vulnerabilities I proposed in the patent are common in all walks of life, and many victims are being attacked without knowing it, and the protection scheme I propose is very general.
I would also like to invite readers and friends to repost it, so that the industry can benefit as soon as possible. If the patent is successfully authorized, I will authorize the public industries such as national defense/government affairs/scientific research/education to use it for free. Readers and friends who have contributed to the dissemination of this article can also apply Free license (can be used for commercial purposes).
It was a dark and windy night, and the dark room was filled with the smell of mutton soup. A young man who looked like a college student was sitting in front of the computer. Hundreds of millions of mobile phone numbers appeared strangely on the computer screen covered with fingerprints. Thousands of kilometers away in the city that never sleeps, Taobao online shoppers have received unfamiliar WeChat friend applications one after another, and then they have been pulled into a WeChat group to receive coupons for shopping. The closed loop of realization is formed.
What is the mysterious force that makes the users of China's two Internet giants have a magical intersection? How did the mobile phone number complete this incredible time and space journey? Why did the pride of the sky take the risk? Please watch this issue of "Approaching Science", ah, no, wrong, wrong hand shake, please watch this issue of "Amazing Little Inventions", haha
1. The door lock is lost, and the tiger eats the lamb
When it comes to obtaining hundreds of millions of mobile phone numbers, one of the most easily thought of means is crawling. In people's common sense, crawler is a high-tech tool that only Baidu and Google have. It is true that Baidu and Google need to include tens of billions of web pages on the entire Internet, and also ensure a certain real-time performance. The technical difficulty is indeed very large, and the research and development costs are in units of 100 million.
However, the crawler that grabs the mobile phone number of Taobao users does not need such a high technical content, because Taobao has an official mobile phone client, and Taobao provides an interface (ie API) for its own mobile client. It is far less difficult to take than a web page:
One is simplicity. Web pages contain not only data information, but also typesetting information. To extract data from web pages, some complex rules and algorithms are required.
The second is stability. Web pages may be revised every month, and the original rules and algorithms for extracting data will need to be rewritten. APIs, on the other hand, are very stable, usually being defined once and not changing for several years.
Therefore, by grabbing the mobile phone number of Taobao users through the API, the mobile phone number can be obtained directly without writing complex rules, and the crawler development can be used for a long time once.
In the eyes of hackers, Taobao's billions of users are thousands of fat sheep in the sheepfold, and the next step is to find a way to break through the sheepfold. High-value user information must have a very high security level. Taobao adopts the international mainstream HMAC technology to identify whether the calling interface is an official client or an unauthorized third-party crawler. The H of HMAC technology stands for hash, and the Chinese translation is hash. It is the cornerstone technology widely used in the banking and financial industry around the world. The most popular blockchain today also relies on hashing to ensure the credibility of data.
Sounds like Taobao's user base is as safe as the safe in a bank vault in a spy war blockbuster? Unfortunately, Taobao put the key used by the HMAC algorithm in the HTTP COOKIE in plaintext (technical details will be discussed later), which is like inserting the key of the safe into the door. The door was wide open, and tigers entered the flock. A person with a bachelor's degree in education alone stole the privacy of 1.18 billion users' mobile phone numbers and user names from Taobao's official interface.
This is exactly:
The Pengmen is now open for you,
The flower path has never been swept away.
Several soap carvings chase Ziyan,
A group of tigers eating lambs.
2. Skynet is sloppy, and you will be caught if you stretch out your hand
In June 2021, the China Judgment Documents Network published a verdict titled "Criminal Judgment of the First Instance of Lu Mou and Li Mou Infringing Citizens' Personal Information", numbered (2021) Yu 1403 Xing Chu No. 78. The truth emerges. Details can be found in this report from Tencent: exceeds 1.1 billion! Your Taobao information may have been leaked
In light of this judgment, we briefly review the main points:
timeline
- The perpetrator started the crime in September 2019
- Taobao risk control personnel found out in July 2020, reported the case in August, and the suspect was arrested in the same month
- First-instance judgment in June 2021.
loss
- The criminals lost their freedom and Taobao lost their goodwill.
- The two criminals were each sentenced to more than three years in prison, the illegal gains were recovered, and a fine of several hundred thousand yuan was imposed.
Number of stolen users
- Taobao reported that there were 35 million (only the data monitored by risk control personnel between July 6 and July 13, 2020)
- The judicial authorities seized and identified 1.18 billion, Taobao confirmed the authenticity of the 1.18 billion data after sampling and verification
- The suspect argued that only tens of millions were caught, and the remaining 1.1 billion was downloaded elsewhere, but the court did not accept it.
technical means of committing crimes
- Captured through the official interface of mobile Taobao, the mtop referred to in the judgment is mobile taobao open platform, an interface platform dedicated to mobile Taobao
- Different from Taobao Open Platform (TOP), the interface platform of mobile Taobao is privately owned by Taobao, and only brother companies and very few partners can apply for access rights. Taobao reported that the suspects illegally bypassed the risk control mechanism, proving that they did not have the access rights legally applied for, presumably through Taobao’s “key inserted in the door” disguised as Taobao’s official client to capture data
Legal risks of developing crawlers
There are a few jokes in our Internet industry: crawlers play well and eat all they can; "Python crawlers: from entry to jail".
The purpose of my popular science is to improve everyone's safety awareness and legal awareness, and to do some legal popularization work while sharing safety protection technology.
The high-risk areas where reptiles violate the law are:
- Capture a large number of personal privacy sales for profit. The charge is to infringe on citizens' personal information, which is a criminal offense. Illegal acquisition/sale of 50 pieces of personal privacy can be sentenced. In this case, the number of pieces of information of the citizens involved exceeded the standard by more than 10 times, triggering laws and regulations with particularly serious circumstances, so the sentence also reached more than 3 years.
- The crawler accesses too frequently, causing the other party's server to crash. The crime is to destroy the computer information system, which is a criminal offense. The one-hour downtime of the system serving 10,000 people is considered serious. For example, , a case where both the CTO and the programmer were sentenced
: A company grabbed the city's residence permit system, causing the server to crash, and several government systems were interrupted. The CTO and the programmer were sentenced to 3 years and 1 year respectively. - Scraping peer data for commercial purposes. It constitutes unfair competition and is a civil tort. For example, this case reported by the China Court Network: Public Dianping v. Baidu in the first instance sentenced Baidu to pay 3.23 million
Friends with programming skills, must learn more about legal common sense, maintain the original intention of technology to benefit the society, do not superstitious "safe haven principle", don't stretch out your hand, you will be caught if you stretch out your hand.
3. Where is the ant's nest, the embankment of a thousand miles collapses
As early as 2015, I reported the security vulnerability of HMAC key plaintext in cookie to Taobao through the Wuyun Vulnerability Platform, but Taobao thought that it was not a key, and the hazard level was low. I think this is app secret, Taobao thinks this is just a token, this is just a difference in the wording, even if you name it noise, it does not change the fact that it plays the role of HMAC algorithm secret 100% in the program.
This practice of putting the plaintext key in the communication message is equivalent to giving the attacker a permanent and effective access right, resulting in the interface being used by the attacker to capture data. Taobao's technical team believes that it is essentially following Taobao to capture others. data does not differ. For the detailed dialogue process, please refer to this I posted on September 29, 2015 by
At that time, this incident also attracted attention on Ali’s intranet, but for some reason, Taobao insisted on using this apparently unsafe and unprofessional solution. Even if the privacy of 1.1 billion consumers was stolen, it failed to attract Taobao’s attention. To this day (March 1, 2022), Taobao is still using the solution that I found a security vulnerability 6 years ago. The evidence is as follows:
Dear Officer, the lift is running with huge safety risk.
No, there is no lift, this is only elevator.
Taboo disease and avoid medical treatment, and eventually lead to the collapse of the embankment, sighing with embarrassment.
4. It's not too late to make up for lost sheep
Friends who provide Internet services must regularly review their security system carefully. The more security protection plans, the better. The fact that a company at the level of Taobao has suffered such a big security incident shows that security is not something that can be done easily.
I hope Ali's friends can help me forward this article to the relevant technical team of Ali, and close the door that has been open for 7 years as soon as possible. It is not too late.
Of course, the mobile phone numbers of 1.1 billion consumers are so easy to be stolen. The error is not only the plaintext key. The API gateway lacks excellent current and frequency limiting and privacy filtering functions. This is also quite difficult to do. I have invested in an API security gateway in the past few years. I have served central enterprises and small startups. I have encountered many technical challenges and applied for many patents. I have tasted the ups and downs.
Tech-savvy readers and friends may think that it is safe to hide the key in the js code, encode and encrypt it several times, and then obfuscate the code? In fact, it is not. Due to the particularity of the browser operating environment and JavaScript syntax, the JavaScript on the web page is interpreted and executed, and can be rewritten and covered. JavaScript cannot hide the real secret. I only used one line of code, and I found out the Taobao key in a few seconds. With the key, I found that it was actually put in the cookie in plain text. As you can see in the two screenshots above, I have the key printed to the console or a warning.
The Android client developed with kotlin may also have similar risks. I have not tested it yet, and it is expected to be more difficult to crack than the webpage.
The signature verification mechanism based on md5 or sha series hash functions is widely used in the Internet, which can be seen from the signature algorithm documents of major open platforms. Most web versions can crack the key in seconds with a single line of code, no matter how deep you hide the key.
With this in mind, I invented a method in 2015 that can truly prevent theft of HMAC keys on the web, and successfully use it on all of my company's products. Until 2022, it was found that there was still no similar technology in the industry, and a patent was applied. The core of this patent is to protect the key security, the secret lies in the following points:
Claim 1: Use a Decayed Key
Hash algorithms are all digest algorithms, and there is information attenuation from input to output, which cannot be reversed. Usually, when you download a software of several hundred megabytes, the hash (also called check sum) used to verify the integrity of the downloaded file is only 16 to 32 bytes, which shows how big the attenuation coefficient is.
We can perform a part of the attenuation operation on the plaintext key in advance, and then put the attenuated key in the JavaScript code of the web page, and implement a variant hash function in the JavaScript code of the web page, so that it uses the attenuated key to calculate produces a result compatible with standard hash functions. This way, the attacker can't steal the real plaintext key from the web page, because you can't steal something that doesn't exist.
If the attacker obtains the decayed key, the standard hash function cannot calculate the correct signature, and cannot crack the HMAC security mechanism.
This is claim 1 of this patent (the patent law stipulates that claim 1 is the cornerstone, and claims 2, 3, and 4 are all based on claim 1).
This is a popular science article. I try to make it as simple and easy to understand as possible. I will not explain the detailed technical details. I wrote the proof process in the patent technology disclosure book. If you are interested in reading it, let me know in the comments and I will post it later. come up.
Claim 2: Protected variant hash function
Through code obfuscation, the readability of the obfuscated code is greatly reduced, and the time cost for the attacker to read the source code of the variant hash function is greatly increased.
In fact, even if it is not obfuscated, the source code of the extremely elegantly written md5 and sha256 functions is very complicated. I feel that only a handful of technicians who have been sentenced for crawling can read and understand the md5 source code.
In addition, the obfuscated hash function code can also be compiled into a web assembly binary product, and the attacker can only decompile and read it, which increases the difficulty by an order of magnitude.
Claim 3: Regular automatic replacement
Write an automated program, execute it regularly, change the key, and the algorithm of the variant hash function, the replacement time interval is far less than the time for the attacker to crack, even if they crack it, it is useless.
5. Put down the ecstasy array, please come into the urn
On the premise that crawler behavior or attack behavior can be detected, it is not necessarily the best choice to use offensive and defensive technology to carry it hard at the first time. Soldiers, tricksters, there are still some methods in the industry to invite you to enter the urn:
honey jar
After detecting an attack, it does not prompt error codes and warning messages, but deliberately sells it to the flaws and feeds the attackers with some fake data with little commercial value, which makes them reluctant and no longer improve their technology to attack the really sensitive places. .
poison pill
If the public interface is captured, in order to ensure that the normal user service is not affected, after identifying the illegal crawler, bury the poison pill in the data: return mixed data to the crawler, so that it can be blamed by leaders or black buyers for the data Poor quality and unsustainable business.
pig raising
If the other party's monetization mode is illegal plug-in, when it is first detected, it will not be alarmed. After the other party has developed a certain number of charging customers, it will be blocked in one fell swoop.
fishing
If the other party's monetization mode is to sell the captured data, the capture channel will not be blocked for the first time, but the evidence will be collected, which meets the criminal case filing standard before resorting to the law.
If you like this article, in addition to forwarding it, you can also scan the QR code below to follow our WeChat public account: Code Dog Says Code
My colleagues will write articles with me there. They are all popular sciences in the field of information security/software technology/smart hardware. They are easy to understand and have technical content. Welcome everyone to communicate.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。