The thing is Jiangzi.
That morning, I easily completed the development of a new feature. I originally planned to have a meal at noon, a nice lunch, a run test in the afternoon, fish, and paddling. A pleasant day passed.
Just as I was so proud of my perfect plan, a message flashed through the WeChat group that someone ait me.
What a special thing! ! !
at this time! Someone ait me! ! !
I was not happy then. At me, who named me by name, it's okay to specify.
I opened the phone and saw that, sure enough, there was an accident. And it seems that things are more urgent and serious.
Let me give you a brief description of what happened.
We (cough cough, please remove the word "we", it's you, don't think about throwing the pot) developed a function that can collect the hardware indicators of the host. One of them is to obtain the IPV6 address, and then there is a problem when the IPV6 address is obtained. The address starting with fe80 should not be obtained (the address starting with fe80 is similar to the IPV4 192.168, which is the address of the local area network. This should be passed. ), but it has become a mess of numbers.
This was the situation at the time. The customer submitted an upgrade order, and they were afraid of problems, so they upgraded two thousand devices first, and planned to upgrade the remaining twenty thousand devices that night.
Then when it was almost noon, I found that among the two thousand stations, there were a hundred of them, and the addresses they got were wrong.
If you feel that I am long-winded, please ask my master to translate.
To put it simply, the customer found a problem with the program at noon, but because all production environments have to be upgraded at night, they must be dealt with in the afternoon. Can't delay their upgrade at night. This is the autumn of life and death.
What's the situation?
What a dumbfounded laborer!
No way, who told us to eat this bowl of rice? Ever since, I simply cleaned up, and I didn’t even care about the afternoon nap, put my small blue schoolbag on my back, got on the bus, and rushed directly to the customer site.
At that time, I didn't know anything in my heart, and we never encountered this problem, so I quickly ran through the code logic at that time to see if I could find a place where I could dump the pot. Well, test it! The test must be untested! Go online with questions!
Ever since:
I was told clearly that the test did not find any problems.
This is embarrassing. No problems were found in the company environment, and no problems were found in the pit environment (Note: The pit environment is a set of machines provided by the customer that is infinitely close to the production environment). In other words, this problem seems to be unique to the production environment.
This is so much. . .
What the hell is this?
There is no way, let's go see it on the spot.
After the lengthy and complicated admission procedures for the customer park, I finally came to the production room. The customer found two problematic machines and said, you can debug! Then, regardless of me, he went to work on his own.
Everyone who has played with the production machine should understand that you must not do what the customer is not allowed to do, and don't think about deleting the library and running away, and then take a boat to Singapore, Vietnam, and Cambodia.
Especially when I got the root
permission. Customers are even more cautious, even restarting the program requires their consent. Don't even think about dangerous commands.
That's a little bit of debugging.
But I can't persuade me at this time. Whether the old man I can keep his fame depends on this.
First of all, I emphasize to customers that this problem is a rare occurrence in a century, and it is certainly not so easy to solve. Even if the root cause can be found this afternoon, I will have to modify the code, test, and upgrade at night.
After this vaccination, the customer also knows that the problem is thorny, so they are prepared. Say, you check the reason first, if it is really difficult to solve, postpone the upgrade tonight, and you can get it done before Friday.
So I can do it boldly.
Let me talk about the logic of our program to obtain IPV6.
Our program is developed in C language. In order to support cross-platform, we have introduced a third-party library sigar The sigar
library is specifically used to obtain machine hardware indicators, and it is implemented for different platform operating systems.
Under Linux
, the logic to obtain IPV6 is to read the configuration file of /proc/net/if_net6
The first item of this file is the address of IPV6
Simply put, getting ipv6
of reading the content of the specified string in the file. It sounds like a simple thing. What can go wrong with this logic?sigar
Curry This logic src/os/linux/linux_sigar.c
in sigar_net_interface_ipv6_config_get
function. Of course we made some adjustments, and the final look is this:
while (fscanf(fp, "%32s %02x %02x %02x %02x %16s\n",
addr, &idx, &prefix, &scope, &flags, ifname) != EOF)
{
if (strEQ(name, ifname) && addr != strstr(addr, "fe80")) {
status = SIGAR_OK;
break;
}
}
/proc/net/if_net6
talk about the file 060dff71297ba4. The interface and unicast address are stored in this file. The internal format is as follows:
addr (32 bit) | if_index (generally 2 hexadecimal numbers) | prefix (generally 2 hexadecimal digits) | scope (generally 2 hexadecimal numbers) | flags (generally 2 hexadecimal numbers) | ifname (16 bits) |
---|---|---|---|---|---|
ipv6 address | Interface ID | Prefix length | Address scope | Flag bit | NIC name |
Looking at it this way, there seems to be no problem with the above code analysis.
But wait, what the hell is this?
I smelled a tinge of crime, although I can't say what's wrong, but I feel that there will be a problem here. The second place is if_index. Why are there so many three places?
A three-digit string, but use %02x
, this will cause problems.
However, why are there 3 digits?
I don't know. No way, sacrifice Baidu!
Ran Goose, on Baidu, looks like this:
As we all know, 80% of the technical articles of the Great Heavenly Dynasty were hijacked by a website that claims to eat (C) shit (S) are (D) difficult (N). As long as you search, the same website will come out. Links, and the high content repetition rate, low gold content, is astounding.
Since Du Niang couldn't help me, I could only seek help from Gu's wife.
but!
That's it, what can I do?
It suddenly occurred to me that one of my high school classmates was doing overseas bidding and became the boss of his own business. Why don't we try a wave of prostitution?
Ok. . . It's still cumbersome to get it, but at any rate it got it.
In a Google tour, most of them only tell you what the if_net6
file means, but didn't tell me how if_index
should be.
Then I found the definition if_index
if.h
struct if_nameindex
{
unsigned int if_index; /* 1, 2, ... */
char *if_name; /* null terminated name: "eth0", ... */
};
int
type, theoretically there can be 8 digits, sweat~
Got it, no matter. In order to verify that my conjecture is if_index
, it is indeed caused by three of 060dff71297e1f. I asked my test colleagues to find the three-digit test in the if_net6
Then is the process of happily modifying the code. We don’t care if there are some or not, what bit of digits are used, once the code is hard coded, the next time there is a change or a problem, it is better to fix it all at once:
while (fscanf(fp, "%s %s %s %s %s %s\n",
addr, idx, prefix, scope, flags, ifname) != EOF)
{
if (strEQ(name, ifname) && addr != strstr(addr, "fe80")) {
status = SIGAR_OK;
break;
}
}
In fact, when the fscanf
function handles this kind of fixed-width parsing, if the string has three digits, but only two digits are parsed, the third digit will not be discarded, but the third digit will be squeezed into the next field.
for example.
Recording the second image above, if_index
is 321
, code logic processing according to the above, will 32
as if_index
, remaining 1
as prefix
, 40
as scope
, 20
bit flags
, 80
is ifname
, veth1ffa1a8
originally name card, It will consider it to be the ipv6
address of the second article, causing all the following to be misplaced.
Ok. . . The next step is to change the code, re-release the patch, and let the customer update the test.
Although the process is tortuous, the result is good.
At this point, the story has been drawn into one paragraph, but it was a pity that it took my nap.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。