Cryptography series: Merkle--Damgård structure and length extension attack



The Merkle-Damgård structure is referred to as the MD structure, which is mainly used in the hash algorithm to resist collision attacks. This structure is the basis of some excellent hash algorithms, such as MD5, SHA-1 and SHA-2. Today I will explain to you this MD structure and the length extension attack on him.

MD structure

The MD structure was described by Ralph Merkle in his PhD thesis in 1979. Because Ralph Merkle and Ivan Damgård respectively proved the rationality of this structure, this structure is called the Merkle-Damgård structure.

Next, we look at how the MD structure works.

The MD structure first fills the input message so that the message becomes an integer multiple of a fixed length (such as 512 or 1024). This is because the compression algorithm cannot process messages of arbitrary length, so it must be filled before processing.

Generally speaking, we will use constant data, such as 0 to fill the entire message block.

For example, if our message is "HashInput" and the size of the compressed block is 8 bytes (64 bits), then our message will be divided into two blocks, and the latter block will be filled with 0, and we will get: "HashInpu t0000000".

But this is often not enough, because usually for the compression function, the extra 0 at the end will be deleted, so the hash value calculated after filling and not filling is the same.

To avoid this situation, the first bit of the constant data must be changed. Since constant padding usually consists of zeros, the first padding bit will be forced to change to "1".

That is, "HashInpu t1000000".

We can also further enhance the padding, such as using an extra block to fill the length of the message.

However, using an additional block is often a bit wasteful. A more space-saving approach is that if there is enough space in the 0 of the last block, you can put the length of the message there.

After filling the block, the message can be compressed next. Let's take a look at the flowchart of MD:

The message is divided into many blocks. The initial initialization vector and the first block are operated with f, and the result obtained is operated with the second block. This cycle is carried out, and the final result is finally obtained.

Length extension attack

In cryptography, the length extension attack means that the attacker can know the value of hash (message1‖message2) through the known length of hash (message1) and message1. Where ‖ represents the connector. And it's aggressive and need to know what message1 is.

The MD structure we talked about in the previous section divides the message into blocks. The value calculated by the previous block will be calculated again with the next block. This structure can be very convenient for length extension attacks. The premise is that we need to know the length of the original message.

Let's take an example, suppose we have the following request:

Original Data: count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggo
Original Signature: 6d5f807e23db210bc254a28be2d6759a0f5f5d99

The above example is to send a waffle filled with eggs to the user number 1 with a message signature to ensure the correctness of the message. The MAC algorithm used to sign the message here.

Suppose a malicious attacker wants to change the value of waffle from eggo to liege.

Then the new data will look like this:


In order to sign the new message, usually, the attacker needs to know the key used to sign the message, and generate a new signature by generating a new MAC. However, through the length expansion attack, the hash (the signature given above) can be used as input, and the hash output can be continued where the original request has been interrupted, as long as the length of the original request is known.

If we consider the impact of padding (message padding), we also need to restore the padding content of the original message, and then add our attack code after restoring the content:

New Data: count=10&lat=37.351&user_id=1&long=-119.827&waffle=eggo\x80\x00\x00

So we can get the new MAC value:

New Signature: 0e41270260895979317fff3898ab85668953aaa2

Wide pipe

In order to avoid length extension attacks, we can make some deformations to the MD structure.

First look at the Wide Pipe structure:

The processes of wide pipe and MD are basically the same, the difference is that the length of the temporarily encrypted message generated in the middle is twice the length of the final generated message.

This is why there are two initial vectors IV1 and IV2 in the above figure. If the length of the final result is n, then the length of the result generated in the middle is 2n. We need to reduce the 2n-length data to n-length data in the final final step.

SHA-512/224 and SHA-512/256 simply discard half of the data.

Fast wide pipe

There is also a faster algorithm than wide pipe called fast wide pipe:

Unlike wide pipe, its main idea is to forward half of the previous link value to XOR, and then XOR it with the output of the compression function.

This article has been included in

The most popular interpretation, the most profound dry goods, the most concise tutorial, and many tips you don't know are waiting for you to discover!

Welcome to pay attention to my official account: "Program those things", know technology, know you better!

阅读 640

Spring,区块链,密码学,分布式,多线程等教程 欢迎关注我的公众号:程序那些事,更多精彩等着您!


747 声望
409 粉丝
0 条评论


747 声望
409 粉丝