Introduction
Argon2 is a key derivation function. It was selected as the champion of the password hashing contest in July 2015. It was designed by Alex Biryukov, Daniel Dinu and Dmitry Khovratovich of the University of Luxembourg. The implementation of Argon2 is usually licensed under Creative Commons CC0 (ie Public domain) or Apache License 2.0 is released, and three related versions are provided, namely Argon2d, Argon2i and Argon2id.

This article will discuss the principle and use of Argon2.

Key derivation function
In cryptography, the key derivation function (KDF) is a cryptographic hash function that uses a pseudo-random function to derive one or more keys from a secret value (such as a master key, password, or password). KDF can be used to stretch the key into a longer key, or to obtain a key in a required format, for example, to convert the result of the Diffie-Hellman key exchange into a symmetric key for AES.

Password Hashing Competition
Although cryptography is the study of cryptography, the more open the encryption algorithm is, the better. Only when it is public can the quality of the algorithm be examined. Only after thorough research can the algorithm be used and spread in the industry.

The most famous cryptographic algorithm competition is definitely a competition organized by NIST in 2001 to specify the standard AES algorithm. The purpose of the competition is to find the latest encryption algorithm to replace the old DES algorithm. In this competition, many excellent algorithms emerged, including CAST-256, CRYPTON, DEAL, DFC, E2, FROG, HPC, LOKI97, MAGENTA, MARS, RC6, Rijndael, SAFER+, Serpent, and Twofish. Finally, the Rijndael algorithm was selected as the final AES algorithm implementation.

The same PHC is also an algorithm competition of this kind. Unlike the algorithm competition held by NIST, this is an unofficial competition organized by cryptographers. It was launched by Jean-Philippe Aumasson in the fall of 2012.

In the first quarter of 2013, a notice of soliciting opinions was issued, and as of the deadline of March 31, 2014, a total of 24 opinions were received. In December 2014, nine shortlists were confirmed. In July 2015, Argon2 was announced as the winner.

Argon2 algorithm
The design of Argon2 is very simple, aiming to achieve the highest memory fill rate and effective use of multiple computing units, while also providing defense against tradeoff attacks (by using the processor's cache and memory).

There are three variants of Argon2. Argon2i, Argon2d and Argon2id. Argon2d is faster and uses data-dependent memory access methods, which makes it highly resistant to GPU cracking attacks and is suitable for applications that do not have side-channel timing attacks threats (such as cryptocurrencies).

Argon2i uses data-independent memory access, which is the first choice for password hashing and password-based key derivation algorithms. It is characterized by slower speed because it runs more processing logic on the memory to prevent tradeoff attacks.

Argon2id is a hybrid of Argon2i and Argon2d. It uses a combination of data-dependent and data-independent memory access, which can simultaneously resist side-channel timing attacks and GPU cracking attacks.

Argon2 input parameters
Argon2 has two types of input parameters, primary inputs and secondary inputs.

The primary inputs include the message P and nonce S to be encrypted, representing password and salt, respectively.

The length of P is 0 to 232-1 bytes, and the length of S is 8 to 232-1 bytes (if it is a password hash, 16 bytes are recommended).

It is called primary inputs because these two parameters must be entered.

The remaining parameters are called secondary inputs, and they include:

The degree of parallelism p indicates how many independent calculation chains can run at the same time, and the value is 1 to 224-1.
Tag length τ, the length is from 4 to 232-1 bytes. '
The memory size is m, the unit is megabytes, and the value is 8p to 232-1.
The number t of iterators improves the running speed. The value ranges from 1 to 232-1.
The version number v, one byte, takes the value 0x13.
The safety value K has a length of 0 to 232-1 bytes.
The additional data X has a length of 0 to 232-1 bytes.
The type of Argon2, 0 stands for Argon2d, 1 stands for Argon2i, and 2 stands for Argon2id.
These inputs can be represented by the following code:

Inputs:

  password (P):       Bytes (0..232-1)    Password (or message) to be hashed
  salt (S):           Bytes (8..232-1)    Salt (16 bytes recommended for password hashing)
  parallelism (p):    Number (1..224-1)   Degree of parallelism (i.e. number of threads)
  tagLength (T):      Number (4..232-1)   Desired number of returned bytes
  memorySizeKB (m):   Number (8p..232-1)  Amount of memory (in kibibytes) to use
  iterations (t):     Number (1..232-1)   Number of iterations to perform
  version (v):        Number (0x13)       The current version is 0x13 (19 decimal)
  key (K):            Bytes (0..232-1)    Optional key (Errata: PDF says 0..32 bytes, RFC says 0..232 bytes)
  associatedData (X): Bytes (0..232-1)    Optional arbitrary extra data
  hashType (y):       Number (0=Argon2d, 1=Argon2i, 2=Argon2id)

Output:

  tag:                Bytes (tagLength)   The resulting generated bytes, tagLength bytes long

Processing flow
Let's take a look at the non-parallel Argon2 algorithm flow:

Non-parallel Argon2 is the simplest.

In the above figure, G represents a compression function, which receives two 1024byte inputs and outputs one 1024byte.

i represents the number of steps executed, and φ(i) above is the input, taken from the memory space.

As a memory-hard algorithm, a very important job is to construct the initial memory. Next, let's take a look at how to construct the initial memory space.

First, we need to construct H0, which is a 64-byte block value. Through H0, we can construct more blocks. The formula for calculating H0 is as follows:

H0 = H(p,τ,m,t,v,y,⟨P⟩,P,⟨S⟩,S,⟨K⟩,K,⟨X⟩,X)

It is the H function of the input parameters we mentioned earlier. The size of H0 is 64byte.

Look at the code generation of H0:

Generate initial 64-byte block H0.

All the input parameters are concatenated and input as a source of additional entropy.
Errata: RFC says H0 is 64-bits; PDF says H0 is 64-bytes.
Errata: RFC says the Hash is H^, the PDF says it's ℋ (but doesn't document what ℋ is). It's actually Blake2b.
Variable length items are prepended with their length as 32-bit little-endian integers.

buffer ← parallelism ∥ tagLength ∥ memorySizeKB ∥ iterations ∥ version ∥ hashType

     ∥ Length(password)       ∥ Password
     ∥ Length(salt)           ∥ salt
     ∥ Length(key)            ∥ key
     ∥ Length(associatedData) ∥ associatedData

H0 ← Blake2b(buffer, 64) //default hash size of Blake2b is 64-bytes
For the input parameter parallelism p, the memory needs to be divided into a memory matrix Bi, which is a matrix with p rows.

Calculate the value of matrix B:

Among them, H′ is a variable-length hash algorithm based on H.

We give the implementation of this algorithm:

Function Hash(message, digestSize)
Inputs:

  message:         Bytes (0..232-1)     Message to be hashed
  digestSize:      Integer (1..232)     Desired number of bytes to be returned

Output:

  digest:          Bytes (digestSize)   The resulting generated bytes, digestSize bytes long

Hash is a variable-length hash function, built using Blake2b, capable of generating
digests up to 232 bytes.

If the requested digestSize is 64-bytes or lower, then we use Blake2b directly
if (digestSize <= 64) then

  return Blake2b(digestSize ∥ message, digestSize) //concatenate 32-bit little endian digestSize with the message bytes

For desired hashes over 64-bytes (e.g. 1024 bytes for Argon2 blocks),
we use Blake2b to generate twice the number of needed 64-byte blocks,
and then only use 32-bytes from each block

Calculate the number of whole blocks (knowing we're only going to use 32-bytes from each)
r ← Ceil(digestSize/32)-1;

Generate r whole blocks.
Initial block is generated from message
V1 ← Blake2b(digestSize ∥ message, 64);
Subsequent blocks are generated from previous blocks
for i ← 2 to r do

  Vi ← Blake2b(Vi-1, 64)

Generate the final (possibly partial) block
partialBytesNeeded ← digestSize – 32*r;
Vr+1 ← Blake2b(Vr, partialBytesNeeded)

Concatenate the first 32-bytes of each block Vi
(except the possibly partial last block, which we take the whole thing)
Let Ai represent the lower 32-bytes of block Vi
return A1 ∥ A2 ∥ ... ∥ Ar ∥ Vr+1
If we have more than one iteration, that is to say t> 1, we calculate B for the next iteration like this:

B^{t}i=G\left(B^{t-1}i, B\left[i^{\prime}\right]\left[j^{\prime}\right]\right) \oplus B^{t-1}iB
t
i=G(B
t−1
i,B[i

][j

])⊕B
t−1
i

B^{t}i=G\left(B^{t}i, B\left[i^{\prime}\right]\left[j^{\prime}\right]\right) \oplus B^{t-1}iB
t
i=G(B
t
i,B[i

][j

])⊕B
t−1
i

After finally traversing T times, we get the final B:

B_{\text {final }}=B^{T}0 \oplus B^{T}1 \oplus \cdots \oplus B^{T}p-1B
final

=B
T
0⊕B
T
1⊕⋯⊕B
T
p−1

Finally get the output:

\mathrm{Tag} \leftarrow H^{\prime}\left(B_{\text {final }}\right)Tag←H

(B
final

)

This logic can also be expressed in code:

Calculate number of 1 KB blocks by rounding down memorySizeKB to the nearest multiple of 4*parallelism kibibytes
blockCount ← Floor(memorySizeKB, 4*parallelism)

Allocate two-dimensional array of 1 KiB blocks (parallelism rows x columnCount columns)
columnCount ← blockCount / parallelism; //In the RFC, columnCount is referred to as q

Compute the first and second block (i.e. column zero and one ) of each lane (i.e. row)
for i ← 0 to parallelism-1 do for each row

  Bi[0] ← Hash(H0 ∥ 0 ∥ i, 1024) //Generate a 1024-byte digest
  Bi[1] ← Hash(H0 ∥ 1 ∥ i, 1024) //Generate a 1024-byte digest

Compute remaining columns of each lane
for i ← 0 to parallelism-1 do //for each row

  for j ← 2 to columnCount-1 do //for each subsequent column
     //i' and j' indexes depend if it's Argon2i, Argon2d, or Argon2id (See section 3.4)
     i′, j′ ← GetBlockIndexes(i, j)  //the GetBlockIndexes function is not defined
     Bi[j] = G(Bi[j-1], Bi′[j′]) //the G hash function is not defined

Further passes when iterations > 1
for nIteration ← 2 to iterations do

  for i ← 0 to parallelism-1 do for each row
    for j ← 0 to columnCount-1 do //for each subsequent column
       //i' and j' indexes depend if it's Argon2i, Argon2d, or Argon2id (See section 3.4)
       i′, j′ ← GetBlockIndexes(i, j)
       if j == 0 then 
         Bi[0] = Bi[0] xor G(Bi[columnCount-1], Bi′[j′])
       else
         Bi[j] = Bi[j] xor G(Bi[j-1], Bi′[j′])

Compute final block C as the XOR of the last column of each row
C ← B0[columnCount-1]
for i ← 1 to parallelism-1 do

  C ← C xor Bi[columnCount-1]

Compute output tag
return Hash(C, tagLength)
This article has been included in http://www.flydean.com/40-argon2/

The most popular interpretation, the most profound dry goods, the most concise tutorial, and many tips you don't know are waiting for you to discover!

Welcome to pay attention to my official account: "Program those things", know technology, know you better!


flydean
890 声望433 粉丝

欢迎访问我的个人网站:www.flydean.com