Thoroughly understand Cookie, Session, Token, JWT

What is authentication (Authentication)

In layman's terms, it is to verify the identity of the current user and prove that "you are yourself" (for example: you need to check in with your fingerprint every day to check in and out of get off work, and when your fingerprint matches the fingerprint entered in the system, check in is successful) .

Authentication in the Internet

Username password login
Email login link
Mobile phone number to receive verification code
As long as you can receive the email/verification code, you will be the owner of the account by default

What is authorization (Authorization)

The user grants third-party applications access to certain resources of the user
- When you install the mobile app, the app will ask whether to allow permission to be granted (access to albums, geographic location, etc.)
- When you access the WeChat Mini Program, when you log in, the Mini Program will ask whether to allow permission to be granted (get personal information such as nickname, avatar, region, gender, etc.)
The ways to achieve authorization are: cookie, session, token, OAuth

What is Credentials

The prerequisite for authentication and authorization is that a medium (certificate) is needed to mark the identity of the visitor.

During the Warring States Period, Shang Yang reformed the law and invented the photo post. The photo body post is issued by the government. It is a smooth and finely polished bamboo board with the holder's head portrait and hometown information engraved on it. The people of the country must have it. If they don’t, they are considered to be illegal, or spies.
In real life, everyone will have an exclusive resident ID, which is a legal document used to prove the identity of the holder. Through the ID card, we can apply for mobile phone cards/bank cards/personal loans/traffic travel, etc. This is the certificate of authentication.
In Internet applications, general websites (such as Nuggets) have two modes, visitor mode and login mode. In guest mode, you can browse the articles on the website normally. Once you want to like/favorite/share articles, you need to log in or register an account. When the user logs in successfully, the server will issue a token to the browser used by the user. This token is used to show your identity. Every time the browser sends a request, it will bring this token and you can use it. Functions not available in guest mode.

What is a cookie

HTTP is a stateless protocol (there is no memory for transaction processing, the server will not save any session information every time the client and server session is completed): each request is completely independent, and the server cannot confirm the current access The identity information of the sender cannot distinguish whether the sender of the previous request and the sender of this time are the same person. Therefore, in order for the server and the browser to track the session (to know who is accessing me), they must actively maintain a state, which is used to inform the server whether the two requests come from the same browser. And this state needs to be achieved through cookie or session.

Cookies are stored on the client: A cookie is a small piece of data sent by the server to the user's browser and saved locally. It will be carried and sent to the server when the browser initiates a request to the same server next time.
Cookies are not cross-domain: each cookie is bound to a single domain name and cannot be used under other domain names. The first-level domain name and the second-level domain name are allowed to be shared (rely on domain).

Important attributes of cookie

What is Session

Session is another mechanism to record the state of server and client sessions
The session is implemented based on cookies, the session is stored on the server side, and the sessionId will be stored in the cookie on the client side

session authentication process

When the user requests the server for the first time, the server creates the corresponding Session according to the relevant information submitted by the user
When the request is returned, the unique identification information SessionID of this Session is returned to the browser
After the browser receives the SessionID information returned by the server, it will store this information in the Cookie, and the Cookie will record which domain name this SessionID belongs to
When the user visits the server for the second time, the request will automatically determine whether there is cookie information under this domain name. If there is, the cookie information will be automatically sent to the server. The server will obtain the SessionID from the Cookie, and then find the corresponding Session according to the SessionID. If no information is found, it means that the user is not logged in or the login is invalid. If a Session is found to prove that the user has logged in, the following operations can be performed.

According to the above process, SessionID is a bridge connecting Cookie and Session, and most systems also verify the user's login status based on this principle.

The difference between Cookie and Session

Security: Session is safer than Cookie , Session is stored on the server side, and Cookie is stored on the client side.
access value type is different : Cookie only supports storing string data. If you want to set other types of data, you need to convert it to a string. Session can store any data type.
different validity period : Cookie can be set to be kept for a long time, such as the default login function we often use, Session generally expires for a short time, and the client is closed (by default) or Session timeout will expire.
different storage sizes : A single cookie can store no more than 4K data, and the session can store data much higher than a cookie, but when there are too many visits, it will take up too much server resources.

What is Token (token)

Acesss Token

The resource credentials required to access the resource interface (API).
The composition of a simple token: uid (user's unique identity), time (time stamp of the current time), sign (signature, the first few digits of the token are compressed into a certain length of hexadecimal string by a hash algorithm).

Features:

The server is stateless and scalable
Support mobile devices
Safety
Support cross-program call

The authentication process of token:

The client uses the username and password to request login
The server receives the request to verify the user name and password
After the verification is successful, the server will issue a token and send the token to the client
After the client receives the token, it will store it, such as in a cookie or localStorage
The client needs to bring the token issued by the server every time it requests resources from the server
The server receives the request, and then verifies the token contained in the client request. If the verification is successful, it returns the requested data to the client

Every request needs to carry the token, and the token needs to be placed in the HTTP header.

Token-based user authentication is a stateless authentication method on the server side, and the server side does not need to store token data.
The calculation time of the parsing token is exchanged for the storage space of the session, thereby reducing the pressure on the server and reducing frequent database queries.

The token is completely managed by the application, so it can avoid the same-origin policy.

Refresh Token

Another kind of token-refresh token

The refresh token is a token dedicated to refreshing the access token. If there is no refresh token, you can also refresh the access token, but every refresh requires the user to enter the login user name and password, which will be very troublesome. With the refresh token, this trouble can be reduced. The client directly uses the refresh token to update the access token, without the need for additional operations by the user.

The validity period of the Access Token is relatively short. When the Acesss Token becomes invalid due to expiration, the Refresh Token can be used to obtain a new Token. If the Refresh Token is also invalid, the user can only log in again.

Refresh Token and expiration time are stored in the database of the server. They will only be verified when applying for a new Acesss Token. It will not affect the response time of the business interface, and it does not need to be kept in memory like Session to deal with a large number of ask.

The difference between Token and Session

Session is a mechanism to record the session state of the server and the client, which makes the server stateful and can record session information. And Token is a token, the resource credential required to access the resource interface (API). Token makes the server stateless and does not store session information.

Session and Token are not contradictory. As an identity authentication Token, the security is better than Session, because every request has a signature to prevent monitoring and replay attacks, and Session must rely on the link layer to ensure communication security. If you need to implement a stateful session, you can still add Session to save some state on the server side.

The so-called Session authentication is simply to store User information in the Session. Because of the unpredictability of SessionID, it is considered safe for the time being. And Token, if it refers to OAuth Token or a similar mechanism, it provides authentication and authorization, authentication is for users, and authorization is for App. Its purpose is to give a certain App the right to access a certain user's information. The Token here is unique. It cannot be transferred to other apps, nor can it be transferred to other users. Session only provides a simple authentication, that is, as long as there is this SessionID, it is considered to have all the rights of this User. It needs to be strictly confidential. This data should only be stored on the site and should not be shared with other websites or third-party apps. So in simple terms: If your user data may need to be shared with a third party, or allow a third party to call the API interface, use Token. If it's always just your own website, your own App, it doesn't matter what you use.

What is JWT

JSON Web Token (JWT for short) is currently the most popular cross-domain authentication solution.
Is an authentication and authorization mechanism.
JWT is an open standard (RFC 7519) based on JSON that is implemented to transfer claims between web application environments. JWT statements are generally used to pass the authenticated user identity information between the identity provider and the service provider in order to obtain resources from the resource server. For example, it is used for user login.
You can use HMAC algorithm or RSA public/private key to sign JWT. Because of the existence of digital signatures, the information transmitted is credible.

Generate JWT

jwt.io/www.jsonwebtoken.io/

The principle of JWT

JWT authentication process:

The user enters the user name/password to log in, and after the server authentication is successful, a JWT will be returned to the client
The client saves the token locally (usually localstorage is used, but cookies can also be used)
When users want to access a protected route or resource, they need to use the Bearer mode to add JWT in the Authorization field of the request header. The content looks like this
```
Authorization: Bearer复制代码
```
The protected route on the server side will check the JWT information in the request header Authorization, and if it is legal, the user's behavior will be allowed
Because JWT is self-contained (contains some session information internally), it reduces the need to query the database
Because JWT does not use cookies, you can use any domain name to provide your API service without worrying about cross-domain resource sharing (CORS)
Because the user's state is no longer stored in the server's memory, this is a stateless authentication mechanism

How to use JWT

The client receives the JWT returned by the server, which can be stored in Cookie or localStorage.

method one

When a user wants to access a protected route or resource, it can be automatically sent in a Cookie, but this cannot be cross-domain, so it is better to put it in the Authorization field of the HTTP request header and use the Bearer mode Add JWT.

    GET /calendar/v1/events
    Host: api.example.com
    Authorization: Bearer <token>

The user's state will not be stored in the server's memory, which is a stateless authentication mechanism
The server-side protection route will check the JWT information in the request header Authorization, and if it is legal, the user's behavior will be allowed.
Since JWT is self-contained, it reduces the need to query the database
These features of JWT allow us to rely entirely on its stateless features to provide data API services, and even create a download streaming service.
Because JWT does not use cookies, you can use any domain name to provide your API service without worrying about cross-domain resource sharing (CORS)
Way two

When cross-domain, you can put JWT in the data body of the POST request.

Way Three

Transmission via URL

    http://www.example.com/user?token=xxx

Use JWT in the project

项目地址: https://github.com/yjdjiayou/jwt-demo

The difference between Token and JWT

same

Are all tokens to access resources
Can record user information
All make the server stateless
Only after the authentication is successful, the client can access the protected resources on the server

the difference

Token: When the server verifies the Token sent by the client, it also needs to query the database to obtain user information, and then verify whether the Token is valid.
JWT: The Token and Payload are encrypted and stored on the client. The server only needs to use the key decryption for verification (verification is also implemented by JWT itself). There is no need to query or reduce the query database, because JWT contains users Information and encrypted data.

Common front-end and back-end authentication methods

Session-Cookie
Token verification (including JWT, SSO)
OAuth2.0 (open authorization)

Common encryption algorithms

Hash algorithm

Hash Algorithm, also known as Hash Algorithm, Hash Function, Hash Function, is a method of creating a small digital "fingerprint" from any kind of data. The hash algorithm shuffles and mixes the data again to recreate a hash value.

The hash algorithm is mainly used to ensure the authenticity (ie integrity) of the data, that is, the sender sends the original message and the hash value together, and the recipient uses the same hash function to verify the authenticity of the original data.

Hash algorithms usually have the following characteristics:

Just like fast: the original data can quickly calculate the hash value
Difficulty in reverse engineering: It is basically impossible to deduce the original data through the hash value
Input sensitivity: As long as the original data changes a little, the hash value obtained is very different
Conflict avoidance: It is difficult to find different raw data to get the same hash value. The number of atoms in the universe is about 10 to the 60th power to the 80th power, so 2 to the 256th power has enough space to accommodate all possibilities. When the algorithm is good, the probability of collision is very low:
- The 128th power of 2 is 340282366920938463463374607431768211456, which is the 39th power of 10
- 2 to the 160th power is 1.4615016373309029182036848327163e+48, which is 10 to the 48th power level
- The 256th power of 2 is 1.1579208923731619542357098500869 × 10 to the 77th power, which is 10 to the 77th power

Note :

The above does not guarantee that the data will be maliciously tampered with. Both the original data and the hash value may be maliciously tampered with. To ensure that they will not be tampered with, you can use the RSA public key private key scheme, combined with the hash value.
The hash algorithm is mainly used to prevent errors in the computer transmission process. Early computers used the 8th parity check code of the first 7 digits to guarantee (12.5% waste efficiency is low). For a piece of data or file, the hash algorithm is used to generate 128bit or 256bit hash value, if there is a problem with the verification, retransmission is required.

common problem

Questions to consider when using cookies

Because it is stored on the client, it is easy to be tampered with by the client, and the legality needs to be verified before use
Do not store sensitive data, such as user passwords, account balances
Use httpOnly to improve security to a certain extent
Minimize the size of the cookie, and the amount of data that can be stored cannot exceed 4kb
Set the correct domain and path to reduce data transmission
cookie cannot cross domain
A browser can store up to 20 cookies for a website, and browsers generally only allow 300 cookies to be stored
The mobile terminal does not support cookies very well, and the session needs to be implemented based on cookies, so tokens are commonly used on mobile terminals

Issues to consider when using session

Store sessions in the server. When users are online at a time, these sessions will occupy more memory, and the expired sessions need to be cleaned up regularly on the server side.
When the website adopts cluster deployment, it will encounter the problem of how to share session between multiple web servers. Because the session is created by a single server, but the server that processes the user's request is not necessarily the server that created the session, the server will not be able to get the information such as the login credentials that have been put into the session before.
When multiple applications want to share a session, in addition to the above problems, they will also encounter cross-domain problems, because different applications may be deployed on different hosts, and cross-domain cookie processing needs to be done in each application.
The sessionId is stored in the cookie. What if the browser prohibits cookies or does not support cookies? Generally, the sessionId is followed by the url parameter to rewrite the url, so the session does not necessarily need to be implemented by cookies
The mobile terminal does not support cookies very well, and the session needs to be implemented based on cookies, so tokens are commonly used on mobile terminals

Issues to consider when using tokens

If you think that using a database to store tokens will lead to too long query time, you can choose to store them in memory. For example, redis is very suitable for your needs for token query.
The token is completely managed by the application, so it can avoid the same-origin policy
Token can avoid CSRF attacks (because cookies are not needed)
The mobile terminal does not support cookies very well, and the session needs to be implemented based on cookies, so tokens are commonly used on mobile terminals

Questions to consider when using JWT

Because JWT does not rely on cookies, you can use any domain name to provide your API service without worrying about cross-domain resource sharing (CORS)
JWT is not encrypted by default, but it can also be encrypted. After the original Token is generated, it can be encrypted again with the key.
When the JWT is not encrypted, the secret data cannot be written into the JWT.
JWT can be used not only for authentication, but also for exchanging information. Effective use of JWT can reduce the number of times the server queries the database.
The biggest advantage of JWT is that the server no longer needs to store the Session, so that the server authentication service can be easily expanded. But this is also the biggest shortcoming of JWT: because the server does not need to store the Session state, it is impossible to discard a Token or change the permissions of the Token during use. In other words, once the JWT is issued, it will always be valid until it expires, unless the server deploys additional logic.
The JWT itself contains authentication information. Once it is leaked, anyone can obtain all the permissions of the token. In order to reduce misappropriation, the validity period of JWT should be set relatively short. For some of the more important permissions, the user should be authenticated again when using it.
JWT is suitable for one-time command authentication. A JWT with a very short validity period is issued. Even if the risk is exposed, it is very small. Since each operation will generate a new JWT, there is no need to save the JWT and truly realize statelessness.
In order to reduce misappropriation, JWT should not use HTTP protocol for clear transmission, but use HTTPS protocol for transmission.

Issues to consider when using encryption algorithms

Never store passwords in clear text
Always use hash algorithms to process passwords. Never use Base64 or other encoding methods to store passwords. This is the same as storing passwords in plain text. Use hashing instead of encoding. Encoding and encryption are two-way processes, and the password is confidential and should only be known by its owner. This process must be one-way. Hashing is used to do this. There has never been such a saying as unhashing, but there is decoding for encoding, and decryption for encryption.
Never use weak hashes or hash algorithms that have been cracked, like MD5 or SHA1, only use strong password hashing algorithms.
Never display or send passwords in clear text, even to the owner of the password. If you need the "forgot password" function, you can randomly generate a new one-time (this is very important) password, and then send this password to the user.

Session sharing scheme under distributed architecture

session copy

When the session on any server changes (addition, deletion, modification), the node will serialize all the content of the session and then broadcast it to all other nodes, regardless of whether other servers need session or not, so as to ensure session synchronization

Advantages: It is fault-tolerant, and the sessions among the servers can respond in real time.
Disadvantages: Will cause a certain amount of pressure on the network load, if the session volume is large, it may cause network congestion and slow down the server performance.

Sticky session/IP binding strategy

Using the ip_hash mechanism in Ngnix, all requests for an ip are directed to the same server, that is, the user is bound to the server. When the user requests for the first time, the load balancer forwards the user's request to server A. If the load balancer sets a sticky session, then every subsequent request of the user will be forwarded to server A, which is equivalent to connecting the user and A The server is stuck together, which is the sticky session mechanism.

Advantages: simple, no need to do any processing on the session.
Disadvantages: lack of fault tolerance. If the currently accessed server fails and the user is transferred to the second server, his session information will be invalid.
Applicable scenarios: A failure has a small impact on customers; a server failure is a low probability event. Implementation method: Take Nginx as an example, configure the ip_hash attribute in the upstream module to achieve a sticky session.

session sharing (commonly used)

Use distributed caching solutions such as Memcached and Redis to cache sessions, but Memcached or Redis must be a cluster

Storing the session in Redis, although the architecture becomes complicated and requires one more visit to Redis, the benefits of this solution are also great:

Realize session sharing;
Can be extended horizontally (add Redis server);
The session will not be lost when the server restarts (but also pay attention to the refresh/invalidation mechanism of the session in Redis);
Not only can cross-server session sharing, but also cross-platform (such as web and APP)

session persistence

Store the session in the database to ensure the persistence of the session

Advantages: There is a problem with the server, the session will not be lost
Disadvantages: If the website has a lot of visits, storing the session in the database will cause a lot of pressure on the database, and additional overhead will be required to maintain the database.

As long as the browser is closed, the session really disappears?

wrong.

For session, unless the program informs the server to delete a session, the server will keep it. The program generally sends an instruction to delete the session when the user logs off. However, the browser never actively informs the server that it will be closed before closing, so the server will never have the opportunity to know that the browser has been closed. The reason for this illusion is that most session mechanisms use session cookies to save the session id. , And the session id disappears after closing the browser, and the original session cannot be found when connecting to the server again.

If the cookie set by the server is saved on the hard disk, or some method is used to rewrite the HTTP request header sent by the browser and send the original session id to the server, the original session can still be opened by opening the browser again.

It is precisely because closing the browser does not cause the session to be deleted, forcing the server to set an expiration time for the session. When the time since the last time the client used the session exceeds this expiration time, the server considers that the client has stopped its activity. The session will be deleted to save storage space.

Author: Autumn without falling leaves juejin.cn/post/6844904034181070861