4
头图
This series mainly reviews and organizes some basic content on the front end, laying a foundation for the increasingly complex frameworks and facilitating the understanding of some framework content.

Today we mainly review the content related to URI and URL

what is URI

Uniform Resource Identifier ( Uniform Resource Identifier,URI ), which allows users to interact with resources in the network through a specific protocol. RFC2396 document Uniform Resource Identifier each part of 061ef7cd70197c as follows.

  • Uniform : Specifies a unified grammatical format to facilitate the processing of many different types of resources without identifying the resource type according to the context.
  • Resource : Any resource that can be identified. A resource can be not only a single object, but also a collection of multiple objects.
  • Identifier : Represents an identifiable object, also known as an identifier.

In general, URI the location identifier of the resource represented by a certain protocol scheme. The protocol scheme refers to the protocol type name used to access the resource. HTTP is one of the protocol schemes. In addition, there are 30 standard URI protocol schemes such FTP , file , and TELNET The protocol scheme is administered and promulgated by the Internet Assigned Numbers Authority (IANA). URI uses a string to identify an Internet resource. The commonly used URI URL , which indicates the location of an Internet resource.

URI consists of 5 components:

URI = scheme:[//authority]path[?query][#fragment]

In URI syntax:

  • scheme is the protocol scheme name. When using the HTTPS or HTTP , it is not case-sensitive, and the last symbol is a colon ":". The protocol scheme name can also use javascript:、data: specify script programs or data.
  • path is a hierarchical file path, specifying the file path on the server to access specific resources.
  • query is the query string. For the file resource of the specified path, you can use the query string to pass in any query parameters.
  • fragment is the fragment identifier, which usually marks the sub-resource of the acquired resource, and is optional.
  • authority can consist of the following 3 distributions:
authority = [userinfo@]host[:port]

In authority , userinfo used as login information, usually in the form of a specified user name and password, which is used as an authentication credential when obtaining resources from the server. userinfo is optional. The server address host needs to specify the access server address when using the absolute path URI . The address can be the domain name resolved DNS example.com , or the IPv4 address of 192.168.1.1 [0:0:0:0:0:0:0:1] enclosed in square brackets. port is the network port number connected to the server. As an option, if not specified, the default port number will be used automatically.

what is the URL

Uniform Resource Locator ( UniformResourceLocators,URL ), as a kind of URI, is like the house number of the network, which identifies the "address" of an Internet resource. For example, <http://www.example.com> means to obtain the home page resource www.example.com host name is 061ef7cd701ce2 through the HTTP protocol.

URL syntax definition URI is consistent, it belongs to URI a subset.

Uniform resource names ( Uniform Resource Name ) are also in the standard form URI , referring to a resource without specifying its location or existence. In view of the fact that this concept has little contact in the scope of daily front-end, it is only for understanding, and those who are interested can check the relevant content by themselves.

What is the relationship between URI and URL

URI和URL的关系

A Uniform Resource Name is also a well-formed URI that refers to a resource without specifying its location or existence. In view of the fact that this concept has little contact in the scope of daily front-end, it is only for understanding, and those who are interested can check the relevant content by themselves.

Borrow a picture to understand the relationship between them: URI can be divided into URL, URN or something with both locators and names properties. URN acts like a person's name, and URL acts like a person's address. In other words: URNs identify something, URLs provide a way to find it.

In the vernacular, URI is an abstract definition, no matter what method is used to represent it, as long as a resource can be located, it is called URI . It was originally envisaged to use two methods to locate: 1. URL , locate by address; 2, URN by name.

For example: go to the village to find a specific person ( URI ), if you use the address: the owner of the house and the first room in a certain village - it is URL ; if you use the ID number + name to find it - it is URN .

Browser URI encoding

URI encoding uses percent encoding ( Percent-encoding ). For the character that needs to be encoded, it is represented as two hexadecimal numbers, and then the escape character “%” placed in front of it, and the corresponding position of the original character is replaced for encoding.

Only unreserved characters and all reserved characters are allowed in URI Among them, the unreserved characters include English letters (a~z, A~Z), numbers (0~9), -, _, ., ~4 special characters, a total of 66 characters. Percent encoding is not required for unreserved characters. Reserved characters are those with special meaning. 18 reserved characters are specified in the RFC 3986

!*'();:@&=+$,/?#[]

In URI , reserved characters have special meanings, such as "?" for query, "#" for segment identification. If you want the reserved characters not to express a specific meaning, but only general characters, you need to URL-encode the reserved characters. Commonly used encoding methods are encodeURI and encodeURIComponent .

encodeURI and encodeURIComponent

encodeURI() and encodeURIComponent() are both URL encoding functions in Javascript.

The difference is that:

  • encodeURI is W3C standard ( RFC 3986 ), not ASCII encoded letters and numbers, punctuation marks are not ASCII 20 (-、_、.、!、~、*、'、(、)、;、/、?、:、@、&、=、+、$、,、#) encoded. For 66 unreserved characters, 18 reserved characters, excluding 2 unsafe reserved characters “[”“]” , the encodeURI set of 82 is 061ef7cd701fb1. For non-ASCII characters, encodeURI needs to convert it to UTF-8 encoded byte order, and then place the escape character (%) in front of each byte for percent encoding and place it in the corresponding position in the URI.

    UTF-8: UTF-8 has the advantages of no endian requirements, single-byte characteristics to save memory, backward compatibility with ASCII, and good error compatibility. A plain ASCII string is also a valid UTF-8 string, so existing ASCII text does not need to be converted. Software designed for the traditional extended ASCII character set can usually be used with UTF-8 with little or no modification.
  • encodeURIComponent assume that the parameters are URI part of (such as protocol, host name, path, or query string), therefore, encodeURIComponent will escape except letters, numbers, "(", ")", ".", "!", " All characters except ~", "*", "'", "-", and "_". For example, “name=val&key=” with encodeURIComponent results in “"name%3Dval%26key%3D"” . For URL component, it is usually necessary to use encodeURIComponent for encoding, such as:

    name=encodeURIComponent('val&key=')
    // name=val%26key%3D

Compared to encodeURIComponent , encodeURI is used to URI , and encodeURIComponent is used to URI a component of URI or a fragment of 061ef7cd702094. From the above point of view coding examples, encodeURIComponent coded character range than encodeURI large.

summary

The above is a URI and URL and related coding methods. In daily front-end development, URL such as 061ef7cd7020c9 are often mentioned by us, and related encoding and transcoding methods are often used in our daily development. I hope this review can also help you to review and deepen this knowledge.

References

- "In-depth understanding of React Router from principle to actual combat"


前端荣耀
1.6k 声望745 粉丝

一个在程序猿修炼生涯中的修行者