Learn URL related operation functions in PHP

硬核项目经理
中文

In the daily business development process, we often have the need to process URL links, so the functions we learn today are actually some functions that everyone often uses. In the previous work process, I actually only had a vague concept of these functions. I knew, but when I really need to use it, I still have to look at the document to determine which function I really want to use. Therefore, today we will treat it as a review exercise, mainly to distinguish and figure out the true use of each function.

Encoding operation function

First look at the functions related to URL encoding. Some browsers will automatically URL-encode the URL after we copy and paste a URL, that is, there are many percent signs in the form. In PHP, there are naturally corresponding codec functions.

$url = "https://www.zyblog.net?opt=dev&mail=zyblog@net.net&comments=aaa bbb ccc %dfg &==*() cdg&value=“中文也有呀,还有中文符号!!”";

echo $url, PHP_EOL;
// https://www.zyblog.net?opt=dev&mail=zyblog@net.net&comments=aaa bbb ccc %dfg &==*() cdg&value=“中文也有呀,还有中文符号!!”

$enurl = urlencode($url);
echo $enurl, PHP_EOL;
// https%3A%2F%2Fwww.zyblog.net%3Fopt%3Ddev%26mail%3Dzyblog%40net.net%26comments%3Daaa+bbb+ccc+%25dfg+%26%3D%3D%2A%28%29+cdg%26value%3D%E2%80%9C%E4%B8%AD%E6%96%87%E4%B9%9F%E6%9C%89%E5%91%80%EF%BC%8C%E8%BF%98%E6%9C%89%E4%B8%AD%E6%96%87%E7%AC%A6%E5%8F%B7%EF%BC%81%EF%BC%81%E2%80%9D

echo urldecode($enurl), PHP_EOL;
// https://www.zyblog.net?opt=dev&mail=zyblog@net.net&comments=aaa bbb ccc %dfg &==*() cdg&value=“中文也有呀,还有中文符号!!”

These two functions are estimated to be the most used functions. urlencode() is used for URL encoding operation. As you can see, the link we prepared has been encoded into content containing various percent signs. Especially for Chinese characters, if it is a Chinese parameter in a link like GET, the content after encoding will make the link very long. urldecode() is the function of the corresponding decoding function, which can decode the encoded link back to the original state.

$rawenurl = rawurlencode($enurl);
echo $rawenurl, PHP_EOL;
// https%253A%252F%252Fwww.zyblog.net%253Fopt%253Ddev%2526mail%253Dzyblog%2540net.net%2526comments%253Daaa%2Bbbb%2Bccc%2B%2525dfg%2B%2526%253D%253D%252A%2528%2529%2Bcdg%2526value%253D%25E2%2580%259C%25E4%25B8%25AD%25E6%2596%2587%25E4%25B9%259F%25E6%259C%2589%25E5%2591%2580%25EF%25BC%258C%25E8%25BF%2598%25E6%259C%2589%25E4%25B8%25AD%25E6%2596%2587%25E7%25AC%25A6%25E5%258F%25B7%25EF%25BC%2581%25EF%25BC%2581%25E2%2580%259D

echo rawurldecode($rawenurl), PHP_EOL;
// https%3A%2F%2Fwww.zyblog.net%3Fopt%3Ddev%26mail%3Dzyblog%40net.net%26comments%3Daaa+bbb+ccc+%25dfg+%26%3D%3D%2A%28%29+cdg%26value%3D%E2%80%9C%E4%B8%AD%E6%96%87%E4%B9%9F%E6%9C%89%E5%91%80%EF%BC%8C%E8%BF%98%E6%9C%89%E4%B8%AD%E6%96%87%E7%AC%A6%E5%8F%B7%EF%BC%81%EF%BC%81%E2%80%9D

echo rawurlencode($url), PHP_EOL;
// https%3A%2F%2Fwww.zyblog.net%3Fopt%3Ddev%26mail%3Dzyblog%40net.net%26comments%3Daaa%20bbb%20ccc%20%25dfg%20%26%3D%3D%2A%28%29%20cdg%26value%3D%E2%80%9C%E4%B8%AD%E6%96%87%E4%B9%9F%E6%9C%89%E5%91%80%EF%BC%8C%E8%BF%98%E6%9C%89%E4%B8%AD%E6%96%87%E7%AC%A6%E5%8F%B7%EF%BC%81%EF%BC%81%E2%80%9D

echo rawurldecode($enurl), PHP_EOL;
// https://www.zyblog.net?opt=dev&mail=zyblog@net.net&comments=aaa+bbb+ccc+%dfg+&==*()+cdg&value=“中文也有呀,还有中文符号!!”

Next we saw rawurlencode() and rawurldecode(). Many friends will not be confused about the difference between them and ordinary urlencode() and urldecode(). In fact, their difference is mainly reflected in some special characters, such as spaces. In urlencode(), spaces are encoded as + signs, and in urlrawencode(), spaces are %20. It can be seen in our third test code.

The first two pieces of test code are for operations on the \$enurl that has been coded before. The third piece of test code is the encoding of the original $url. These two functions are functions that implement the RFC3986 specification. And urlencode() retains some special cases like the conversion of spaces into + signs due to historical reasons.

Finally, let's look at two very simple Base64-related codec functions.

$base64url = base64_encode($enurl);
echo $base64url, PHP_EOL;
// aHR0cHMlM0ElMkYlMkZ3d3cuenlibG9nLm5ldCUzRm9wdCUzRGRldiUyNm1haWwlM0R6eWJsb2clNDBuZXQubmV0JTI2Y29tbWVudHMlM0RhYWErYmJiK2NjYyslMjVkZmcrJTI2JTNEJTNEJTJBJTI4JTI5K2NkZyUyNnZhbHVlJTNEJUUyJTgwJTlDJUU0JUI4JUFEJUU2JTk2JTg3JUU0JUI5JTlGJUU2JTlDJTg5JUU1JTkxJTgwJUVGJUJDJThDJUU4JUJGJTk4JUU2JTlDJTg5JUU0JUI4JUFEJUU2JTk2JTg3JUU3JUFDJUE2JUU1JThGJUI3JUVGJUJDJTgxJUVGJUJDJTgxJUUyJTgwJTlE

echo base64_decode($base64url), PHP_EOL;
// https%3A%2F%2Fwww.zyblog.net%3Fopt%3Ddev%26mail%3Dzyblog%40net.net%26comments%3Daaa+bbb+ccc+%25dfg+%26%3D%3D%2A%28%29+cdg%26value%3D%E2%80%9C%E4%B8%AD%E6%96%87%E4%B9%9F%E6%9C%89%E5%91%80%EF%BC%8C%E8%BF%98%E6%9C%89%E4%B8%AD%E6%96%87%E7%AC%A6%E5%8F%B7%EF%BC%81%EF%BC%81%E2%80%9D

In fact, the biggest use of Base64 is not reflected in the encoding of such ordinary strings, but in the role of binary strings in the transmission after encoding. This classmate who must have used it naturally knows it well. Mainly for interface development, if we use Base64 to encode data, one is that there is no encryption effect, and the other is that it is possible to increase the length of the data, so unless there are special needs, there is really no too much in ordinary transmission. It is necessary to encode the data in Base64.

URL parsing operations

In addition to encoding and decoding characters in URL links, parsing link parameters is also a function we often use. for example:

$urls = parse_url($url);
var_dump($urls);
// array(3) {
//     ["scheme"]=>
//     string(5) "https"
//     ["host"]=>
//     string(14) "www.zyblog.net"
//     ["query"]=>
//     string(119) "opt=dev&mail=zyblog@net.net&comments=aaa bbb ccc %dfg &==*() cdg&value=“中文也有呀,还有中文符号!!”"
//   }

Through the parse_url() function, we can disassemble the various parts of the link.

$parseTestUrl = 'http://username:password@hostname/path?arg=value#anchor';

print_r(parse_url($parseTestUrl));
// Array
// (
//     [scheme] => http
//     [host] => hostname
//     [user] => username
//     [pass] => password
//     [path] => /path
//     [query] => arg=value
//     [fragment] => anchor
// )

The above test link is more standardized, and we can also see that parse_url() can disassemble the protocol, address, user name, password, path, query statement, and snippets. These are also the normative standards that constitute a URL link. We can also specify what we need.

echo parse_url($parseTestUrl, PHP_URL_PATH); // /path

Add the second parameter like this, you can get only the part of the content we need. Of course, for the entire URL link, what we are most concerned about is actually the content of the query part. Can they be disassembled again? Get all query data results just like $_GET.

$querys = [];
parse_str($urls['query'], $querys);
var_dump($querys);
// array(4) {
//     ["opt"]=>
//     string(3) "dev"
//     ["mail"]=>
//     string(14) "zyblog@net.net"
//     ["comments"]=>
//     string(15) "aaa bbb ccc �g "
//     ["value"]=>
//     string(48) "“中文也有呀,还有中文符号!!”"
//   }

The parse_str() function is the function to parse this kind of URL link query statement. It should be noted that the second parameter of this function is optional. If a variable is not used to receive the parsed result of this function, then all parsed results will be directly converted into variable form. It may be a bit dizzy to say, just look at the code.

parse_str($urls['query']);
echo $value, PHP_EOL; // “中文也有呀,还有中文符号!!”

See it now. In order to prevent the occurrence of variable pollution problems, it is best to have a second parameter to store the results of the analysis in the place we specify. Finally, let's take a look at how to combine the array into a URL query statement.

echo http_build_query($querys), PHP_EOL;
// opt=dev&mail=zyblog%40net.net&comments=aaa+bbb+ccc+%DFg+&value=%E2%80%9C%E4%B8%AD%E6%96%87%E4%B9%9F%E6%9C%89%E5%91%80%EF%BC%8C%E8%BF%98%E6%9C%89%E4%B8%AD%E6%96%87%E7%AC%A6%E5%8F%B7%EF%BC%81%EF%BC%81%E2%80%9D

echo http_build_query($querys, null, '$||$', PHP_QUERY_RFC3986), PHP_EOL;
// opt=dev$||$mail=zyblog%40net.net$||$comments=aaa%20bbb%20ccc%20%DFg%20$||$value=%E2%80%9C%E4%B8%AD%E6%96%87%E4%B9%9F%E6%9C%89%E5%91%80%EF%BC%8C%E8%BF%98%E6%9C%89%E4%B8%AD%E6%96%87%E7%AC%A6%E5%8F%B7%EF%BC%81%EF%BC%81%E2%80%9D

http_build_query() In fact, as long as the students who have done the development of docking external interfaces will not be unfamiliar. Because it is so convenient. However, it should be noted that this function will rawurlencode() encode the data itself. In addition, it has several optional parameters. For example, we modified the connection symbol in the second test code, and replaced the original & symbol with our custom symbol to splice the URL query statement.

Parse the response header and meta information of the file or remote address

For remote file requests, the response header information is also very important. In fact, there are also functions for directly obtaining response headers in URL-related components.

$url = 'https://www.sina.com.cn';

print_r(get_headers($url));
// Array
// (
//     [0] => HTTP/1.1 200 OK
//     [1] => Server: nginx
//     [2] => Date: Mon, 25 Jan 2021 02:08:35 GMT
//     [3] => Content-Type: text/html
//     [4] => Content-Length: 530418
//     [5] => Connection: close
//     [6] => Vary: Accept-Encoding
//     [7] => ETag: "600e278a-7c65e"V=5965C31
//     [8] => X-Powered-By: shci_v1.13
//     [9] => Expires: Mon, 25 Jan 2021 02:09:12 GMT
//     [10] => Cache-Control: max-age=60
//     [11] => X-Via-SSL: ssl.22.sinag1.qxg.lb.sinanode.com
//     [12] => Edge-Copy-Time: 1611540513080
//     [13] => Age: 24
//     [14] => Via: https/1.1 cmcc.guangzhou.union.82 (ApacheTrafficServer/6.2.1 [cRs f ]), https/1.1 cmcc.jiangxi.union.175 (ApacheTrafficServer/6.2.1 [cRs f ])
//     [15] => X-Via-Edge: 1611540515462770a166fee55a97524d289c7
//     [16] => X-Cache: HIT.175
//     [17] => X-Via-CDN: f=edge,s=cmcc.jiangxi.union.166.nb.sinaedge.com,c=111.22.10.119;f=edge,s=cmcc.jiangxi.union.168.nb.sinaedge.com,c=117.169.85.166;f=Edge,s=cmcc.jiangxi.union.175,c=117.169.85.168
// )

print_r(get_headers($url, 1));
// Array
// (
//     [0] => HTTP/1.1 200 OK
//     [Server] => nginx
//     [Date] => Mon, 25 Jan 2021 02:08:35 GMT
//     [Content-Type] => text/html
//     [Content-Length] => 530418
//     [Connection] => close
//     [Vary] => Accept-Encoding
//     [ETag] => "600e278a-7c65e"V=5965C31
//     [X-Powered-By] => shci_v1.13
//     [Expires] => Mon, 25 Jan 2021 02:09:12 GMT
//     [Cache-Control] => max-age=60
//     [X-Via-SSL] => ssl.22.sinag1.qxg.lb.sinanode.com
//     [Edge-Copy-Time] => 1611540513080
//     [Age] => 24
//     [Via] => https/1.1 cmcc.guangzhou.union.82 (ApacheTrafficServer/6.2.1 [cRs f ]), https/1.1 cmcc.jiangxi.union.175 (ApacheTrafficServer/6.2.1 [cRs f ])
//     [X-Via-Edge] => 1611540515593770a166fee55a97568f1a9d6
//     [X-Cache] => HIT.175
//     [X-Via-CDN] => f=edge,s=cmcc.jiangxi.union.165.nb.sinaedge.com,c=111.22.10.119;f=edge,s=cmcc.jiangxi.union.175.nb.sinaedge.com,c=117.169.85.165;f=Edge,s=cmcc.jiangxi.union.175,c=117.169.85.175
// )

The response header information returned by the target address server can be directly obtained through the get_headers() function. Its second parameter can return data in the form of key-value subscripts. In addition to the response header, we can also get the content in all meta tags of the website.

var_dump(get_meta_tags($url));
// array(11) {
//     ["keywords"]=>
//     string(65) "新浪,新浪网,SINA,sina,sina.com.cn,新浪首页,门户,资讯"
//     ["description"]=>
//     string(331) "新浪网为全球用户24小时提供全面及时的中文资讯,内容覆盖国内外突发新闻事件、体坛赛事、娱乐时尚、产业资讯、实用信息等,设有新闻、体育、娱乐、财经、科技、房产、汽车等30多个内容频道,同时开设博客、视频、论坛等自由互动交流空间。"
//     ["referrer"]=>
//     string(6) "always"
//     ["stencil"]=>
//     string(10) "PGLS000022"
//     ["publishid"]=>
//     string(8) "30,131,1"
//     ["verify-v1"]=>
//     string(44) "6HtwmypggdgP1NLw7NOuQBI2TW8+CfkYCoyeB8IDbn8="
//     ["application-name"]=>
//     string(12) "新浪首页"
//     ["msapplication-tileimage"]=>
//     string(42) "//i1.sinaimg.cn/dy/deco/2013/0312/logo.png"
//     ["msapplication-tilecolor"]=>
//     string(7) "#ffbf27"
//     ["baidu_ssp_verify"]=>
//     string(32) "c0e9f36397049594fb9ac93a6316c65b"
//     ["sudameta"]=>
//     string(20) "dataid:wpcomos:96318"
//   }

This function is not only useful for remotely linked websites, but also can directly view the content of all meta tags in a local static file. We only need to change the remote link of the parameter to the path of the local file. You can do it yourself try it.

Summarize

Today's content is relatively simple, mainly these functions are often used in daily work. However, the usage of some parameters may not be clear to many friends, such as the second parameter of the parse_str() function. So as I said at the beginning, this article is a review and consolidation, and it also plays a role in deepening understanding. After in-depth study, you can master it even more with practical application.

Test code:

https://github.com/zhangyue0503/dev-blog/blob/master/php/2021/01/source/9. Learn URL related operation functions in PHP. php

Reference documents:

https://www.php.net/manual/zh/book.url.php

阅读 785
87 声望
16 粉丝
0 条评论
87 声望
16 粉丝
文章目录
宣传栏