6

Hello everyone, this is Liang Xu.

wget is a free utility that can download files from the Internet. Its working principle is to obtain data from the Internet and save it to a local file or display it on your terminal.

This is actually what the browsers you use, such as Firefox or Chrome, actually call the wget program internally to download data.

This article introduces 8 wget commands, I hope it will be helpful to friends.

1. Use the wget command to download files

You can use the wget command to download the file of the specified link. By default, the downloaded file will be saved to a file with the same name in the current working directory.

$ wget http://www.lxlinux.net
--2021-09-20 17:23:47-- http://www.lxlinux.net/
Resolving www.lxlinux.net... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to www.lxlinux.net|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1256 (1.2K) [text/html]
Saving to: 'index.html'

If you don't want to save the downloaded file locally, but just want to display it in standard output (stdout), you can use the --output-document , followed by the number -

$ wget http://www.lxlinux.net --output-document - | head -n4
<!doctype html>
<html>
<head>
   <title>Example Domain</title>

If you want to rename the downloaded file, you can use the --output-document (or more simply, just use -O ):

$ wget http://www.lxlinux.net --output-document newfile.html

2. Resume uploading

If the file you want to download is very large, it may not be able to download completely at one time due to network reasons. If you have to re-download every time, you don’t know that you have to wait until the year of the monkey.

In this case, you can use the --continue option (or -c ) to realize the resumable transfer. That is to say, if the download is interrupted due to various reasons, if you use this option, you can continue the last download without re-downloading.

$ wget --continue https://www.lxlinux.net/linux-distro.iso

3. Download a series of files

If you are not downloading a large file, but a lot of small files, then the wget command can also help you achieve it easily.

However, some bash syntax is also needed to achieve the purpose here. Generally speaking, the names of these files have certain rules, such as: file_1.txt, file_2.txt, file_3.txt, etc., then you need to use this command:

$ wget http://www.lxlinux.net/file_{1..4}.txt

4. Mirror the entire site

If you want to download the entire site of a certain website, including its directory structure, then you need to use the --mirror option.

This option is equivalent to --recursive --level inf --timestamping --no-remove-listing , which means it is infinitely recursive, so you can download all content on the specified domain.

If you use the wget archive site, then these options --no-cookies --page-requisites --convert-links can also be used to ensure that each page is up-to-date and complete.

5. Modify HTML request header

Friends who have studied network communication know that there are many elements in the HTTP data packet. Among them, the HTTP header is the initial part of the data packet.

When you use a browser to browse the web, your browser sends HTTP request headers to the server. What exactly did you post? You can use the --debug option to view the header information sent by each request of wget

$ wget --debug www.lxlinux.net
---request begin---
GET / HTTP/1.1
User-Agent: Wget/1.19.5 (linux-gnu)
Accept: */*
Accept-Encoding: identity
Host: www.lxlinux.net
Connection: Keep-Alive

---request end---

You can modify the request header --header Why do you do that? In fact, there are many usage scenarios. For example, sometimes in order to test, it is necessary to simulate the request issued by a specific browser.

For example, if you want to simulate a request from the Edge browser, you can do this:

$ wget --debug --header="User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 Edg/91.0.864.59" http://www.lxlinux.net

In addition, you can also pretend to be a specific mobile device (such as iPhone):

$ wget --debug \
--header = "User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari /604.1" \
HTTP:// www.lxlinux.net

6. View response headers

The header information is included in the response in the same way as the browser requests to send the header information. Similarly, you can use the --debug option to view the response headers:

$ wget --debug www.lxlinux.net
[...]
---response begin---
HTTP/1.1 200 OK
Accept-Ranges: bytes
Age: 188102
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Etag: "3147526947"
Server: ECS (sab/574F)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

---response end---
200 OK
Registered socket 3 for persistent reuse.
URI content encoding = 'UTF-8'
Length: 1256 (1.2K) [text/html]
Saving to: 'index.html'

7. Response 301 response

Those who are familiar with network protocols know that a 200 response code means that everything is going as expected. A 301 response means that the URL already points to a different website.

In this case, if you need to download files, you need to use the redirection function of wget Therefore, if you encounter a 301 response, you need to use the --max-redirect option.

If you don't want redirection, you can set --max-redirect to 0.

$ wget --max-redirect 0 http://www.lxlinux.net
--2021-09-21 11:01:35-- http://www.lxlinux.net/
Resolving www.lxlinux.net... 192.0.43.8, 2001:500:88:200::8
Connecting to www.lxlinux.net|192.0.43.8|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://www.www.lxlinux.net/ [following]
0 redirections exceeded.

Alternatively, you can also set it to another number to control the number of redirects followed by wget

8. Expand short link

Sometimes, we need to convert a long link into a short link. For example, when filling in information in a text box, sometimes the text box has a limit on the length of characters. In this case, the short link can greatly reduce the number of characters.

In addition to using third-party platforms, in fact, we can directly use the wget command to restore short links to long links. --max-redirect option is still used here:

$ wget --max-redirect 0 "https://bit.ly/2yDyS4T"
--2021-09-21 11:32:04-- https://bit.ly/2yDyS4T
Resolving bit.ly... 67.199.248.10, 67.199.248.11
Connecting to bit.ly|67.199.248.10|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.lxlinux.net/ [following]
0 redirections exceeded.

In the second-to-last line of the output, in the Location section, you will see the true face of the short chain after it is expanded.

Finally, recently, many friends asked me for the Linux learning roadmap , so based on my experience, I spent a month staying up late in my spare time and compiled an e-book. Whether you are in an interview or self-improvement, I believe it will be helpful to you!

Give it to everyone for free, just ask you to give me a thumbs up!

e-book | Linux development learning roadmap

I also hope that some friends can join me to make this e-book more perfect!

Gain? I hope that the old guys will have a three-strike combo, so that more people can read this article

Recommended reading:


良许
1k 声望1.8k 粉丝