Preface
Recently, I used crawler technology to crawl NASA, which is the NASA
10000
pictures related to Mars exploration.
Hmm, little things, little things.
After I was done, I was a little excited, so I have this article, which will have the following content:
- Why should I crawl NASA pictures
- How do I crawl NASA pictures (super detailed)
- What did I get (high-definition large image)
- What secret did I find (Super Madden)
Why should I climb NASA's pictures
I'm over 35, shivering when I was driven.
Every day I think about what to do if I lose my job someday, I think about playing a self-media, I will give you all the vernacular every day. The vernacular point of historical mystery, the mystery of the universe or NASA
.
NASA
has a variety of space exploration missions, and related articles, interviews, pictures, and videos are released. This is a rare resource library.
How do I crawl NASA pictures (super detailed)
NASA
is publicly accessible and the address is
https://www.nasa.gov/
After opening it, its homepage looks like this, and you can see all kinds of content. There is also a search box in the upper right corner, we enter Mars
which is Mars
After a while, Mars
will be displayed, including one item Mars Exploration
, which is Mars Exploration
Images
to open it, I came to a new page, and then found the 060e422a1a0a5d picture, and reached the target page we crawled
https://www.nasa.gov/mission_pages/mars/images/index.html
Pull down the page, you will see a big button with MORE IMAGES
written on it, click to try and you will find:
The content of the page is not directly loaded by the page, but rendered asynchronously after api
F12, open the browser developer mode, re-execute the steps just now, observe the request information, and find the following situations
It seems that this url
address is very important, let’s look at his request address first:
https://www.nasa.gov/api/2/ubernode/_search?size=24&from=24&sort=promo-date-time%3Adesc&q=((ubernode-type%3Aimage)%20AND%20(topics%3A3152))&_source_include=promo-date-time%2Cmaster-image%2Cnid%2Ctitle%2Ctopics%2Cmissions%2Ccollections%2Cother-tags%2Cubernode-type%2Cprimary-tag%2Csecondary-tag%2Ccardfeed-title%2Ctype%2Ccollection-asset-link%2Clink-or-attachment%2Cpr-leader-sentence%2Cimage-feature-caption%2Cattachments%2Curi
Pay attention to the parameters inside
size=24&from=24
Obviously, size
is the number of pictures requested each time, and from
tested to query the initial position, we can change it to get other content
Let's take a look at its return information:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 659,
"max_score": null,
"hits": [{
"_index": "nasa-public",
"_type": "ubernode",
"_id": "450040",
"_score": null,
"_source": {
"image-feature-caption": "Mars 2020 rover underwent an eye exam after several cameras were installed on the rover. ",
"topics": ["3140", "3152"],
"nid": "450040",
"title": "NASA 'Optometrists' Verify Mars 2020 Rover's 20/20 Vision",
"type": "ubernode",
"uri": "/image-feature/jpl/nasa-optometrists-verify-mars-2020-rovers-2020-vision",
"collections": ["4525", "5246"],
"link-or-attachment": "link",
"missions": ["6336"],
"primary-tag": "6336",
"cardfeed-title": "NASA 'Optometrists' Verify Mars 2020 Rover's 20/20 Vision",
"promo-date-time": "2019-08-05T17:49:00-04:00",
"secondary-tag": "3140",
"master-image": {
"fid": "603128",
"alt": "Engineers test cameras on the top of the Mars 2020 rover’s mast and front chassis. ",
"width": "1600",
"id": "603128",
"title": "Engineers test cameras on the top of the Mars 2020 rover’s mast and front chassis. ",
"uri": "public://thumbnails/image/pia23314-16.jpg",
"height": "900"
},
"ubernode-type": "image"
},
"sort": [1565041740000]
}, {
"_index": "nasa-public",
"_type": "ubernode",
"_id": "433172",
"_score": null,
"_source": {
"image-feature-caption": "NASA still hasn't heard from the Opportunity rover, but at least we can see it again.",
"topics": ["3152"],
"nid": "433172",
"title": "Opportunity Emerges in a Dusty Picture",
"type": "ubernode",
"uri": "/image-feature/opportunity-emerges-in-a-dusty-picture",
"collections": ["7628"],
"link-or-attachment": "link",
"missions": ["3639"],
"primary-tag": "3152",
"cardfeed-title": "Opportunity Emerges in a Dusty Picture",
"promo-date-time": "2018-09-26T12:39:00-04:00",
"secondary-tag": "7628",
"master-image": {
"fid": "584263",
"alt": "NASA's Opportunity rover appears as a blip in the center of this square",
"width": "1400",
"id": "584263",
"title": "NASA's Opportunity rover appears as a blip in the center of this square",
"uri": "public://thumbnails/image/pia22549-16.jpg",
"height": "788"
},
"ubernode-type": "image"
},
"sort": [1537979940000]
}]
}
}
The json
above is too long. I deleted some duplicates. In fact hits
which is the same as the number of pictures displayed on the page. It can basically be concluded that the information on the page comes from this array.
Further comparison found that under the master-image
field is the information we need, including the picture address,
picture size, and
picture title.
Here is the code, assembling the request URL, obtaining the content, and downloading the picture in three steps
I used the Dart
language, please feel free to
import 'dart:convert';
import 'package:dio/dio.dart';
main() async {
// 每页数量是固定24个,改变初始值即可
for (int from = 0; from < 24 * 100; from = from + 24) {
await getPage(from);
}
}
//获取每一页的信息并且下载
Future<void> getPage(int from) async {
String url = 'https://www.nasa.gov/api/2/ubernode/_search?size=24&from=' +
from.toString() +
'&sort=promo-date-time%3Adesc&q=((ubernode-type%3Aimage)%20AND%20(topics%3A3152))&_source_include=promo-date-time%2Cmaster-image%2Cnid%2Ctitle%2Ctopics%2Cmissions%2Ccollections%2Cother-tags%2Cubernode-type%2Cprimary-tag%2Csecondary-tag%2Ccardfeed-title%2Ctype%2Ccollection-asset-link%2Clink-or-attachment%2Cpr-leader-sentence%2Cimage-feature-caption%2Cattachments%2Curi';
//获取到内容
var res = await Dio().get(url);
var map = jsonDecode(res.toString());
(map['hits']['hits'] as List<dynamic>).forEach((element) async {
Uri fileUri = Uri.parse(getUri(element));
String savePath = getSavePath(element);
await Dio().downloadUri(fileUri, savePath);
print('已下载: ' + savePath);
});
}
//获取图片下载地址
String getUri(dynamic element) {
String uri = element['_source']['master-image']['uri'].toString();
uri = uri.replaceAll('public://',
'https://www.nasa.gov/sites/default/files/styles/full_width_feature/public/');
return uri;
}
//处理信息,并且返回图片保存地址
String getSavePath(dynamic element) {
String id = element['_id'];
String fid = element['_source']['master-image']['fid'].toString();
String title = element['_source']['master-image']['title'].toString();
String uri = element['_source']['master-image']['uri'].toString();
String savePath =
id + '_' + fid + '_' + title.trim() + '.' + uri.split('.').last;
savePath = savePath.replaceAll('/', '');
savePath = savePath.replaceAll('\\', '');
savePath = savePath.replaceAll('"', '');
savePath = 'images/' + savePath;
return savePath;
}
The above code is still very simple, and experienced students should understand it at a glance.
Let's go.
已下载: images/470436_643588_This is the third color image taken by NASA’s Ingenuity helicopter.jpg
已下载: images/470435_643587_This is the second color image taken by NASA’s Ingenuity helicopter.jpg
已下载: images/468546_639327_This is the first high-resolution, color image to be sent back by the Hazard Cameras (Hazcams).jpg
已下载: images/452007_605784_Danielson Crater on Mars.jpg
已下载: images/458478_615132_Gullies on Mars.jpg
已下载: images/469416_641582_A field of sand dunes occupies this frosty 5-kilometer diameter crater in the high-latitudes of the northern plains of Mars..jpeg
已下载: images/458075_614251_Mars 2020 With Sample Tubes (Artist's Concept).jpg
已下载: images/470381_643473_CME.jpg
已下载: images/458813_615896_Mars.jpg
已下载: images/467026_635309_Illustration of NASA’s Perseverance rover begins its descent through the Martian atmosphere.jpg
已下载: images/470438_643591_This black and white image was taken by NASA’s Ingenuity helicopter during its third flight on April 25, 2021.jpg
已下载: images/465488_631398_Cliffs in Ancient Ice on Mars.jpg
已下载: images/463659_626874_Avalanche on Mars.jpg
已下载: images/470251_643164_This image from NASA’s Perseverance rover shows the agency’s Ingenuity Mars Helicopter right after it successfully completed a high-speed spin-up test..jpeg
已下载: images/468636_639726_Mars' Jezero Crater.jpg
What i got
These pictures
And these
The picture and the picture caption are all available, I guess it will be enough to look at for a month.
What secret i found
This picture is my favorite. One is so clear and the other is so muddy, why is this? The Martian crack generator?
Well, the real secret is:
NASA
is not anti-collection. If you don't believe it, try it. . .
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。