• 0
  • 新人请关照

querylist爬取亚马逊出错的问题,爬取其它网页是没有任何问题的

问题描述

Fatal error: Uncaught GuzzleHttpExceptionConnectException: cURL error 7: Failed to connect to www.amazon.com port 443: Timed out (see http://curl.haxx.se/libcurl/c... in E:test18tp5php_rurlvendorguzzlehttpguzzlesrcHandlerCurlFactory.php:185 Stack trace: #0 E:test18tp5php_rurlvendorguzzlehttpguzzlesrcHandlerCurlFactory.php(149): GuzzleHttpHandlerCurlFactory::createRejection(Object(GuzzleHttpHandlerEasyHandle), Array) #1 E:test18tp5php_rurlvendorguzzlehttpguzzlesrcHandlerCurlFactory.php(102): GuzzleHttpHandlerCurlFactory::finishError(Object(GuzzleHttpHandlerCurlHandler), Object(GuzzleHttpHandlerEasyHandle), Object(GuzzleHttpHandlerCurlFactory)) #2 E:test18tp5php_rurlvendorguzzlehttpguzzlesrcHandlerCurlHandler.php(43): GuzzleHttpHandlerCurlFactory::finish(Object(GuzzleHttpHandlerCurlHandler), Object(GuzzleHttpHandlerEasyHandle), Object(GuzzleHttpHandlerCurlFactory)) #3 E:test18tp5php_rurlvendorguzzlehttpguzzlesrcHandlerProxy.php(28): GuzzleH in E:test18tp5php_rurlvendorguzzlehttpguzzlesrcHandlerCurlFactory.php on line 185

问题出现的环境背景及自己尝试过哪些方法

使用PHP原生开发,

相关代码

// 请把代码文本粘贴到下方(请勿用图片代替代码)
<?php
require 'vendor/autoload.php';
use QLQueryList;

$data = QueryList::get('https://www.amazon.com/s?k=yo...')->find('h1>div>div>div>div>span')->texts();
print_r($data->all());

你期待的结果是什么?实际看到的错误信息又是什么?

找出原因解决它

阅读 919
评论
    1 个回答

    443的原因有很多,你先试一试用echo file_get_contents("https://www.amazon.com/s?k=yoga&ref=nb_sb_noss");能输出html源代码吗?

    建议用一些headers伪装自己的来路身份,可以先准备一个方法:

    /**
         * 返回采集机器人识别的header数组
         * @return array
         */
        private function getRobotHeader()
        {
            return [
                'headers' => [
                    'Referer'    => 'http://www.baidu.com',
                    'User-Agent' => 'Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)',
                ]
            ];
        }

    然后再再你的get()方法第三个参数传递进去,比如这样:
    QueryList::get($url, ['timeout' => 30], $this->getRobotHeader())

    或者

    QueryList::get($url, [], $this->getRobotHeader())

      撰写回答

      登录后参与交流、获取后续更新提醒

      相似问题
      推荐文章