php curl 获取网页内容 中文乱码

kurisu_
  • 73

获取是没问题。。但是似乎字符编码上有些问题,

<?php

//header( "Content-type:text/html;Charset=utf-8" );
$urls = [
    'http://jobs.51job.com/'
];

$array = [
//    'user-agent:Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36;'
//    'accept-language:zh-CN,zh;q=0.8,zh-TW;q=0.6;
    'Content-Type:text/html; charset=utf-8'
];


var_dump($urls);
foreach ($urls as $url) {

    $ch = curl_init();
    curl_setopt_array($ch, [
        CURLOPT_URL => $url,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_FOLLOWLOCATION => 10,
        CURLOPT_TIMEOUT => 30,
        CURLOPT_BINARYTRANSFER=>true,
        CURLOPT_ENCODING => 'gzip,deflate',
        CURLOPT_HTTPHEADER => $array
    ]);

    $output = curl_exec($ch);
    $info = curl_getinfo($ch);
    curl_close($ch);

    var_dump($info);
    mb_convert_encoding($output, 'utf-8', 'GBK,UTF-8,ASCII');
    echo $output;

//    file_put_contents('str.txt' , $output,FILE_APPEND);
}

顺带问一下获取拉钩内容一直显示页面加载中。。。

<br><html><head><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta name="renderer" content="webkit"><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><script type="text/javascript" src="https://www.lagou.com/utrack/trackMid.js?version=1.0.0.3&t=1503291026"></script><body><input type="hidden" id="KEY" value="rsagIwk3yl2hnrkI98FuQACf9eerWodYa0dPJ"/><script type="text/javascript">kfGNYOsx();</script>页面加载中...<script type="text/javascript" src="https://www.lagou.com/upload/oss.js"></script></body></html>
回复
阅读 8.3k
2 个回答

51job是gb2312编码,抓到内容转换一下就行

clipboard.png

mb_convert_encoding($contents,'utf-8','gb2312');

iconv('gbk','utf-8//IGNORE', $content);

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
宣传栏