3

HttpClient4.x简单使用

之前一直使用HttpClient4来获取URL的页面,那么HttpClient怎么使用呢?闲话少叙直接上代码吧!

public class HTTPUtils {

    private static CloseableHttpClient httpClient;

    private static RequestConfig requestConfig = RequestConfig.custom()
            .setSocketTimeout(5000).setConnectTimeout(5000).build();

        /**
         * 
         * @param url
         * @return
         * @throws IOException
         */
        public static String getHTML(String url) throws IOException {
            httpClient = HttpClients.createDefault();
            HttpGet request = new HttpGet(url);
            request.setConfig(requestConfig);
            HttpResponse response = httpClient.execute(request);
            HttpEntity entity = response.getEntity();
            // ContentType contentType = ContentType.get(entity);
            String html = EntityUtils.toString(entity, "GB18030");
            httpClient.close();
            // httpClient.getConnectionManager().shutdown();
            return html;
        }
    }
    该段代码重点在于requestConfig的定义,如果不设置超时时间,当批量操作大量网页的时候,会出现等待假死的情况。这种情况是特别严重的,会大大提高人工,所以加入超时设定来控制。获取html页面的时候,需要设置一下页面编码,否则默认ISO_8859_1字符编码。

iceworldvip
57 声望1 粉丝

热爱技术


下一篇 »
缓冲区的笔记