爬一下申万指数

之前在用知乎上的爬虫爬了交易所股票数据。
给个传送门：https://zhuanlan.zhihu.com/p/...
现在打算给股票按板块分分类，所以准备把申万的28个板块指数爬一下。

web小白一个，照着上面的方法试一下：

就是你了。点击Initiator定位到了

$.ajax({
               type: "POST",//用POST方式传输
               dataType:"json",//数据格式:JSON
               url:'handler.aspx',//目标地址
               data:"tablename=swzs&key=L1&p="+(pageindx+1)+"&where=  L1 in('801010','801020','801030','801040','801050','801080','801110','801120','801130','801140','801150','801160','801170','801180','801200','801210','801230','801710','801720','801730','801740','801750','801760','801770','801780','801790','801880','801890')&orderby="+orderby+"&fieldlist=L1,L2,L3,L4,L5,L6,L7,L8,L11&pagecount=28&timed="+ new Date().getTime(),
   
               beforeSend:function(){$("#divload").show();$("#Pagination").hide();},//发送数据之前
               complete:function(){$("#divload").hide();$("#Pagination").show();   },//接收数据完毕
               success:function(json) {
       
                        $("#productTable tr:gt(0)").remove();
                        var productData = json.root;
                        if(productData!="")
                        {
                        $.each(productData, function(i, n) {
                            var trs = "";
                            /*
                            if( where.indexOf("000001")>-1)
                           { trs += "<tr><td class='t_c'   ><a href=\"#\">" + n.L1 + "</a></td><td class='t_c'>" + n.L2 + "</td><td  class='t_b' >" + n.L3 + "</td><td>" + n.L4 + "</td><td class='t_b'>" +changeTwoDecimal_f( parseFloat(n.L5)/1000000)   + "</td><td class='t_b'>" + n.L6 + "</td><td class='t_b'>" + n.L7 + "</td><td class='t_b'>" + n.L8 + "</td><td class='t_b'>" +changeTwoDecimal_f( parseFloat(n.L11)/1000000)  + "</td></tr>";}
                           else
                           { trs += "<tr><td class='t_c'   ><a href=\"idx0210.aspx?swindexcode="+ n.L1  +"\">" + n.L1 + "</a></td><td class='t_c'>" + n.L2 + "</td><td  class='t_b' >" + n.L3 + "</td><td>" + n.L4 + "</td><td class='t_b'>" +changeTwoDecimal_f( parseFloat(n.L5)/1000000)   + "</td><td class='t_b'>" + n.L6 + "</td><td class='t_b'>" + n.L7 + "</td><td class='t_b'>" + n.L8 + "</td><td class='t_b'>" +changeTwoDecimal_f( parseFloat(n.L11)/1000000)  + "</td></tr>";} 
                            */
                            trs += "<tr><td class='t_c'   ><a href=\"idx0210.aspx?swindexcode="+ n.L1  +"\">" + n.L1 + "</a></td><td class='t_c'>" + n.L2 + "</td><td  class='t_b' >" + n.L3 + "</td><td>" + n.L4 + "</td><td class='t_b'>" +changeTwoDecimal_f( parseFloat(n.L5)/1000000)   + "</td><td class='t_b'>" + n.L6 + "</td><td class='t_b'>" + n.L7 + "</td><td class='t_b'>" + n.L8 + "</td><td class='t_b'>" +changeTwoDecimal_f( parseFloat(n.L11)/1000000)  + "</td></tr>";
                            tbody += trs;
                        });
                 
                 }
                 else
                 {
                      tbody="<tr><td  > 查询无数据</td></tr>";
                 }
                        $("#productTable").append(tbody);

                        $("#productTable tr:gt(0):odd").attr("class", "odd");
                        $("#productTable tr:gt(0):even").attr("class", "enen");
                        
                        
                        $("#productTable tr:gt(0)").hover(function(){
                            $(this).addClass('mouseover');
                        },function(){
                            $(this).removeClass('mouseover');
                        });
                }});

开始不太明白要怎么操作，但是在这一段中看到了type,url以及相应的注释，联想一下url的组成。
'data'属性应该即是url的参数。
'data'属性的最后有一个gettime，同时观察到页面没有今开盘数据，于是获取一下节前收市的时间，这里取的2020-04-30-15:00
把请求组一下。

url = "http://www.swsindex.com/handler.aspx?tablename=swzs&key=L1&p=1&where=  L1 in('801010','801020','801030','801040','801050','801080','801110','801120','801130','801140','801150','801160','801170','801180','801200','801210','801230','801710','801720','801730','801740','801750','801760','801770','801780','801790','801880','801890')&orderby=&fieldlist=L1,L2,L3,L4,L5,L6,L7,L8,L11&pagecount=28&timed=1588230000000"

成功~
按照上面的方法爬取数据。

------更新------
爬取碰到点问题，把正则式捋了一下，做个记录

先按照上面的方法将所有行提取出来，去掉队首的'root'

    data = re.compile("'root':\[(.*?)\]",re.S).findall(r.text)

其次将每一行互相分隔开

datas = data[0].split('},{')

3.将队首的'{'用空格替代，并提取出每一列属性

stock = datas[i].replace('{',"").split(",")
stocks = re.compile(":'(.*?)'",re.S).findall("".join(stock))

done

爬一下申万指数

HHXXHGGZ

引用和评论

探究一下索引

Anaconda安装教程以及Anaconda和pip配置国内镜像

如何减少跨团队交付摩擦？——基于 DevOps 与敏捷的最佳实践

python与nodejs哪个性能高

Python 描述符

科学计算编程涉及到的技术栈简介

使用 chardet 判断文件编码需要注意的坑——过大的文件会导致高耗时