BeautifulSoup 用 .find(text=True) 找不到 table 里边的文字

.find(text=True)table里的一些文字没有作用,下边是我的代码:

import urllib
import urllib2
import cookielib
import re
import csv
import codecs
from bs4 import BeautifulSoup

listmain = 'http://gdemba.gicp.net:84/ListMain.asp'
header = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(listmain,headers=header)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)

table = soup.find(id='Table11')
f = open('table.csv', 'w')
csv_writer = csv.writer(f)
td = re.compile('td')

client = ""
tag = ""
tel = ""
catalogue = ""
region = ""
client_type = ""
email = ""
creater = ""
department = ""
action = ""

for row in table.find_all("tr"):
    cells = row.find_all("td")
    if len(cells) == 10:
        client = cells[0].find(text=True)
        tag = cells[1].find(text=True)
        tel = cells[2].find(text=True)
        catalogue = cells[3].find(text=True)
        region = cells[4].find(text=True)
        client_type = cells[5].find(text=True)
        email = cells[6].find(text=True)
        creater = cells[7].find(text=True)
        department = cells[8].find(text=True)
        action = cells[9].find(text=True)

    csv_writer.writerow([x.encode('utf-8') for x in [client, tag, tel, catalogue, region, client_type, email, creater, department, action]])

f.close()

有一条要处理的<tr>是这样的:

<tr class="ListTableRow" id="Row0" onclick="javascript:setRowFocus(this,false,0);FirstDataFormat('0000008688')" ondblclick="viewcoinfo('interunit','0000008688','{A31618B2-90CC-456F-A2E7-4C5B0D577E25}')">
<td nowrap=""> <span id="spanshare0000008688"></span>深圳营业部</td>
<td id="0000008688sign" nowrap=""> 福田</td>
<td nowrap=""> 0755-66666666</td>
<td nowrap=""> 手机配件</td>
<td nowrap=""> 深圳市</td>
<td nowrap=""> 普通客户</td>
<td nowrap=""> <span class="BlueText" onclick="javascript:EmailTo('0000008688','123456@qq.com')" onmouseout="javascript:this.style.textDecoration=''" onmouseover="javascript:this.style.textDecoration='underline'>123456@qq.com</span></td>
<td nowrap=""> 信息资源部</td>
<td nowrap=""> 信息资源部</td>
<td height="16" nowrap="" style="width: 78px"> </td>
</tr>

但是客户名称Email两个<td>里边的text没办法取出来:

图片描述

请问是什么原因,跟<span>标签有关系吗?

阅读 6.8k
2 个回答

不需要使用cells[0].find(text=True),直接用cells[0].text就行

因为“深圳营业部”没有包含在span里,而是包含在了td里。所以直接td.get_text()就行(这句里td只是代表那个标签)
根据你的源码,直接这样就好了:

resutls = soup.select('tr [class="ListTableRow"] td')
for tag in results:
    print tag.get_text()

大概这样吧

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题