python爬虫网页提取器xpath,可是提取不到网址,要怎么修改呢?
以下是需要提取的网页代码:
<script type="text/x-handlebars-template" id="descTemplate">
<p>
<img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i1/678878759/TB24N1cenAKL1JjSZFCXXXFspXa_!!678878759.png" align="absmiddle" width="750"><img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i2/678878759/TB2YMdtclxRMKJjy0FdXXaifFXa_!!678878759.jpg" align="absmiddle" width="750"><img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i4/678878759/TB2NCcvcbsTMeJjy1zbXXchlVXa_!!678878759.jpg" align="absmiddle" width="750"><img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i1/678878759/TB2soQ.dEl7MKJjSZFDXXaOEpXa_!!678878759.jpg" align="absmiddle" width="750"><img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i3/678878759/TB2z6gDcgoQMeJjy0FpXXcTxpXa_!!678878759.jpg" align="absmiddle" width="750"><img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i2/678878759/TB2Kst_elcHL1JjSZJiXXcKcpXa_!!678878759.jpg" align="absmiddle" width="750"><img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i1/678878759/TB21M4tclxRMKJjy0FdXXaifFXa_!!678878759.jpg" align="absmiddle" width="750"><img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i1/678878759/TB20qx8eoEIL1JjSZFFXXc5kVXa_!!678878759.jpg" align="absmiddle" width="750"><img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i3/678878759/TB2nVADcgMPMeJjy1XbXXcwxVXa_!!678878759.jpg" align="absmiddle" width="750"><img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i3/678878759/TB2_6KcenAKL1JjSZFCXXXFspXa_!!678878759.jpg" align="absmiddle" width="750"><img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i1/678878759/TB2bICced.LL1JjSZFEXXcVmXXa_!!678878759.jpg" align="absmiddle" width="750"><img style="max-width:750.0px;" src="https://img.alicdn.com/imgextra/i2/678878759/TB2IayieoQIL1JjSZFhXXaDZFXa_!!678878759.jpg" align="absmiddle" width="750"> </p>
</script>
以下是我写的代码:
describe_image_urls_list = selector.xpath('//*[@id="descTemplate"]/p/img/@src').extract()
if len(describe_image_urls_list) == 0:
describe_image_urls_list = selector.xpath('//*[@id="main-con"]/div[2]/div/div[2]/p[2]/img/@src').extract()
if len(describe_image_urls_list) == 0:
describe_image_urls_list = selector.xpath('//*[@id="main-con"]/div[2]/div/div[2]/h1/img/@src').extract()
item["describe_url"] = describe_image_urls_list
可是怎么也提取不出来,各位高手帮忙看看
使用了BeautifulSoup这个库,因为 beautifulsoup 会单独解析script这个标签,所以下班 script 去掉