请教各位大佬Python采集如何遍历这种代码并拼出地址

<li\><span class\="file"\>favicon.ico</span\></li\>  
<li class\="expandable"\>  
    <div class\="hitarea expandable-hitarea"\></div\>  
    <span class\="folder"\>banner-ads</span\>  
    <ul style\="display: block;"\>  
        <li\><span class\="img"\>ad01.png</span\></li\>  
        <li\><span class\="img"\>ad02.png</span\></li\>  
        <li\><span class\="img"\>ad03.png</span\></li\>  
        <li\><span class\="img"\>ad04.png</span\></li\>  
        <li\><span class\="img"\>ad06.jpg</span\></li\>  
    </ul\>  
</li\>  
<li class\="expandable"\>  
    <div class\="hitarea expandable-hitarea"\></div\>  
    <span class\="folder"\>logos</span\>  
    <ul style\="display: block;"\>  
        <li\><span class\="img"\>logo-light.png</span\></li\>  
        <li\><span class\="img"\>logo.png</span\></li\>  
    </ul\>  
</li\>  
<li class\="expandable"\>  
    <div class\="hitarea expandable-hitarea"\></div\>  
    <span class\="folder"\>news</span\>  
    <ul style\="display: block;"\>  
        <li class\="expandable"\>  
            <div class\="hitarea expandable-hitarea"\></div\>  
            <span class\="folder"\>category</span\>  
            <ul style\="display: block;"\>  
                <li\><span class\="img"\>category1.png</span\></li\>  
                <li\><span class\="img"\>category2.png</span\></li\>  
                <li\><span class\="img"\>category3.png</span\></li\>  
                <li\><span class\="img"\>category4.png</span\></li\>  
                <li\><span class\="img"\>category5.png</span\></li\>  
            </ul\>  
        </li\>  
        <li class\="expandable"\>  
            <div class\="hitarea expandable-hitarea"\></div\>  
            <span class\="folder"\>fashion</span\>  
            <ul style\="display: block;"\>  
                <li\><span class\="img"\>image1.png</span\></li\>  
                <li\><span class\="img"\>image2.png</span\></li\>  
                <li\><span class\="img"\>image3.png</span\></li\>  
                <li\><span class\="img"\>image4.png</span\></li\>  
            </ul\>  
        </li\>  
        <li class\="expandable"\>  
            <div class\="hitarea expandable-hitarea"\></div\>  
            <span class\="folder"\>food</span\>  
            <ul style\="display: block;"\>  
                <li\><span class\="img"\>food01.png</span\></li\>  
            </ul\>  
        </li\>  
        <li class\="expandable"\>  
            <div class\="hitarea expandable-hitarea"\></div\>  
            <span class\="folder"\>health</span\>  
            <ul style\="display: block;"\>  
                <li\><span class\="img"\>image1.png</span\></li\>  
                <li\><span class\="img"\>image2.png</span\></li\>  
            </ul\>  
        </li\>  
        <li class\="expandable"\>  
            <div class\="hitarea expandable-hitarea"\></div\>  
            <span class\="folder"\>lifestyle</span\>  
            <ul style\="display: block;"\>  
                <li\><span class\="img"\>image1.jpg</span\></li\>  
                <li\><span class\="img"\>image2.png</span\></li\>  
                <li\><span class\="img"\>image3.png</span\></li\>  
                <li\><span class\="img"\>image4.png</span\></li\>  
            </ul\>  
        </li\>  
        <li class\="expandable"\>  
            <div class\="hitarea expandable-hitarea"\></div\>  
            <span class\="folder"\>news-details</span\>  
            <ul style\="display: block;"\>  
                <li\><span class\="img"\>large-image.jpg</span\></li\>  
                <li\><span class\="img"\>left-image.jpg</span\></li\>  
            </ul\>  
        </li\>  
        <li class\="expandable"\>  
            <div class\="hitarea expandable-hitarea"\></div\>  
            <span class\="folder"\>sports</span\>  
            <ul style\="display: block;"\>  
                <li\><span class\="img"\>sports02.png</span\></li\>  
                <li\><span class\="img"\>sports03.png</span\></li\>  
            </ul\>  
        </li\>  
        <li class\="expandable"\>  
            <div class\="hitarea expandable-hitarea"\></div\>  
            <span class\="folder"\>tech</span\>  
            <ul style\="display: block;"\>  
                <li\><span class\="img"\>image5.png</span\></li\>  
                <li\><span class\="img"\>tech02.png</span\></li\>  
                <li\><span class\="img"\>tech1.png</span\></li\>  
            </ul\>  
        </li\>  
        <li class\="expandable"\>  
            <div class\="hitarea expandable-hitarea"\></div\>  
            <span class\="folder"\>travel</span\>  
            <ul style\="display: block;"\>  
                <li\><span class\="img"\>image1.png</span\></li\>  
                <li\><span class\="img"\>image2.png</span\></li\>  
                <li\><span class\="img"\>image3.png</span\></li\>  
            </ul\>  
        </li\>  
        <li class\="expandable"\>  
            <div class\="hitarea expandable-hitarea"\></div\>  
            <span class\="folder"\>video</span\>  
            <ul style\="display: block;"\>  
                <li\><span class\="img"\>video1.jpg</span\></li\>  
                <li\><span class\="img"\>video2.jpg</span\></li\>  
                <li\><span class\="img"\>video3.jpg</span\></li\>  
                <li\><span class\="img"\>video4.jpg</span\></li\>  
            </ul\>  
        </li\>  
        <li\><span class\="img"\>author.png</span\></li\>  
        <li\><span class\="img"\>user1.png</span\></li\>  
        <li\><span class\="img"\>user2.png</span\></li\>  
    </ul\>  
</li\>  
<li\><span class\="img"\>controls.png</span\></li\>  


最终拼成如下
favicon.ico
banner-ads/ad01.png
news/category/category1.png
news/category/category2.png

阅读 1.3k
1 个回答

把反斜杠去掉就是合法的xml,你用xml递归解析就可以解决这个问题。xpath并没有特别方便的递归方法,所以还是直接手工xml解析方便点

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题