1,今日头条的url地址列表,头条有CDN,id和访问结果是一样,cdn地址不一样。这类特殊重复,怎么用正则识别并剔除重复保留其中一个?研究了很长时间没解决。
['http:\\/\\/p3.pstatp.com\\/origin\\/1b7b000317e8e6eae3e0', 'http:\\/\\/p3.pstatp.com\\/origin\\/1b7b000317e8e6eae3e0', 'http:\\/\\/pb9.pstatp.com\\/origin\\/1b7b000317e8e6eae3e0', 'http:\\/\\/pb1.pstatp.com\\/origin\\/1b7b000317e8e6eae3e0', 'http:\\/\\/p3.pstatp.com\\/origin\\/1b7800060aed2ccfa0cc","width":640,"url_list":[{"url":"http:\\/\\/p3.pstatp.com\\/origin\\/1b7800060aed2ccfa0cc"},{"url":"http:\\/\\/pb9.pstatp.com\\/origin\\/1b7800060aed2ccfa0cc"},{"url":"http:\\/\\/pb1.pstatp.com\\/origin\\/1b7800060aed2ccfa0cc"}],"uri":"origin\\/1b7800060aed2ccfa0cc","height":917},{"url":"http:\\/\\/p3.pstatp.com\\/origin\\/1b7d0003099985f45ee3', 'http:\\/\\/p3.pstatp.com\\/origin\\/1b7d0003099985f45ee3', 'http:\\/\\/pb9.pstatp.com\\/origin\\/1b7d0003099985f45ee3', 'http:\\/\\/pb1.pstatp.com\\/origin\\/1b7d0003099985f45ee3', 'http:\\/\\/p1.pstatp.com\\/origin\\/1b7c000309f203688954', 'http:\\/\\/p1.pstatp.com\\/origin\\/1b7c000309f203688954', 'http:\\/\\/pb3.pstatp.com\\/origin\\/1b7c000309f203688954', 'http:\\/\\/pb9.pstatp.com\\/origin\\/1b7c000309f203688954', 'http:\\/\\/p1.pstatp.com\\/origin\\/1b7800060af42554fb15', 'http:\\/\\/p1.pstatp.com\\/origin\\/1b7800060af42554fb15', 'http:\\/\\/pb3.pstatp.com\\/origin\\/1b7800060af42554fb15', 'http:\\/\\/pb9.pstatp.com\\/origin\\/1b7800060af42554fb15', 'http:\\/\\/p3.pstatp.com\\/origin\\/1b7c000309fad41441ae","width":640,"url_list":[{"url":"http:\\/\\/p3.pstatp.com\\/origin\\/1b7c000309fad41441ae"},{"url":"http:\\/\\/pb9.pstatp.com\\/origin\\/1b7c000309fad41441ae"},{"url":"http:\\/\\/pb1.pstatp.com\\/origin\\/1b7c000309fad41441ae"}],"uri":"origin\\/1b7c000309fad41441ae","height":917},{"url":"http:\\/\\/p1.pstatp.com\\/origin\\/1b7d000309a67b996cfd","width":640,"url_list":[{"url":"http:\\/\\/p1.pstatp.com\\/origin\\/1b7d000309a67b996cfd"},{"url":"http:\\/\\/pb3.pstatp.com\\/origin\\/1b7d000309a67b996cfd"},{"url":"http:\\/\\/pb9.pstatp.com\\/origin\\/1b7d000309a67b996cfd"}],"uri":"origin\\/1b7d000309a67b996cfd","height":917},{"url":"http:\\/\\/p3.pstatp.com\\/origin\\/1b7c00030a00854feda6', 'http:\\/\\/p3.pstatp.com\\/origin\\/1b7c00030a00854feda6', 'http:\\/\\/pb9.pstatp.com\\/origin\\/1b7c00030a00854feda6', 'http:\\/\\/pb1.pstatp.com\\/origin\\/1b7c00030a00854feda6', 'http:\\/\\/p3.pstatp.com\\/origin\\/1b7d000309aa72ca8132', 'http:\\/\\/p3.pstatp.com\\/origin\\/1b7d000309aa72ca8132', 'http:\\/\\/pb9.pstatp.com\\/origin\\/1b7d000309aa72ca8132', 'http:\\/\\/pb1.pstatp.com\\/origin\\/1b7d000309aa72ca8132']
set(['http:\/\/p9.pstatp.com\/origin\/1b7b000317e8e6eae3e0']
可能时间复杂度有点高,不过可以优化。