python爬虫json解析的问题

用python抓取数据的时候,从网页上提取了一条这样的 json 字符串 str

{\"count\":8,\"sub_images\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470700000c7084773fb2\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470700000c7084773fb2\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470700000c7084773fb2\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470700000c7084773fb2\"}],\"uri\":\"origin\\/470700000c7084773fb2\",\"height\":1590},{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/47050001b69355a0bf1b\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/47050001b69355a0bf1b\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/47050001b69355a0bf1b\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/47050001b69355a0bf1b\"}],\"uri\":\"origin\\/47050001b69355a0bf1b\",\"height\":1557},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470300020761150d671a\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470300020761150d671a\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470300020761150d671a\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470300020761150d671a\"}],\"uri\":\"origin\\/470300020761150d671a\",\"height\":1552},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/47000002200f2a0a9020\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/47000002200f2a0a9020\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/47000002200f2a0a9020\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/47000002200f2a0a9020\"}],\"uri\":\"origin\\/47000002200f2a0a9020\",\"height\":1575},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/470000022011d5569ccb\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/470000022011d5569ccb\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/470000022011d5569ccb\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470000022011d5569ccb\"}],\"uri\":\"origin\\/470000022011d5569ccb\",\"height\":1588},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/4700000220127db96444\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/4700000220127db96444\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/4700000220127db96444\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/4700000220127db96444\"}],\"uri\":\"origin\\/4700000220127db96444\",\"height\":1561},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/46ff000532e33a9fa35a\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"}],\"uri\":\"origin\\/46ff000532e33a9fa35a\",\"height\":1563},{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/470700000c7b871a5fae\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/470700000c7b871a5fae\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470700000c7b871a5fae\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/470700000c7b871a5fae\"}],\"uri\":\"origin\\/470700000c7b871a5fae\",\"height\":1575}],\"max_img_width\":1178,\"labels\":[],\"sub_abstracts\":[\" \",\" \",\" \",\" \",\" \",\" \",\" \",\" \"],\"sub_titles\":[\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\"]}

可以看出已经被转义过了,所以无法用

data = json.loads(str)

转成 python 对象,现在找了两个办法:

1.从网上搜到有 demjson 库

data = json.loads(demjson.decode(str))

2. 对 str 正则替换,但是替换起来好像不是那么好写,有可能换掉了正确的数据。

还有没有其他办法,初学python,求教。

阅读 3.9k
5 个回答
import re
import json

data = your string
pattern = "\\\""
data = re.sub(pattern, r'"', data)
pattern = "\\\\/"
data = re.sub(pattern, r"/", data)
data_json = json.loads(data)
print data_json
新手上路,请多包涵

所有的替换成空试试

亲测有效

import re
import json

jsonStr = '{\"count\":8,\"sub_images\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470700000c7084773fb2\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470700000c7084773fb2\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470700000c7084773fb2\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470700000c7084773fb2\"}],\"uri\":\"origin\\/470700000c7084773fb2\",\"height\":1590},{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/47050001b69355a0bf1b\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/47050001b69355a0bf1b\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/47050001b69355a0bf1b\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/47050001b69355a0bf1b\"}],\"uri\":\"origin\\/47050001b69355a0bf1b\",\"height\":1557},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470300020761150d671a\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470300020761150d671a\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470300020761150d671a\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470300020761150d671a\"}],\"uri\":\"origin\\/470300020761150d671a\",\"height\":1552},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/47000002200f2a0a9020\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/47000002200f2a0a9020\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/47000002200f2a0a9020\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/47000002200f2a0a9020\"}],\"uri\":\"origin\\/47000002200f2a0a9020\",\"height\":1575},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/470000022011d5569ccb\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/470000022011d5569ccb\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/470000022011d5569ccb\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470000022011d5569ccb\"}],\"uri\":\"origin\\/470000022011d5569ccb\",\"height\":1588},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/4700000220127db96444\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/4700000220127db96444\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/4700000220127db96444\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/4700000220127db96444\"}],\"uri\":\"origin\\/4700000220127db96444\",\"height\":1561},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/46ff000532e33a9fa35a\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"}],\"uri\":\"origin\\/46ff000532e33a9fa35a\",\"height\":1563},{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/470700000c7b871a5fae\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/470700000c7b871a5fae\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470700000c7b871a5fae\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/470700000c7b871a5fae\"}],\"uri\":\"origin\\/470700000c7b871a5fae\",\"height\":1575}],\"max_img_width\":1178,\"labels\":[],\"sub_abstracts\":[\" \",\" \",\" \",\" \",\" \",\" \",\" \",\" \"],\"sub_titles\":[\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\"]}'

jsonSt2 = re.sub(r'\\{1,2}', '',jsonStr)

jsonObj = json.loads(jsonStr)
print(jsonObj)
print(jsonObj.get('count'))

Quite simple:

>>> print '"Hello,\\nworld!"'.decode('string_escape')
"Hello,
world!"

>>> data = json.loads('{\"count\":8,\"sub_images\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470700000c7084773fb2\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470700000c7084773fb2\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470700000c7084773fb2\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470700000c7084773fb2\"}],\"uri\":\"origin\\/470700000c7084773fb2\",\"height\":1590},{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/47050001b69355a0bf1b\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/47050001b69355a0bf1b\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/47050001b69355a0bf1b\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/47050001b69355a0bf1b\"}],\"uri\":\"origin\\/47050001b69355a0bf1b\",\"height\":1557},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470300020761150d671a\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/470300020761150d671a\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470300020761150d671a\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470300020761150d671a\"}],\"uri\":\"origin\\/470300020761150d671a\",\"height\":1552},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/47000002200f2a0a9020\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/47000002200f2a0a9020\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/47000002200f2a0a9020\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/47000002200f2a0a9020\"}],\"uri\":\"origin\\/47000002200f2a0a9020\",\"height\":1575},{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/470000022011d5569ccb\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p1.pstatp.com\\/origin\\/470000022011d5569ccb\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/470000022011d5569ccb\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/470000022011d5569ccb\"}],\"uri\":\"origin\\/470000022011d5569ccb\",\"height\":1588},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/4700000220127db96444\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/4700000220127db96444\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/4700000220127db96444\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/4700000220127db96444\"}],\"uri\":\"origin\\/4700000220127db96444\",\"height\":1561},{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/46ff000532e33a9fa35a\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p3.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"},{\"url\":\"http:\\/\\/pb9.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/46ff000532e33a9fa35a\"}],\"uri\":\"origin\\/46ff000532e33a9fa35a\",\"height\":1563},{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/470700000c7b871a5fae\",\"width\":1178,\"url_list\":[{\"url\":\"http:\\/\\/p9.pstatp.com\\/origin\\/470700000c7b871a5fae\"},{\"url\":\"http:\\/\\/pb1.pstatp.com\\/origin\\/470700000c7b871a5fae\"},{\"url\":\"http:\\/\\/pb3.pstatp.com\\/origin\\/470700000c7b871a5fae\"}],\"uri\":\"origin\\/470700000c7b871a5fae\",\"height\":1575}],\"max_img_width\":1178,\"labels\":[],\"sub_abstracts\":[\" \",\" \",\" \",\" \",\" \",\" \",\" \",\" \"],\"sub_titles\":[\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\",\"\\u6e05\\u65b0\\u81ea\\u7136\\uff0c\\u7f8e\\u4e3d\\u65e0\\u53cc\"]}'.decode('string_escape'))
>>> 
>>> data["count"]
8
>>> 
新手上路,请多包涵

这个是今日头条的街拍美图把 直接把‘’全部换成空(‘’)就可以了 然后就可以json.loads

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题