新手上路，请多包涵

我有一个这种形式的大量 geo json：

  {'features': [{'properties': {'MARKET': 'Albany',
    'geometry': {'coordinates': [[[-74.264948, 42.419877, 0],
       [-74.262041, 42.425856, 0],
       [-74.261175, 42.427631, 0],
       [-74.260384, 42.429253, 0]]],
     'type': 'Polygon'}}},
  {'properties': {'MARKET': 'Albany',
    'geometry': {'coordinates': [[[-73.929627, 42.078788, 0],
       [-73.929114, 42.081658, 0]]],
     'type': 'Polygon'}}},
  {'properties': {'MARKET': 'Albuquerque',
    'geometry': {'coordinates': [[[-74.769198, 43.114089, 0],
       [-74.76786, 43.114496, 0],
       [-74.766474, 43.114656, 0]]],
     'type': 'Polygon'}}}],
 'type': 'FeatureCollection'}

读取 json 后：

 import json
with open('x.json') as f:
    data = json.load(f)

我将值读入列表，然后读入数据框：

 #to get a list of all markets
mkt=set([f['properties']['MARKET'] for f in data['features']])

#to create a list of market and associated lat long
markets=[(market,list(chain.from_iterable(f['geometry']['coordinates']))) for f in data['features'] for market in mkt if f['properties']['MARKET']==mkt]

df = pd.DataFrame(markets[0:], columns=['a','b'])

df 的前几行是：

       a       b
0   Albany  [[-74.264948, 42.419877, 0], [-74.262041, 42.4...
1   Albany  [[-73.929627, 42.078788, 0], [-73.929114, 42.0...
2   Albany  [[-74.769198, 43.114089, 0], [-74.76786, 43.11...

然后为了解除 b 列中的嵌套列表，我使用 pandas concat ：

 df1 = pd.concat([df.iloc[:,0:1], df['b'].apply(pd.Series)], axis=1)

但这是创建了 8070 个包含许多 NaN 的列。有没有办法按市场（a 列）对所有纬度和经度进行分组？需要一百万行乘以两列数据框。

所需的操作是：

 mkt         lat         long
Albany      42.419877   -74.264948
Albany      42.078788   -73.929627
..
Albuquerque  35.105361   -106.640342

请注意，需要忽略列表元素 ([-74.769198, 43.114089, 0]) 中的零。

原文由 skrubber 发布，翻译遵循 CC BY-SA 4.0 许可协议

python python-3.x pandas geojson

阅读 944

2 个回答

得票最新

社区维基

发布于
2022-11-17

✓ 已被采纳

像这样的东西？？

 from pandas.io.json import json_normalize
df = json_normalize(geojson["features"])

coords = 'properties.geometry.coordinates'

df2 = (df[coords].apply(lambda r: [(i[0],i[1]) for i in r[0]])
           .apply(pd.Series).stack()
           .reset_index(level=1).rename(columns={0:coords,"level_1":"point"})
           .join(df.drop(coords,1), how='left')).reset_index(level=0)

df2[['lat','long']] = df2[coords].apply(pd.Series)

df2

输出：

    index  point properties.geometry.coordinates properties.MARKET  \
0      0      0         (-74.264948, 42.419877)            Albany
1      0      1         (-74.262041, 42.425856)            Albany
2      0      2         (-74.261175, 42.427631)            Albany
3      0      3         (-74.260384, 42.429253)            Albany
4      1      0         (-73.929627, 42.078788)            Albany
5      1      1         (-73.929114, 42.081658)            Albany
6      2      0         (-74.769198, 43.114089)       Albuquerque
7      2      1          (-74.76786, 43.114496)       Albuquerque
8      2      2         (-74.766474, 43.114656)       Albuquerque

  properties.geometry.type        lat       long
0                  Polygon -74.264948  42.419877
1                  Polygon -74.262041  42.425856
2                  Polygon -74.261175  42.427631
3                  Polygon -74.260384  42.429253
4                  Polygon -73.929627  42.078788
5                  Polygon -73.929114  42.081658
6                  Polygon -74.769198  43.114089
7                  Polygon -74.767860  43.114496
8                  Polygon -74.766474  43.114656

如果：

 geojson = {'features': [{'properties': {'MARKET': 'Albany',
    'geometry': {'coordinates': [[[-74.264948, 42.419877, 0],
       [-74.262041, 42.425856, 0],
       [-74.261175, 42.427631, 0],
       [-74.260384, 42.429253, 0]]],
     'type': 'Polygon'}}},
  {'properties': {'MARKET': 'Albany',
    'geometry': {'coordinates': [[[-73.929627, 42.078788, 0],
       [-73.929114, 42.081658, 0]]],
     'type': 'Polygon'}}},
  {'properties': {'MARKET': 'Albuquerque',
    'geometry': {'coordinates': [[[-74.769198, 43.114089, 0],
       [-74.76786, 43.114496, 0],
       [-74.766474, 43.114656, 0]]],
     'type': 'Polygon'}}}],
 'type': 'FeatureCollection'}

原文由 Anton vBR 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

发布于
2022-11-17

@Anton_vBR 给出了很好的答案！

但是，也可以考虑将“geopandas”库作为替代方案：

 import geopandas

df = geopandas.read_file("yourfile.geojson")

其中 df 将是“class geopandas.GeoDataFrame”，这将允许您像普通 pandas 的 DataFrame 一样操作 geojson（通过内部结构递归）

原文由 Kamal Barshevich 发布，翻译遵循 CC BY-SA 4.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

将带有嵌套列表的 Geo json 转换为 pandas 数据框

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

如何使用 python 代码实现迅雷磁力链接资源的下载？

请问，FastAPI如何获取到前端上传的二进制文件并且返回？

如何实现一个深拷贝函数？

浏览器能请求到数据怎么换了api工具或是爬虫都没数据了呢？

Three.js加载中国GeoJSON行政边界，线条混乱问题如何解决？

Stack Overflow 翻译