新手上路，请多包涵

我正在尝试使用 Elasticsearch 的非常基本查询的结果在 pandas 中构建一个 DataFrame。我得到了我需要的数据，但它是以构建适当数据框的方式对结果进行切片的问题。我真的只关心获取每个结果的时间戳和路径。我尝试了几种不同的 es.search 模式。

代码：

 from datetime import datetime
from elasticsearch import Elasticsearch
from pandas import DataFrame, Series
import pandas as pd
import matplotlib.pyplot as plt
es = Elasticsearch(host="192.168.121.252")
res = es.search(index="_all", doc_type='logs', body={"query": {"match_all": {}}}, size=2, fields=('path','@timestamp'))

这给出了 4 个数据块。 [u’hits’, u’_shards’, u’took’, u’timed_out’]。我的结果在命中之内。

 res['hits']['hits']
Out[47]:
[{u'_id': u'a1XHMhdHQB2uV7oq6dUldg',
  u'_index': u'logstash-2014.08.07',
  u'_score': 1.0,
  u'_type': u'logs',
  u'fields': {u'@timestamp': u'2014-08-07T12:36:00.086Z',
   u'path': u'app2.log'}},
 {u'_id': u'TcBvro_1QMqF4ORC-XlAPQ',
  u'_index': u'logstash-2014.08.07',
  u'_score': 1.0,
  u'_type': u'logs',
  u'fields': {u'@timestamp': u'2014-08-07T12:36:00.200Z',
   u'path': u'app1.log'}}]

我唯一关心的是获取时间戳和每次点击的路径。

 res['hits']['hits'][0]['fields']
Out[48]:
{u'@timestamp': u'2014-08-07T12:36:00.086Z',
 u'path': u'app1.log'}

我终其一生都无法弄清楚谁将结果放入熊猫的数据框中。因此，对于我返回的 2 个结果，我希望有一个类似的数据框。

    timestamp                   path
0  2014-08-07T12:36:00.086Z    app1.log
1  2014-08-07T12:36:00.200Z    app2.log

原文由 Justin S 发布，翻译遵循 CC BY-SA 4.0 许可协议

python pandas elasticsearch

阅读 686

2 个回答

得票最新

社区维基

发布于
2023-01-09

✓ 已被采纳

有一个叫做 pd.DataFrame.from_dict 的好玩具，你可以在这样的情况下使用它：

 In [34]:

Data = [{u'_id': u'a1XHMhdHQB2uV7oq6dUldg',
      u'_index': u'logstash-2014.08.07',
      u'_score': 1.0,
      u'_type': u'logs',
      u'fields': {u'@timestamp': u'2014-08-07T12:36:00.086Z',
       u'path': u'app2.log'}},
     {u'_id': u'TcBvro_1QMqF4ORC-XlAPQ',
      u'_index': u'logstash-2014.08.07',
      u'_score': 1.0,
      u'_type': u'logs',
      u'fields': {u'@timestamp': u'2014-08-07T12:36:00.200Z',
       u'path': u'app1.log'}}]
In [35]:

df = pd.concat(map(pd.DataFrame.from_dict, Data), axis=1)['fields'].T
In [36]:

print df.reset_index(drop=True)
                 @timestamp      path
0  2014-08-07T12:36:00.086Z  app2.log
1  2014-08-07T12:36:00.200Z  app1.log

分四步展示：

1，将列表中的每个项目（即 dictionary ）读入 DataFrame

2，我们可以将列表中的所有项目放入一个大的 DataFrame concat 它们按行排列，因为我们将为每个项目执行步骤＃1，我们可以使用 map 去做。

3、然后我们访问标记为 'fields' 的列

4，我们可能希望将 DataFrame 旋转90度（转置）和 reset_index 如果我们希望索引为默认值 int

在此处输入图像描述

原文由 CT Zhu 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

发布于
2023-01-09

或者你可以使用 pandas 的 json_normalize 函数：

 from pandas import json_normalize
# from pandas.io.json import json_normalize
df = json_normalize(res['hits']['hits'])

然后按列名过滤结果数据框

原文由 Brown nightingale 发布，翻译遵循 CC BY-SA 4.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

从 ElasticSearch 结果创建 DataFrame

你尚未登录，登录后可以

字节的 trae AI IDE 不支持类似 vscode 的 ssh remote 远程开发怎么办？

DataCap 中验证码无法显示，后台出现 NullPointerException 错误?

发现深拷贝和浅拷贝效果一致：请问一下有什么区别呢？

如何实现一个深拷贝函数？

Python 成员变量在多个子类实例间共享，如何避免？

为什么 Qwen2.5-Omni-7B 官方教程都报错 Cannot import available module of Qwen2_5OmniModel in modelscope ？

Spark-TTS-0.5B 的 requirements.txt 在哪里？

Stack Overflow 翻译