Python BigQuery API - 获取表架构

新手上路,请多包涵

我正在尝试从 bigquery 表中获取模式。给定一个示例代码

from google.cloud import bigquery
from google.cloud import storage

client =  bigquery.Client.from_service_account_json('service_account.json')

def test_extract_schema(client):
    project = 'bigquery-public-data'
    dataset_id = 'samples'
    table_id = 'shakespeare'

    dataset_ref = client.dataset(dataset_id, project=project)
    table_ref = dataset_ref.table(table_id)
    table = client.get_table(table_ref)  # API Request

    # View table properties
    print(table.schema)

if __name__ == '__main__':
    test_extract_schema(client)

这是返回值,如:

[SchemaField('word', 'STRING', 'REQUIRED', 'A single unique word (where whitespace is the delimiter) extracted from a corpus.', ()), SchemaField('word_count', 'INTEGER', 'REQUIRED', 'The number of times this word appears in this corpus.', ()), SchemaField('corpus', 'STRING', 'REQUIRED', 'The work from which this word was extracted.', ()), SchemaField('corpus_date', 'INTEGER', 'REQUIRED', 'The year in which this corpus was published.', ())]

我试图仅以类似格式捕获模式的地方

'word' 'STRING','word_count' INTEGER'

有没有办法使用 API 调用或任何其他方法来获取它?

原文由 Sandeep Singh 发布,翻译遵循 CC BY-SA 4.0 许可协议

阅读 537
2 个回答

您始终可以获得 table.schema 变量并对其进行迭代,因为该表是由 SchemaField 值组成的列表:

 result = ["{0} {1}".format(schema.name,schema.field_type) for schema in table.schema]

同一数据集和表的结果:

 ['word STRING', 'word_count INTEGER', 'corpus STRING', 'corpus_date INTEGER']

原文由 Mangu 发布,翻译遵循 CC BY-SA 4.0 许可协议

另一种方法是,在您拥有客户端和表实例之后,执行如下操作:

 import io
f = io.StringIO("")
client.schema_to_json(table.schema, f)
print(f.getvalue())

原文由 Jose B 发布,翻译遵循 CC BY-SA 4.0 许可协议