新手上路，请多包涵

我已经用wget下载了数据

!wget http://nlp.stanford.edu/data/glove.6B.zip
 - ‘glove.6B.zip’ saved [862182613/862182613]

它保存为 zip，我想使用 zip 文件中的 glove.6B.300d.txt 文件。我想要实现的是：

 embeddings_index = {}
with io.open('glove.6B.300d.txt', encoding='utf8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:],dtype='float32')
        embeddings_index[word] = coefs

当然我有这个错误：

 IOErrorTraceback (most recent call last)
<ipython-input-47-d07cafc85c1c> in <module>()
      1 embeddings_index = {}
----> 2 with io.open('glove.6B.300d.txt', encoding='utf8') as f:
      3     for line in f:
      4         values = line.split()
      5         word = values[0]

IOError: [Errno 2] No such file or directory: 'glove.6B.300d.txt'

我如何在 Google colab 上面的代码中解压缩并使用该文件？

原文由 beginner 发布，翻译遵循 CC BY-SA 4.0 许可协议

python google-colaboratory word-embedding

阅读 844

2 个回答

得票最新

社区维基

发布于
2023-01-10

✓ 已被采纳

很简单，查看来自 SO 的旧帖子。

 import zipfile
zip_ref = zipfile.ZipFile(path_to_zip_file, 'r')
zip_ref.extractall(directory_to_extract_to)
zip_ref.close()

原文由 Sidharth Shah 发布，翻译遵循 CC BY-SA 3.0 许可协议

社区维基

发布于
2023-01-10

您可以执行的另一种方法如下。

1. 下载 zip 文件

!wget http://nlp.stanford.edu/data/glove.6B.zip

下载后将其保存在 google Collab 的 /content 目录中的 zip 文件。

2.解压

!unzip glove*.zip

3.使用提取嵌入向量的确切路径

!ls
!pwd

4.索引向量

print('Indexing word vectors.')

embeddings_index = {}
f = open('glove.6B.100d.txt', encoding='utf-8')
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()

print('Found %s word vectors.' % len(embeddings_index))

5. 与 google-drive 融合

!pip install --upgrade pip
!pip install -U -q pydrive
!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null

!apt-get -y install -qq google-drive-ocamlfuse fuse

from google.colab import auth
auth.authenticate_user()
# Generate creds for the Drive FUSE library.
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

!mkdir -p drive
!google-drive-ocamlfuse drive

6.将索引向量保存到谷歌驱动器以供重复使用

import pickle
pickle.dump({'embeddings_index' : embeddings_index } , open('drive/path/to/your/file/location', 'wb'))

如果您已经在本地系统中下载了 zip 文件，只需将其解压缩并将所需的维度文件上传到 google drive -> fuse gdrive -> 提供适当的路径，然后使用它/对其进行索引等。

另外，另一种方法是，如果已经通过 collab 中的代码下载到本地系统

from google.colab import files
files.upload()

选择文件并按照步骤 3 中的步骤使用它。

这就是你如何在 google collaboratory 中使用手套词嵌入。希望能帮助到你。

原文由 Akson 发布，翻译遵循 CC BY-SA 4.0 许可协议

撰写回答

你尚未登录，登录后可以

和开发者交流问题的细节
关注并接收问题和回答的更新提醒
参与内容的编辑和改进，让解决方法与时俱进

推荐问题

如何在 Google colaboratory 上使用 GloVe 词嵌入文件

1. 下载 zip 文件

2.解压

3.使用提取嵌入向量的确切路径

4.索引向量

5. 与 google-drive 融合

6.将索引向量保存到谷歌驱动器以供重复使用

你尚未登录，登录后可以

请问： Python中是否有方式可以像前端的TSLint一样进行代码的自动风格格式检查？

Qt中布局是否只有5种呢？

请问一下Python 可以进行强类型开发吗？

这段代码为什么不能获取到数据？

请问一下，如何理解reduce函数呢？

python中最好的单元测试是使用的什么呢？

如何使用Python+Selenium爬取Goodreads上万条书评而不崩溃？

Stack Overflow 翻译