我正在进行图像分类 Kaggle 竞赛,并从 Kaggle.com 下载一些训练图像。然后我在 Keras 2.0 和 Tensorflow 作为背景(和 Python 3)中使用 ResNet50 的迁移学习来处理这些图像。
但是,总共 1281 个火车图像中有 258 个具有“可能损坏的 EXIF 数据”,并且在加载到 ResNet 模型时被忽略,很可能是由于 Pillow 问题。
输出消息如下:
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 524288 bytes but only got 0. Skipping tag 3
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 393216 bytes but only got 0. Skipping tag 3
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 33554432 bytes but only got 0. Skipping tag 4
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 25165824 bytes but only got 0. Skipping tag 4
"Skipping tag %s" % (size, len(data), tag))
/home/shi/anaconda3/lib/python3.6/site-packages/PIL/TiffImagePlugin.py:692: UserWarning: Possibly corrupt EXIF data. Expecting to read 131072 bytes but only got 0. Skipping tag 3
"Skipping tag %s" % (size, len(data), tag))
(more to come ...)
根据输出消息,我只知道它们在那里,但不知道它们是哪些……
我的问题是:如何识别这 258 张图像,以便我可以手动将它们从数据集中删除?
原文由 user3768495 发布,翻译遵循 CC BY-SA 4.0 许可协议
想到的最简单的方法是修改代码以一次处理一张图像,然后遍历每张图像并检查哪一张生成警告。