Introduction
When doing scientific computing, we need to load data from the outside. Today, I will introduce you to genfromtxt, a very useful method in NumPy. genfromtxt can be decomposed into two steps. The first step is to read data from a file and convert it into a string. The second step is to convert the string into the specified data type.
Introduction to genfromtxt
First look at the definition of genfromtxt:
numpy.genfromtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, skip_header=0, skip_footer=0, converters=None, missing_values=None, filling_values=None, usecols=None, names=None, excludelist=None, deletechars=" !#$%&'()*+, -./:;<=>?@[\]^{|}~", replace_space='_', autostrip=False, case_sensitive=True, defaultfmt='f%i', unpack=None, usemask=False, loose=True, invalid_raise=True, max_rows=None, encoding='bytes')
genfromtxt can accept multiple parameters. Among so many parameters, only fname is a required parameter, and the others are optional.
fname can have many forms, it can be file, str, pathlib.Path, list of str, or generator .
If it is a single str, the default is the name of the local or remote file. If it is a list of str, then each str is treated as a line of data in the file. If the incoming file is a remote file, this file will be automatically downloaded to the local directory.
genfromtxt can also automatically identify whether a file is a compression type, and currently supports two compression types: gzip and bz2.
Next we look at the common applications of genfromtxt:
Before use, you usually need to import two libraries:
from io import StringIO
import numpy as np
StringIO will generate a String object, which can be used as input to genfromtxt.
We first define a StringIO containing different types:
s = StringIO(u"1,1.3,abcde")
This StringIO contains an int, a float and a str. And the delimiter is ,
.
Let's look at the simplest use of genfromtxt:
In [65]: data = np.genfromtxt(s)
In [66]: data
Out[66]: array(nan)
Because the default delimiter is delimiter=None, the data in StringIO will be converted into an array as a whole, and the result is nan.
Below we add a comma separator:
In [67]: _ = s.seek(0)
In [68]: data = np.genfromtxt(s,delimiter=",")
In [69]: data
Out[69]: array([1. , 1.3, nan])
This time there is output, but because the last string cannot be converted to float, nan is obtained.
Note that we need to reset the StringIO pointer to the beginning of the file in the first line. Here we use s.seek(0).
So how to also convert the last str? We need to manually specify the dtype:
In [74]: _ = s.seek(0)
In [75]: data = np.genfromtxt(s,dtype=float,delimiter=",")
In [76]: data
Out[76]: array([1. , 1.3, nan])
Above we specified that all array types are floats, and we can also specify the types for each element of the array separately:
In [77]: _ = s.seek(0)
In [78]: data = np.genfromtxt(s,dtype=[int,float,'S5'],delimiter=",")
In [79]: data
Out[79]: array((1, 1.3, b'abcde'), dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<U')])
We use int, float and str respectively to convert the type in the file, and we can see that the correct result is obtained.
In addition to specifying the type, we can also specify the name. In the above example, we did not specify the name, so the default f0, f1, and f2 are used. Look at an example of specifying a name:
In [214]: data = np.genfromtxt(s, dtype="i8,f8,S5",names=['myint','myfloat','mystring'], delimiter=",")
In [215]: data
Out[215]:
array((1, 1.3, b'abcde'),
dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])
In addition to using characters as separators, you can also use index:
In [216]: s = StringIO(u"11.3abcde")
In [217]: data = np.genfromtxt(s, dtype=None, names=['intvar','fltvar','strvar'],
...: delimiter=[1,3,5])
In [218]: data
Out[218]:
array((1, 1.3, b'abcde'),
dtype=[('intvar', '<i8'), ('fltvar', '<f8'), ('strvar', 'S5')])
Above we use index as the division of s.
Multidimensional Arrays
If there are line breaks in the data, you can use genfromtxt to generate a multidimensional array:
>>> data = u"1, 2, 3\n4, 5, 6"
>>> np.genfromtxt(StringIO(data), delimiter=",")
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
autostrip
Use autostrip
to delete the spaces on both sides of the data:
>>> data = u"1, abc , 2\n 3, xxx, 4"
>>> # Without autostrip
>>> np.genfromtxt(StringIO(data), delimiter=",", dtype="|U5")
array([['1', ' abc ', ' 2'],
['3', ' xxx', ' 4']], dtype='<U5')
>>> # With autostrip
>>> np.genfromtxt(StringIO(data), delimiter=",", dtype="|U5", autostrip=True)
array([['1', 'abc', '2'],
['3', 'xxx', '4']], dtype='<U5')
comments
The default comments is #, and all the data beginning with # are regarded as comments.
>>> data = u"""#
... # Skip me !
... # Skip me too !
... 1, 2
... 3, 4
... 5, 6 #This is the third line of the data
... 7, 8
... # And here comes the last line
... 9, 0
... """
>>> np.genfromtxt(StringIO(data), comments="#", delimiter=",")
array([[1., 2.],
[3., 4.],
[5., 6.],
[7., 8.],
[9., 0.]])
Skip rows and select columns
You can use skip_header
and skip_footer
to skip specific rows of the returned array:
>>> data = u"\n".join(str(i) for i in range(10))
>>> np.genfromtxt(StringIO(data),)
array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
>>> np.genfromtxt(StringIO(data),
... skip_header=3, skip_footer=5)
array([ 3., 4.])
You can use usecols
to select a specific number of rows:
>>> data = u"1 2 3\n4 5 6"
>>> np.genfromtxt(StringIO(data), usecols=(0, -1))
array([[ 1., 3.],
[ 4., 6.]])
If the column still has a name, you can use usecols
to select the name of the column:
>>> data = u"1 2 3\n4 5 6"
>>> np.genfromtxt(StringIO(data),
... names="a, b, c", usecols=("a", "c"))
array([(1.0, 3.0), (4.0, 6.0)],
dtype=[('a', '<f8'), ('c', '<f8')])
>>> np.genfromtxt(StringIO(data),
... names="a, b, c", usecols=("a, c"))
array([(1.0, 3.0), (4.0, 6.0)],
dtype=[('a', '<f8'), ('c', '<f8')])
This article has been included in http://www.flydean.com/06-python-numpy-genfromtxt/The most popular interpretation, the most profound dry goods, the most concise tutorial, and many tips you don't know are waiting for you to discover!
Welcome to pay attention to my official account: "Program those things", know technology, know you better!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。