Introduction

An ordinary array means that objects of the same type are stored in the array. The structured array is a format for storing different objects in the index group.

Today we will discuss in detail structured arrays in NumPy.

Field in a structured array

Because the structured array contains different types of objects, each object type is called a field.

Each field has 3 parts, namely: string type name, any valid dtype type type, and an optional title .

Look at an example of using filed to build dtype:

In [165]: np.dtype([('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
Out[165]: dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f4')])

We can use the above dtype type to build a new array:

In [166]: x = np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)],
     ...:     dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
     ...:

In [167]: x
Out[167]:
array([('Rex', 9, 81.), ('Fido', 3, 27.)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f4')])

x is a 1-dimensional array, each element contains three fields, name, age and weight. And specify their data types separately.

A row of data can be accessed through index:

In [168]: x[1]
Out[168]: ('Fido', 3, 27.)

You can also access a column of data by name:

In [170]: x['name']
Out[170]: array(['Rex', 'Fido'], dtype='<U10')

You can also assign values to all columns uniformly:

In [171]: x['age']
Out[171]: array([9, 3], dtype=int32)

In [172]: x['age'] = 10

In [173]: x
Out[173]:
array([('Rex', 10, 81.), ('Fido', 10, 27.)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f4')])

Structured data type

The above example gives us a basic understanding of structured data types. The structured data type is a collection of a series of files.

Create structured data types

Structured data types are created from basic types, mainly in the following ways:

Create from tuple

Each tuple is in the format (fieldname, datatype, shape), where shape is optional. fieldname is the title of the field.

In [174]: np.dtype([('x', 'f4'), ('y', np.float32), ('z', 'f4', (2, 2))])
Out[174]: dtype([('x', '<f4'), ('y', '<f4'), ('z', '<f4', (2, 2))])

If the fieldname is a null character, it will be created by default starting with f.

In [177]: np.dtype([('x', 'f4'), ('', 'i4'), ('z', 'i8')])
Out[177]: dtype([('x', '<f4'), ('f1', '<i4'), ('z', '<i8')])

Created from comma separated dtype

You can choose to create from comma-separated dtype types:

In [178]: np.dtype('i8, f4, S3')
Out[178]: dtype([('f0', '<i8'), ('f1', '<f4'), ('f2', 'S3')])

In [179]: np.dtype('3int8, float32, (2, 3)float64')
Out[179]: dtype([('f0', 'i1', (3,)), ('f1', '<f4'), ('f2', '<f8', (2, 3))])

Create from dictionary

Created from a dictionary is in this format: {'names': ...,'formats': ...,'offsets': ...,'titles': ...,'itemsize': ...}

This way of writing can specify the name list and the formats list.

offsets refers to the byte offsets of each field. titles is the title of the field, itemsize is the size of the entire dtype.

In [180]: np.dtype({'names': ['col1', 'col2'], 'formats': ['i4', 'f4']})
Out[180]: dtype([('col1', '<i4'), ('col2', '<f4')])

In [181]: np.dtype({'names': ['col1', 'col2'],
     ...: ...           'formats': ['i4', 'f4'],
     ...: ...           'offsets': [0, 4],
     ...: ...           'itemsize': 12})
     ...:
Out[181]: dtype({'names':['col1','col2'], 'formats':['<i4','<f4'], 'offsets':[0,4], 'itemsize':12})

Manipulate structured data types

The attributes of the structured data type can be accessed through the names and fields of the dtype:

>>> d = np.dtype([('x', 'i8'), ('y', 'f4')])
>>> d.names
('x', 'y')
>>> d.fields
mappingproxy({'x': (dtype('int64'), 0), 'y': (dtype('float32'), 8)})

Offsets and Alignment

For structured types, because a dtype contains multiple data types, these data types are not aligned by default.

We can look at each type of offset through the following example:

>>> def print_offsets(d):
...     print("offsets:", [d.fields[name][1] for name in d.names])
...     print("itemsize:", d.itemsize)
>>> print_offsets(np.dtype('u1, u1, i4, u1, i8, u2'))
offsets: [0, 1, 2, 6, 7, 15]
itemsize: 17

If align=True is specified when creating the dtype type, then these types may be aligned according to the structure of the C-struct.

The advantage of alignment is that it can improve processing efficiency. Let's look at an example of alignment:

>>> print_offsets(np.dtype('u1, u1, i4, u1, i8, u2', align=True))
offsets: [0, 1, 4, 8, 16, 24]
itemsize: 32

Field Titles

In addition to the name, each Filed can also contain a title.

There are two ways to specify the title, the first way:

In [182]: np.dtype([(('my title', 'name'), 'f4')])
Out[182]: dtype([(('my title', 'name'), '<f4')])

The second way:

In [183]: np.dtype({'name': ('i4', 0, 'my title')})
Out[183]: dtype([(('my title', 'name'), '<i4')])

Take a look at the structure of fields:

In [187]: d.fields
Out[187]:
mappingproxy({'my title': (dtype('float32'), 0, 'my title'),
              'name': (dtype('float32'), 0, 'my title')})

Structured array

After creating a structured array from a structured data type, we can operate on the structured array.

Assignment

We can assign values to structured arrays from tuples:

>>> x = np.array([(1, 2, 3), (4, 5, 6)], dtype='i8, f4, f8')
>>> x[1] = (7, 8, 9)
>>> x
array([(1, 2., 3.), (7, 8., 9.)],
     dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '<f8')])

You can also assign a value to a structured array from a scalar:

>>> x = np.zeros(2, dtype='i8, f4, ?, S1')
>>> x[:] = 3
>>> x
array([(3, 3., True, b'3'), (3, 3., True, b'3')],
      dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')])
>>> x[:] = np.arange(2)
>>> x
array([(0, 0., False, b'0'), (1, 1., True, b'1')],
      dtype=[('f0', '<i8'), ('f1', '<f4'), ('f2', '?'), ('f3', 'S1')])

Structured arrays can also be assigned to unorganized arrays, but only if the structured array has only one filed:

>>> twofield = np.zeros(2, dtype=[('A', 'i4'), ('B', 'i4')])
>>> onefield = np.zeros(2, dtype=[('A', 'i4')])
>>> nostruct = np.zeros(2, dtype='i4')
>>> nostruct[:] = twofield
Traceback (most recent call last):
...
TypeError: Cannot cast array data from dtype([('A', '<i4'), ('B', '<i4')]) to dtype('int32') according to the rule 'unsafe'

Structured arrays can also assign values to each other:

>>> a = np.zeros(3, dtype=[('a', 'i8'), ('b', 'f4'), ('c', 'S3')])
>>> b = np.ones(3, dtype=[('x', 'f4'), ('y', 'S3'), ('z', 'O')])
>>> b[:] = a
>>> b
array([(0., b'0.0', b''), (0., b'0.0', b''), (0., b'0.0', b'')],
      dtype=[('x', '<f4'), ('y', 'S3'), ('z', 'O')])

Access structured array

As mentioned before, a column of data can be accessed and modified by the name of filed:

>>> x = np.array([(1, 2), (3, 4)], dtype=[('foo', 'i8'), ('bar', 'f4')])
>>> x['foo']
array([1, 3])
>>> x['foo'] = 10
>>> x
array([(10, 2.), (10, 4.)],
      dtype=[('foo', '<i8'), ('bar', '<f4')])

The returned value is a view of the original array, they share memory space, so modifying the view will also modify the original data.

Look at the situation where a filed is a multi-dimensional array:

In [188]: np.zeros((2, 2), dtype=[('a', np.int32), ('b', np.float64, (3, 3))])
Out[188]:
array([[(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]),
        (0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])],
       [(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]),
        (0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])]],
      dtype=[('a', '<i4'), ('b', '<f8', (3, 3))])

A 2 2 matrix is constructed above. The first column of this matrix is of type int, and the second column is a 3 3 float matrix.

We can view the shape value of each column like this:

>>> x = np.zeros((2, 2), dtype=[('a', np.int32), ('b', np.float64, (3, 3))])
>>> x['a'].shape
(2, 2)
>>> x['b'].shape
(2, 2, 3, 3)

In addition to single-column access, we can also access multiple columns of data at once:

>>> a = np.zeros(3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')])
>>> a[['a', 'c']]
array([(0, 0.), (0, 0.), (0, 0.)],
     dtype={'names':['a','c'], 'formats':['<i4','<f4'], 'offsets':[0,8], 'itemsize':12})

Assign multiple columns at the same time:

>>> a[['a', 'c']] = (2, 3)
>>> a
array([(2, 0, 3.), (2, 0, 3.), (2, 0, 3.)],
      dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<f4')])

Simple exchange of column data:

>>> a[['a', 'c']] = a[['c', 'a']]

Record Arrays

Structured arrays can only be accessed through index, which is very inconvenient. For this reason, NumPy provides a multi-dimensional array subclass numpy.recarray, which can then be accessed through attributes.

Let's look at a few examples:

>>> recordarr = np.rec.array([(1, 2., 'Hello'), (2, 3., "World")],
...                    dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])
>>> recordarr.bar
array([ 2.,  3.], dtype=float32)
>>> recordarr[1:2]
rec.array([(2, 3., b'World')],
      dtype=[('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')])
>>> recordarr[1:2].foo
array([2], dtype=int32)
>>> recordarr.foo[1:2]
array([2], dtype=int32)
>>> recordarr[1].baz
b'World'

The result returned by recarray is a rec.array. In addition to using np.rec.array to create, you can also use view:

In [190]: arr = np.array([(1, 2., 'Hello'), (2, 3., "World")],
     ...: ...                dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'a10')])
     ...:

In [191]: arr
Out[191]:
array([(1, 2., b'Hello'), (2, 3., b'World')],
      dtype=[('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')])

In [192]: arr.view(dtype=np.dtype((np.record, arr.dtype)),
     ...: ...                      type=np.recarray)
     ...:
Out[192]:
rec.array([(1, 2., b'Hello'), (2, 3., b'World')],
          dtype=[('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')])

If it is a rec.array object, its dtype type will be automatically converted to np.record type:

In [200]: recordarr.dtype
Out[200]: dtype((numpy.record, [('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')]))

To convert back to the original np.ndarray type, you can do this:

In [202]: recordarr.view(recordarr.dtype.fields or recordarr.dtype, np.ndarray)
Out[202]:
array([(1, 2., b'Hello'), (2, 3., b'World')],
      dtype=[('foo', '<i4'), ('bar', '<f4'), ('baz', 'S10')])

If the field of rec.array object is accessed through index or field, if the field is a structure type, it will return numpy.recarray, if it is a non-structure type, it will return numpy.ndarray:

>>> recordarr = np.rec.array([('Hello', (1, 2)), ("World", (3, 4))],
...                 dtype=[('foo', 'S6'),('bar', [('A', int), ('B', int)])])
>>> type(recordarr.foo)
<class 'numpy.ndarray'>
>>> type(recordarr.bar)
<class 'numpy.recarray'>
This article has been included in http://www.flydean.com/05-python-structured-arrays/

The most popular interpretation, the most profound dry goods, the most concise tutorial, and many tips you don't know are waiting for you to discover!

Welcome to pay attention to my official account: "Program those things", know technology, know you better!


flydean
890 声望433 粉丝

欢迎访问我的个人网站:www.flydean.com