1

background

Data structure refers to a collection of data elements with structural characteristics. In the data structure, the data are linked together through a certain organizational structure, which is convenient for computer storage and use. Divided from broad categories, data structures can be divided into linear structures and non-linear structures, which are suitable for different application scenarios.

  • Linear structure:

As the most commonly used data structure, linear structure is characterized by a one-to-one linear relationship between individual data. Contains two different storage structures: sequential storage structure and chain storage structure. The linear table stored sequentially is called the sequential table, and the storage elements in the sequential table are continuous.

(Linear structure)

The linear table of chain storage is called a linked list. The storage elements in the linked list are not necessarily continuous. The data elements and the address information of adjacent elements are stored in the element nodes.

Common linear structures are: arrays, queues, linked lists, and stacks.

  • Non-linear structure:

Except for linear structures, other data structures are non-linear structures, which are characterized by multiple correspondences between single data. Common ones are: two-dimensional arrays, multi-dimensional arrays, generalized tables, tree structures, and graph structures.

(Common non-linear structure)

Sparse Array

Among the various data structures, the most basic and most commonly used is an array. Arrays can be very intuitive to represent the relationship of data in one-dimensional or multi-dimensional space, which is closer to the actual situation, so it is regarded as the preferred data structure by most programmers. However, when using arrays to store data in some application scenarios, it will In various situations, it is necessary to optimize the data structure based on the array to derive new data structures such as sparse arrays.

Taking the gomoku game as an example, how should we store the moves on the board?

(Use a two-digit array to store the five-piece chess board)

If a two-dimensional array is used to store chessboard moves, when we get a chessboard data content, most of the content is meaningless 0, meaningful data are not adjacent, and a lot of space is wasted. For Gobang, this problem may not be very obvious, but if the board is large enough, the wasted space will affect the function implementation of the software. At this time, the introduction of a sparse array (SparseArray) is of great significance.

The sparse array compresses the contents of the array and stores it in a more refined two-dimensional array. The essence of the sparse array is to replace space with time.

The specific processing method is:

  1. There are a total of several rows and columns in the array to record
  2. After ignoring the content of the same element, only the positions with different content units are recorded

Implementation of sparse array

Saving storage space is obviously an advantage of sparse arrays, but can the read performance be much worse than two-dimensional arrays?

In order to clarify this issue, we can first look at the implementation logic of SparseArray in Android. SparseArray internally uses two arrays for data storage. One stores the key and the other stores the value. We can see from the source code that the key and value are each represented by an array:

  private int[] mKeys;

   private Object[] mValues;

At the same time, SparseArray uses a binary search method when storing and reading data:

 public void put(int key, E value) {
        int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
        ...
        }
 public E get(int key, E valueIfKeyNotFound) {
        int i = ContainerHelpers.binarySearch(mKeys, mSize, key);
        ...
        }

When adding data in put, the binary search method will be used to compare the size of the key of the element we are currently adding with the previous key, and then arranged in ascending order. Therefore, the elements stored in SparseArray are arranged from small to large according to the key value of the element. When acquiring data, the binary search method is also used to determine the position of the element, which can make the data acquisition more efficient. Therefore, when the amount of key data (which can be understood as the number of chess pieces on the board after removing the blanks) is not large, the read performance of the sparse array is guaranteed.

Typical application scenarios

Anyone doing development knows that the easiest way to make the system faster is to add memory. A large number of caches can be used to speed up the program, which is the so-called "space for time". However, in certain environments, the memory available to the program is limited.

On mobile devices, memory is a scarce resource. For example, the iPhone 7 has 2G of memory, while the latest iPhone 13 is only 4G. Therefore, the "time for space" technology of sparse arrays was first widely used in the field of mobile development.

In addition to the mobile terminal, another operating environment where memory is scarce is the browser. Although there is no explicit stipulation, in the common understanding of the industry, browsers impose memory restrictions on a single thread, such as 64-bit chrome, and the memory consumption of each tab page is not allowed to exceed 4G. This limitation will not be a problem more than ten years ago when single-page applications were still immature. Because at that time, what everyone was concerned about was how to improve the processing performance of the back-end. The front-end was just a static web page expression.

With the rapid development of front-end engineering, various front-end engineering scaffolds have matured, the WebComponent standard has been put on the agenda, and enterprises have begun to transform from C/S to B/S applications. This requires front-end developers to face the challenge of processing complex business data on a single page. The front-end program is designed from the very beginning and the memory usage needs to be considered during the entire development process, so as to reduce the memory usage as much as possible to prevent the webpage from crashing. Taking the front-end spreadsheet as an example, we usually need to provide users with millions of cells (100 columns x 10,000 rows), but there may be only a few hundred cells with data. In order to reduce the memory occupied by the data model, our final solution is to change the data storage method of the table from a regular array to a sparse array. The memory usage can be reduced to a few tenths to ensure that the browser memory will not be overwhelmed.

(Sparse matrix storage strategy)

Not just "time for space";

Compared with traditional chain storage or array storage, sparse matrix storage constructs a data dictionary based on index Key. In the loosely laid out table data, the sparse matrix only stores non-empty data, and does not need to open up additional memory space for empty data.

Using this special storage strategy, in addition to reducing memory usage, it also makes data fragmentation easier. You can frame a piece of data in the entire data layer at any time for serialization or deserialization without having to deal with the same data structure. Other data.

Borrowing this feature, we can replace or restore any level of nodes in the entire storage structure at any time, and efficiently solve the rollback and restoration of table data by changing the reference method, and this is also the technical basis for the electronic table to support online collaboration.

Summarize

This section introduces the basic knowledge, technical implementation and application scenarios of sparse arrays. Taking the front-end spreadsheet as an example, it demonstrates the advantages of this technology in saving memory space and implementing rollback recovery.

We will continue to introduce more serious and interesting content to you in the follow-up~

I think it's good, just like it and let's go~

Please indicate the source for reprinting: Grape City official website , Grape City provides developers with professional development tools, solutions and services, and empowers developers.

葡萄城技术团队
2.7k 声望28.5k 粉丝

葡萄城创建于1980年,是专业的软件开发技术和低代码平台提供商。以“赋能开发者”为使命,葡萄城致力于通过各类软件开发工具和服务,创新开发模式,提升开发效率,推动软件产业发展,为“数字中国”建设提速。