Data structure and algorithm analysis (3) linear table

Before studying the articles in this series, it is recommended to take a look at the predecessor of this article "Miscellaneous Senses (1)". This article discusses how to learn and learning strategies. I believe it will be helpful to the study of the data structure series.

Preface

When writing this article, I remembered that in the university, when I first studied the course of data structure, I was determined to listen to this course, but I overlooked a few points. The university course inherits the past and learns data. The structure requires familiarity with a high-level language and a certain amount of coding. The obstacles to learning the course of data structure are not so big, but I am not so familiar with the only C language I have learned at the time. The amount of coding caused me to be in a hazy state when I was studying this course, and I was in a state of incomprehension about the course of data structure. Later, after the amount of code grew rapidly, I felt a little more comfortable in learning data structure. So my suggestion is to have a general understanding of a high-level language before studying the articles in this series.

Let’s talk about the style of writing again. I don’t like to write notes on that kind of book and write down the core concepts in the book. What I prefer is to combine these concepts in my own way and add my own understanding.

"Linear Table" in Daily Life

Queuing is a common scene in our daily life. Except for the people at the head and the end of the team, there is only one neighbor before and after each person. A person can be regarded as a node, so the relationship between such nodes is called It is one-to-one, that is, except for the first and last data elements, other data elements are connected end to end. From this we have led to the linear table in the data structure. In the data structure, the structure that has a one-to-one relationship between data elements becomes a linear table. In addition to the first and last data elements, the linear table The other elements are connected end to end.

As we discussed before, discussing data structure is generally discussing logical structure, storage structure, and operations based on this structure.
Let's first discuss the logical structure of the linear table. In fact, we have already introduced the above logical structure. Let's discuss the logical characteristics of the linear table:

There is only one node called the "first", that is, the starting node, the head of the team.
There is only one node called the "last" node, the terminal node, the end of the team.
Except for the first one, each node in the set has only one predecessor, and there is only one person in front of each person in the line.
Except for the last one, each node in the set has only one successor. From the queuing in daily life, it means that for every person in the line, there is a person behind
You see, we compare these abstract data structures with the corresponding scenes in reality. Isn't it difficult to understand these relatively abstract things?

Discuss the three elements of data structure again

Let's recall here the three elements of data structure in the basic concept of data structure mentioned in "Data Structure and Algorithm Analysis (1)":

The logical structure of the data
Data storage structure
Data calculation

The logical structure of the data describes the previous connection of the data and is independent of the computer. The logical structure of the linear table we discussed above belongs to the linear logical structure and belongs to "one-to-one". The one-to-one mentioned here refers to The above mentioned logical characteristics are described. The data structure can be roughly divided into linear structure and non-linear structure from the logical structure. In fact, it can also be divided in more detail, like the following:

It seems that when we talk about data structures, we are mainly talking about the logical structure of data structures.

Next, let’s discuss the storage structure of data. The storage structure of data is also called physical structure. It is the way data and its logical structure are represented in the computer. The concrete realization uses the data type in the computer language to describe.

Let's think about what should be stored in the stored data? How to store the data, storing data not only save the value of the data itself, but also save the connection between the data, so that all the information related to the problem can be stored intact. The purpose of storing the connection between data is to find the elements connected to it through this connection. Secondly, our purpose of storing data in the computer is to process them. If it is stored in the machine but cannot be found when it is to be used, the storage of the data loses its meaning. The meaning of "find" here is to be able to find Therefore, the design of the storage structure of the data structure should be based on the following two principles:

Save value, save contact
Keep in, take out
As we all know, after our program runs, it is loaded into memory. In most cases, the data when the program is running resides in the memory. So how do we look at the memory? My idea is that we should abstract the memory. , The memory can be understood as an array. The subscript of the array is the address of the memory. Some high-level languages are like the operating system applying for a large amount of memory space (such as Java, C#), and some languages require Then apply to the operating system for images (C, C++).

So we can understand that what the operating system provides to us is the sequential storage structure. On this basis, chain storage structure, index storage structure, hash storage structure, etc. are derived.
Sequential storage structure: The data elements are stored sequentially and sequentially. In the physical memory structure, the data are adjacent. If the logical structure of the data is also sequential (linear), the logical structure and storage structure are completely unified. Data elements stored continuously can be easily found in memory.
Chain storage structure: The chain storage structure is very similar to a train. Each car is regarded as a data element, and each car also holds a pointer item (ie, iron rope). Elements are not necessarily stored continuously in memory. A pointer item is added to the element, and the actual location of the data element logically connected to it can be found through the pointer.
Index storage structure: The index storage method is to create an additional index table while storing node information. For each index item in the index table, the general form of the index item is: (keyword, address)
Hash storage structure: Hash everywhere, take the key of the node as the argument, and directly calculate the storage address of the node through the function relationship F: node address = F (keyword), this is sometimes also It is called a hash algorithm.

The storage structure of the linear table-sequence table

The elements in the linear table are placed in a continuous storage space one by one. Such a linear table is called a sequential list (Sequential List). The continuous storage space, what comes to mind, is the array. The elements in the array are in a continuous storage space, so a one-dimensional array can be used to store each node of the linear table. According to the storage principle of "store nodes and connections", although the storage method of the sequence table does not directly store the logical relationship between nodes, the nodes are stored adjacent to each other through physical addresses, which means they are logically adjacent.

In the C language, the linear table is usually represented by the structure, and various operations of the data structure are realized, such as initialization, insertion, deletion and traversal. But for the high-level language of the object-oriented system, we can usually express it with the help of classes. If you are familiar with ArrayList in Java, ArrayList in Java is a linear table that is well-implemented. The storage structure is a linear table and an array, so we can improve our understanding of the data structure by looking at the source code and learn how others design it. of. The operations on the sequence table usually include:

Initialization: Give some default parameters to the linear table. Some high-level languages will have built-in data structures that have already been written. We can refer to them when we learn data structures.
Find the length: find the number of elements in the linear table
Take element: take the value of the element at a given position
Positioning: Find the position of a given element value
Insert: Insert the given value at the specified position
Delete: Delete the value at the specified position, delete the given value

The English of the linear table is SequenceList, we build a class to achieve the following:

/**
 * @author XingKe
 * @date 2021-07-24 15:04
 */
public class SequenceList{
    private int[] elementData;
    /**
     * 定义一个变量来返回给调用者
     */
    private  int size;
    /**
     * 默认数组大小
     */
    private  int DEFAULT_SIZE = 15;


    /**
     * 我们要不仅仅满足于实现一个简单的数据结构
     * 还要设计一个良好的接口、函数。给调用者一个良好的使用体验。
     * 发现Java泛型不能直接初始化,这里就直接用int举例了。
     * @param initSize
     */
    public SequenceList(int initSize) {
        if (initSize > 0){
            elementData  = new int[initSize];
        }else {
            // 抛一个异常出来 或者拒绝执行初始化
        }
    }

    /**
     * 这里我们也给出一个无参的,不强制要求调用者一定给定出初始大小
     */
    public SequenceList() {
        elementData  = new int[DEFAULT_SIZE];
    }

    /**
     * 该方法默认将元素添加至数组已添加的最后一个元素之后
     * @param data
     */
    public void add(int data){
        // 首先我们判断元素是否放满了,如果放满了,再放就放不下了
        // 我们执行就需要对数组进行扩容,但是我们知道数组一开始就要执行初始化.
        // 所谓的扩容就是产生一个新数组
        // 这里我们先按扩容两倍来处理,事实上这也是一种权衡
        ensureCapacity(size + 1);
        if (size + 1 > elementData.length){
            // 说明当前需要扩容
            elementData = new int[elementData.length * 2];
        }else{
            elementData[size++] = data;
        }
    }

    /**
     * 取指定位置的元素
     * @param index
     */
    public int  get(int index){
        rangeCheck(index);
        return elementData[index];
    }

    /**
     * 取给定元素的位置,默认返回第一个,不存在则返回-1
     * @param data
     * @return
     */
    public int indexOf(int data){
        for (int i = 0; i < elementData.length; i++) {
            if (data == elementData[i]){
                return i;
            }
        }
        return -1;
    }

    /**
     * 向指定位置插入元素
     * @param index
     * @param data
     */
  public void  add(int index, int data){
        // 先检查位置,有这么多检查位置的,事实上我们可以做一个通用的方法出来
        // 这里我就不做了
        // 范围检查,如果数组元素不够  执行扩容
        rangeCheck(index);
        ensureCapacity(size + 1);
        // index 到  size - 1 这个位置上的元素都向后平移一位
        for (int i = size - 1; i >= index; i--) {
            elementData[ i + 1 ] = elementData[i];
        }
        elementData[index] = data;
        ++size;
    }

    /**
     * 查看是否超出了数组的容纳
     * 超出了执行扩容
     * @param minCapacity
     */
    private void ensureCapacity(int minCapacity) {
        if (minCapacity > elementData.length){
            elementData = new int[elementData.length * 2];
        }
    }

    private void rangeCheck(int index) {
        if (index < 0 || index > elementData.length){
            throw new IndexOutOfBoundsException();
        }
    }

    /**
     * 移除指定元素的位置
     * @return
     */
    public int remove(int index){
        // 先检查位置
        rangeCheck(index);
        int data = elementData[index];
        // index+1到size - 1 往前移动一个位置
        for (int i = index + 1; i < size ; i++) {
            elementData[i] = elementData[++index];
        }
        // 移除后长度--
        size--;
        return data;
    }

    /**
     * 求长度这里,我们有两种思路,一种是每次都遍历一下，算出长度
     * 第二个就是在添加的时候,就对size变量进行++
     * 综合上来看,第一种思路更为优秀,所以size方法直接返回
     * @return
     */
    public int size(){
        return size;
    }
}

We will not analyze the time complexity of inserting and deleting at the end. We only analyze the time complexity of inserting in the middle. In fact, inserting at the end can also be included in the case of inserting in the middle. The problem scale here is the number of nodes, that is, the number of elements. After inserting new elements, the number of nodes becomes n+1.
The time of this algorithm is mainly spent on moving the sentence up after the node loop. The number of executions of the sentence is the number of node moves nk (k is the insertion position). It can be seen that the number of times required to move a node depends not only on the length of the table, but also on the insertion position, without loss of generality, assuming that the node is inserted at any position in the table (0<=k<=n) The chances of points are equal:

Insert subscript position K	0	1	2	....	n
Number of moves	n	n-1	n-2		0
Possible probability	1/n+1	1/n+1	1/n+1	1/n+1	1/n+1

The best case: when k=n, that is, you only need to add elements at the end, which is equivalent to the end of the line, without moving, and the time complexity is O(1).
Worst case: When k=0, the number of moves of the node is n times, which means that all nodes in the table are to be moved.
Average situation: Insert a node at the kth position in a linear table of length n, and the number of moves is nk. The average number of moves is: (1/n+1)* (n(n+1))/2 = n/2.
It can be seen that in the insertion operation on the linear table, half of the elements on the table must be moved on average, and the average time complexity of the algorithm is: O(n).

The deletion operation of the linear table also needs to move the nodes to fill the vacancies caused by the deletion operation, so as to ensure that the deleted elements are still adjacent to each other through physical addresses to reflect the logical adjacent relationship. If the deletion position happens to be the last element, only the terminal node needs to be deleted, without moving the node: If 0=< i <= n-1, the position in the table must be i+1, i+2... If the node of n-1 is moved to i, i+1,..., n-2, a total of ni elements from i+1 to n need to be moved forward.
Suppose the number of nodes is n, the same as inserting elements, the time of the delete algorithm is mainly spent on the node loop forward sentence, set the moving position to k, then the number of moves of the sentence is nk-1, so the required number of moves Not only depends on the length of the table, but also related to the delete position:

Delete subscript position K	0	1	2	....	n-1
Number of moves	n - 1	n-2	n-3		0
Possible probability	1/n	1/n	1/n	1/n	1/n

Best case: When k = n-1, the node-forwarding sentence is no longer performed, and the time complexity is O(1).
Worst case: when k = 0, all nodes need to be moved.
Average situation: The same without loss of generality, we assume that the probability of the deletion position at each position from 0 to n-1 is equal, then the average number of moves the node needs to move is: 1/n (n-1 ) n / 2 = (n-1)/2. The probability is multiplied by the number of moves and then accumulated. It can be seen that doing a delete operation on the sequence table requires about half of the elements to be moved on average. The average time complexity of the algorithm is O(n). The elements accessed by the array according to the index do not depend on the position, and the time complexity is O( 1).

The storage structure of linear table-linked list

The chain storage structure or chain structure is more common in real life, and the more well-known one is the train, like the following:

There is no direct connection between the carriages, but the connection is established through the chain between the carriages. If you and your friend travel by train together, if you two happen to be buying adjacent seat tickets, then if you want to talk to her, just because you are next to each other, you can just say it directly. But if it happens that your friend is in another car, but you know which car your friend is in, you can find him. This is a chain structure in a sense.

If your train is looking for people, you can only find the rear car from the front car, and the rear car cannot reach the front car, we call this kind of train a one-way train, which is called in terms of data structure. If it is a singly linked list.

If the train you are on is looking for people, although you can only find the back car from the front car, but the last car can reach the first car, a storage structure like this is called a circular linked list in the data structure. .

If the person behind your train can reach the car in front of him, and the person in front can also reach the car behind him, then a storage structure like this is called a doubly linked list.

From the above discussion, we should be able to derive the structural design of the link list node, just like each car carries passengers, it also holds the position of the next car (that is, the chain between the cars). The linked list is also similar to the carriage. In addition to storing its own information, each node also stores the location of its neighboring nodes.

Single list

Based on the above discussion, a singly linked list can only go from the front to the back. Each node not only stores its own information but also stores the storage address of the next node. We can roughly give the data structure of the node:

public class SingleLinkedListNode {
    // 存储结点本身的信息
    private int data;
    // 存储下一个结点的信息
    private SingleLinkedListNode next;

    public int getData() {
        return data;
    }
    public void setData(int data) {
        this.data = data;
    }
    public SingleLinkedListNode getNext() {
        return next;
    }
    public void setNext(SingleLinkedListNode next) {
        this.next = next;
    }
}

In fact, we can also put this node in the internal class we designed, which is easier to read and use. There are usually two basic ways to create a singly linked list:

Tail Insertion
The tail insertion method is a bit simpler and intuitionistic. We still use the example of a train carriage to explain the tail insertion method. We can understand the locomotive as the head node. Let's call the new carriage a1. If you want to hang it, hang it on the locomotive. Behind, let's call another carriage a2, this one hangs behind a1. We call this the tail interpolation method. The new nodes are hung behind the last node, which is the reason for the tail insertion method.
Head insertion
The tail insertion method is like queuing, the new person is at the back of the queue, then the head insertion method is opposite to the tail insertion method, he is jumping in the line. There is only one head node at the beginning, we call it a1, and another node we call a2, and we put it before a1.
After discussing the storage structure and initialization method, we begin to discuss the related operations of the singly linked list. This is the basic idea of the data structure that we introduce later. First, introduce, then introduce its storage structure, and finally discuss the operation based on the data structure. We only discuss several core operations of singly linked lists here:
initialization
Find
Search can be divided into two types: search by value and search by serial number (similar to search by index of array)
insert

delete

/**
 * @author XingKe
 * @date 2021-07-31 10:39
 * 这里我们从用尾插法的方式来建立单链表
 */
public class SingleLinkedList {
  private Node head;
  /**
   * 结点数量
   */
  private int size;

  private class Node{
      private int data;
      private Node next;
      public Node(int data, Node next) {
          this.data = data;
          this.next = next;
      }
      public Node(int data) {
          this(data,null);
      }

      @Override
      public String toString() {
          return "Node{" +
                  "data=" + data +
                  '}';
      }
  }

  public SingleLinkedList(int data) {
     head =  new Node(data,null);
     size++;
  }

  /**
   *  直接添加在末尾
   */
  public void add(int data){
      Node lastNode = head;
      while (lastNode.next != null){
          lastNode = lastNode.next;
      }
      Node newNode = new Node(data);
      lastNode.next = newNode;
      size++;
  }

  /**
   * 检查位置是否合法,我们将其抽象出来做成一个方法
   * @param index
   */
  public void checkPosition(int index){
      if (index < 0 || index >= size){
          throw  new RuntimeException("下标越界");
      }
  }

  /**
   * 单链表的插入,我们只需要将插入位置的元素的next指向要添加元素，要添加的元素的next指向插入位置原先的next
   * @param data
   * @param index
   */
  public void add(int data, int index){
      checkPosition(index);
      Node indexNode =  get(index);
      Node afterNode  = indexNode.next;
      Node insertNode = new Node(data,afterNode);
      indexNode.next  = insertNode;
      size++;
  }

  /**
   * 按数据查找
   * @param data
   * @return
   */
  public int indexOf(int data){
      Node indexNode = head;
      int index = - 1;
      while (indexNode.data != data){
           indexNode = indexNode.next;
           index++;
      }
      return index;
  }

  /**
   * 按下标查找
   * @param index
   * @return
   */
  public Node get(int index){
      checkPosition(index);
      Node indexNode = head;
      for (int i = 0; i < index; i++) {
          indexNode = indexNode.next;
      }
      return indexNode;
  }

  /**
   * 如果是队尾直接移除即可
   * 如果是中间,移除位置的前一个位置和后一个位置相连即可
   * @param index
   */
  public void remove(int index){
      checkPosition(index);
      // 获取需要移除位置的前一个结点
      // 即移除头结点,那么头结点的下一个结点称为头结点
      if (index == 0){
          head = head.next;
          return;
      }
      Node indexNode = get(index - 1);
      Node afterNode = indexNode.next.next;
      indexNode.next = afterNode;
  }
}

Next, let's analyze the complexity of finding, deleting, and inserting. These are all related to the location. Take search as an example. Assuming that the searched element is just at the head node, then only one search is needed.

Find a location	0	1	2	n
Find times	0	1	2	n
Probability	1/n	1/n	1/n	1/n

The number of searches in each location multiplied by the corresponding probability is the average number of searches, the most is (n(n+1)/2) * (1 / n), and the time complexity is O(n).
The time for deleting and searching for similarity is mainly spent on searching, so the time complexity is also O(n)
Compared with the singly linked list, the last node of the circular linked list points to the head node, which is the difference between the circular linked list and the singly linked list.

Doubly linked list

In the sequence table, we can always easily find the predecessor and successor of the table element according to the subscript, but the singly-linked list can only find the successor. If you need to operate on the direct predecessor of a node, you must search from the beginning node. rise. Because the node of the singly linked list only records the position of the successor node, and does not have the information of the predecessor node. So can it add data items that record predecessor grounding in its node structure? Of course it is possible. Designed based on this idea, we call it a doubly linked list.

/**
 * @author XingKe
 * @date 2021-07-31 12:36
 */
public class DoubleLinkedList {
    private Node head;
    // 指向第一个元素
    private Node last;

    private int size;

    private class Node{
        private Node before;
        private int data;
        private Node next;
        public Node(int data, Node next, Node before) {
            this.data = data;
            this.next = next;
        }
        public Node(int data) {
            this(data,null, null);
        }

        @Override
        public String toString() {
            return "Node{" +
                    "data=" + data +
                    '}';
        }
    }

    public DoubleLinkedList(int data) {
        Node node = new Node(1);
        head = node;
        last = node;
        size = 0;
        size++;
    }

    public void add(int data){
        last.next = new Node(data,null,last);
        size++;
    }

    public void add(int data,int index){
        //  定位该下标的位置
        Node indexNode = node(index);
        size++;
    }

    public void checkPosition(int index){
        if (index < 0 || index >= size){
            throw  new RuntimeException("下标越界");
        }
    }
    // 有了前驱结点,我们就不必从头找到尾了,我们可以根据下标判断
    // 如果比较靠厚我们就从最后一个结点找,如果比较靠前我们就从头结点开始找
    // index < size / 2  那说明比较靠前,我们从头遍历,
    // 如果index > size / 2 , 那说明比较靠后,我们就从后往前找即可。
    // 我们最多只需要查找一半的次数就可以了
    private Node node(int index) {
        checkPosition(index);
        if (index < size / 2){
            Node indexNode = head;
            for (int i = 0; i < index ; i++) {
                indexNode = head.next;
            }
            return indexNode;
        }else{
            Node indexNode = last;
            for (int i = size ; i > index ; i--) {
                indexNode = indexNode.before;
            }
            return indexNode;
        }
    }
}

The operations of deleting and adding are also spent on finding elements. Here we analyze the time complexity of finding:

Find a location	0	1	2	index - 1
Find times	0	1	2	index - 1
Probability	1/n	1/n	1/n	1/n

Relative to the singly linked list, we can intuitively feel that the worst of the doubly linked list is only half the range of the search. The more the index is in the middle, the more the search is done.
The time complexity of the search is O(n). Some students may ask here. Isn't this almost the same as a singly linked list? Note that this is an asymptotic upper bound. Both n /2 and n/4 are less than n. We can understand that this n is marked by a big O symbol, and the approximation is not compact enough. When we express the time complexity, we can choose to be as compact as possible. If the upper bound is chosen too large, then it may go against perception. This formula adds up to a compact upper bound of O(n/4), and the search speed of a singly linked list is O(n/2).

To design a data structure, we need to be more flexible and think about how to strengthen based on the original data structure. The doubly linked list constructed by the books I refer to has only one head node, but I refer to the LinkedList in Java (JDK official Realized doubly linked list), but for the convenience of addressing, size and last are added here. So when designing the data structure, you can also refer to the class library of the high-level language used.

in conclusion

This article belongs to one of the "Data Structure and Algorithm Analysis Series", which is considered to be a reworked data structure and algorithm analysis. This was originally the content of nearly one chapter. I incorporated it into an article and wrote about it for nearly two weekends. In the future, I will try to be agile and not accumulate too much content.

Data structure and algorithm analysis (3) linear table

Preface

"Linear Table" in Daily Life

Discuss the three elements of data structure again

The storage structure of the linear table-sequence table

The storage structure of linear table-linked list

Single list

Doubly linked list

in conclusion

北冥有只鱼

引用和评论

从阻塞IO到io_uring: Linux IO模型的演进之路

大模型中的Token究竟是什么？从原理到作用深度解析

DeepSeek行业应用实践报告100+份汇总解读|附PDF下载

功率器件热设计基础（九）——功率半导体模块的热扩散

英飞凌 | 驱动电路设计（二）——驱动器的输入侧探究

DeepSeek的开源之路:一文读懂从V1-R1的技术发展,见证从开源新秀到推理革命的领跑者

2025低空经济eVTOL行业研究报告42份汇总解读|附PDF下载