数据流中的中位数

[TOC]

题目描述

如何得到一个数据流中的中位数？如果从数据流中读出奇数个数值，那么中位数就是所有数值排序之后位于中间的数值。如果从数据流中读出偶数个数值，那么中位数就是所有数值排序之后中间两个数的平均值。我们使用Insert()方法读取数据流，使用GetMedian()方法获取当前读取数据的中位数。

public class Solution {

    public void Insert(Integer num) {
    
    }

    public Double GetMedian() {
        
    }


}

思路一：使用partition函数找出无序数组的中位数

从数据流中读出数据，首先需要一个容器来保存，如果采用数组作为容器，并且保存的时候直接放在数组末尾，不排序，读完之后，再通过partition函数从一个无序数组中寻找第n/2大的数（即中位数），这里就可以借鉴“数组中出现次数超过一半的数字”那道题了

import java.util.ArrayList;
import static java.util.Collections.swap;
public class Solution {

    ArrayList<Integer> list = new ArrayList<>();
    public void Insert(Integer num) {
        list.add(num);
    }


    public int partition(ArrayList<Integer> list,int low,int high){

        int base = list.get(high);
        int i = low - 1;
        //注意这里for循环j的起始位置为low
        for(int j = low; j < high; j++){
            if(list.get(j) <= base){
                i++;
                swap(list,i,j);
            }
        }
        swap(list,i+1,high);
        return i+1;
    }

    public Double GetMedian() {
        double result = 0;
        int index = partition(list, 0, list.size()-1);
        int middle = 0;
        if((list.size() % 2) == 0) {
            middle = list.size() >> 1;
        }else {
            middle = (int) Math.floor(list.size() >> 1);
        }
        while (index != middle){
            if(index > middle){
                index = partition(list, 0, index - 1);
            }else {
                index = partition(list, index + 1, list.size()-1);
            }
        }
        if(list.size() % 2 == 0 && index != 0 ){
            result = (list.get(index) + list.get(index - 1)) / 2.0;
        }else {
            result = list.get(index);
        }
        return result;

    }


}

思路二：有序的插入到数组

在向数组中插入新的数据时让数组保持有序（可以采用插入排序）每插入一个数，可能需要移动o(n)个数，对于插入n个数，时间复杂度为o(n^2)

然后在排好序的数组中找出中位数，时间复杂度为o(1)

思路三：有序的插入到链表

用链表作为保存数据流中的数据的容器，同样的，每插入一个数，需要o(n)的时间复杂度才能找到合适的插入位置，使用链表和使用数组的时间复杂度一样

思路四：使用平衡二叉树AVL树保存数据

插入到AVL树中的时间复杂度为o(logn)，获得中位数的时间复杂度为o(1)

思路五：使用最大堆和最小堆分别求出两部分的最大值和最小值

具体思路

在堆中插入一个数据的时间效率为o(logn)，得到堆顶的数据的时间复杂度为o(1)

图片描述

数据平均分配到两个堆中，两个堆中的数目只差不超过1
保证最大堆的数据都小于最小堆的数据

依次将数据流中的数据依次放入最大堆和最小堆，当将要放入最小堆的数据小于最大堆的某些数，那么先将这个数插入最大堆中，然后取出最大堆中的最大值放入最小堆中。同理，如果将要放入最大堆的数据大于最小堆的某些数，那么就先将这个数放入最小堆中，然后取出最小堆的最小值放入最大堆

图片描述

回顾堆的性质

在使用堆之前我们先来回顾一下堆的性质

1、什么是堆：堆其实就是一颗完全二叉树，除了最底层，其他层的节点都被元素填满，并且最底层尽可能从左到右填入

2、堆有哪些性质：堆一般用一维数组实现，并且该一维数组是堆的层次遍历对应的序列

1）堆的子节点和父节点之间的位置关系：对于节点i,它的的左右子节点在一维数组中的索引为：2i+1和2i+2

它的父节点的索引为floor((i-1)/2)

2)堆的深度：同二叉树的深度一样为θ（logn）

3)堆的叶节点的下标分别为：floor(n/2)，floor(n/2)+1,floor(n/2)+2,floor(n/2)+3,floor(n/2)+4...... n-1

接下来就是回顾堆排序了，堆排序是在简单选择排序的基础上演化而来的

回顾简单选择排序

所以我们先回顾一下简单选择排序

简单选择排序的思想：每次从剩下的数中挑选一个最小的数放在已经排好序的末尾，这个最小的数需要找到它的下标，然后和arr[i]进行交换

图片描述

简单选择排序的实现

 public void simpleSort(int[] arr){
        for(int i = 0; i < arr.length - 1; i++){
            int minIndex = i;
            for(int j = i+1; j < arr.length; j++){
                if(arr[j] < arr[minIndex]){
                    minIndex = j;
                }
            }
            swap(arr, i, minIndex);
        }
    }
    public void swap(int[] arr,int i,int j){
        int temp = 0;
        temp = arr[i];
        arr[i] = arr[j];
        arr[j] = temp;
    }

从代码中我们可以看出，在每一轮比较中（从第一个for循环看出一共需要比较arr.length-1轮）通过两两比较寻找剩下的序列中最小的数的下标，然后将arr[i]与arr[minIndex]交换，然后下一轮又从i+1开始继续寻找剩下的序列中最小数的下标，这个过程中有的数实际上互相比较了很多次，所以这是一个非常损耗性能的事情

堆排序如何实现

假如给了我们一个数组arr={4,8,1,0,3,5},我们如何通过堆排序来对该数组进行排序呢（由大到小）

步骤如下：

1）将该数组构建成一个最大堆

2）取出最大堆的根节点，维护新堆的最大堆性质，构建一个新的最大堆（堆的大小减一）

3）重复步骤2，直到取完堆中的所有节点

图片描述

最后总结一下堆排序的伪代码

图片描述

大顶堆排序的具体实现(内部由大到小)：最后将数组顺序排列

import org.omg.CORBA.RepositoryIdHelper;

import java.util.Arrays;

public class HeapSort {
    public static void main(String[] args){

        HeapSort heapSort = new HeapSort();
        int[] arr = {5, 2, 7, 1, 9, 3};
        //[1, 2, 3, 5, 7, 9]
        heapSort.heapSort(arr);
        System.out.println(Arrays.toString(arr));
    }


    private int heapSize = 0;

    public void heapSort(int[] arr){
        //构建一个最大堆
        buildMaxHeap(arr);
        for(int i = arr.length-1; i > 0; i--){
            swap(arr,i,0);
            heapSize--;
            maxHeapify(arr, 0);
        }
    }
    public void buildMaxHeap(int[] arr){
        heapSize = arr.length;
        //检查除了叶子节点以外的其他节点是否满足最大堆的性质
        for(int i = (arr.length >> 1) - 1; i >= 0; i--){
            maxHeapify(arr, i);
        }

    }
    public void maxHeapify(int[] arr,int i){
        int leftIndex = 2 * i + 1;
        int rightIndex = 2 * i + 2;
        int largeIndex = i;
        if(leftIndex <heapSize && arr[leftIndex] > arr[i]){
            largeIndex = leftIndex;
        }
        if(rightIndex < heapSize && arr[rightIndex] > arr[largeIndex]){
            largeIndex = rightIndex;
        }
        if(largeIndex != i){
            swap(arr, largeIndex, i);
            maxHeapify(arr,largeIndex);
        }

    }

    public void swap(int[] arr,int i,int j){
        int temp = 0;
        temp = arr[i];
        arr[i] = arr[j];
        arr[j] = temp;
    }
}

小顶堆的实现（*内部实现由小到大*）：最后将数组降序排列

import sun.net.www.HeaderParser;

import java.util.Arrays;

public class SmallHeap {

    public static void main(String[] args){

        SmallHeap smallHeap = new SmallHeap();
        int[] arr = {5, 2, 7, 1, 9, 3};
        ///[9, 7, 5, 3, 2, 1]
        smallHeap.smallHeapSort(arr);
        System.out.println(Arrays.toString(arr));
    }


    int heapSize = 0;
    public void smallHeapSort(int[] arr){
        buildSmallHeap(arr);
        for(int i = arr.length - 1; i > 0; i--){
            swap(arr, i, 0);
            heapSize--;
            minHeapify(arr, 0);
        }
    }


    public void swap(int[] arr,int i,int j){
        int temp = arr[i];
        arr[i] = arr[j];
        arr[j] = temp;
    }

    public void buildSmallHeap(int[] arr){
        heapSize = arr.length;
        for(int i =( arr.length >> 1) - 1; i >= 0 ; i--){
            minHeapify(arr, i);

        }
    }

    public void minHeapify(int[] arr,int i){
        int leftIndex = 2 * i + 1;
        int rightIndex = 2 * i + 2;
        int minIndex = i;
        if (leftIndex < heapSize && arr[leftIndex] < arr[i]) {
            minIndex = leftIndex;
        }
        if (rightIndex < heapSize && arr[rightIndex] < arr[minIndex]) {
            minIndex = rightIndex;
        }
        if (minIndex != i) {
            swap(arr,i,minIndex);
            minHeapify(arr,minIndex);
        }
    }
}

堆的应用——优先队列

最大优先队列（最大堆）

1）insert(S,x):把x插入集合S中

2）maximum(S):返回S的最大值

3）extractMax(S)：remove 并且返回S的最大值

4）increaseKey(S,i,k):将S[i]增加到K,

应用

作业调度，选出最高优先级的作业执行，随时insert一个新的作业到队列

最小优先队列（最小堆）

1）insert(S,x):把x插入集合S中

2）minimum(S):返回S的最小值

3）extractMin(S)：remove 并且返回S的最小值

4）decreaseKey(S,x,k):将S[i]减小到K,

最大优先队列和最小优先队列的实现

图片描述

本题的实现：通过最大堆和最小堆分别构建最大优先队列和最小优先队列

import java.util.ArrayList;


import static java.util.Collections.swap;
public class Solution {

    class MaxHeap{
        ArrayList<Integer> maxHeapList = new ArrayList<Integer>();
        private int maxHeapSize ;

        public MaxHeap() {
            this.maxHeapSize = maxHeapList.size();
        }
    }
    class MinHeap{
        ArrayList<Integer> minHeapList = new ArrayList<Integer>();

        private int minHeapSize ;

        public MinHeap() {
            this.minHeapSize = minHeapList.size();
        }
    }




    private int count = 0;

    public void Insert(Integer num,MaxHeap maxHeap,MinHeap minHeap){
        if(count == 0){
            maxHeap.maxHeapList.add(0,num);
            maxHeap.maxHeapSize++;
        }else {
            if(count % 2 == 0){
                if(num > minHeap.minHeapList.get(0)){
                    int temp = extractiMin(minHeap);
                    insertMinHeap(minHeap, num);
                    inserMaxHeap(maxHeap, temp);
                }else {
                    inserMaxHeap(maxHeap,num);
                }
            }else {

                if(num < maxHeap.maxHeapList.get(0)){
                    int temp = extractMax(maxHeap);
                    inserMaxHeap(maxHeap,num);
                    insertMinHeap(minHeap,temp);
                }else {
                    insertMinHeap(minHeap,num);

                }
            }


        }

        count++;
    }

    MaxHeap maxHeap = new MaxHeap();
    MinHeap minHeap = new MinHeap();

    public void Insert(Integer num) {
        Insert(num,maxHeap,minHeap);
    }

    public int extractiMin(MinHeap minHeap){

        if(minHeap.minHeapSize < 0) return 0;
        int min = minHeap.minHeapList.get(0);
        minHeap.minHeapList.remove(0);
        //注意处理heapsize的大小
        minHeap.minHeapSize--;

        minHeapify(minHeap, 0);
        return min;
    }

    //这里维护最小堆的性质和最大堆不同，比较的是和父节点的大小，因为这里不断的插入新的节点到末尾，应该是和前面的父节点比较
    public void minHeapify(MinHeap minHeap,int i){
        int minIndex = i;
        int parentIndex = (int) Math.floor((i - 1) / 2.0);
        if( parentIndex >= 0 && minHeap.minHeapList.get(i) < minHeap.minHeapList.get(parentIndex) ){
            minIndex = parentIndex;
            swap(minHeap.minHeapList, i, minIndex);
            minHeapify(minHeap,minIndex);
        }

    }

    public void insertMinHeap(MinHeap minHeap,int num){
        minHeap.minHeapSize++;
        minHeap.minHeapList.add(minHeap.minHeapSize - 1,Integer.MAX_VALUE);
        decreaseKey(minHeap.minHeapList, minHeap.minHeapSize - 1, num);
    }

    public void decreaseKey(ArrayList<Integer> list,int i,int key){
        if(list.get(i) < key)return;
        list.remove(i);
        list.add(i, key);
        minHeapify(minHeap,i);
    }
    public void inserMaxHeap(MaxHeap maxHeap,int num){

        maxHeap.maxHeapSize++;
        if(maxHeap.maxHeapSize > 0) {
            maxHeap.maxHeapList.add(maxHeap.maxHeapSize - 1, Integer.MIN_VALUE);
        }else {
            maxHeap.maxHeapList.add(0,Integer.MAX_VALUE);
        }
        increaseKey(maxHeap, maxHeap.maxHeapSize - 1, num);
    }

    public void increaseKey(MaxHeap maxHeap,int i,int key){
        if(maxHeap.maxHeapList.get(i) > key)return;
        maxHeap.maxHeapList.remove(i);
        maxHeap.maxHeapList.add(i, key);
        while (i > 0 && maxHeap.maxHeapList.get((int) Math.floor((i-1)/2.0)) < maxHeap.maxHeapList.get(i)){
            swap(maxHeap.maxHeapList,i,(int) Math.floor((i-1)/2.0));
            i = (int) Math.floor((i - 1) / 2.0);
        }
    }

    public int extractMax(MaxHeap maxHeap){

        if(maxHeap.maxHeapSize < 0)return 0;
        int max = maxHeap.maxHeapList.get(0);
        maxHeap.maxHeapList.remove(0);
        maxHeap.maxHeapSize--;
        maxHeapify(maxHeap, 0);
        return max;
    }

    public void maxHeapify(MaxHeap maxHeap,int i){
        int leftIndex = 2 * i +1;
        int rightIndex = 2 * i +2;
        int largeIndex = i;
        if(leftIndex < maxHeap.maxHeapSize && maxHeap.maxHeapList.get(leftIndex) > maxHeap.maxHeapList.get(i)){
            largeIndex = leftIndex;
        }
        if(rightIndex < maxHeap.maxHeapSize && maxHeap.maxHeapList.get(rightIndex) > maxHeap.maxHeapList.get(largeIndex)){
            largeIndex = rightIndex;
        }
        if(largeIndex != i){
            swap(maxHeap.maxHeapList,i,largeIndex);
            maxHeapify(maxHeap,largeIndex);
        }
    }

    public Double GetMedian() {
        if(count % 2 == 0){
            return (maxHeap.maxHeapList.get(0) + minHeap.minHeapList.get(0)) / 2.0;
        }else {
            return (double)maxHeap.maxHeapList.get(0);
        }

    }



}

该题的实现：基于现有的类库：PriorityQueue实现最大堆和最小堆

import java.util.Comparator;
import java.util.PriorityQueue;
public class Solution {

    private int count = 0;
    //默认升序排列，也就是默认为最小堆
    private PriorityQueue<Integer> minHeap = new PriorityQueue<>();
    //通过重写Comparator接口的compare方法，构建一个逆序排列的最大堆
    private PriorityQueue<Integer> maxHeap = new PriorityQueue<Integer>( new Comparator<Integer>() {
        @Override
        public int compare(Integer o1, Integer o2) {
            return o2 - o1;
        }
    });

    public void Insert(Integer num) {
        if (count %2 == 0) {
            //最大堆插入一个元素之后，内部会自动重新构建一个最大堆maxheapify
            maxHeap.offer(num);
            //remove并返回第一个元素
            int temp = maxHeap.poll();
            //插入到最小堆，内部同样进行minheapify
            minHeap.offer(temp);
        } else {
            minHeap.offer(num);
            int temp = minHeap.poll();
            maxHeap.offer(temp);
        }
        count++;
    }

    public Double GetMedian() {
        if (count %2 == 0) {
            return new Double((minHeap.peek() + maxHeap.peek())) / 2;
        } else {
            return new Double(minHeap.peek());
        }
    }

}

参考

《剑指offer纪念版》

《算法导论第三版》

数据流中的中位数

题目描述

思路一：使用partition函数找出无序数组的中位数

思路二：有序的插入到数组

思路三：有序的插入到链表

思路四：使用平衡二叉树AVL树保存数据

思路五：使用最大堆和最小堆分别求出两部分的最大值和最小值

具体思路

回顾堆的性质

回顾简单选择排序

堆排序如何实现

堆的应用——优先队列

最大优先队列（最大堆）

最小优先队列（最小堆）

本题的实现：通过最大堆和最小堆分别构建最大优先队列和最小优先队列

该题的实现：基于现有的类库：PriorityQueue实现最大堆和最小堆

参考

flyingcr

引用和评论

动态规划