ARTS打卡活动——第三周

1.Algorithm 做一个leetcode的算法题

146. LRU Cache
Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations: get and put.

get(key) - Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
put(key, value) - Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.

The cache is initialized with a positive capacity.

Follow up:
Could you do both operations in O(1) time complexity?

Example:

LRUCache cache = new LRUCache( 2 /* capacity */ );

cache.put(1, 1);
cache.put(2, 2);
cache.get(1);       // returns 1
cache.put(3, 3);    // evicts key 2
cache.get(2);       // returns -1 (not found)
cache.put(4, 4);    // evicts key 1
cache.get(1);       // returns -1 (not found)
cache.get(3);       // returns 3
cache.get(4);       // returns 4

解答：这道题并不难，主要是考察数据结构的使用。因为LRU的规则，需要频繁从线性结构中，移动某个元素到头部，所以只能选择链表结构；又因为需要在常数时间内查询key是否在缓存中，单单链表是无法满足，所以需要加上哈希表来快速查询。也就是说，这道题需要的数据结构包括：双向链表+哈希表。
我的第一个版本采用的是STL的list和unordered_map，代码如下：

class LRUCache {
public:
    LRUCache(int capacity) {
        m_cache.clear();
        m_hash.clear();
        m_capacity = capacity;
        m_size = 0;
    }
    
    int get(int key) {
        return _get(key, -1);
    }
    
    void put(int key, int value) {
        int ret = _get(key, value);
        if (ret > 0) {
            //exist and refresh, do nothing
            return;
        }
        else {
            //push_front
            m_cache.push_front(pair<int, int>(key, value));
            m_hash.insert(pair<int, list<pair<int, int>>::iterator>(key, m_cache.begin()));
            if (m_size >= m_capacity) {
                //pop the old value
                list<pair<int, int>>::iterator last = --m_cache.end();
                m_hash.erase((*last).first);
                m_cache.pop_back();
            }
            else {
                m_size++;
            }
        }
    }
private:
    int m_capacity;
    int m_size;
    map<int, list<pair<int, int>>::iterator> m_hash;
    list<pair<int, int>> m_cache;

    int _get(int key, int newVal) {
        map<int, list<pair<int, int>>::iterator>::iterator iter;
        iter = m_hash.find(key);
        if (iter == m_hash.end()) {
            return -1;
        }
        else {
            list<pair<int, int>>::iterator pos = iter->second;
            int val = (*pos).second;
            if (newVal > 0) {
                val = newVal;
            }
            m_cache.erase(pos);
            m_cache.push_front(pair<int, int>(key, val));
            m_hash.erase(key);
            m_hash.insert(pair<int, list<pair<int, int>>::iterator>(key, m_cache.begin()));
            return val;
        }
    }
};

采用STL的list，虽然免去了双向链表的具体实现，但是使用起来很不顺畅，原因在于：

在缓存满的时候加入新元素，需要剔除最后一个元素，所以链表的元素需要记录key和对应的val（val用于get的返回，key主要是用于移除哈希表对应的数据）
哈希表的val是链表的iterator，定义起来很繁琐（当然可以用typedef来简化）
用STL的list实现，内部实现增添了不少复杂性，执行时间会比纯粹的双向链表要高

因此，最后我还是自己封装了链表的实现，具体代码如下：

struct Node {
    int key;
    int val;
    Node *prev;
    Node *next;
};

class LRUCache {
public:
    LRUCache(int capacity) {
        m_capacity = capacity;
        m_size = 0;
        m_head = new Node{0, 0};
        m_tail = new Node{0, 0};
        //m_head和m_tail为哨兵
        m_head->next = m_tail;
        m_tail->prev = m_head;
    }
    
    int get(int key) {
        return _touch(key, -1);
    }
    
    void put(int key, int value) {
        if (_touch(key, value) < 0) {
            Node *p = new Node{key, value};
            _push_front(p);
            m_map[key] = p;
            if (m_size < m_capacity) {
                m_size++;
            }
            else {
                m_map.erase(_pop_back());
            }           
        }
    }
private:
    unordered_map<int, Node *> m_map;
    Node *m_head, *m_tail;
    int m_capacity;
    int m_size;

    int _touch(int key, int val) {
        unordered_map<int, Node *>::iterator iter = m_map.find(key);
        if (iter == m_map.end()) {
            return -1;
        }
        Node *p = iter->second;
        if (val > 0)
            p->val = val;
        _move_to_head(p);
        return p->val;
    }

    void _move_to_head(Node *p) {
        p->prev->next = p->next;
        p->next->prev = p->prev;
        _push_front(p);
    }

    int _pop_back() {
        Node *last = m_tail->prev;
        int key = last->key;
        last->prev->next = m_tail;
        m_tail->prev = last->prev;
        last = NULL;
        return key;
    }

    void _push_front(Node *p) {
        p->prev = m_head;
        p->next = m_head->next;
        m_head->next->prev = p;
        m_head->next = p;
    }
};

在链表的实现中，我添加了两个哨兵（m_head, m_tail头尾节点指针），在添加节点和移除节点时，不需要作判空处理，省了不少代码。最终这种实现，无论耗时还是空间使用，都比第一种要优。

2.Review 阅读并点评至少一篇英文技术文章

这周阅读的文章是Prateek Gogia写的Redis: What and Why?，通篇讲的是redis是什么，以及相比传统的数据库有哪些优势。
首先，redis是一种内存数据库，支持字符串、哈希表、有序集、集合等数据结构，因为数据存储在内存，所以处理速度比传统数据库要快很多；
其次，为什么要使用redis，原因有以下几点：

redis是用C语言写的，因此处理速度很快；
redis是NoSql；
目前很多技术巨头都使用redis，比如GitHub,Weibo, Pinterest, Snapchat, Craigslist, Digg, StackOverflow, Flickr；
采用redis缓存，可以省去直接访问云端数据库的成本；
redis对开发者很友好，支持很多语言，比如（JavaScript, Java, Go, C, C++, C#, Python, Objective-C, PHP以及其他热门语言）；
redis是开源的，而且到目前为止很稳定。

3.Tip 学习至少一个技术技巧

哨兵的使用：在写算法逻辑时候，常常因为场景的限制，需要对一些边界条件作特殊的判断，比如指针的判空、上下左右边界的界限、数组最后元素的判定等等，这个时候引用哨兵这个概念，可以避免这种边界的判断，使用一点空间来简化代码逻辑。以下列几种情况来说明哨兵是如何使用的：

二维矩阵的哨兵，很多场景需要用到二维矩阵，比如迷宫，假设迷宫的大小为n*m，那么走迷宫的时候，需要判断上下左右的边界，比如x坐标的边界判断是x > 0 && x < n，y坐标类似，每一次处理都需要做4次判断，很繁琐。针对这个情况，我一般会在二维矩阵的周边添加哨兵，也就是我会申请(n+2)*(m+2)的矩阵，对于坐标(x,y) (x == 0 || x == n + 1 || y == 0 || y == m + 1)，设置对应的值为障碍，是无法通过的，这样逻辑处理的简单很多了，直接遍历1 <= x <= n 和 1 <= y <= m就可以了；
操作双向链表的头尾指针，往往都需要先判断是否为空，如果我们初始化链表的时候，就给链表分配头尾两个节点（这两个节点在链表的生命周期中都不会变动），这样以后操作链表的节点时，就不需要作判空操作了；
在无序的数组中查找元素，一般都是从头到尾遍历一次，找到了跳出循环，每次循环需要执行两次比较操作，一个是判断下标i是否小于n，一个是判断对应的数据是否等于要找的key。设置哨兵的做法就是，让数组最后一个元素为要找的key，这样循环过程中就可以减少对下标i是否小于n的判断，因为最终一定会找到。

4.Share 分享一篇有观点和思考的技术文章

这周分享的文章是使用 TensorFlow 做文本情感分析，里面详细的描述如何用TensorFlow来分析文本的情感，有步骤有数据，挺好的一篇文章。

ARTS打卡活动——第三周