基于跳表结构的KV键值对简易存储引擎

本文最后更新于380 天前，其中的信息可能已经过时，如有错误请发送邮件到yuan140457@gmail.com

基于跳表实现的轻量键值对数据库。在随机读写情况下，该引擎每秒在多个并发线程下的累计执行成功次数都可以保持在30000左右。

1. 技术栈

基于跳表结构使用面向对象与泛型编程思想，对节点类与跳表类进行抽象与封装
使用互斥锁，防止并发插入数据时发生冲突
使用MakeFile的多文件编译方式

2. 对外接口

insertElement （插入数据）
deleteElement （删除数据）
searchElement （查询数据）
showSkipList （展示已存数据）
dumpFile （数据落盘）
loadFile （加载数据）
getSize （返回数据规模）

3. 跳表的原理

跳表是在一个原始链表中添加了多级索引，通过每一级索引来进行二分搜索的数据结构，其架构如下：

在上述跳表中，假如查询key=10的记录，则可以从第二级索引开始快速定位：

遍历第二级索引，从1开始，发现7<10<13，7就是该层要找的索引，通过它跳到下一级索引
遍历第一级索引，从7开始，发现9<10<13，9就是该层要找的索引，通过它跳到下一级索引
遍历原始链表，从9开始，发现10=10，10就是该层要找的最终索引

相比于直接遍历原始链表，多级索引的存在使跳表查询效率更快，总结：

跳表的优点：可以实现高效的插入、删除和查询，时间复杂度为O(logn).

跳表的缺点：需要存储多级索引，增加了额外的存储空间

跳表的用途：经典的以空间换时间的数据结构，常用于非关系数据库的存储引擎中

4. 跳表节点的实现

对于每个节点来说都需要保存它下一个节点的指针，所以我们的节点类可以定义如下：

/* 元素节点的类模板 */
template <class K, class V>
class Node {

public:
    /*默认构造 构造 与 析构函数*/
    Node() {}
    Node(K key, V value, int level);
    ~Node();

    /*获取键值对 K V */
    K getKey() const;
    V getValue() const;

    /*设置值 V */
    void setValue(const V &value);
public:
    Node<K,V> **forward;

    int node_level;
private:
    K key;
    V value;
};

5. 跳表的插入与删除

5.1 插入操作

跳表的插入负责构建整个跳表，是跳表中最难的需求。跳表的插入原理如下：

在上图中，要插入一个节点6，则要在多个层都插入这个节点6，这个多个层要可能随机，保证每一层都比上一层的节点少一半。

template <typename K, typename V>
bool SkipList<K, V>::insertElement(K key, V value) {
    // 上锁
    mtx.lock();

    // 
    Node<K,V> *current = this->head;

    // 定义一个 update 数组用来存放 current->forward[i]操作之后的元素
    Node<K, V> *update[this->maxLevel + 1];
    memset(update, 0, sizeof(Node<K,V> *) * this->maxLevel + 1);

    // 跳表由高到低进行遍历找到需要插入的位置
    for (int i = this->currentLevel; i >= 0; i--) {
        // 当当前层级的下一个节点不为空且该key小于要插入的key就往右遍历
        while (current->forward[i] != nullptr && current->forward[i]->getKey() < key) {
            current = current->forward[i];
        }
        // 否则就将该节点放入需要进行更新的数组 update 中
        update[i] = current;
    }

    // 当到达 level 0 层级时，将指针指向右节点，也就是希望插入键的节点
    current = current->forward[0];

    // 如果当前节点的键等于搜索到的键，那么证明已经存在该节点了
    if (current != nullptr && current->getKey() == key) {
        current->setValue(value);
        mtx.unlock();
        return true;
    }

    // 如果 current 为 NULL，则表示我们已到最底层
    // 如果 current 的键不等于 key，这意味着我们必须在 update[0] 和当前节点之间插入节点
    if (current == nullptr || current->getKey() != key) {
        // 先为新节点分配随机的 level/层数
        int randomLevel = getRandomLevel();

        // 如果随机层数大于跳表当前最大的层数，那么就用头节点来初始化update数组
        /*  
            假设当前跳表的最高层级 currentLevel 为 2，而生成的随机层级 randomLevel 为 4。
            在这种情况下，需要扩展 update 数组和更新 currentLevel。
              进入 for 循环，i 的初始值为 currentLevel+1，即 3。
              在第一次迭代中，i 的值为 3，执行 update[3] = heade，即将 update[3] 赋值为头点。
              在第二次迭代中，i 的值为 4，执行 update[4] = heade，即将 update[4] 赋值为头点。
            for 循环结束后，update 数组扩展为 [空, 空, 空, 头节点, 头节点]。
            然后，将 currentLevel 更新为 randomLevel，即将 currentLevel 更新为 4。
        */
        if (randomLevel > this->currentLevel) {
            for (int i = this->currentLevel + 1; i <= randomLevel + 1; i++) {
                update[i] = this->head;
            }
            this->currentLevel = randomLevel;
        }

        // 为新节点分配空间
        Node<K,V>* newNode = creatNode(key,value,randomLevel);

        // 插入该节点        
        for (int i = 0; i <= randomLevel; i++) {
            newNode->forward[i] = update[i]->forward[i];
            update[i]->forward[i] = newNode;
        }
        std::cout << "Successfully inserted key:" << key << ", value:" << value << std::endl;
        this->elementCount++;
    }
    mtx.unlock();
    return true;
}

5.2 删除操作

template <typename K, typename V>
bool SkipList<K, V>::removeElement(K key) {
    mtx.lock();

    Node<K,V> *current = this->head;
    Node<K,V> *update[this->currentLevel + 1];
    memset(update, 0, sizeof(Node<K,V> *) * (this->currentLevel + 1));

    // 从高到低进行查找
    for (int i = this->currentLevel; i >= 0; i--) {
        while (current->forward[i] != nullptr && current->forward[i]->getKey() < key) {
            current = current->forward[i];
        }
        update[i] = current;
    }

    current = current->forward[0];
    if (current != nullptr && current->getKey() == key) {
        // 从 level 0 开始，在每一层删除当前的节点
        for (int i = 0; i < this->currentLevel; i++) {
            // 如果下一个节点不是目标节点，则中断循环。
            if (update[i]->forward[i] != current) {
                break;
            }
            // 否则就让需要删除的节点的前一个指向后一个
            update[i]->forward[i] = current->forward[i];
        }

        // 删除没有元素的层级
        while (this->currentLevel > 0 && current->forward[this->currentLevel] == 0) {
            this->currentLevel--;
        }

        std::cout << "Successfully deleted key "<< key << std::endl;
        delete current;
        this->elementCount--;
    }
    mtx.unlock();
    return true;
}

6. 小结

跳表使用空间换时间的设计思路，通过构建多级索引来提高查询的效率，实现了基于链表的“二分查找”。跳表是一种动态数据结构，支持快速地插入、删除、查找操作，时间复杂度都是 O(logn)。

跳表的空间复杂度是 O(n)。不过，跳表的实现非常灵活，可以通过改变索引构建策略，有效平衡执行效率和内存消耗。

项目地址：

https://github.com/XiaChuerwu/SkipList