如何快速对比2个数组数据?

需求:有份1G+的json文件,存放着一个列表数据,文件会变更,数据全量写入,在监听到文件变更后需要知道变更前后增加、删除、变更了哪些数据?
列表数据的格式如下:(id字段唯一,name字段数据可能会变)

[{
    "id": "id1",
    "name": "name1"
}, {
    "id": "id2",
    "name": "name2"
}]

大致思路方案如下:

let oldList = [
  {"id": "id1", "name": "name1"}, 
  {"id": "id2", "name": "name2"}
];

let newList = [
  {"id": "id1", "name": "name4"}, 
  {"id": "id3", "name": "name3"}
];

// getDiffData为方案所需实现的算法
let {addList, removeList, changeList} = getDiffData(oldList, newList, 'id');

console.log(addList);
// [{"id": "id3", "name": "name3"}]

console.log(removeList);
// [{"id": "id2", "name": "name2"}]

console.log(changeList);
// [{"id": "id1", "name": "name4"}] or [{"id": "id1", "name": "name1"}]

问题:需要对比的文件数据较大(几百万条数据),希望该计算的时间尽量小点

阅读 4.4k
2 个回答
function getDiffData(oldList, newList) {
  const oldData = oldList.reduce((data, item) => {
    data[item.id] = item;
    return data;
  }, {});
  const addList = [];
  const changeList = [];

  for (const item of newList) {
    if (!oldData[item.id]) {
      addList.push(item);
    } else {
      if (oldData[item.id].name !== item.name) {
        changeList.push(item);
      }
      oldData[item.id] = true;
    }
  }

  const removeList = oldList.filter(item => oldData[item.id] !== true);

  return {
    addList,
    changeList,
    removeList,
  };
}

有唯一字段可以尝试把其中一个数组转换成对象后对比,可以有效提升效率

const oldList = Array.from({length: 300000}, (v, id) => ({
  id,
  name: (((1 + Math.random()) * 0x10000) | 0).toString(16).substring(1),
}));
const newList = Array.from({length: 350000}, (v, id) => ({
  id,
  name: (((1 + Math.random()) * 0x10000) | 0).toString(16).substring(1),
}));

console.time('getDiffData');
const {addList, changeList, removeList} = getDiffData(oldList, newList);
console.timeEnd('getDiffData');
console.log(`addList: ${addList.length}`, `changeList: ${changeList.length}`, `removeList: ${removeList.length}`);

写了个分别创建的30万和35万条数组进行对比,只要40ms左右
image

撰写回答
你尚未登录,登录后可以
  • 和开发者交流问题的细节
  • 关注并接收问题和回答的更新提醒
  • 参与内容的编辑和改进,让解决方法与时俱进
推荐问题