foreword
When a Javascript program needs to store data on the browser side, you have the following options:
- Cookies: Typically used for HTTP requests and have a size limit of 64 kb.
- LocalStorage: Stores key-value pairs in key-value format, usually with a limit of 5MB.
- WebSQL: Not an HTML5 standard, deprecated.
- FileSystem & FileWriter API: Very poor compatibility, currently only supported by Chrome browser.
- IndexedDB: It is a NOSQL database that can operate asynchronously, support transactions, store JSON data and iterate with indexes, with good compatibility.
Obviously, only IndexedDB is suitable for doing a lot of data storage. But using IndexedDB directly will also encounter several problems:
- The IndexedDB API is transaction-based, biased towards the bottom layer, and the operation is cumbersome, requiring simplified encapsulation.
- Where are the main performance bottlenecks of IndexedDB?
- IndexedDB may perform multiple operations on the same data record when the browser has multiple tab pages.
This article will combine the author's practical experience to conduct related explorations on the above issues.
Log log storage scenarios
There is a scenario where the client generates a large number of logs and stores several logs. When some errors occur (or when a long connection is instructed by the server), all local log contents can be pulled and a request for reporting occurs.
as the picture shows:
This is a well-designed operation for IndexedDB CRUD scenarios, and here we only focus on the IndexedDB storage part. There are basic concepts about IndexedDB, such as warehouse IDBObjectStore, index IDBIndex, cursor IDBCursor, transaction IDBTransaction, please refer to IndexedDB-MDN due to space limitations
create database
We know that IndexedDB is transaction-driven, open a database db_test, create a store log, and use time as an index.
class Database {
constructor(options = {}) {
if (typeof indexedDB === 'undefined') {
throw new Error('indexedDB is unsupported!')
return
}
this.name = options.name
this.db = null
this.version = options.version || 1
}
createDB () {
return new Promise((resolve, reject) => {
// 为了本地调试,数据库先删除后建立
indexedDB.deleteDatabase(this.name);
const request = indexedDB.open(this.name);
// 当数据库升级时,触发 onupgradeneeded 事件。
// 升级是指该数据库首次被创建,或调用 open() 方法时指定的数据库的版本号高于本地已有的版本。
request.onupgradeneeded = () => {
const db = request.result;
window.db = db
console.log('db onupgradeneeded')
// 在这里创建 store
this.createStore(db)
};
// 打开成功的回调函数
request.onsuccess = () => {
resolve(request.result)
this.db = request.result
};
// 打开失败的回调函数
request.onerror = function(event) {
reject(event)
}
})
}
createStore(db) {
if (!db.objectStoreNames.contains('log')) {
// 创建表
const objectStore = db.createObjectStore('log', {
keyPath: 'id',
autoIncrement: true
});
// time 为索引
objectStore.createIndex('time', 'time');
}
}
}
The calling statement is as follows:
(async function() {
const database = new Database({ name: 'db_test' })
await database.createDB()
console.log(database)
// Database {name: 'db_test', db: IDBDatabase, version: 1}
// db: IDBDatabase
// name: "db_test"
// objectStoreNames: DOMStringList {0: 'log', length: 1}
// onabort: null
// onclose: null
// onerror: null
// onversionchange: null
// version: 1
// [[Prototype]]: IDBDatabase
// name: "db_test"
// version: 1
// [[Prototype]]: Object
})()
Add, delete and modify operations
When a piece of data is inserted into the log, we need to submit a transaction, and perform an add operation on the store in the transaction.
const db = window.db;
const transaction = db.transaction('log', 'readwrite')
const store = transaction.objectStore('log')
const storeRequest = store.add(data);
storeRequest.onsuccess = function(event) {
console.log('add onsuccess, affect rows ', event.target.result);
resolve(event.target.result)
};
storeRequest.onerror = function(event) {
reject(event);
};
Since each addition, deletion, modification and inspection needs to open a transaction, such a call is inevitably cumbersome. We need some steps to simplify and provide an API in the form of ES6 promises.
class Database {
// ... 省略打开数据库的过程
// constructor(options = {}) {}
// createDB() {}
// createStore() {}
add (data) {
return new Promise((resolve, reject) => {
const db = this.db;
const transaction = db.transaction('log', 'readwrite')
const store = transaction.objectStore('log')
const request = store.add(data);
request.onsuccess = event => resolve(event.target.result);
request.onerror = event => reject(event);
})
}
put (data) {
return new Promise((resolve, reject) => {
const db = this.db;
const transaction = db.transaction('log', 'readwrite')
const store = transaction.objectStore('log')
const request = store.put(data);
request.onsuccess = event => resolve(event.target.result);
request.onerror = event => reject(event);
})
}
// delete
delete (id) {
return new Promise((resolve, reject) => {
const db = this.db;
const transaction = db.transaction('log', 'readwrite')
const store = transaction.objectStore('log')
const request = store.delete(id)
request.onsuccess = event => resolve(event.target.result);
request.onerror = event => reject(event);
})
}
}
The calling code is as follows:
(async function() {
const db = new Database({ name: 'db_test' })
await db.createDB()
const row1 = await db.add({time: new Date().getTime(), body: 'log 1' })
// {id: 1, time: new Date().getTime(), body: 'log 2' }
await db.add({time: new Date().getTime(), body: 'log 2' })
await db.put({id: 1, time: new Date().getTime(), body: 'log AAAA' })
await db.delete(1)
})()
Inquire
There are many kinds of queries. Common ORMs provide two methods: range query and index query. In range query, paging query is also possible. In IndexedDB we simplify to getByIndex.
The query requires the use of an IDBCursor cursor and an IDBIndex index.
class Database {
// ... 省略打开数据库的过程
// constructor(options = {}) {}
// createDB() {}
// createStore() {}
// 查询第一个 value 相匹对的值
get (value, indexName) {
return new Promise((resolve, reject) => {
const db = this.db;
const transaction = db.transaction('log', 'readwrite')
const store = transaction.objectStore('log')
let request
// 有索引则打开索引来查找,无索引则当作主键查找
if (indexName) {
let index = store.index(indexName);
request = index.get(value)
} else {
request = store.get(value)
}
request.onsuccess = evt => evt.target.result ?
resolve(evt.target.result) : resolve(null)
request.onerror = evt => reject(evt)
});
}
/**
* 条件查询,带分页
*
* @param {string} keyPath 索引名称
* @param {string} keyRange 索引对象
* @param {number} offset 分页偏移量
* @param {number} limit 分页页码
*/
getByIndex (keyPath, keyRange, offset = 0, limit = 100) {
return new Promise((resolve, reject) => {
const db = this.db;
const transaction = db.transaction('log', 'readonly')
const store = transaction.objectStore('log')
const index = store.index(keyPath)
let request = index.openCursor(keyRange)
const result = []
request.onsuccess = function (evt) {
let cursor = evt.target.result
// 偏移量大于 0,代表需要跳过一些记录
if (offset > 0) {
cursor.advance(offset);
}
if (cursor && limit > 0) {
console.log(1)
result.push(cursor.value)
limit = limit - 1
cursor.continue()
} else {
cursor = null
resolve(result)
}
}
request.onerror = function (evt) {
console.err('getLogByIndex onerror', evt)
reject(evt.target.error)
}
transaction.onerror = function(evt) {
reject(evt.target.error)
};
})
}
}
(async function() {
const db = new Database({ name: 'db_test' })
await db.createDB()
await db.add({time: new Date().getTime(), body: 'log 1' })
// {id: 1, time: new Date().getTime(), body: 'log 2' }
await db.add({time: new Date().getTime(), body: 'log 2' })
const time = new Date().getTime()
await db.put({id: 1, time: time, body: 'log AAAA' })
await db.add({time: new Date().getTime(), body: 'log 3' })
// 查询最小是这个时间的的记录
const test = await db.getByIndex('time', IDBKeyRange.lowerBound(time))
// multi index query
// await db.getByIndex('time, test_id', IDBKeyRange.bound([0, 99],[Date.now(), 2100]);)
console.log(test)
// 0: {id: 1, time: 1648453268858, body: 'log AAAA'}
// 1: {time: 1648453268877, body: 'log 3', id: 3}
})()
Of course, there are more possibilities for query, such as querying all the data in a table, or counting the number of records in this table, etc., and it is left to readers to expand by themselves.
optimization
We need to separate the Model and the Database, and make some improvements when creating the DB above. Similar to ORM, it provides mapping and basic addition, deletion, modification, and query methods.
class Database {
constructor(options = {}) {
if (typeof indexedDB === 'undefined') {
throw new Error('indexedDB is unsupported!')
}
this.name = options.name
this.db = null
this.version = options.version || 1
// this.upgradeFunction = option.upgradeFunction || function () {}
this.modelsOptions = options.modelsOptions
this.models = {}
}
createDB () {
return new Promise((resolve, reject) => {
indexedDB.deleteDatabase(this.name);
const request = indexedDB.open(this.name);
// 当数据库升级时,触发 onupgradeneeded 事件。升级是指该数据库首次被创建,或调用 open() 方法时指定的数据库的版本号高于本地已有的版本。
request.onupgradeneeded = () => {
const db = request.result;
console.log('db onupgradeneeded')
Object.keys(this.modelsOptions).forEach(key => {
this.models[key] = new Model(db, key, this.modelsOptions[key])
})
};
// 打开成功
request.onsuccess = () => {
console.log('db open onsuccess')
console.log('addLog, deleteLog, clearLog, putLog, getAllLog, getLog')
resolve(request.result)
this.db = request.result
};
// 打开失败
request.onerror = function(event) {
console.log('db open onerror', event);
reject(event)
}
})
}
}
class Model {
constructor(database, tableName, options) {
this.db = database
this.tableName = tableName
if (!this.db.objectStoreNames.contains(tableName)) {
const objectStore = this.db.createObjectStore(tableName, {
keyPath: options.keyPath,
autoIncrement: options.autoIncrement || false
});
options.index && Object.keys(options.index).forEach(key => {
objectStore.createIndex(key, options.index[key]);
})
}
}
add(data) {
// ... 省略上文的 add 函数
}
delete(id) {
// ... 省略
}
put(data) {
// ... 省略
}
getByIndex(keyPath, keyRange) {
// ... 省略
}
get(indexName, value) {
// ... 省略
}
}
The call is as follows:
(async function() {
const db = new Database({
name: 'db_test',
modelsOptions: {
log: {
keyPath: 'id',
autoIncrement: true,
rows: {
id: 'number',
time: 'number',
body: 'string',
},
index: {
time: 'time'
}
}
}
})
await db.createDB()
await db.models.log.add({time: new Date().getTime(), body: 'log 1' })
await db.models.log.add({time: new Date().getTime(), body: 'log 2' })
await db.models.log.get(null, 1)
const time = new Date().getTime()
await db.models.log.put({id: 1, time: time, body: 'log AAAA' })
await db.models.log.getByIndex('time', IDBKeyRange.only(time))
})()
Of course, this is just a very crude model, and it still has some shortcomings. For example, when querying, the developer does not need to touch IDBKeyRange when calling, similar to the sequelize style, which is mapped to time: { $gt: new Date().getTime() }, and $gt is used to replace IDBKeyRange.lowerbound.
Bulk operations
It is worth mentioning that the operational performance of IndexedDB is closely related to the number of transactions submitted to it. It is recommended to use batch inserts as much as possible.
For batch operations, event delegation can be used to avoid onsuccess and onerror events that generate many requests.
class Model {
// ... 省略 construct
bulkPut(datas) {
if (!(datas && datas.length > 0)) {
return Promise.reject(new Error('no data'))
}
return new Promise((resolve, reject) => {
const db = this.db;
const transaction = db.transaction('log', 'readwrite')
const store = transaction.objectStore('log')
datas.forEach(data => store.put(data))
// Event delegation
// IndexedDB events bubble: request → transaction → database.
transaction.oncomplete = function() {
console.log('add transaction complete');
resolve()
};
transaction.onabort = function (evt) {
console.error('add transaction onabort', evt);
reject(evt.target.error)
}
})
}
}
performance exploration
The insertion time of IndexedDB is significantly related to the number of transactions committed to it. We set up a set of controlled experiments:
- Commit 1000 transactions, each inserting 1 piece of data.
- Submit 1 transaction, and insert 1000 pieces of data into the transaction.
The test code is as follows:
const promises = []
for (let index = 0; index < 1000; index++) {
promises.push(db.models.log.add({time: new Date().getTime(), body: `log ${index}` }))
}
console.time('promises')
Promise.all(promises).then(() => {
console.timeEnd('promises')
})
// promises: 20837.403076171875 ms
const arr = []
for (let index = 0; index < 1000; index++) {
arr.push({time: new Date().getTime(), body: `log ${index}` })
}
console.time('promises')
await db.models.log.bulkPut(arr)
console.timeEnd('promises')
// promises: 250.491943359375 ms
It is very important to reduce transaction commits, so that when a large number of stored operations are required, it is recommended that the logs be merged in memory as much as possible, and then written in batches.
It is worth mentioning that the body only writes single-digit characters in the above control experiment. Assuming that 5000 characters are written each time, the batch writing time is only increased from 250ms to 300ms, and the improvement is not obvious.
Let's compare another set of cases, we will submit 1 transaction, insert 1000 pieces of data, and test between 0 and 5 million stock data, we get the following data:
for (let i = 0; i < 10000; i++) {
let date = new Date()
let datas = []
for (let j = 0; j < 1000; j++) {
datas.push({ time: new Date().getTime(), body: `log ${j}`})
}
await db.models.log.bulkPut(datas)
datas = []
if (i === 10 || i === 50
|| i === 100 || i === 500 || i === 1000 || i === 2000
|| i === 5000) {
console.warn(`success for bulkPut ${i}: `, new Date() - date)
} else {
console.log(`success for bulkPut ${i}: `, new Date() - date)
}
}
// success for bulkPut 10: 283
// success for bulkPut 50: 310
// success for bulkPut 100: 302
// success for bulkPut 500: 296
// success for bulkPut 1000: 290
// success for bulkPut 2000: 150
// success for bulkPut 5000: 201
The above data shows that the fluctuation is not large, and the conclusion is that within the data range of 500w, the insertion time is not significantly improved. Of course, the query depends on more factors, and its time-consuming is left to the readers to verify by themselves.
When multiple tabs operate on the same data
For IndexedDB, it is only responsible for receiving one transaction after another for processing, and no matter which tab page these transactions are submitted from, it may generate multiple tab page JS programs to try to operate the same data in the database. Condition.
Take our db as an example, if we modify the index time when creating the store as follows:
objectStore.createIndex('time', 'time', { unique: true });
Open 3 tabs at the same time, each tab writes a piece of data to the database every 20ms, there is a high probability of error, the ideal way to solve this problem is SharedWorker API, SharedWorker is similar to WebWorker, the difference is that SharedWorker can be used in multiple shared between contexts. We can create a database in SharedWorker, and all browser tabs can request data from the Worker instead of establishing a database connection by ourselves.
Unfortunately the SharedWorker API is not supported in Safari, there is no polyfill. Instead, we can use the BroadcastChannel API, which can communicate among multiple tabs, elect a leader, and allow the leader to have the ability to write to the database, while other tabs can only read and not write.
The following is a simple code for the leader election process, referring to broadcast-channel.
class LeaderElection {
constructor(name) {
this.channel = new BroadcastChannel(name)
// 是否已经存在 leader
this.hasLeader = false
// 是否自己作为 leader
this.isLeader = false
// token 数,用于无 leader 时同时有多个 apply 的情况,来比对 maxTokenNumber 确定最大的作为 leader
this.tokenNumber = Math.random()
// 最大的 token,用于无 leader 时同时有多个 apply 的情况,来选举一个最大的作为 leader
this.maxTokenNumber = 0
this.channel.onmessage = (evt) => {
console.log('channel onmessage', evt.data)
const action = evt.data.action
switch (action) {
// 收到申请拒绝,或者是其他人已成为 leader 的宣告,则标记 this.hasLeader = true
case 'applyReject':
this.hasLeader = true
break;
case 'leader':
// todo, 可能会产生另一个 leader
this.hasLeader = true
break;
// leader 已死亡,则需要重新推举
case 'death':
this.hasLeader = false
this.maxTokenNumber = 0
// this.awaitLeadership()
break;
// leader 已死亡,则需要重新推举
case 'apply':
if (this.isLeader) {
this.postMessage('applyReject')
} else if (this.hasLeader) {
} else if (evt.data.tokenNumber > this.maxTokenNumber) {
// 还没有 leader 时,若自己 tokenNumber 比较小,那么记录 maxTokenNumber,
// 将在 applyOnce 的过程中,撤销成为 leader 的申请。
this.maxTokenNumber = evt.data.tokenNumber
}
break;
default:
break;
}
}
}
awaitLeadership() {
return new Promise((resolve) => {
const intervalApply = () => {
return this.sleep(4000)
.then(() => {
return this.applyOnce()
})
.then(() => resolve())
.catch(() => intervalApply())
}
this.applyOnce()
.then(() => resolve())
.catch(err => intervalApply())
})
}
applyOnce(timeout = 1000) {
return this.postMessage('apply').then(() => this.sleep(timeout))
.then(() => {
if (this.isLeader) {
return
}
if (this.hasLeader === true || this.maxTokenNumber > this.tokenNumber) {
throw new Error()
}
return this.postMessage('apply').then(() => this.sleep(timeout))
})
.then(() => {
if (this.isLeader) {
return
}
if (this.hasLeader === true || this.maxTokenNumber > this.tokenNumber) {
throw new Error()
}
// 两次尝试后无人阻止,晋升为 leader
this.beLeader()
})
}
beLeader () {
this.postMessage('leader')
this.isLeader = true
this.hasLeader = true
clearInterval(this.timeout)
window.addEventListener('beforeunload', () => this.die());
window.addEventListener('unload', () => this.die());
}
die () {
this.isLeader = false
this.hasLeader = false
this.postMessage('death')
}
postMessage(action) {
return new Promise((resolve) => {
this.channel.postMessage({
action,
tokenNumber: this.tokenNumber
})
resolve()
})
}
sleep(time) {
if (!time) time = 0;
return new Promise(res => setTimeout(res, time));
}
}
The calling code is as follows:
const elector = new LeaderElection('test_channel')
window.elector = elector
elector.awaitLeadership().then(() => {
document.title = 'leader!'
})
The effect is like broadcast-channel like this:
Summarize
To store a large amount of data offline in the browser, we can only use IndexedDB at present. Using IndexedDB will encounter several problems:
- The IndexedDB API is transaction-based, biased towards the bottom layer, and the operation is cumbersome and needs to be encapsulated.
- The biggest bottleneck of IndexedDB performance is the number of transactions. When using it, pay attention to reducing transaction commits.
- IndexedDB doesn't care which tab page the transaction is submitted from. In the case of multiple tab pages, the browser may perform multiple operations on the same data record. You can elect a leader to allow writing to avoid this problem.
See github for the code used in this repository: https://github.com/everlose/indexeddb-test
Recommended recent activities
Front-end development, as one of the current hot technologies, has attracted the attention of many developers and learners. NetEase Intelligence and CCF YOCSEF Wuhan jointly created the "Front-end Talking Series of Open Classes" to provide developers with opportunities to learn and communicate. This series of open front-end courses can meet the different needs of developers from basic introduction to enterprise practice to future employment, and help developers understand and master front-end knowledge in simple terms. The schedule of this series of open classes is as follows, everyone is welcome to register:
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。