头图

The role of the cursor

Now that the web-side business system has been encapsulated layer by layer, the cursor is not used much, but when processing large-scale data, the cursor is still used for fast speed and memory saving, which can greatly improve the query efficiency.

Assuming that there are 100 million records in a data table, all of which need to be exported or processed for business data, it is obviously impossible to read them into memory at one time using OR Mapping, so it is obviously necessary to process them in stages. The common segmentation method is to specify the returned record window through offset+limit (flask-sqlalchemy directly provides the paginate object to implement paging, which should be the encapsulation of offset+limit).
This method is intuitive, but the disadvantages are also obvious: the huge query is re-executed for each segment, and only a small part of the data set is returned, which is a huge waste of database performance.

At this point, the cursor should come into play: the value of the Cursor is that it can only execute a query, cache the result position index, the cursor records the current position, and then each time n records are obtained through fetch, the cursor moves n positions, thus Save a lot of disk IO.

In terms of implementation, Cursor is divided into server-side cursors and client-side cursors. The difference lies in where to cache the result set index. Server-side caching allows multiple connections to share the result set, and the memory configuration of the database is often very high; however, The current database client is usually an application server, and the memory configuration is not low. It is helpful to share the pressure on the server by doing client-side caching. SQLAlchemy's default connections are client-side caches.

SQLAlchemy get cursor

If you search from the Internet, the code you usually find is like this:

 engine = create_engine('sqlite:///file.db')
connection = engine.connect()
try:
    cursor_obj = connection.cursor()
    cursor_obj.execute("select * from table1")
    results_one = cursor_obj.fetchall()
    cursor_obj.close()
finally:
    connection.close()

Or use raw_connection instead of connection, but if we are in a web environment and use a similar flask-sqlalchemy environment, obviously we can't manage database connections ourselves, and all transactions must be managed uniformly, so we need to get the connection from the session:

 db = SQLAlchemy()
    db.init_app(app)

    session = db.session()
    cursor = session.execute('select * from user limit 10').cursor
    result = cursor.fetchall()
    print(result)

Fetch data using cursor

After getting the cursor, you can use the cursor to get the data. Note that the cursor can only be moved sequentially from front to back, and it is impossible to go back and re-fetch, so the action of fetch is to get the data and move the cursor. There are three fetch methods: fetchone(), fetchmany(size) and fetchall(), as the name suggests, to get one record, multiple records and all records respectively.
The result set obtained by fetchmany and fetchall is a nested tuple, the first layer is row, the second layer is column; and fetchone has only one record because it is clear, the result set is a single layer tuple. For example, suppose the user table has four fields
id | account | name | comment

The following code:

 cursor = session.execute('select * from user limit 10').cursor
        result = cursor.fetchmany(2)
        print(result)
        result = cursor.fetchmany(2)
        print(result)
        result = cursor.fetchone()
        print(result)
        result = cursor.fetchall()

The output looks like the following:

 ((1, 'acc1', 'acc1', None), (2, 'acc2', 'acc2', None))
((3, 'acc3', 'acc3', None), (4, 'acc4', 'acc4', None))
(5, 'acc6', 'acc6', None)
((6, 'acc7', 'acc7', None), (7, 'acc8', 'acc8', None), (8, 'acc9', 'acc9', None), (9, 'acc10', 'acc10', None), (10, 'acc11', 'acc11', None))

songofhawk
303 声望24 粉丝