Choosing the Best Write Option for Your Use Case

There are a few options to write data to DolphinDB server using the Python API. This section explains the differences between these options and their optimal use cases.


Use Case	Recommended Write Option	Description
Upload variables	`Session.upload`	Direct upload of variables with flexible type conversion. Applicable to all data types.
Pass in local data as arguments when executing server functions	`run`	Uploads data as function arguments. Applicable to all data types.
Perform database and table operations on server from Python client	`Session.table` and other methods (such as `Session.loadTable`, `Session.loadText`)	Manipulates server data from Python client.
For data that can be conveniently converted into DataFrames, use this option to batch write data to server.	TableAppender, TableUpserter, PartitionedTableAppender	Auto type conversion is supported. For details, see the "TableAppender, TableUpserter and PartitionedTableAppender" section below.
Write stream data	MultithreadedTableWriter	Sends stream data to server in batches with auto type conversion.
The API client is struggling with limited resources while the DolphinDB server has sufficient resources available	Enable asynchronous mode on a session	Shifts the burden of writing data from the client to the server. This option is not usually recommended.

upload, table and tableInsert

`Session.upload`

You can upload data to the DolphinDB server using the Session.upload method. This method can be used to upload various DolphinDB data structures, including tables, dictionaries and vectors. When uploading variables with upload(), you must specify the name of the variable in DolphinDB after the upload.

>>> data = pd.DataFrame({·
...     'clong': [1, 2, 3],
... })
...
>>> s.upload({'data': data})

Python has multiple data type systems, including native Python types, NumPy types, and Pandas types. These do not map perfectly to DolphinDB's type system. As a result, some DolphinDB data types, such as UUID and MINUTE, cannot be directly uploaded from Python. However, DolphinDB Python API version 1.30.22.1 and later provide a __DolphinDB_Type__ attribute, which can be specified when uploading Pandas DataFrames to explicitly cast columns into particular DolphinDB data types.

Note: Variables uploaded through upload must be manually released on the server to prevent the session from consuming too much memory.

`Session.table`

You can pass a local Python object, such as a Pandas DataFrame or dictionary, to the Session.table() method to upload it as a temporary table in DolphinDB. The life span of the temporary table is managed through the Python API. This option is only supported for uploading objects as DolphinDB tables. Under the hood, Session.table() calls Session.upload(), so you can use the __DolphinDB_Type__ attribute to explicitly cast columns to particular DolphinDB data types when uploading.

Unlike upload, Session.table returns a Table instance in Python. Deleting this instance will also delete the temporary DolphinDB table on the server.

Example

>>> data = pd.DataFrame({
...     'clong': [1, 2, 3],
... })
...
>>> tb = s.table(data=data)
>>> tb
<dolphindb.table.Table object at 0x7faf42e67a00>

`tableInsert`

tableInsert is not a Python API method, but a built-in DolphinDB function. You can execute it using the Session.run() method, passing in the data you want to insert as an argument. This will insert the data into the target table on the server.

This option uses the same data serialization process as writing with upload() or table().
This is a more streamlined approach when the uploaded data does not require a variable name. For example, when appending data to a table, you don’t have to upload the data to a temporary table on the server then insert it to the target table by calling tableInsert. Instead, you can simply pass the data as an argument of run and it will be inserted to the target table through tableInsert.

Example

>>> data = pd.DataFrame({
...     'clong': [1, 2, 3],
... })
...
>>> s.run("t = table(100:0, [`clong], [LONG])")
>>> s.run("tableInsert{t}", data)
3

If you need to specify additional user permissions, or the data insertion involves multiple steps, you can encapsulate the script into a function view, and pass the data to insert as its argument.

For details, see

DolphinDB User Manual - Partial Application
DolphinDB User Manual - Function View

Comparing `Session.upload` , `Session.table` , and `run`+ `tableInsert`

These three write options essentially follow the same underlying process:

(1) The API first checks the data type and structure of the data to be uploaded. (2) The data is converted into a DolphinDB type. (3) The data is uploaded to the server, either as a variable or function argument.

The differences is shown in the following table:


Write Option	Implementation	Use Case
`Session.upload`	direct upload	upload objects of all data forms
`Session.table`	calls `Session.upload` internally	upload data as DolphinDB tables
`run` + `tableInsert`	uploads data as an argument of `run`	Execute server functions or function views that accept arguments

TableAppender, TableUpserter and PartitionedTableAppender

TableAppender

The TableAppender class internally calls the run("tableInsert{tableName}", data) method to append data to a table. However, unlike directly calling the run method, a TableAppender obtains the schema of the target table during its construction. Therefore, it can automatically convert the appended data to the corresponding column type.

Example

>>> s.run("t = table(100:0, `csymbol`cvalue, [SYMBOL, LONG])")
>>> tbAppender = ddb.TableAppender(tableName="t", ddbSession=s)
>>> data = pd.DataFrame({
...     'csymbol': ["aaa", "bbb", "aaa"],
...     'cvalue': [1, 2, 3],
... })
...
>>> tbAppender.append(data)
3

TableUpserter

The TableUpserter class also obtains the schema of the target table during its construction. Therefore, it also supports automatic column type conversion. Internally, the TableUpserter is equivalent to the upsert! server function. The key column(s) must be specified during the construction of the TableUpserter.

Example

 >>> s.run("t = keyedTable(`csymbol, 100:0, `csymbol`cvalue, [SYMBOL, LONG])")
>>> tbUpserter = ddb.TableUpserter(tableName="t", ddbSession=s, keyColNames=["csymbol"])
>>> data = pd.DataFrame({
...     'csymbol': ["aaa", "bbb", "aaa"],
...     'cvalue': [1, 2, 3],
... })
...
>>> tbUpserter.upsert(data)

For more information, see DolphinDB User Manual - upsert!

PartitionedTableAppender

Unlike TableAppender and TableUpserter, which write data based on sessions, the PartitionedTableAppender class accepts a DBConnectionPool as an argument during its construction so that data can be concurrently appended to the target partitioned table using the connection pool.

The PartitionedTableAppender also supports automatic column type conversion.

Example

>>> if s.existsDatabase("dfs://test"):
...     s.dropDatabase("dfs://test")
...
>>> db = s.database(dbPath="dfs://test", partitionType=keys.VALUE, partitions=[1, 2, 3])
>>> s.run("schema_table = table(100:0, `cindex`cvalue, [INT, DOUBLE]);")
>>> schema_table = s.table(data="schema_table")
>>> tb = db.createPartitionedTable(table=schema_table, tableName="pt", partitionColumns="cindex")
>>> pool = ddb.DBConnectionPool("localhost", 8848, 3, "admin", "123456")
>>> ptableAppender = ddb.PartitionedTableAppender(dbPath="dfs://test", tableName="pt", partitionColName="cindex", dbConnectionPool=pool)
>>> data = pd.DataFrame({
...     'cindex': [1, 2, 3, 4, 5],
...     'cvalue': [1.1, 2.2, 3.3, 4.4, 5.5]
... })
...
>>> ptableAppender.append(data)
5

Comparing TableAppender, TableUpserter, and PartitionedTableAppender

All of these options append pandas DataFrames to the target table with automatic column type conversion, but each has different use cases.


Option	Implementation	Use Case
TableAppender	equivalent to `run("tableInsert{tableName}", data)`	append to all types of tables
TableUpserter	equivalent to the `upsert!` server function	update or append to keyed tables, index tables and partitioned tables
PartitionedTableAppender	accepts a DBConnectionPool as an argument during its construction, and appends data concurrently to the target partitioned table using the connection pool.	append to tables that support concurrent writing, such as partitioned tables

MultithreadedTableWriter, BatchTableWriter and Asynchronous `tableInsert`

MultithreadedTableWriter

The MultithreadedTableWriter (MTW) class starts multiple C++ threads in the background for asynchronous data type conversion, data uploading, and writing. For each target table, you must construct an MTW object to write to that table. When the insert method of MTW is called, the MTW object does not immediately perform type conversion, but instead adds the data to a conversion queue. It then adds the converted data to a writing queue. Multiple writing queues are used to write data to the server concurrently.

Example

>>> if s.existsDatabase("dfs://test"):
...     s.dropDatabase("dfs://test")
>>> db = s.database(dbPath="dfs://test", partitionType=keys.VALUE, partitions=[1, 2, 3])
>>> s.run("schema_table = table(100:0, `cindex`cvalue, [INT, DOUBLE])")
>>> schema_table = s.table(data="schema_table")
>>> pt = db.createPartitionedTable(table=schema_table, tableName="pt", partitionColumns="cindex")
>>> writer = ddb.MultithreadedTableWriter("localhost", 8848, "admin", "123456", dbPath="dfs://test", tableName="pt", threadCount=1)
>>> for i in range(100):
...     writer.insert(i, i*1.1)
>>> writer.waitForThreadCompletion()
>>> res = writer.getStatus()
>>> if res.succeed():
...     print("Data successfully written.")
... 
Data successfully written.

BatchTableWriter

The BatchTableWriter (BTW) class creates one writer thread for each target table. Unlike the MultithreadedTableWriter, the BatchTableWriter performs data type conversion synchronously during data insertion, resulting in lower performance.

Note: The BTW class is no longer maintained.

Asynchronous `tableInsert`

As mentioned previously, tableInsert is not a Python API method, but a server function. The asynchronous tableInsert option means enabling asynchronous mode in a session, using run to send the write script to the server, and then immediately returning without waiting for the script to finish executing. For details, see Asynchronous Session Submission.

Example

>>> s = ddb.Session(enableASYNC=True)
>>> s.connect("localhost", 8848, "admin", "123456")
>>> s.run("t = table(100:0, `cindex`cvalue, [INT, DOUBLE]);")
>>> data = pd.DataFrame({
...     'cindex': [1, 2, 3, 4, 5],
...     'cvalue': [1.1, 2.2, 3.3, 4.4, 5.5]
... })
... 
>>> for i in range(100):
...     s.run("tableInsert{t}", data)
...

Comparing MTW, BTW, and Asynchronous `tableInsert`

The internal workings of these three asynchronous write options differ slightly:


Option	Implementation	Use Case	Pros	Cons
MTW	Uses C++ writer threads in the background. Provides the batchSize and throttle parameters for specifying the granularity of data batches and the maximum waiting time before sending data to server.	Write stream data	Supports auto data type conversion; sends stream data to server in batches to improve network performance	Data type conversion is conducted in single-threaded mode due to the Python Global Interpreter Lock.
BTW	Uses C++ writer threads in the background. Sends data to server for every 100 ms.	Write stream data	Sends stream data to server in batches to improve network performance	Auto data type conversion is not supported.
Asynchronous `tableInsert`	Executes the `run` method in a session with asynchronous mode enabled	When network bandwidth is limited	Reduces network transmission time and waiting time	Shifts the burden of writing data from the client to the server. May cause server overload. Unsuitable for writing 1 record at a time, which would take a large amount of server resources.

The MultithreadedTableWriter uses multiple background threads to split data into batches and upload, which reduces strain on the network.
The BatchTableWriter (BTW) is deprecated. The MultithreadedTableWriter (MTW) can be used in all cases where the BTW was previously used.

Choosing the Best Write Option for Your Use Case

upload, table and tableInsert

Session.upload

Session.table

tableInsert

Comparing Session.upload , Session.table , and run+ tableInsert

TableAppender, TableUpserter and PartitionedTableAppender

TableAppender

TableUpserter

PartitionedTableAppender

Comparing TableAppender, TableUpserter, and PartitionedTableAppender

MultithreadedTableWriter, BatchTableWriter and Asynchronous tableInsert

MultithreadedTableWriter

BatchTableWriter

Asynchronous tableInsert

Comparing MTW, BTW, and Asynchronous tableInsert

`Session.upload`

`Session.table`

`tableInsert`

Comparing `Session.upload` , `Session.table` , and `run`+ `tableInsert`

MultithreadedTableWriter, BatchTableWriter and Asynchronous `tableInsert`

Asynchronous `tableInsert`

Comparing MTW, BTW, and Asynchronous `tableInsert`