Choosing the Best Write Option for Your Use Case

There are a few options to write data to DolphinDB server using the Python API. This section explains the differences between these options and their optimal use cases.

Use CaseRecommended Write OptionDescription
Upload variablesSession.uploadDirect upload of variables with flexible type conversion. Applicable to all data types.
Pass in local data as arguments when executing server functionsrunUploads data as function arguments. Applicable to all data types.
Perform database and table operations on server from Python clientSession.table and other methods (such as Session.loadTable, Session.loadText)Manipulates server data from Python client.
For data that can be conveniently converted into DataFrames, use this option to batch write data to server.TableAppender, TableUpserter, PartitionedTableAppenderAuto type conversion is supported. For details, see the "TableAppender, TableUpserter and PartitionedTableAppender" section below.
Write stream dataMultithreadedTableWriterSends stream data to server in batches with auto type conversion.
The API client is struggling with limited resources while the DolphinDB server has sufficient resources availableEnable asynchronous mode on a sessionShifts the burden of writing data from the client to the server. This option is not usually recommended.

upload, table and tableInsert

Session.upload

You can upload data to the DolphinDB server using the Session.upload method. This method can be used to upload various DolphinDB data structures, including tables, dictionaries and vectors. When uploading variables with upload(), you must specify the name of the variable in DolphinDB after the upload.

>>> data = pd.DataFrame({·
...     'clong': [1, 2, 3],
... })
...
>>> s.upload({'data': data})

Python has multiple data type systems, including native Python types, NumPy types, and Pandas types. These do not map perfectly to DolphinDB's type system. As a result, some DolphinDB data types, such as UUID and MINUTE, cannot be directly uploaded from Python. However, DolphinDB Python API version 1.30.22.1 and later provide a __DolphinDB_Type__ attribute, which can be specified when uploading Pandas DataFrames to explicitly cast columns into particular DolphinDB data types.

Note: Variables uploaded through upload must be manually released on the server to prevent the session from consuming too much memory.

Session.table

You can pass a local Python object, such as a Pandas DataFrame or dictionary, to the Session.table() method to upload it as a temporary table in DolphinDB. The life span of the temporary table is managed through the Python API. This option is only supported for uploading objects as DolphinDB tables. Under the hood, Session.table() calls Session.upload(), so you can use the __DolphinDB_Type__ attribute to explicitly cast columns to particular DolphinDB data types when uploading.

Unlike upload, Session.table returns a Table instance in Python. Deleting this instance will also delete the temporary DolphinDB table on the server.

Example

>>> data = pd.DataFrame({
...     'clong': [1, 2, 3],
... })
...
>>> tb = s.table(data=data)
>>> tb
<dolphindb.table.Table object at 0x7faf42e67a00>

tableInsert

tableInsert is not a Python API method, but a built-in DolphinDB function. You can execute it using the Session.run() method, passing in the data you want to insert as an argument. This will insert the data into the target table on the server.

  • This option uses the same data serialization process as writing with upload() or table().
  • This is a more streamlined approach when the uploaded data does not require a variable name. For example, when appending data to a table, you don’t have to upload the data to a temporary table on the server then insert it to the target table by calling tableInsert. Instead, you can simply pass the data as an argument of run and it will be inserted to the target table through tableInsert.

Example

>>> data = pd.DataFrame({
...     'clong': [1, 2, 3],
... })
...
>>> s.run("t = table(100:0, [`clong], [LONG])")
>>> s.run("tableInsert{t}", data)
3

If you need to specify additional user permissions, or the data insertion involves multiple steps, you can encapsulate the script into a function view, and pass the data to insert as its argument.

For details, see

  • DolphinDB User Manual - Partial Application
  • DolphinDB User Manual - Function View

Comparing Session.upload , Session.table , and run+ tableInsert

These three write options essentially follow the same underlying process:

(1) The API first checks the data type and structure of the data to be uploaded. (2) The data is converted into a DolphinDB type. (3) The data is uploaded to the server, either as a variable or function argument.

The differences is shown in the following table:

Write OptionImplementationUse Case
Session.uploaddirect uploadupload objects of all data forms
Session.tablecalls Session.upload internallyupload data as DolphinDB tables
run + tableInsertuploads data as an argument of runExecute server functions or function views that accept arguments

TableAppender, TableUpserter and PartitionedTableAppender

TableAppender

The TableAppender class internally calls the run("tableInsert{tableName}", data) method to append data to a table. However, unlike directly calling the run method, a TableAppender obtains the schema of the target table during its construction. Therefore, it can automatically convert the appended data to the corresponding column type.

Example

>>> s.run("t = table(100:0, `csymbol`cvalue, [SYMBOL, LONG])")
>>> tbAppender = ddb.TableAppender(tableName="t", ddbSession=s)
>>> data = pd.DataFrame({
...     'csymbol': ["aaa", "bbb", "aaa"],
...     'cvalue': [1, 2, 3],
... })
...
>>> tbAppender.append(data)
3

TableUpserter

The TableUpserter class also obtains the schema of the target table during its construction. Therefore, it also supports automatic column type conversion. Internally, the TableUpserter is equivalent to the upsert! server function. The key column(s) must be specified during the construction of the TableUpserter.

Example

 >>> s.run("t = keyedTable(`csymbol, 100:0, `csymbol`cvalue, [SYMBOL, LONG])")
>>> tbUpserter = ddb.TableUpserter(tableName="t", ddbSession=s, keyColNames=["csymbol"])
>>> data = pd.DataFrame({
...     'csymbol': ["aaa", "bbb", "aaa"],
...     'cvalue': [1, 2, 3],
... })
...
>>> tbUpserter.upsert(data)

For more information, see DolphinDB User Manual - upsert!

PartitionedTableAppender

Unlike TableAppender and TableUpserter, which write data based on sessions, the PartitionedTableAppender class accepts a DBConnectionPool as an argument during its construction so that data can be concurrently appended to the target partitioned table using the connection pool.

The PartitionedTableAppender also supports automatic column type conversion.

Example

>>> if s.existsDatabase("dfs://test"):
...     s.dropDatabase("dfs://test")
...
>>> db = s.database(dbPath="dfs://test", partitionType=keys.VALUE, partitions=[1, 2, 3])
>>> s.run("schema_table = table(100:0, `cindex`cvalue, [INT, DOUBLE]);")
>>> schema_table = s.table(data="schema_table")
>>> tb = db.createPartitionedTable(table=schema_table, tableName="pt", partitionColumns="cindex")
>>> pool = ddb.DBConnectionPool("localhost", 8848, 3, "admin", "123456")
>>> ptableAppender = ddb.PartitionedTableAppender(dbPath="dfs://test", tableName="pt", partitionColName="cindex", dbConnectionPool=pool)
>>> data = pd.DataFrame({
...     'cindex': [1, 2, 3, 4, 5],
...     'cvalue': [1.1, 2.2, 3.3, 4.4, 5.5]
... })
...
>>> ptableAppender.append(data)
5

Comparing TableAppender, TableUpserter, and PartitionedTableAppender

All of these options append pandas DataFrames to the target table with automatic column type conversion, but each has different use cases.

OptionImplementationUse Case
TableAppenderequivalent to run("tableInsert{tableName}", data)append to all types of tables
TableUpserterequivalent to the upsert! server functionupdate or append to keyed tables, index tables and partitioned tables
PartitionedTableAppenderaccepts a DBConnectionPool as an argument during its construction, and appends data concurrently to the target partitioned table using the connection pool.append to tables that support concurrent writing, such as partitioned tables

MultithreadedTableWriter, BatchTableWriter and Asynchronous tableInsert

MultithreadedTableWriter

The MultithreadedTableWriter (MTW) class starts multiple C++ threads in the background for asynchronous data type conversion, data uploading, and writing. For each target table, you must construct an MTW object to write to that table. When the insert method of MTW is called, the MTW object does not immediately perform type conversion, but instead adds the data to a conversion queue. It then adds the converted data to a writing queue. Multiple writing queues are used to write data to the server concurrently.

Example

>>> if s.existsDatabase("dfs://test"):
...     s.dropDatabase("dfs://test")
>>> db = s.database(dbPath="dfs://test", partitionType=keys.VALUE, partitions=[1, 2, 3])
>>> s.run("schema_table = table(100:0, `cindex`cvalue, [INT, DOUBLE])")
>>> schema_table = s.table(data="schema_table")
>>> pt = db.createPartitionedTable(table=schema_table, tableName="pt", partitionColumns="cindex")
>>> writer = ddb.MultithreadedTableWriter("localhost", 8848, "admin", "123456", dbPath="dfs://test", tableName="pt", threadCount=1)
>>> for i in range(100):
...     writer.insert(i, i*1.1)
>>> writer.waitForThreadCompletion()
>>> res = writer.getStatus()
>>> if res.succeed():
...     print("Data successfully written.")
... 
Data successfully written.

BatchTableWriter

The BatchTableWriter (BTW) class creates one writer thread for each target table. Unlike the MultithreadedTableWriter, the BatchTableWriter performs data type conversion synchronously during data insertion, resulting in lower performance.

Note: The BTW class is no longer maintained.

Asynchronous tableInsert

As mentioned previously, tableInsert is not a Python API method, but a server function. The asynchronous tableInsert option means enabling asynchronous mode in a session, using run to send the write script to the server, and then immediately returning without waiting for the script to finish executing. For details, see Asynchronous Session Submission.

Example

>>> s = ddb.Session(enableASYNC=True)
>>> s.connect("localhost", 8848, "admin", "123456")
>>> s.run("t = table(100:0, `cindex`cvalue, [INT, DOUBLE]);")
>>> data = pd.DataFrame({
...     'cindex': [1, 2, 3, 4, 5],
...     'cvalue': [1.1, 2.2, 3.3, 4.4, 5.5]
... })
... 
>>> for i in range(100):
...     s.run("tableInsert{t}", data)
... 

Comparing MTW, BTW, and Asynchronous tableInsert

The internal workings of these three asynchronous write options differ slightly:

OptionImplementationUse CaseProsCons
MTWUses C++ writer threads in the background. Provides the batchSize and throttle parameters for specifying the granularity of data batches and the maximum waiting time before sending data to server.Write stream dataSupports auto data type conversion; sends stream data to server in batches to improve network performanceData type conversion is conducted in single-threaded mode due to the Python Global Interpreter Lock.
BTWUses C++ writer threads in the background. Sends data to server for every 100 ms.Write stream dataSends stream data to server in batches to improve network performanceAuto data type conversion is not supported.
Asynchronous tableInsertExecutes the run method in a session with asynchronous mode enabledWhen network bandwidth is limitedReduces network transmission time and waiting timeShifts the burden of writing data from the client to the server. May cause server overload. Unsuitable for writing 1 record at a time, which would take a large amount of server resources.
  • The MultithreadedTableWriter uses multiple background threads to split data into batches and upload, which reduces strain on the network.
  • The BatchTableWriter (BTW) is deprecated. The MultithreadedTableWriter (MTW) can be used in all cases where the BTW was previously used.