Choosing the Best Write Option for Your Use Case
There are a few options to write data to DolphinDB server using the Python API. This section explains the differences between these options and their optimal use cases.
Use Case | Recommended Write Option | Description |
---|---|---|
Upload variables | Session.upload | Direct upload of variables with flexible type conversion. Applicable to all data types. |
Pass in local data as arguments when executing server functions | run | Uploads data as function arguments. Applicable to all data types. |
Perform database and table operations on server from Python client | Session.table and other methods (such as Session.loadTable , Session.loadText ) | Manipulates server data from Python client. |
For data that can be conveniently converted into DataFrames, use this option to batch write data to server. | TableAppender, TableUpserter, PartitionedTableAppender | Auto type conversion is supported. For details, see the "TableAppender, TableUpserter and PartitionedTableAppender" section below. |
Write stream data | MultithreadedTableWriter | Sends stream data to server in batches with auto type conversion. |
The API client is struggling with limited resources while the DolphinDB server has sufficient resources available | Enable asynchronous mode on a session | Shifts the burden of writing data from the client to the server. This option is not usually recommended. |
upload, table and tableInsert
Session.upload
You can upload data to the DolphinDB server using the Session.upload
method. This method can be used to upload various DolphinDB data structures, including tables, dictionaries and vectors. When uploading variables with upload()
, you must specify the name of the variable in DolphinDB after the upload.
>>> data = pd.DataFrame({·
... 'clong': [1, 2, 3],
... })
...
>>> s.upload({'data': data})
Python has multiple data type systems, including native Python types, NumPy types, and Pandas types. These do not map perfectly to DolphinDB's type system. As a result, some DolphinDB data types, such as UUID and MINUTE, cannot be directly uploaded from Python. However, DolphinDB Python API version 1.30.22.1 and later provide a __DolphinDB_Type__
attribute, which can be specified when uploading Pandas DataFrames to explicitly cast columns into particular DolphinDB data types.
Note: Variables uploaded through upload
must be manually released on the server to prevent the session from consuming too much memory.
Session.table
You can pass a local Python object, such as a Pandas DataFrame or dictionary, to the Session.table()
method to upload it as a temporary table in DolphinDB. The life span of the temporary table is managed through the Python API. This option is only supported for uploading objects as DolphinDB tables. Under the hood, Session.table()
calls Session.upload()
, so you can use the __DolphinDB_Type__
attribute to explicitly cast columns to particular DolphinDB data types when uploading.
Unlike upload
, Session.table
returns a Table instance in Python. Deleting this instance will also delete the temporary DolphinDB table on the server.
Example
>>> data = pd.DataFrame({
... 'clong': [1, 2, 3],
... })
...
>>> tb = s.table(data=data)
>>> tb
<dolphindb.table.Table object at 0x7faf42e67a00>
tableInsert
tableInsert
is not a Python API method, but a built-in DolphinDB function. You can execute it using the Session.run()
method, passing in the data you want to insert as an argument. This will insert the data into the target table on the server.
- This option uses the same data serialization process as writing with
upload()
ortable()
. - This is a more streamlined approach when the uploaded data does not require a variable name. For example, when appending data to a table, you don’t have to upload the data to a temporary table on the server then insert it to the target table by calling
tableInsert
. Instead, you can simply pass the data as an argument ofrun
and it will be inserted to the target table throughtableInsert
.
Example
>>> data = pd.DataFrame({
... 'clong': [1, 2, 3],
... })
...
>>> s.run("t = table(100:0, [`clong], [LONG])")
>>> s.run("tableInsert{t}", data)
3
If you need to specify additional user permissions, or the data insertion involves multiple steps, you can encapsulate the script into a function view, and pass the data to insert as its argument.
For details, see
- DolphinDB User Manual - Partial Application
- DolphinDB User Manual - Function View
Comparing Session.upload
, Session.table
, and run
+ tableInsert
These three write options essentially follow the same underlying process:
(1) The API first checks the data type and structure of the data to be uploaded. (2) The data is converted into a DolphinDB type. (3) The data is uploaded to the server, either as a variable or function argument.
The differences is shown in the following table:
Write Option | Implementation | Use Case |
---|---|---|
Session.upload | direct upload | upload objects of all data forms |
Session.table | calls Session.upload internally | upload data as DolphinDB tables |
run + tableInsert | uploads data as an argument of run | Execute server functions or function views that accept arguments |
TableAppender, TableUpserter and PartitionedTableAppender
TableAppender
The TableAppender class internally calls the run("tableInsert{tableName}", data)
method to append data to a table. However, unlike directly calling the run
method, a TableAppender obtains the schema of the target table during its construction. Therefore, it can automatically convert the appended data to the corresponding column type.
Example
>>> s.run("t = table(100:0, `csymbol`cvalue, [SYMBOL, LONG])")
>>> tbAppender = ddb.TableAppender(tableName="t", ddbSession=s)
>>> data = pd.DataFrame({
... 'csymbol': ["aaa", "bbb", "aaa"],
... 'cvalue': [1, 2, 3],
... })
...
>>> tbAppender.append(data)
3
TableUpserter
The TableUpserter class also obtains the schema of the target table during its construction. Therefore, it also supports automatic column type conversion. Internally, the TableUpserter is equivalent to the upsert!
server function. The key column(s) must be specified during the construction of the TableUpserter.
Example
>>> s.run("t = keyedTable(`csymbol, 100:0, `csymbol`cvalue, [SYMBOL, LONG])")
>>> tbUpserter = ddb.TableUpserter(tableName="t", ddbSession=s, keyColNames=["csymbol"])
>>> data = pd.DataFrame({
... 'csymbol': ["aaa", "bbb", "aaa"],
... 'cvalue': [1, 2, 3],
... })
...
>>> tbUpserter.upsert(data)
For more information, see DolphinDB User Manual - upsert!
PartitionedTableAppender
Unlike TableAppender and TableUpserter, which write data based on sessions, the PartitionedTableAppender class accepts a DBConnectionPool as an argument during its construction so that data can be concurrently appended to the target partitioned table using the connection pool.
The PartitionedTableAppender also supports automatic column type conversion.
Example
>>> if s.existsDatabase("dfs://test"):
... s.dropDatabase("dfs://test")
...
>>> db = s.database(dbPath="dfs://test", partitionType=keys.VALUE, partitions=[1, 2, 3])
>>> s.run("schema_table = table(100:0, `cindex`cvalue, [INT, DOUBLE]);")
>>> schema_table = s.table(data="schema_table")
>>> tb = db.createPartitionedTable(table=schema_table, tableName="pt", partitionColumns="cindex")
>>> pool = ddb.DBConnectionPool("localhost", 8848, 3, "admin", "123456")
>>> ptableAppender = ddb.PartitionedTableAppender(dbPath="dfs://test", tableName="pt", partitionColName="cindex", dbConnectionPool=pool)
>>> data = pd.DataFrame({
... 'cindex': [1, 2, 3, 4, 5],
... 'cvalue': [1.1, 2.2, 3.3, 4.4, 5.5]
... })
...
>>> ptableAppender.append(data)
5
Comparing TableAppender, TableUpserter, and PartitionedTableAppender
All of these options append pandas DataFrames to the target table with automatic column type conversion, but each has different use cases.
Option | Implementation | Use Case |
---|---|---|
TableAppender | equivalent to run("tableInsert{tableName}", data) | append to all types of tables |
TableUpserter | equivalent to the upsert! server function | update or append to keyed tables, index tables and partitioned tables |
PartitionedTableAppender | accepts a DBConnectionPool as an argument during its construction, and appends data concurrently to the target partitioned table using the connection pool. | append to tables that support concurrent writing, such as partitioned tables |
MultithreadedTableWriter, BatchTableWriter and Asynchronous tableInsert
MultithreadedTableWriter
The MultithreadedTableWriter (MTW) class starts multiple C++ threads in the background for asynchronous data type conversion, data uploading, and writing. For each target table, you must construct an MTW object to write to that table. When the insert
method of MTW is called, the MTW object does not immediately perform type conversion, but instead adds the data to a conversion queue. It then adds the converted data to a writing queue. Multiple writing queues are used to write data to the server concurrently.
Example
>>> if s.existsDatabase("dfs://test"):
... s.dropDatabase("dfs://test")
>>> db = s.database(dbPath="dfs://test", partitionType=keys.VALUE, partitions=[1, 2, 3])
>>> s.run("schema_table = table(100:0, `cindex`cvalue, [INT, DOUBLE])")
>>> schema_table = s.table(data="schema_table")
>>> pt = db.createPartitionedTable(table=schema_table, tableName="pt", partitionColumns="cindex")
>>> writer = ddb.MultithreadedTableWriter("localhost", 8848, "admin", "123456", dbPath="dfs://test", tableName="pt", threadCount=1)
>>> for i in range(100):
... writer.insert(i, i*1.1)
>>> writer.waitForThreadCompletion()
>>> res = writer.getStatus()
>>> if res.succeed():
... print("Data successfully written.")
...
Data successfully written.
BatchTableWriter
The BatchTableWriter (BTW) class creates one writer thread for each target table. Unlike the MultithreadedTableWriter, the BatchTableWriter performs data type conversion synchronously during data insertion, resulting in lower performance.
Note: The BTW class is no longer maintained.
Asynchronous tableInsert
As mentioned previously, tableInsert
is not a Python API method, but a server function. The asynchronous tableInsert
option means enabling asynchronous mode in a session, using run
to send the write script to the server, and then immediately returning without waiting for the script to finish executing. For details, see Asynchronous Session Submission.
Example
>>> s = ddb.Session(enableASYNC=True)
>>> s.connect("localhost", 8848, "admin", "123456")
>>> s.run("t = table(100:0, `cindex`cvalue, [INT, DOUBLE]);")
>>> data = pd.DataFrame({
... 'cindex': [1, 2, 3, 4, 5],
... 'cvalue': [1.1, 2.2, 3.3, 4.4, 5.5]
... })
...
>>> for i in range(100):
... s.run("tableInsert{t}", data)
...
Comparing MTW, BTW, and Asynchronous tableInsert
The internal workings of these three asynchronous write options differ slightly:
Option | Implementation | Use Case | Pros | Cons |
---|---|---|---|---|
MTW | Uses C++ writer threads in the background. Provides the batchSize and throttle parameters for specifying the granularity of data batches and the maximum waiting time before sending data to server. | Write stream data | Supports auto data type conversion; sends stream data to server in batches to improve network performance | Data type conversion is conducted in single-threaded mode due to the Python Global Interpreter Lock. |
BTW | Uses C++ writer threads in the background. Sends data to server for every 100 ms. | Write stream data | Sends stream data to server in batches to improve network performance | Auto data type conversion is not supported. |
Asynchronous tableInsert | Executes the run method in a session with asynchronous mode enabled | When network bandwidth is limited | Reduces network transmission time and waiting time | Shifts the burden of writing data from the client to the server. May cause server overload. Unsuitable for writing 1 record at a time, which would take a large amount of server resources. |
- The MultithreadedTableWriter uses multiple background threads to split data into batches and upload, which reduces strain on the network.
- The BatchTableWriter (BTW) is deprecated. The MultithreadedTableWriter (MTW) can be used in all cases where the BTW was previously used.