Data Type Conversion

When the API client communicates with the DolphinDB server, you can specify a protocol to format the transmitted data by setting the protocol parameter.

Python has multiple type systems, and they do not map perfectly to the DolphinDB type system. Starting in version 1.30.21.1 of the DolphinDB Python API, a new protocol parameter was added to the session and DBConnectionPool classes. protocol can take the values PROTOCOL_DDB, PROTOCOL_PICKLE (the default), or PROTOCOL_ARROW. For DolphinDB data types that cannot be directly represented in Python (like UUID, IPADDR, and SECOND), the API supports forced type conversion.

The following examples demonstrate how to specify the formatting protocol when constructing a session:

import dolphindb as ddb
import dolphindb.settings as keys
# use PROTOCOL_DDB
s = ddb.Session(protocol=keys.PROTOCOL_DDB)
# use PROTOCOL_PICKLE
s = ddb.Session(protocol=keys.PROTOCOL_PICKLE)
# use PROTOCOL_ARROW
s = ddb.Session(protocol=keys.PROTOCOL_ARROW)

PROTOCOL_DDB, PROTOCOL_PICKLE, and PROTOCOL_ARROW support different type systems and serialization methods, suitable for different use cases.

  • PROTOCOL_DDB uses DolphinDB's native data serialization and deserialization. It is widely used in DolphinDB's APIs for Python, C++, Java, and more. This protocol supports the greatest variety of data forms and types.
  • PROTOCOL_PICKLE is based on Python’s Pickle module with DolphinDB customizations. This protocol is the default value of protocol.
  • PROTOCOL_ARROW uses the Apache Arrow format to serialize and deserialize large datasets. It is ideal when you need to efficiently transmit data across platforms or languages.

The protocols are suitable for different use cases:

  • PROTOCOL_ARROW is ideal when using the Apache Arrow format throughout your workflow—from retrieving data from an upstream database to consuming it downstream. It allows exchanging data between components with no needs for repeated data serialization and deserialization. This minimizes network bandwidth usage.
  • PROTOCOL_PICKLE is suitable for mass data transmission scenarios. For scenarios where pandas DataFrames are heavily involved, it is suitable to use PROTOCOL_PICKLE and PROTOCOL_DDB. As most data types are transmitted faster with PROTOCOL_PICKLE, it is the default protocol value.

Communication Process

In brief, API client communication with DolphinDB server has two stages: session establishment and single run requests.

Session Establishment

During session establishment, the API client sends a connection request to DolphinDB. The request specifies several parameters, including the data format protocol to use for the session.

The chosen data format protocol is used throughout the session to serialize and deserialize data. For any data structures not supported by the specified protocol, the DolphinDB PROTOCOL_DDB is used instead. For example, PROTOCOL_PICKLE only supports the deserialization of matrices and tables. When downloading vectors from DolphinDB, PROTOCOL_DDB will still be used for deserialization.

Single run Requests

Single run requests also specify parameters to determine how data should be (de)serialized, similar to session establishment. Each run execution request contains flags indicating the data format protocol and any additional parameters to use. For example, both PROTOCOL_PICKLE and PROTOCOL_DDB accept a pickleTableToList parameter. If pickleTableToList is set to True for a single run, it will change how data is serialized and deserialized for just that request.

Unlike session establishment, the parameters specified in a single run request only apply to that single query.