Arrow Plugin

Apache Arrow defines a columnar memory format, which combines the benefits of columnar data structures with in-memory computing. With the DolphinDB Arrow plugin, you can use the Arrow format to interact with the DolphinDB server through Python API with automatic data type conversion.

Note: Starting from 2.00.11, the plugin name has been changed from "formatArrow" to "Arrow".

The version of Apache Arrow used in this document is 9.0.0.

For the DolphinDB Arrow plugin, please make sure you use DolphinDB server version 2.00.11 or higher.

Install Plugin

(Optional) Manually Compile Plugin - Linux

Configure the Environment

a. Compile the plugin development kit

git clone https://github.com/apache/arrow.git
cd arrow/cpp
mkdir build
cd build
cmake .. 
make -j

b. After compilation, save the following files to your DolphinDB plugin directory:

FileTarget Directory
arrow/cpp/src/arrow./include
arrow/cpp/build/release/libarrow.so.900./build

Compile Plugin

Before compilation, make sure the directories of libDolphinDB.so (included in the server package) and libarrow.so.900 have been added to the GCC search path. You can set LD_LIBRARY_PATH or directly save the files under the build directory.

cd /path/to/plugins/Arrow
mkdir build
cd build
cmake ..
make

Load Plugin

Alternatively, you can skip the manual compilation steps and simply download the precompile file libarrow.so.900. Save the files under your DolphinDB plugin directory.

To load the plugin, enter the following command in DolphinDB:

loadFormatPlugin("/path/to/plugin/PluginArrow.txt")

Usage Example

a. Enter the following command on the DolphinDB server side to load the plugin:

loadFormatPlugin("path/to/Arrow/PluginArrow.txt")

b. On the Python API side, execute the following script:

import dolphindb as ddb
import dolphindb.settings as keys
s = ddb.session("192.168.1.113", 8848, "admin", "123456", protocol=keys.PROTOCOL_ARROW)
pat = s.run("table(1..10 as a)")

print(pat)
-------------------------------------------
pyarrow.Table
a: int32
----
a: [[1,2,3,4,5,6,7,8,9,10]]

Note: Currently, the DolphinDB server does not support enabling compression when the Arrow protocol is used.

Supported Data Types

DolphinDB to Arrow

When DolphinDB transfers data to the Python application through Python API, the data type mappings between DolphinDB and Arrow are as follows:

DolphinDBArrow
BOOLboolean
CHARint8
SHORTint16
INTint32
LONGint64
DATEdate32
MONTHdate32
TIMEtime32(ms)
MINUTEtime32(s)
SECONDtime32(s)
DATETIMEtimestamp(s)
TIMESTAMPtimestamp(ms)
NANOTIMEtime64(ns)
NANOTIMESTAMPtimestamp(ns)
DATEHOURtimestamp(s)
FLOATfloat32
DOUBLEfloat64
SYMBOLdictionary(int32, utf8)
STRINGutf8
IPADDRutf8
UUIDfixed_size_binary(16)
INT128fixed_size_binary(16)
BLOBlarge_binary
DECIMAL32(X)decimal128(38, X)
DECIMAL64(X)decimal128(38, X)

Array vectors of the types listed above (excluding the Decimal types) are also supported.

Note:

  • Array vectors of the types listed above (excluding the Decimal types) are also supported.
  • When converting Arrow-formatted data sent from the DolphinDB server to a pandas.DataFrame, the DolphinDB NANOTIME type is converted to Arrow time64 type. The NANOTIME value must be a multiple of 1,000, otherwise the error Value xxxxxxx has non-zero nanoseconds is raised.
  • Starting from version 2.00.11, the byte order of downloaded UUID/INT128 data matches the upload order, instead of reversing it.