Arrow

Apache Arrow defines a columnar memory format, which combines the benefits of columnar data structures with in-memory computing. With the DolphinDB Arrow plugin, you can use the Arrow format to interact with the DolphinDB server through Python API with automatic data type conversion.

Note:

  1. Starting from 2.00.11, the plugin name has been changed from "formatArrow" to "Arrow".
  2. Since version 2.00.12, the Arrow plugin can be directly downloaded from the plugin repository and loaded using the loadPlugin function. For versions 2.00.11 and earlier, the loadFormatPlugin function is required, which is used in the same way as loadPlugin but is specifically for loading data format plugins.

Installation (with installPlugin)

Required server version: DolphinDB 2.00.12 or higher

Supported OS: Windows x86-64 and Linux x86-64.

Installation Steps:

(1) Use listRemotePlugins to check plugin information in the plugin repository.

Note: For plugins not included in the provided list, you can install through precompiled binaries or compile from source. These files can be accessed from our GitHub repository by switching to the appropriate version branch.

login("admin", "123456")
listRemotePlugins("arrow")

(2) Invoke installPlugin for plugin installation.

installPlugin("arrow")

(3) Use loadPlugin to load the plugin before using the plugin methods.

loadPlugin("arrow")

Method References

The Arrow plugin provides no user-callable interfaces.

The interfaces returned by the loadPlugin function are only for internal use within DolphinDB and cannot be called by users through scripts.

Data Type Mappings

The Arrow plugin only supports one-way data transfer from DolphinDB to APIs and does not support receiving Arrow-formatted data from APIs.

Currently, only the Python API can download Arrow-formatted data using the PROTOCOL_ARROW protocol.

DolphinDB to Arrow

The plugin currently only supports serializing and transferring DolphinDB tables as Arrow tables. The data type mappings between DolphinDB and Arrow are as follows:

DolphinDBArrow
BOOLboolean
CHARint8
SHORTint16
INTint32
LONGint64
DATEdate32
MONTHdate32
TIMEtime32(ms)
MINUTEtime32(s)
SECONDtime32(s)
DATETIMEtimestamp(s)
TIMESTAMPtimestamp(ms)
NANOTIMEtime64(ns)
NANOTIMESTAMPtimestamp(ns)
DATEHOURtimestamp(s)
FLOATfloat32
DOUBLEfloat64
SYMBOLdictionary(int32, utf8)
STRINGutf8
IPADDRutf8
UUIDfixed_size_binary(16)
INT128fixed_size_binary(16)
BLOBlarge_binary
DECIMAL32(X)decimal128(38, X)
DECIMAL64(X)decimal128(38, X)

Note:

  • Array vectors of the types listed above (excluding the Decimal types) are also supported.
  • Starting from version 2.00.11, the byte order of downloaded UUID/INT128 data matches the upload order, instead of reversing it.

Usage Example

DolphinDB server

login("admin", "123456");
loadPlugin("arrow");

Python API

loadFormatPlugin("path/to/Arrow/PluginArrow.txt")
import dolphindb as ddb
import dolphindb.settings as keys
s = ddb.session("192.168.1.113", 8848, "admin", "123456", protocol=keys.PROTOCOL_ARROW)
pat = s.run("table(1..10 as a)")

print(pat)
-------------------------------------------
pyarrow.Table
a: int32
----
a: [[1,2,3,4,5,6,7,8,9,10]]

Note: Currently, the DolphinDB server does not support enabling compression when the Arrow protocol is used.