Arrow

Apache Arrow defines a columnar memory format, which combines the benefits of columnar data structures with in-memory computing. With the DolphinDB Arrow plugin, you can use the Arrow format to interact with the DolphinDB server through Python API with automatic data type conversion.

Note:

Starting from 2.00.11, the plugin name has been changed from "formatArrow" to "Arrow".
Since version 2.00.12, the Arrow plugin can be directly downloaded from the plugin repository and loaded using the loadPlugin function. For versions 2.00.11 and earlier, the loadFormatPlugin function is required, which is used in the same way as loadPlugin but is specifically for loading data format plugins.

Installation (with `installPlugin`)

Required server version: DolphinDB 2.00.12 or higher

Supported OS: Windows x86-64 and Linux x86-64.

Installation Steps:

(1) Use listRemotePlugins to check plugin information in the plugin repository.

Note: For plugins not included in the provided list, you can install through precompiled binaries or compile from source. These files can be accessed from our GitHub repository by switching to the appropriate version branch.

login("admin", "123456")
listRemotePlugins("arrow")

(2) Invoke installPlugin for plugin installation.

installPlugin("arrow")

(3) Use loadPlugin to load the plugin before using the plugin methods.

loadPlugin("arrow")

Method References

The Arrow plugin provides no user-callable interfaces.

The interfaces returned by the loadPlugin function are only for internal use within DolphinDB and cannot be called by users through scripts.

Data Type Mappings

The Arrow plugin only supports one-way data transfer from DolphinDB to APIs and does not support receiving Arrow-formatted data from APIs.

Currently, only the Python API can download Arrow-formatted data using the PROTOCOL_ARROW protocol.

DolphinDB to Arrow

The plugin currently only supports serializing and transferring DolphinDB tables as Arrow tables. The data type mappings between DolphinDB and Arrow are as follows:


DolphinDB	Arrow
BOOL	boolean
CHAR	int8
SHORT	int16
INT	int32
LONG	int64
DATE	date32
MONTH	date32
TIME	time32(ms)
MINUTE	time32(s)
SECOND	time32(s)
DATETIME	timestamp(s)
TIMESTAMP	timestamp(ms)
NANOTIME	time64(ns)
NANOTIMESTAMP	timestamp(ns)
DATEHOUR	timestamp(s)
FLOAT	float32
DOUBLE	float64
SYMBOL	dictionary(int32, utf8)
STRING	utf8
IPADDR	utf8
UUID	fixed_size_binary(16)
INT128	fixed_size_binary(16)
BLOB	large_binary
DECIMAL32(X)	decimal128(38, X)
DECIMAL64(X)	decimal128(38, X)

Note:

Array vectors of the types listed above (excluding the Decimal types) are also supported.
Starting from version 2.00.11, the byte order of downloaded UUID/INT128 data matches the upload order, instead of reversing it.

Usage Example

DolphinDB server

login("admin", "123456");
loadPlugin("arrow");

Python API

loadFormatPlugin("path/to/Arrow/PluginArrow.txt")
import dolphindb as ddb
import dolphindb.settings as keys
s = ddb.session("192.168.1.113", 8848, "admin", "123456", protocol=keys.PROTOCOL_ARROW)
pat = s.run("table(1..10 as a)")

print(pat)
-------------------------------------------
pyarrow.Table
a: int32
----
a: [[1,2,3,4,5,6,7,8,9,10]]

Note: Currently, the DolphinDB server does not support enabling compression when the Arrow protocol is used.