Arrow
Apache Arrow defines a columnar memory format, which combines the benefits of columnar data structures with in-memory computing. With the DolphinDB Arrow plugin, you can use the Arrow format to interact with the DolphinDB server through Python API with automatic data type conversion.
Note: Starting from 2.00.11, the plugin name has been changed from "formatArrow" to "Arrow".
The version of Apache Arrow used in this document is 9.0.0.
For the DolphinDB Arrow plugin, please make sure you use DolphinDB server version 2.00.11 or higher.
Install Plugin
(Optional) Manually Compile Plugin - Linux
Configure the Environment
a. Compile the plugin development kit
git clone https://github.com/apache/arrow.git
cd arrow/cpp
mkdir build
cd build
cmake ..
make -j
b. After compilation, save the following files to your DolphinDB plugin directory:
File | Target Directory |
---|---|
arrow/cpp/src/arrow | ./include |
arrow/cpp/build/release/libarrow.so.900 | ./build |
Compile Plugin
Before compilation, make sure the directories of libDolphinDB.so (included in the server package) and libarrow.so.900 have been added to the GCC search path. You can set LD_LIBRARY_PATH or directly save the files under the build directory.
cd /path/to/plugins/Arrow
mkdir build
cd build
cmake ..
make
Load Plugin
Alternatively, you can skip the manual compilation steps and simply download the precompile file libarrow.so.900. Save the files under your DolphinDB plugin directory.
To load the plugin, enter the following command in DolphinDB:
loadFormatPlugin("/path/to/plugin/PluginArrow.txt")
Usage Example
a. Enter the following command on the DolphinDB server side to load the plugin:
loadFormatPlugin("path/to/Arrow/PluginArrow.txt")
b. On the Python API side, execute the following script:
import dolphindb as ddb
import dolphindb.settings as keys
s = ddb.session("192.168.1.113", 8848, "admin", "123456", protocol=keys.PROTOCOL_ARROW)
pat = s.run("table(1..10 as a)")
print(pat)
-------------------------------------------
pyarrow.Table
a: int32
----
a: [[1,2,3,4,5,6,7,8,9,10]]
Note: Currently, the DolphinDB server does not support enabling compression when the Arrow protocol is used.
Supported Data Types
DolphinDB to Arrow
When DolphinDB transfers data to the Python application through Python API, the data type mappings between DolphinDB and Arrow are as follows:
DolphinDB | Arrow |
---|---|
BOOL | boolean |
CHAR | int8 |
SHORT | int16 |
INT | int32 |
LONG | int64 |
DATE | date32 |
MONTH | date32 |
TIME | time32(ms) |
MINUTE | time32(s) |
SECOND | time32(s) |
DATETIME | timestamp(s) |
TIMESTAMP | timestamp(ms) |
NANOTIME | time64(ns) |
NANOTIMESTAMP | timestamp(ns) |
DATEHOUR | timestamp(s) |
FLOAT | float32 |
DOUBLE | float64 |
SYMBOL | dictionary(int32, utf8) |
STRING | utf8 |
IPADDR | utf8 |
UUID | fixed_size_binary(16) |
INT128 | fixed_size_binary(16) |
BLOB | large_binary |
DECIMAL32(X) | decimal128(38, X) |
DECIMAL64(X) | decimal128(38, X) |
Array vectors of the types listed above (excluding the Decimal types) are also supported.
Note:
- Array vectors of the types listed above (excluding the Decimal types) are also supported.
- When converting Arrow-formatted data sent from the DolphinDB server to a pandas.DataFrame, the DolphinDB NANOTIME type is converted to Arrow time64 type. The NANOTIME value must be a multiple of 1,000, otherwise the error
Value xxxxxxx has non-zero nanoseconds
is raised. - Starting from version 2.00.11, the byte order of downloaded UUID/INT128 data matches the upload order, instead of reversing it.