Feather Plugin

Feather uses the Apache Arrow columnar memory format for data, which is organized for efficient analytic operations. The DolphinDB Feather plugin supports efficient import and export of Feather files with automatic data type conversion.

The DolphinDB Feather plugin has the following versions: release 200 and release130. Each plugin version corresponds to a DolphinDB server version. You're looking at the plugin documentation for release200. You're looking at the plugin documentation for release200. If you use a different DolphinDB server version, please refer to the corresponding version of the plugin documentation.

Install the Plugin

Compile on Linux

Initialization

(1) Compile the Feather Development Kit.

git clone https://github.com/apache/arrow.git
cd arrow/cpp
mkdir build
cd build
cmake .. -DARROW_BUILD_STATIC=ON -DARROW_BUILD_SHARED=OFF -DARROW_DEPENDENCY_USE_SHARED=OFF -DARROW_WITH_ZLIB=ON -DARROW_WITH_ZSTD=ON -DARROW_WITH_LZ4=ON
make -j

(2) After compiling, copy the following files to the target directories.

FilesTarget Directory
arrow/cpp/src/arrow./include
arrow/cpp/build/release/libarrow.aarrow/cpp/build/jemalloc_ep-prefix/src/jemalloc_ep/lib/libjemalloc_pic.aarrow/cpp/build/zstd_ep-install/lib64/libzstd.aarrow/cpp/build/zlib_ep/src/zlib_ep-install/lib/libz.aarrow/cpp/build/lz4_ep-prefix/src/lz4_ep/lib/liblz4.a./lib/linux

Note: If the files listed in the "Files" column do not exist during compilation, you can manually compile the following three libraries.

  • If libz.a cannot be found, run the following command:
wget http://www.zlib.net/zlib-1.2.12.tar.gz
tar -zxf zlib-1.2.12.tar.gz
cd zlib-1.2.12
CFLAGS="-fPIC" ./configure
make

Find libz.a in the zlib-1.2.12 directory and put it to the ./lib/linux directory in the plugins folder.

  • If liblz4.a cannot be found, run the following command:
wget https://github.com/lz4/lz4/archive/8f61d8eb7c6979769a484cde8df61ff7c4c77765.tar.gz
tar -xzvf 8f61d8eb7c6979769a484cde8df61ff7c4c77765.tar.gz
cd lz4-8f61d8eb7c6979769a484cde8df61ff7c4c77765/
make

Find libz.a in the lz4-8f61d8eb7c6979769a484cde8df61ff7c4c77765/lib directory and put it to the ./lib/linux directory in the plugins folder.

  • If libzstd.a cannot be found, run the following command:
wget https://github.com/facebook/zstd/releases/download/v1.5.2/zstd-1.5.2.tar.gz
tar -zxvf zstd-1.5.2.tar.gz
cd zstd-1.5.2/
cd build/cmake/
mkdir build
cd build/
cmake ..
make -j

Find libz.a in the zstd-1.5.2/build/cmake/build/lib directory and put it to the ./lib/linux directory in the plugins folder.

(3) Build the Entire Project

cd /path/to/plugins/feather
mkdir build
cd build
cmake ..
make

Note: Make sure the file libDolphinDB.so is under the GCC search path before compilation. You can add the plugin path to the library search path LD_LIBRARY_PATH or copy it to the build directory.

Load the Plugin

loadPlugin("/path/to/plugin/PluginFeather.txt")

Methods

feather::extractSchema

Syntax

feather::extractSchema(filePath)

Parameters

  • filePath: a STRING scalar indicating the Feather file path.

Details

Get the schema of the Feature file and return a table containing the following three columns:

  1. column names
  2. data type of Arrow
  3. data type of DolphinDB

Note: If the value of a cell in column DolphinDB Type is VOID, it indicates that the corresponding data type in Arrow is not supported to be converted.

Examples

feather::extractSchema("path/to/data.feather");
feather::extractSchema("path/to/data.compressed.feather");

feather::load

Syntax

feather::load(filePath, [columns])

Parameters

  • filePath: a STRING scalar indicating the Feather file path.
  • columns: a STRING scalar indicating column names to be loaded. It is an optional parameter.

Details

Load a Feather file to a DolphinDB in-memory table. Regarding data type conversion, see Data Type Mappings .

Note:

  • Since the minimum of DolphinDB integral type is a NULL character, the minimum of Arrow int8, int16, int32, int64 cannot be imported into DolphinDB.
  • The infinities and NaNs (not a number) of floating-point numbers are converted to NULL values in DolphinDB.

Examples

table = feather::load("path/to/data.feather");
table_part = feather::load("path/to/data.feather", [ "col1_name","col2_name"]);

feather::save

Syntax

feather::save(table, filePath, [compressMethod], [compressionLevel])

Parameters

  • table: the table to be exported.
  • filePath: a STRING scalar indicating the Feather file path.
  • compression: a STRING scalar indicating the following three compression methods: "uncompressed", "lz4", "zstd" (case insensitive). The default is lz4. It is an optional parameter.
  • compressionLevel: an integral scalar. It is an optional parameter only used for compression method "zstd".

Details

Export a DolphinDB table to a Feather file. Regarding data type conversion, see Data Type Mappings .

Examples

feather::save(table, "path/to/save/data.feather");
feather::save(table, "path/to/save/data.feather", "lz4");
feather::save(table, "path/to/save/data.feather", "zstd", 2);

Data Type Mappings

Import

The following is the data type mappings when a Feather file is imported to DolphinDB:

ArrowDolphinDB
boolBOOL
int8CHAR
uint8SHORT
int16SHORT
uint16INT
int32INT
uint32LONG
int64LONG
uint64LONG
floatFLOAT
doubleDOUBLE
stringSTRING
date32DATE
date64TIMESTAMP
timestamp(ms)TIMESTAMP
timestamp(ns)NANOTIMESTAMP
time32(s)SECOND
time32(ms)TIME
time64(ns)NANOTIME

The following Arrow types are not supported for conversion: binary, fixed_size_binary, half_float, timestamp(us), time64(us), interval_months, interval_day_time, decimal128, decimal, decimal256, list, struct, sparse_union, dense_union, dictionary, map, extension, fixed_size_list, large_string, large_binary, large_list, interval_month_day_nano, max_id

Export

The following is the data type mappings when exporting data from DolphinDB to a Feather file:

DolphinDBArrow
BOOLbool
CHARint8
SHORTint16
INTint32
LONGint64
DATEdate32
TIMEtime32(ms)
SECONDtime32(s)
TIMESTAMPtimestamp(ms)
NANOTIMEtime64(ns)
NANOTIMESTAMPtimestamp(ns)
FLOATfloat
DOUBLEdouble
STRINGstring
SYMBOLstring

The following DolphinDB data types are not supported for conversion: MINUTE, MONTH, DATETIME, UUID, FUNCTIONDEF, HANDLE, CODE, DATASOURCE, RESOURCE, ANY, COMPRESS, ANY DICTIONARY, DATEHOUR, IPADDR, INT128, BLOB, COMPLEX, POINT, DURATION

Note:

You may encounter some problems when reading Feather files using Python.

Scenario 1: The error Value XXXXXXXXXXXXX has non-zero nanoseconds is raised when reading the Feather file contains data of type time64(ns) using pyarrow.feather.read_feather(). When a table is converted to a DataFrame, the time64(ns) type is converted to the datetime.time type, which does not support temporal data in nanosecond.

Solution: It is recommended to read with function pyarrow.feather.read_table().

Scenario 2: Use pyarrow.feather.read_feather() to read Feather files that contain null integer columns will convert the integer columns to floating point types.

Solution: It is recommended to read Feather files into the pyarrow table and convert the data type by specifying types_mapper.

```python
pa_table = feather.read_table("path/to/feather_file")
df = pa_table.to_pandas(types_mapper={pa.int64(): pd.Int64Dtype()}.get)
```