Data Import Method

DolphinDB offers a comprehensive suite of tools for handling data transfers across different file formats and sources. These tools enable you to load, clean, and transfer data either immediately or on a scheduled basis, streamlining the process of data management and integration.

Built-in Functions

DolphinDB provides various built-in functions for importing and exporting data from text files, binary files, and JSON files. During the data import process, users can set relevant parameters for tasks such as data type matching and other preprocessing tasks.

Importing Text Files

Function Description
loadText Imports a text file as an in-memory table.
ploadText Imports a text file as a partitioned in-memory table in parallel, which is faster than loadText. ploadText leverages multi-core CPUs for parallel loading, making it suitable for quickly loading larger files (16MB or more).
loadTextEx Imports text files directly into either a distributed database or in-memory database. loadTextEx loads a text file into a database in batches to prevent OOM issues. It enables data cleaning and preprocessing during data import.
textChunkDS Divides the text file into multiple data sources. It can be used with function mr to load data.

Importing Binary and JSON Files

File Types Function Description
Binary Files readRecord! Imports binary files into memory, without supporting strings.
loadRecord Imports binary files with fixed-length rows into memory, supporting strings.
JSON Files fromJson, fromStdJson Converts a JSON string into DolphinDB object.

Data Export

File Types Function Description
Text Files saveText Saves any variable or table as a text file on disk.
saveTextFile Saves strings to a file by appending or overwriting.
Binary Files writeRecord Converts a DolphinDB object (e.g., table or tuple) into a binary file and returns the number of rows written to the file.
saveAsNpy Saves a DolphinDB vector or matrix as a .npy binary file supported by NumPy.
JSON Files toJson, toStdJson Converts a DolphinDB object into a JSON string.

Plugins and Tools

DolphinDB also provides a variety of specialized plugins for seamless data integration.

Plugins for Database Migration

With ready-to-use interfaces, database migration plugins simplify the process of transferring data between systems. Dedicated plugins are available for specific databases like MySQL, kdb+, and MongoDB. Additionally, the ODBC plugin serves as a universal connector, enabling integration with other database systems including SQL Server, Oracle, ClickHouse, and SQLite.

Plugin Description
odbc Reads data from other data sources via ODBC, including data from MySQL, Oracle, SQL Server, and other databases.
mysql Connects to MySQL and reads data.
HBase Connects to HBase via Thrift to read data.
kdb Connects to the kdb+ database or directly reads kdb+ data files from disk to import data.
mongodb Connects to MongoDB and reads data.

Plugins for File-Based Data Import

File-based data import plugins excel at handling large volumes of data.

Plugin Description
Arrow Serializes data in Apache Arrow format.
aws Reads from and writes to AWS S3 network files.
feather Reads from and writes to Apache Feather files.
hdfs Reads from and writes to Hadoop HDFS files.
hdf5 Reads from and writes to HDF5 files.
mat Reads from and writes to MATLAB files.
mseed Reads from and writes to miniSEED files.
orc Reads from and writes to ORC files.
parquet Reads from and writes to Apache Parquet files.
zip Extracts ZIP files.
zlib Compresses and decompresses gz files.

Plugins for Message Middlewares Integration

Message middleware plugins enable real-time data ingestion by establishing connections with various messaging systems. These plugins serve as bridges, allowing seamless streaming of live data directly into DolphinDB's environment.

Plugin Description
zmq Publishes and subscribes to ZeroMQ messages.
mqtt Publishes and subscribes to MQTT messages.
kafka Publishes or subscribes to Kafka messages.

Other Data Import Tools

DolphinDB offers the dolphindbwriter plugin, built on the DataX offline data synchronization tool, to import data from various sources and synchronize incremental updates with high extensibility and versatility.

Users can also import data through DolphinDB's C++ API, Python API, Java API, and other supported interfaces. For details, refer to the corresponding API documentation.