createExternalTable

As the data sources become increasingly diverse, users require a unified platform to conveniently access and query data across multiple databases and file systems. DolphinDB provides the external table feature that allows external data to be used like local tables, supporting common queries and partial predicate pushdown, thereby improving usability and query efficiency.

Currently supports mainstream data sources such as Oracle, MySQL, SQL Server, S3, and Parquet. Users only need to install the corresponding connection plugins (e.g., ODBC, AWS, Parquet) to use this feature.

Create External Tables

Syntax

createExternalTable(tableName, externalType, config, [columnNames], [columnTypes])

Arguments

tableName is a string indicating the external table name.

externalType is a STRING scalar indicating data source type, which can be 'oracle', ‘s3’, 'parquet', 'mysql', 'sqlserver', or 'dolphindb'.

config is a dictionary for setting connection parameters. Configuration parameters differ depending on the external data source. Refer to Details.

columnNames (optional) is a STRING vector for specifying the column names. If specified, names must be provided for all columns; if not, defaults to the external tables' original column names.

columnTypes (optional) is equal in length to columnNames for specifying the column types. If not specified, the system automatically performs the conversion based on the external table type.

Details

Create an external table and return its corresponding table object. Then you can query and process the external data source with DolphinDB scripts just like a local table.

Identify the external table type via externalType, which can be 'oracle', 's3', 'parquet', 'mysql', 'sqlserver', or 'dolphindb'. Configuration parameters differ depending on the external data source.

For Oracle, MySQL and SQL Server, configure connectionString, e.g., Dsn=MyOracleDB;Uid=user;Pwd=pass. Refer to ODBC.
For S3:


config Dictionary Key	config Dictionary Value
filePath	a string indicating the file path in S3 bucket. Currently support files with the extensions .csv or .csv.gz.
region	a string indicating the AWS region name (e.g., "us-east-1")
bucket	a string indicating the S3 bucket name
accessKeyId	a string indicating the AWS access key ID
secretAccessKey	a string indicating the AWS access key

For Parquet:


config Dictionary Key	config Dictionary Value
fileName	a string indicating the file path name of parquet. If it is an HDFS system, it indicates its path name on HDFS.
nameNode	(optional) a string indicating the IP address where the HDFS is located
port	(optional) an integer indicating the port number of HDFS
username	(optional) a string indicating the username for login
kerbTicketCachePath	(optional) a string indicating the Kerberos path used to connect to HDFS
keytabPath	(optional) a string indicating the path of the keytab files used to authenticate the obtained Kerberos tickets
principal	(optional) a string indicating the Kerberos principals to be authenticated
lifetime	(optional) a string indicating the lifetime of the generated tickets

For remote DolphinDB nodes:


config Dictionary Key	config Dictionary Value
siteAlias	the alias of the remote node. It needs to be defined in configuration. Once configured, host and port do not need to be specified.
host	a string indicating the host name (IP address or website) of the remote node. Once configured, siteAlias does not need to be specified.
port	an integer indicating the port number of the remote node
userId	(optional) a string indicating the username
password	(optional) a string indicating the user’s password
enableSSL	(optional) a boolean value determining whether to use the SSL protocol for encrypted communication. The default value is false.
database	(optional) a string indicating the name of remote database to be read. If unspecified, tableName is a shared table.

Install and load the corresponding plugins based on external data sources, for example:

Oracle / MySQL / SQL Server: odbc.
S3: aws.
To read Parquet files, load the parquet and hdfs plugins (if the files are stored in HDFS).

Examples

1. Create an external table that maps to the aka_name table in Oracle. Then you can query its data in DolphinDB.

loadPlugin("odbc")
oracle_cfg = dict(["connectionString"], ["Dsn=MyOracleDB"])
t = createExternalTable("aka_name", "oracle", oracle_cfg) 

select t.name from t where t.id > 200 limit 50

2. Create an external table that maps to the source data file in S3.

loadPlugin("aws")
aws_cfg = dict(["filePath", "region", "bucket", "accessId", "secretKey"], ["demo.csv.gz", 'cn-north-1', 'tests3func', 'AKIAXTOOQXTLU44HF75E', 
'S0OzvoQnlmFRSn1RzhK5b0yY4IpbrvYTyF5bwtyw'])
t = createExternalTable("test","s3", aws_cfg,`Open`High`OpenInt, [DOUBLE, DOUBLE, INT]);
select count(*) from t where Open > 0.5 and High < 0.6 limit 1;

3. Create an external table that maps the distributed table pt in the remote DolphinDB node.

ddb_cfg = dict(["host", "port", "userId", "password", "database"], ["192.168.0.130", 8848, "admin", "123456", "dfs://textDB"])
t = createExternalTable("pt", "dolphindb", ddb_cfg2)
select min(vol) from t where qty > 1500