createExternalTable
As the data sources become increasingly diverse, users require a unified platform to conveniently access and query data across multiple databases and file systems. DolphinDB provides the external table feature that allows external data to be used like local tables, supporting common queries and partial predicate pushdown, thereby improving usability and query efficiency.
Currently supports mainstream data sources such as Oracle, MySQL, SQL Server, S3, and Parquet. Users only need to install the corresponding connection plugins (e.g., ODBC, AWS, Parquet) to use this feature.
Create External Tables
Syntax
createExternalTable(tableName, externalType, config, [columnNames],
[columnTypes])
Arguments
tableName is a string indicating the external table name.
externalType is a STRING scalar indicating data source type, which can be 'oracle', ‘s3’, 'parquet', 'mysql', 'sqlserver', or 'dolphindb'.
config is a dictionary for setting connection parameters. Configuration parameters differ depending on the external data source. Refer to Details.
columnNames (optional) is a STRING vector for specifying the column names. If specified, names must be provided for all columns; if not, defaults to the external tables' original column names.
columnTypes (optional) is equal in length to columnNames for specifying the column types. If not specified, the system automatically performs the conversion based on the external table type.
Details
Create an external table and return its corresponding table object. Then you can query and process the external data source with DolphinDB scripts just like a local table.
Identify the external table type via externalType, which can be 'oracle', 's3', 'parquet', 'mysql', 'sqlserver', or 'dolphindb'. Configuration parameters differ depending on the external data source.
- For Oracle, MySQL and SQL Server, configure connectionString, e.g., Dsn=MyOracleDB;Uid=user;Pwd=pass. Refer to ODBC.
- For S3:
config Dictionary Key | config Dictionary Value |
---|---|
filePath | a string indicating the file path in S3 bucket. Currently support files with the extensions .csv or .csv.gz. |
region | a string indicating the AWS region name (e.g., "us-east-1") |
bucket | a string indicating the S3 bucket name |
accessKeyId | a string indicating the AWS access key ID |
secretAccessKey | a string indicating the AWS access key |
- For Parquet:
config Dictionary Key | config Dictionary Value |
---|---|
fileName | a string indicating the file path name of parquet. If it is an HDFS system, it indicates its path name on HDFS. |
nameNode | (optional) a string indicating the IP address where the HDFS is located |
port | (optional) an integer indicating the port number of HDFS |
username | (optional) a string indicating the username for login |
kerbTicketCachePath | (optional) a string indicating the Kerberos path used to connect to HDFS |
keytabPath | (optional) a string indicating the path of the keytab files used to authenticate the obtained Kerberos tickets |
principal | (optional) a string indicating the Kerberos principals to be authenticated |
lifetime | (optional) a string indicating the lifetime of the generated tickets |
- For remote DolphinDB nodes:
config Dictionary Key | config Dictionary Value |
---|---|
siteAlias | the alias of the remote node. It needs to be defined in configuration. Once configured, host and port do not need to be specified. |
host | a string indicating the host name (IP address or website) of the remote node. Once configured, siteAlias does not need to be specified. |
port | an integer indicating the port number of the remote node |
userId | (optional) a string indicating the username |
password | (optional) a string indicating the user’s password |
enableSSL | (optional) a boolean value determining whether to use the SSL protocol for encrypted communication. The default value is false. |
database | (optional) a string indicating the name of remote database to be read. If unspecified, tableName is a shared table. |
Install and load the corresponding plugins based on external data sources, for example:
- Oracle / MySQL / SQL Server: odbc.
- S3: aws.
- To read Parquet files, load the parquet and hdfs plugins (if the files are stored in HDFS).
Examples
1. Create an external table that maps to the aka_name table in Oracle. Then you can query its data in DolphinDB.
loadPlugin("odbc")
oracle_cfg = dict(["connectionString"], ["Dsn=MyOracleDB"])
t = createExternalTable("aka_name", "oracle", oracle_cfg)
select t.name from t where t.id > 200 limit 50
2. Create an external table that maps to the source data file in S3.
loadPlugin("aws")
aws_cfg = dict(["filePath", "region", "bucket", "accessId", "secretKey"], ["demo.csv.gz", 'cn-north-1', 'tests3func', 'AKIAXTOOQXTLU44HF75E',
'S0OzvoQnlmFRSn1RzhK5b0yY4IpbrvYTyF5bwtyw'])
t = createExternalTable("test","s3", aws_cfg,`Open`High`OpenInt, [DOUBLE, DOUBLE, INT]);
select count(*) from t where Open > 0.5 and High < 0.6 limit 1;
3. Create an external table that maps the distributed table pt in the remote DolphinDB node.
ddb_cfg = dict(["host", "port", "userId", "password", "database"], ["192.168.0.130", 8848, "admin", "123456", "dfs://textDB"])
t = createExternalTable("pt", "dolphindb", ddb_cfg2)
select min(vol) from t where qty > 1500