createDimensionTable

Syntax

createDimensionTable(dbHandle, table, tableName, [compressMethods], [sortColumns|primaryKey], [keepDuplicates=ALL], [softDelete=false], [indexes])

Alias: createTable

Arguments

Note: The parameters sortColumns and keepDuplicates take effect only in a TSDB storage engine (i.e., database().engine = TSDB).

dbHandle is a DFS database handle returned by function database.

table is a table object. The table schema will be used to construct the new dimension table.

tableName is a string indicating the name of the dimension table.

compressMethods (optional) is a dictionary indicating which compression methods are used for specified columns. The keys are columns name and the values are compression methods ("lz4", "delta", "zstd" or "chimp"). If unspecified, use LZ4 compression method.

Note:

The "delta" (delta-of-delta) compression method can be used for DECIMAL, SHORT, INT, LONG or temporal data types.The delta compression method applies delta-of-delta algorithm, which is particularly suitable for data types like SHORT, INT, LONG, and date/time data.
Save strings as SYMBOL type to enable compression of strings.
The chimp compression method can be used for DOUBLE type data with decimal parts not exceeding three digits in length.

sortColumns (optional) is a STRING scalar/vector that specifies the column(s) used to sort the ingested data within each level file. The sort columns must be of Integral, Temporal, STRING, SYMBOL, or DECIMAL type. Note that sortColumns is not necessarily consistent with the partitioning column.

If multiple columns are specified for sortColumns, the last column must be a time column. The preceding columns are used as the sort keys and they cannot be of TIME, TIMESTAMP, NANOTIME, or NANOTIMESTAMP type.
If only one column is specified for sortColumns, the column is used as the sort key, and it can be a time column or not. If the sort column is a time column and sortKeyMappingFunction is specified, the sort column specified in a SQL where condition can only be compared with temporal values of the same data type.
It is recommended to specify frequently-queried columns for sortColumns and sort them in the descending order of query frequency, which ensures that frequently-used data is readily available during query processing.
The number of sort key entries (which are unique combinations of the values of the sort keys) may not exceed 1000 for optimal performance. This limitation prevents excessive memory usage and ensures efficient query processing.

primaryKey (optional) is a STRING scalar/vector that specifies the primary key column(s), uniquely identifying each record in a DFS table of the PKEY database. For records with the same primary key, only the latest one is retained. Note that:

primaryKey must include all partitioning columns.
The primary key columns must be of Logical, Integral (excluding COMPRESSED), Temporal, STRING, SYMBOL, or DECIMAL type.
With more than one primary key column, a composite primary key is maintained. The composite primary key uses a Bloomfilter index by default (see the indexes parameter for details).

keepDuplicates (optional) specifies how to deal with records with duplicate sortColumns values. It can have the following values:

ALL: keep all records;
LAST: only keep the last record;
FIRST: only keep the first record.

softDelete (optional) determines whether to enable soft delete for TSDB databases. The default value is false. To use it, keepDuplicates must be set to 'LAST'. It is recommended to enable soft delete for databases where the row count is large and delete operations are infrequent.

indexes (optional) is a dictionary with columns as keys and index types as values. Both keys and values are of STRING type. indexes can only be set for tables of PKEY databases.

Currently, only "bloomfilter" index type is available. Bloomfilter indexing excels in point queries on high-cardinality columns (e.g., ID card numbers, order numbers, foreign keys from upstreams).
It supports indexing on columns of the following data types: BOOL, CHAR, SHORT, INT, LONG, BLOB, STRING, DECIMAL32, DECIMAL64, DECIMAL128.
Composite primary keys are automatically indexed with Bloomfilter. Columns not specified in indexes default to ZoneMap indexing.

Details

This function creates an empty dimension (non-partitioned) table in a DFS database, used to store small datasets with infrequent updates. During query, all data in a dimension table will be loaded into the memory.

The system will regularly check the memory usage. When memory usage exceeds warningMemSize, the system will discard the least recently used (LRU) data from memory to clean up the cache. Users can also manually call command clearCachedDatabase to clear the cached data.

Like partitioned tables, a dimension table can have multiple replicas (determined by the configuration parameter dfsReplicationFactor).

To enable concurrent writes, updates or deletes on a dimension table, set the configuration parameter enableConcurrentDimensionalTableWrite to true.

Examples

Example 1

db=database("dfs://db1",VALUE,1 2 3)
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
sym = `C`MS`MS`MS`IBM`IBM`C`C`C
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
t = table(timestamp, sym, qty, price);

dt=db.createDimensionTable(t,`dt).append!(t);
select * from dt;


timestamp	sym	qty	price
09:34:07	C	2200	49.6
09:36:42	MS	1900	29.46
09:36:51	MS	2100	29.52
09:36:59	MS	3200	30.02
09:32:47	IBM	6800	174.97
09:35:26	IBM	5400	175.23
09:34:16	C	1300	50.76
09:34:26	C	2500	50.32
09:38:12	C	8800	51.29

Example 2

db = database("dfs://demodb", VALUE, 1..10)
t=table(take(1, 86400) as id, 2020.01.01T00:00:00 + 0..86399 as timestamp, rand(1..100, 86400) as val)
dt = db.createDimensionTable(t, "dt", {timestamp:"delta", val:"delta"})
dt.append!(t)

Example 3. Create a dimension table in a TSDB database

if(existsDatabase("dfs://dbctable_createDimensionTable")){
    dropDatabase("dfs://dbctable_createDimensionTable")
}
db = database("dfs://dbctable_createDimensionTable", VALUE, 1..100, , "TSDB")
t1 = table(1 100 100 300 300 400 500 as id, 1..7 as v)
db.createDimensionTable(t1, "dt", , "id").append!(t1)
dt=loadTable("dfs://dbctable_createDimensionTable","dt")

Example 4. Create a dimension table in a PKEY database.

db = database(directory="dfs://PKDB", partitionType=VALUE, partitionScheme=1..10, engine="PKEY")
schematb = table(1:0,`id1`id2`val1`val2`date1`time1,[INT,INT,INT,DECIMAL32(2),DATE,TIME])
pkt = createDimensionTable(dbHandle=db, table=schematb, tableName="pkt", primaryKey=`id1`id2, indexes={"val1": "bloomfilter", "val2": "bloomfilter"})

Related functions: createPartitionedTable