createDimensionTable

Syntax

createDimensionTable(dbHandle, table, tableName, [compressMethods], [sortColumns], [keepDuplicates=ALL], [softDelete=false])

Alias: createTable

Arguments

Note: The parameters sortColumns and keepDuplicates take effect only in a TSDB storage engine (i.e., database().engine = TSDB).

dbHandle is a DFS database handle returned by function database.

table is a table object. The table schema will be used to construct the new dimension table.

tableName is a string indicating the name of the dimension table.

compressMethods (optional) is a dictionary indicating which compression methods are used for specified columns. The keys are columns name and the values are compression methods ("lz4", "delta" or "zstd"). If unspecified, use LZ4 compression method.

Note:

  • The delta compression method can be used for DECIMAL, SHORT, INT, LONG or temporal data types.

  • Save strings as SYMBOL type to enable compression of strings.

sortColumns (optional) is a STRING scalar/vector that specifies the column(s) used to sort the ingested data within each level file. The sort columns must be of Integral, Temporal, STRING, SYMBOL, or DECIMAL type. Note that sortColumns is not necessarily consistent with the partitioning column.

  • If multiple columns are specified for sortColumns, the last column must be a time column. The preceding columns are used as the sort keys and they cannot be of TIME, TIMESTAMP, NANOTIME, or NANOTIMESTAMP type.

  • If only one column is specified for sortColumns, the column is used as the sort key, and it can be a time column or not. If the sort column is a time column and sortKeyMappingFunction is specified, the sort column specified in a SQL where condition can only be compared with temporal values of the same data type.

  • It is recommended to specify frequently-queried columns for sortColumns and sort them in the descending order of query frequency, which ensures that frequently-used data is readily available during query processing.

  • The number of sort key entries (which are unique combinations of the values of the sort keys) may not exceed 1000 for optimal performance. This limitation prevents excessive memory usage and ensures efficient query processing.

keepDuplicates (optional) specifies how to deal with records with duplicate sortColumns values. It can have the following values:

  • ALL: keep all records;

  • LAST: only keep the last record;

  • FIRST: only keep the first record.

softDelete (optional) determines whether to enable soft delete for TSDB databases. The default value is false. To use it, keepDuplicates must be set to 'LAST'. It is recommended to enable soft delete for databases where the row count is large and delete operations are infrequent.

Details

This function creates an empty dimension (non-partitioned) table in a DFS database, used to store small datasets with infrequent updates. During query, all data in a dimension table will be loaded into the memory.

The system will regularly check the memory usage. When memory usage exceeds warningMemSize, the system will discard the least recently used (LRU) data from memory to clean up the cache. Users can also manually call command clearCachedDatabase to clear the cached data.

Like partitioned tables, a dimension table can have multiple replicas (determined by the configuration parameter dfsReplicationFactor).

To enable concurrent writes, updates or deletes on a dimension table, set the configuration parameter enableConcurrentDimensionalTableWrite to true.

Examples

Example1

db=database("dfs://db1",VALUE,1 2 3)
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
sym = `C`MS`MS`MS`IBM`IBM`C`C`C
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
t = table(timestamp, sym, qty, price);

dt=db.createDimensionTable(t,`dt).append!(t);
select * from dt;
timestamp sym qty price
09:34:07 C 2200 49.6
09:36:42 MS 1900 29.46
09:36:51 MS 2100 29.52
09:36:59 MS 3200 30.02
09:32:47 IBM 6800 174.97
09:35:26 IBM 5400 175.23
09:34:16 C 1300 50.76
09:34:26 C 2500 50.32
09:38:12 C 8800 51.29

Example 2

db = database("dfs://demodb", VALUE, 1..10)
t=table(take(1, 86400) as id, 2020.01.01T00:00:00 + 0..86399 as timestamp, rand(1..100, 86400) as val)
dt = db.createDimensionTable(t, "dt", {timestamp:"delta", val:"delta"})
dt.append!(t)

Example 3. Create a dimension table in a TSDB database

if(existsDatabase("dfs://dbctable_createDimensionTable")){
    dropDatabase("dfs://dbctable_createDimensionTable")
}
db = database("dfs://dbctable_createDimensionTable", VALUE, 1..100, , "TSDB")
t1 = table(1 100 100 300 300 400 500 as id, 1..7 as v)
db.createDimensionTable(t1, "dt", , "id").append!(t1)
dt=loadTable("dfs://dbctable_createDimensionTable","dt")

Related functions: createPartitionedTable