# keyedTable {#keyedtable}

## Syntax {#syntax}

`keyedTable(keyColumns, X, [X1], [X2], .....)`

or

`keyedTable(keyColumns, capacity:size, colNames, colTypes)`

or

`keyedTable(keyColumns, table)`

## Arguments {#arguments}

**keyColumn** is a string scalar or vector indicating the name\(s\) of the primary key column\(s\). The column type must be INTEGRAL, TEMPORAL or LITERAL.

For the first scenario: **X**, **X1**, **X2** ... can be vectors, matrices or tuples. Each vector, each matrix column and each tuple element must have the same length. **When _Xk_is a tuple:**

-   If the elements of *Xk*are vectors of equal length, each element of the tuple will be treated as a column in the table.
-   If *Xk*contains elements of different types or unequal lengths, it will be treated as a single column in the table \(with the column type set to ANY\), and each element will correspond to the value of that column in each row.

For the second scenario:

-   **capacity** is a positive integer indicating the amount of memory \(in terms of the number of rows\) allocated to the table. When the number of rows exceeds capacity, the system will first allocate memory of 1.2~2 times of capacity, copy the data to the new memory space, and release the original memory. For large tables, these steps may use significant amount of memory.
-   **size** is an integer no less than 0 indicating the initial size \(in terms of the number of rows\) of the table. If *size*=0, create an empty table; If *size*&gt;0, the initialized values are:
    -   false for Boolean type;
    -   0 for numeric, temporal, IPADDR, COMPLEX, and POINT types;
    -   Null value for Literal, INT128 types.

-   **Note:** If *colTypes* is an array vector, *size* must be 0.

-   **colNames** is a STRING vector of column names.
-   **colTypes** is a string vector of data types. The non-key columns can be specified as an array vector type or ANY type.

For the third scenario, **table** is a table. Please note that *keyColumns* in *table* cannot have duplicate values.

## Details {#details}

Create an keyed table, which is a special type of in-memory table with primary key. The primary key can be one column or multiple columns. The keyed table is implemented based on a hash table, storing the combined value of the primary key fields as a key entry, with each key entry corresponding to a record in the table. During queries, by specifying all fields of the primary key, data can be located through the index without performing a full table scan. It is recommended to use [sliceByKey](../s/sliceByKey.md) to improve query performance.

When adding new records to the table, if the primary key of the new record duplicates an existing record, the system updates the record in the table; otherwise, the new record is added to the table.

Keyed tables exhibit better performance for single-record updates and queries, making them an ideal choice for data caching. Keyed tables can also serve as output tables for time series engines for real time updates.

**Note:** This function does not support creating a keyed table containing array vector columns.

The following compares the query optimization techniques for indexed and keyed tables.

For indexed tables:

-   The first column of *keyColumns* must be queried, and filter conditions for this column can only use `=`, `in`, or `and`.
-   Columns other than the first column of *keyColumns* can use range queries through `between`, comparison operators, etc., with higher query efficiency than using the `in` predicate.
-   The number of distinct columns filtered with `in` should not exceed 2.

For keyed tables:

-   All *keyColumns* must be queried. For such queries, key tables show better performance than indexed tables.
-   Filter conditions can only use `=`, `in`, or `and`.
-   The number of distinct columns filtered with `in` should not exceed 2.

Please refer to the optimized SQL query in [indexedTable](../i/indexedTable.md).

## Examples {#examples}

*Example 1. Create a keyed table.*

The first scenario:

```
sym=`A`B`C`D`E
id=5 4 3 2 1
val=52 64 25 48 71
t=keyedTable(`sym`id,sym,id,val)
t;
```

|id|x|val|
|---|---|---|
|A|5|52|
|B|4|64|
|C|3|25|
|D|2|48|
|E|1|71|

The second scenario:

```
t=keyedTable(`sym`id,1:0,`sym`id`val,[SYMBOL,INT,INT])
insert into t values(`A`B`C`D`E,5 4 3 2 1,52 64 25 48 71);
```

The third scenario:

```
tmp=table(sym, id, val)
t=keyedTable(`sym`id, tmp);
```

Create a keyed in-memory partitioned table:

```
sym=`A`B`C`D`E
id=5 4 3 2 1
val=52 64 25 48 71
t=keyedTable(`sym`id,sym,id,val)
db=database("",VALUE,sym)
pt=db.createPartitionedTable(t,`pt,`sym).append!(t);
```

*Example 2. Update a keyed table.*

```
t=keyedTable(`sym,1:0,`sym`datetime`price`qty,[SYMBOL,DATETIME,DOUBLE,DOUBLE])
insert into t values(`APPL`IBM`GOOG,2018.06.08T12:30:00 2018.06.08T12:30:00 2018.06.08T12:30:00,50.3 45.6 58.0,5200 4800 7800)
t;
```

|sym|datetime|price|qty|
|---|--------|-----|---|
|APPL|2018.06.08T12:30:00|50.3|5200|
|IBM|2018.06.08T12:30:00|45.6|4800|
|GOOG|2018.06.08T12:30:00|58|7800|

Insert a new row with duplicate primary key value as an existing row. The existing row will be overwritten:

```
insert into t values(`APPL`IBM`GOOG,2018.06.08T12:30:01 2018.06.08T12:30:01 2018.06.08T12:30:01,65.8 45.2 78.6,5800 8700 4600)
t;
```

|sym|datetime|price|qty|
|---|--------|-----|---|
|APPL|2018.06.08T12:30:01|65.8|5800|
|IBM|2018.06.08T12:30:01|45.2|8700|
|GOOG|2018.06.08T12:30:01|78.6|4600|

Insert new rows among which there are duplicate primary key values:

```
insert into t values(`MSFT`MSFT,2018.06.08T12:30:01 2018.06.08T12:30:01,45.7 56.9,3600 4500)
t;
```

|sym|datetime|price|qty|
|---|--------|-----|---|
|APPL|2018.06.08T12:30:01|65.8|5800|
|IBM|2018.06.08T12:30:01|45.2|8700|
|GOOG|2018.06.08T12:30:01|78.6|4600|
|MSFT|2018.06.08T12:30:01|56.9|4500|

The primary key cannot be updated:

```
update t set sym="C_"+sym;
// Error: Can't update a key column.
```

*Example 3. Query on a keyed table.*

In some cases, queries on a keyed table are optimized. In this section we will compare the performance of queries on keyed tables and ordinary in-memory tables.

For the following examples, we first create a keyed table and an ordinary in-memory table with 1 million records each:

```
id=shuffle(1..1000000)
date=take(2012.06.01..2012.06.10, 1000000)
type=rand(9, 1000000)
val=rand(100.0, 1000000)
t=table(id, date, type, val)
kt=keyedTable(`id`date`type, id, date, type, val);
```

Example 3.1

```
timer(100) select * from t where id=500000, date=2012.06.01, type=0;
// Time elapsed: 161.574 ms

timer(100) select * from kt where id=500000, date=2012.06.01, type=0;
// Time elapsed: 1.483 ms

timer(100) sliceByKey(t1, (500000, 2012.06.01, 0))
// Time elapsed: 0.705 ms
```

Example 3.2

```
timer(100) select * from t where id in [1, 500000], date in 2012.06.01..2012.06.05, type=5;
// Time elapsed: 894.241 ms

timer(100) select * from kt where id in [1, 500000], date in 2012.06.01..2012.06.05, type=5;
// Time elapsed: 2.322 ms
```

With more than 2 "in" operators in the filtering conditions, however, a query on a keyed table is not optimized.

Example 3.3

```
timer(100) select * from t where id in [1, 500000], date in 2012.06.01..2012.06.05, type in 1..5;
// Time elapsed: 801.347 ms

timer(100) select * from kt where id in [1, 500000], date in 2012.06.01..2012.06.05, type in 1..5;
// Time elapsed: 834.184 ms
```

If the filtering conditions do not include all key columns, a query on a keyed table is not optimized.

Example 3.4

```
timer(100) select * from t where id=500000, date in 2012.06.01..2012.06.05;
// Time elapsed: 177.113 ms

timer(100) select * from kt where id=500000, date in 2012.06.01..2012.06.05;
// Time elapsed: 163.265 ms
```

Example 4. Use a keyed table with array vectors to record the 5 levels of quotes data.

``` {#codeblock_kqy_fds_cbc}
sym=["a","b","c "] 
time=22:58:52.827 22:58:53.627 22:58:53.827 
volume=array(INT[]).append!([[100,110,120,115,125],[200,230,220,225,230],[320,300,310,315,310]])
price=array(DOUBLE[]).append!([[10.5,10.6,10.7,10.77,10.85],[8.6,8.7,8.76,8.83,8.9],[6.3,6.37,6.42,6.48,6.52]])
t=keyedTable(`sym,sym,time,volume,price)
t;
```

|sym|time|volume|price|
|---|----|------|-----|
|a|22:58:52.827|\[100, 110, 120, 115, 125\]|\[10.5, 10.6, 10.7, 10.77, 10.85\]|
|b|22:58:53.627|\[200, 230, 220, 225, 230\]|\[8.6, 8.7, 8.76, 8.83, 8.9\]|
|c|22:58:53.827|\[320, 300, 310, 315, 310\]|\[6.3, 6.37, 6.42, 6.48, 6.52\]|

``` {#codeblock_lhk_hds_cbc}
// latest quote volume and price
newVolume=array(INT[]).append!([[130,110,110,115,120]])
newPrice= array(DOUBLE[]).append!([[10.55,10.57,10.62,10.68,10.5]])
// update for stock a
update t set volume=newVolume, price=newPrice where sym="a"
t;
```

|sym|time|volume|price|
|---|----|------|-----|
|a|22:58:52.827|\[130, 110, 110, 115, 120\]|\[10.55, 10.57, 10.62, 10.68, 10.5\]|
|b|22:58:53.627|\[200, 230, 220, 225, 230\]|\[8.6, 8.7, 8.76, 8.83, 8.9\]|
|c|22:58:53.827|\[320, 300, 310, 315, 310\]|\[6.3, 6.37, 6.42, 6.48, 6.52\]|

Note that when updating the array vector column, the number of elements in each column must be consistent with the original column. For example, if the vector of new record contains 4 elements, while the original contains 5 elements, an error is raised:

``` {#codeblock_qv1_nds_cbc}
newVolume=array(INT[]).append!([[130,110,110,120]])
newPrice= array(DOUBLE[]).append!([[10.55,10.57,10.62,10.5]])

update t set volume=newVolume, price=newPrice where sym="a"
// error: Failed to update column: volume
```

Related function: [indexedTable](../i/indexedTable.md)

