update

Update records in a table.

Details

The update statement can be used to update in-memory tables or DFS tables (including partitioned and dimension tables) where data isn't being written. The array vector columns in DFS tables can be updated with vectors or array vectors.

For a partitioned table, update will modify the entire partition where the target data is stored. For a dimension table, the entire table will be updated. Therefore, the update operation is only suitable for infrequent updates (e.g., updates occurring in minutes) rather than frequent updates (e.g. updates in milliseconds).

The update takes a multi-version approach and supports transactions. The system will create a new version to store the new data. Before committing the transaction, other SQL statements still can access the data in the old version. If data in multiple partitions needs to be updated, as long as the update on one partitions fails, the system will roll back all modifications on each partition.

Note:
  • The update statement cannot modify the data type of a column.
  • As of version 2.00.12, update with a context by clause supports csort and having keywords.
  • Since version 2.00.10, the partitioning columns of OLAP and TSDB tables (with keepDuplicated set to 'ALL') can be updated.
  • The update operation first loads the partitions where the data is stored, updates these partitions, and writes the partitions back to disk. Therefore, cross-partition updates are restricted in DolphinDB. Aggregate functions, order-sensitive functions, and user-defined functions cannot be used in the set clause for updates across partitions. However, these functions can be called when using update with context by specifying all partitioning columns, either as: (1) the original partitioning columns; (2) the VALUE or RANGE partitioning columns casted using time conversion functions, where the converted precision must match the partitioning scheme. See Example 2.

Syntax

update
    table_name
    set col1=X1, [col2=X2,…]
    [from table_joiner(table_names)]
    [where condition(s)]
    [context by col_name(s)] 
    [csort col_name(s) ASC|DESC]
    [having condition(s)]

Examples

Example 1. Update an in-memory table

sym = `C`MS`MS`MS`IBM`IBM`C`C`C$SYMBOL
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
t1 = table(timestamp, sym, qty, price);
t1;
timestamp sym qty price
09:34:07 C 2200 49.6
09:36:42 MS 1900 29.46
09:36:51 MS 2100 29.52
09:36:59 MS 3200 30.02
09:32:47 IBM 6800 174.97
09:35:26 IBM 5400 175.23
09:34:16 C 1300 50.76
09:34:26 C 2500 50.32
09:38:12 C 8800 51.29

Add a column vol whose values are all null:

update t1 set vol=long();
t1;
timestamp sym qty price vol
09:34:07 C 2200 49.6
09:36:42 MS 1900 29.46
09:36:51 MS 2100 29.52
09:36:59 MS 3200 30.02
09:32:47 IBM 6800 174.97
09:35:26 IBM 5400 175.23
09:34:16 C 1300 50.76
09:34:26 C 2500 50.32
09:38:12 C 8800 51.29
t1.drop!(`vol);

Update the table trades by adding 0.5 to the price column and subtracting 50 from the qty column for records with stock symbol C.

update t1 set price=price+0.5, qty=qty-50 where sym=`C;
t1;
timestamp sym qty price
09:34:07 C 2150 50.1
09:36:42 MS 1900 29.46
09:36:51 MS 2100 29.52
09:36:59 MS 3200 30.02
09:32:47 IBM 6800 174.97
09:35:26 IBM 5400 175.23
09:34:16 C 1250 51.26
09:34:26 C 2450 50.82
09:38:12 C 8750 51.79
update t1 set price=price-0.5, qty=qty+50 where sym=`C;

context by can make group adjustments; whereas contextby can't. The example below first calculates the average price for each stock with context by, then deducts the average price from each record's original price.

update t1 set price=price-avg(price) context by sym;
t1;
timestamp sym qty price
09:34:07 C 2150 -0.8925
09:36:42 MS 1900 -0.206667
09:36:51 MS 2100 -0.146667
09:36:59 MS 3200 0.353333
09:32:47 IBM 6800 -0.13
09:35:26 IBM 5400 0.13
09:34:16 C 1250 0.2675
09:34:26 C 2450 -0.1725
09:38:12 C 8750 0.7975

Updating a table with table joins:

item = table(1..10 as id, 10+rand(100,10) as qty, 1.0+rand(10.0,10) as price)
promotion = table(1..10 as id, rand(0b 1b, 10) as flag, 0.5+rand(0.4,10) as discount);
item;
id qty price
1 23 7.839664
2 44 7.635988
3 76 5.378054
4 91 8.078173
5 11 10.316152
6 58 9.510634
7 90 1.643082
8 68 5.787797
9 52 7.53352
10 62 6.222249
promotion;
id flag discount
1 0 0.650346
2 0 0.697081
3 0 0.774207
4 1 0.819562
5 0 0.710393
6 0 0.728223
7 1 0.602512
8 0 0.71226
9 1 0.606631
10 0 0.765697
update item set price = price*discount from ej(item, promotion, `id) where flag=1;
item;
id qty price
1 23 7.839664
2 44 7.635988
3 76 5.378054
4 91 6.620566
5 11 10.316152
6 58 9.510634
7 90 0.989976
8 68 5.787797
9 52 4.570069
10 62 6.222249

Example 2. Update a DFS table

login(`admin, `123456)
n=1000000
ID=rand(10, n)
x=rand(1.0, n)
t=table(ID, x)
db=database("dfs://rangedb123", RANGE,  0 5 10)
pt=db.createPartitionedTable(t, `pt, `ID)
pt.append!(t)
pt=loadTable("dfs://rangedb123", `pt)
select avg(x) from pt;
// output: 0.4999

update pt set x=x+1;

pt=loadTable("dfs://rangedb123", `pt)
select avg(x) from pt;
// output: 1.4999
// Order-sensitive functions are not allowed for cross-partition updates
update pt set x=prev(x) 

// update pt set x = prev(x) => Aggregate or order-sensitive functions are not allowed without context by clause when updating a partitioned table.

update pt set x=prev(x) context by ID
// The update operation can be executed when context by specifies all partitioning columns
select TOP 10* from pt order by ID

/*
ID	x
0	
0	1.4483
0	1.6277
0	1.7735
0	1.4349
*/
Example 3. Use csort and having in context by clause with sample table pt
login("admin", "123456")
dbName = "dfs://time_comparison"
if(existsDatabase(dbName))
	dropDatabase(dbName)
	
db = database(dbName, VALUE, [2019.12M,2020.01M,2020.02M,2020.03M,2021.01M,2021.02M,2021.03M,2021.12M,2022.01M],engine="TSDB")
n = 100
t = table(n:n,[`time,`id,`value],[MONTH,INT,DOUBLE])
t[`time] = take([2019.12M,2020.01M,2020.02M,2020.03M,2021.01M,2021.02M,2021.03M,2021.12M,2022.01M],n)
t[`id] = take(1..10,n)
t[`value] = rand(100.0,n)
pt = db.createPartitionedTable(t, `pt, `time, sortColumns=`time).append!(t)
pnodeRun(flushIotCache)

update pt set value = 1
select * from pt where time = 2019.12M order by time, id
table pt:
time id value
2019.12M 1 1
2019.12M 1 1
2019.12M 2 1
2019.12M 3 1
2019.12M 4 1
2019.12M 5 1
2019.12M 6 1
2019.12M 7 1
2019.12M 8 1
2019.12M 9 1
2019.12M 10 1
2019.12M 10 1
3.1 update pt with csort and having:
update pt set value = 1
update pt set value = cumsum(value) context by time csort id having sum(id) > 60;
select * from pt where time = 2019.12M order by time, id
time id value
2019.12M 1 1
2019.12M 1 2
2019.12M 2 3
2019.12M 3 4
2019.12M 4 5
2019.12M 5 6
2019.12M 6 7
2019.12M 7 8
2019.12M 8 9
2019.12M 9 10
2019.12M 10 11
2019.12M 10 12
3.2 Use having with aggregate and non-aggregate functions:
update pt set value = 1
update pt set value = cumsum(value) context by time csort id having denseRank(id) > 3 and sum(id) > 10000
select * from pt where time = 2019.12M order by time, id
time id value
2019.12M 1 1
2019.12M 1 1
2019.12M 2 1
2019.12M 3 1
2019.12M 4 1
2019.12M 5 1
2019.12M 6 1
2019.12M 7 1
2019.12M 8 1
2019.12M 9 1
2019.12M 10 1
2019.12M 10 1

Example 4. Use the following script to create a DFS table with an array vector column:

dbName = "dfs://updateArrayVector"
db = database(dbName,VALUE,2024.01.01..2024.01.03, engine='TSDB')
a = arrayVector(2 4 6 8 10 12, [1, 1, 1, 2, 1, 3, 1, 1, 1, 2, 1, 3])
date = take(2024.01.03 2024.01.02 2024.01.01, 6)
sym = take(`a`b`c, 6)
t = table(a as a, sym as sym, date as date)
pt=createPartitionedTable(db,t,`pt,partitionColumns=`date, sortColumns=`date)
pt.append!(t)
select * from pt
a sym date
[1, 3] c 2024.01.01
[1, 3] c 2024.01.01
[1, 2] b 2024.01.02
[1, 2] b 2024.01.02
[1, 1] a 2024.01.03
[1, 1] a 2024.01.03

Then update column 'a':

update pt set a = [2,2,3] where date = 2024.01.01
select * from pt where date = 2024.01.01
a sym date
[2, 2, 3] c 2024.01.01
[2, 2, 3] c 2024.01.01

Example 5. Update array vector columns in DFS tables with array vectors.

Update column a of table 'pt' created in the above example. The length of the array vector must match the number of rows in the table.

update pt set a=array(INT[], 0).append!([1 2 3, 4 5]) where date = 2024.01.02
select * from pt where date = 2024.01.02
a sym date
[1, 2, 3] b 2024.01.02
[4, 5] b 2024.01.02