update
Update records in a table.
Details
The update
statement can be used to update in-memory tables or DFS
tables (including partitioned and dimension tables) where data isn't being written.
The array vector columns in DFS tables can be updated with vectors or array
vectors.
For a partitioned table, update
will modify the entire partition
where the target data is stored. For a dimension table, the entire table will be
updated. Therefore, the update operation is only suitable for infrequent updates
(e.g., updates occurring in minutes) rather than frequent updates (e.g. updates in
milliseconds).
The update
takes a multi-version approach and supports transactions.
The system will create a new version to store the new data. Before committing the
transaction, other SQL statements still can access the data in the old version. If
data in multiple partitions needs to be updated, as long as the update on one
partitions fails, the system will roll back all modifications on each partition.
- The
update
statement cannot modify the data type of a column. - As of version 2.00.12,
update
with acontext by
clause supportscsort
andhaving
keywords. - Since version 2.00.10, the partitioning columns of OLAP and TSDB tables (with keepDuplicated set to 'ALL') can be updated.
- The update operation first loads the partitions where the data is stored,
updates these partitions, and writes the partitions back to disk. Therefore,
cross-partition updates are restricted in DolphinDB. Aggregate functions,
order-sensitive functions, and user-defined functions cannot be used in the
set
clause for updates across partitions. However, these functions can be called when usingupdate
withcontext by
specifying all partitioning columns, either as: (1) the original partitioning columns; (2) the VALUE or RANGE partitioning columns casted using time conversion functions, where the converted precision must match the partitioning scheme. See Example 2.
Syntax
update
table_name
set col1=X1, [col2=X2,…]
[from table_joiner(table_names)]
[where condition(s)]
[context by col_name(s)]
[csort col_name(s) ASC|DESC]
[having condition(s)]
Examples
Example 1. Update an in-memory table
sym = `C`MS`MS`MS`IBM`IBM`C`C`C$SYMBOL
price= 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
t1 = table(timestamp, sym, qty, price);
t1;
timestamp | sym | qty | price |
---|---|---|---|
09:34:07 | C | 2200 | 49.6 |
09:36:42 | MS | 1900 | 29.46 |
09:36:51 | MS | 2100 | 29.52 |
09:36:59 | MS | 3200 | 30.02 |
09:32:47 | IBM | 6800 | 174.97 |
09:35:26 | IBM | 5400 | 175.23 |
09:34:16 | C | 1300 | 50.76 |
09:34:26 | C | 2500 | 50.32 |
09:38:12 | C | 8800 | 51.29 |
Add a column vol whose values are all NULL:
update t1 set vol=long();
t1;
timestamp | sym | qty | price | vol |
---|---|---|---|---|
09:34:07 | C | 2200 | 49.6 | |
09:36:42 | MS | 1900 | 29.46 | |
09:36:51 | MS | 2100 | 29.52 | |
09:36:59 | MS | 3200 | 30.02 | |
09:32:47 | IBM | 6800 | 174.97 | |
09:35:26 | IBM | 5400 | 175.23 | |
09:34:16 | C | 1300 | 50.76 | |
09:34:26 | C | 2500 | 50.32 | |
09:38:12 | C | 8800 | 51.29 |
t1.drop!(`vol);
Update the table trades by adding 0.5 to the price column and subtracting 50 from the qty column for records with stock symbol C.
update t1 set price=price+0.5, qty=qty-50 where sym=`C;
t1;
timestamp | sym | qty | price |
---|---|---|---|
09:34:07 | C | 2150 | 50.1 |
09:36:42 | MS | 1900 | 29.46 |
09:36:51 | MS | 2100 | 29.52 |
09:36:59 | MS | 3200 | 30.02 |
09:32:47 | IBM | 6800 | 174.97 |
09:35:26 | IBM | 5400 | 175.23 |
09:34:16 | C | 1250 | 51.26 |
09:34:26 | C | 2450 | 50.82 |
09:38:12 | C | 8750 | 51.79 |
update t1 set price=price-0.5, qty=qty+50 where sym=`C;
context by
can make group adjustments; whereas
contextby
can't. The example below first calculates the average
price for each stock with context by
, then deducts the average
price from each record's original price.
update t1 set price=price-avg(price) context by sym;
t1;
timestamp | sym | qty | price |
---|---|---|---|
09:34:07 | C | 2150 | -0.8925 |
09:36:42 | MS | 1900 | -0.206667 |
09:36:51 | MS | 2100 | -0.146667 |
09:36:59 | MS | 3200 | 0.353333 |
09:32:47 | IBM | 6800 | -0.13 |
09:35:26 | IBM | 5400 | 0.13 |
09:34:16 | C | 1250 | 0.2675 |
09:34:26 | C | 2450 | -0.1725 |
09:38:12 | C | 8750 | 0.7975 |
Updating a table with table joins:
item = table(1..10 as id, 10+rand(100,10) as qty, 1.0+rand(10.0,10) as price)
promotion = table(1..10 as id, rand(0b 1b, 10) as flag, 0.5+rand(0.4,10) as discount);
item;
id | qty | price |
---|---|---|
1 | 23 | 7.839664 |
2 | 44 | 7.635988 |
3 | 76 | 5.378054 |
4 | 91 | 8.078173 |
5 | 11 | 10.316152 |
6 | 58 | 9.510634 |
7 | 90 | 1.643082 |
8 | 68 | 5.787797 |
9 | 52 | 7.53352 |
10 | 62 | 6.222249 |
promotion;
id | flag | discount |
---|---|---|
1 | 0 | 0.650346 |
2 | 0 | 0.697081 |
3 | 0 | 0.774207 |
4 | 1 | 0.819562 |
5 | 0 | 0.710393 |
6 | 0 | 0.728223 |
7 | 1 | 0.602512 |
8 | 0 | 0.71226 |
9 | 1 | 0.606631 |
10 | 0 | 0.765697 |
update item set price = price*discount from ej(item, promotion, `id) where flag=1;
item;
id | qty | price |
---|---|---|
1 | 23 | 7.839664 |
2 | 44 | 7.635988 |
3 | 76 | 5.378054 |
4 | 91 | 6.620566 |
5 | 11 | 10.316152 |
6 | 58 | 9.510634 |
7 | 90 | 0.989976 |
8 | 68 | 5.787797 |
9 | 52 | 4.570069 |
10 | 62 | 6.222249 |
Example 2. Update a DFS table
login(`admin, `123456)
n=1000000
ID=rand(10, n)
x=rand(1.0, n)
t=table(ID, x)
db=database("dfs://rangedb123", RANGE, 0 5 10)
pt=db.createPartitionedTable(t, `pt, `ID)
pt.append!(t)
pt=loadTable("dfs://rangedb123", `pt)
select avg(x) from pt;
// output: 0.4999
update pt set x=x+1;
pt=loadTable("dfs://rangedb123", `pt)
select avg(x) from pt;
// output: 1.4999
// Order-sensitive functions are not allowed for cross-partition updates
update pt set x=prev(x)
// update pt set x = prev(x) => Aggregate or order-sensitive functions are not allowed without context by clause when updating a partitioned table.
update pt set x=prev(x) context by ID
// The update operation can be executed when context by specifies all partitioning columns
select TOP 10* from pt order by ID
/*
ID x
0
0 1.4483
0 1.6277
0 1.7735
0 1.4349
*/
csort
and having
in context
by
clause with sample table
ptlogin("admin", "123456")
dbName = "dfs://time_comparison"
if(existsDatabase(dbName))
dropDatabase(dbName)
db = database(dbName, VALUE, [2019.12M,2020.01M,2020.02M,2020.03M,2021.01M,2021.02M,2021.03M,2021.12M,2022.01M],engine="TSDB")
n = 100
t = table(n:n,[`time,`id,`value],[MONTH,INT,DOUBLE])
t[`time] = take([2019.12M,2020.01M,2020.02M,2020.03M,2021.01M,2021.02M,2021.03M,2021.12M,2022.01M],n)
t[`id] = take(1..10,n)
t[`value] = rand(100.0,n)
pt = db.createPartitionedTable(t, `pt, `time, sortColumns=`time).append!(t)
pnodeRun(flushIotCache)
update pt set value = 1
select * from pt where time = 2019.12M order by time, id
table pt:time | id | value |
---|---|---|
2019.12M | 1 | 1 |
2019.12M | 1 | 1 |
2019.12M | 2 | 1 |
2019.12M | 3 | 1 |
2019.12M | 4 | 1 |
2019.12M | 5 | 1 |
2019.12M | 6 | 1 |
2019.12M | 7 | 1 |
2019.12M | 8 | 1 |
2019.12M | 9 | 1 |
2019.12M | 10 | 1 |
2019.12M | 10 | 1 |
update
pt with csort
and
having
:update pt set value = 1
update pt set value = cumsum(value) context by time csort id having sum(id) > 60;
select * from pt where time = 2019.12M order by time, id
time | id | value |
---|---|---|
2019.12M | 1 | 1 |
2019.12M | 1 | 2 |
2019.12M | 2 | 3 |
2019.12M | 3 | 4 |
2019.12M | 4 | 5 |
2019.12M | 5 | 6 |
2019.12M | 6 | 7 |
2019.12M | 7 | 8 |
2019.12M | 8 | 9 |
2019.12M | 9 | 10 |
2019.12M | 10 | 11 |
2019.12M | 10 | 12 |
having
with aggregate and non-aggregate
functions:update pt set value = 1
update pt set value = cumsum(value) context by time csort id having denseRank(id) > 3 and sum(id) > 10000
select * from pt where time = 2019.12M order by time, id
time | id | value |
2019.12M | 1 | 1 |
2019.12M | 1 | 1 |
2019.12M | 2 | 1 |
2019.12M | 3 | 1 |
2019.12M | 4 | 1 |
2019.12M | 5 | 1 |
2019.12M | 6 | 1 |
2019.12M | 7 | 1 |
2019.12M | 8 | 1 |
2019.12M | 9 | 1 |
2019.12M | 10 | 1 |
2019.12M | 10 | 1 |
Example 4. Use the following script to create a DFS table with an array vector column:
dbName = "dfs://updateArrayVector"
db = database(dbName,VALUE,2024.01.01..2024.01.03, engine='TSDB')
a = arrayVector(2 4 6 8 10 12, [1, 1, 1, 2, 1, 3, 1, 1, 1, 2, 1, 3])
date = take(2024.01.03 2024.01.02 2024.01.01, 6)
sym = take(`a`b`c, 6)
t = table(a as a, sym as sym, date as date)
pt=createPartitionedTable(db,t,`pt,partitionColumns=`date, sortColumns=`date)
pt.append!(t)
select * from pt
a | sym | date |
---|---|---|
[1, 3] | c | 2024.01.01 |
[1, 3] | c | 2024.01.01 |
[1, 2] | b | 2024.01.02 |
[1, 2] | b | 2024.01.02 |
[1, 1] | a | 2024.01.03 |
[1, 1] | a | 2024.01.03 |
Then update column 'a':
update pt set a = [2,2,3] where date = 2024.01.01
select * from pt where date = 2024.01.01
a | sym | date |
---|---|---|
[2, 2, 3] | c | 2024.01.01 |
[2, 2, 3] | c | 2024.01.01 |
Example 5. Update array vector columns in DFS tables with array vectors.
Update column a of table 'pt' created in the above example. The length of the array vector must match the number of rows in the table.
update pt set a=array(INT[], 0).append!([1 2 3, 4 5]) where date = 2024.01.02
select * from pt where date = 2024.01.02
a | sym | date |
---|---|---|
[1, 2, 3] | b | 2024.01.02 |
[4, 5] | b | 2024.01.02 |