registerSnapshotEngine
Syntax
registerSnapshotEngine(dbName, tableName, keyColumnName,
[dedupColumnName])
Arguments
dbName is a string indicating the name of a distributed database.
tableName is a string indicating the name of a table.
keyColumnName is a string indicating the name of a column.
dedupColumnName [optional] is a string indicating the column based on which data will be sorted in descending order within each group, retaining only the first record after sorting. In cluster mode,this parameter must be specified to keep the query results correct and consistent across cluster.
Details
Register a snapshot engine for a DFS table. After registering the snapshot engine, we can get the most recent row of each group specified by keyColumnName with the following SQL statement:
select [HINT_SNAPSHOT] * from loadTable(dbName,tableName).
- This engine is currently not supported in clusters with compute nodes.
- A DFS table can only be registered with one snapshot engine.
- After system restart, the snapshot engine needs to be re-registered. In cluster
mode, if any data node restarts, the snapshot engine needs to be re-registered.
First use unregisterSnapshotEngine to clear
snapshot information on other nodes, then call
registerSnapshotEngine
to re-register.
Examples
Create a partitioned DFS table:
db1=database("",VALUE,2018.09.01..2018.09.30)
db2=database("",VALUE,`AAPL`MSFT`MS`C)
db=database("dfs://compoDB",COMPO,[db1,db2])
t=table(1:0,`date`sym`val,[DATE,SYMBOL,DOUBLE])
pt=db.createPartitionedTable(t,`pt,`date`sym);
Register a snapshot engine to get the latest record for each symbol from the DFS table pt.
registerSnapshotEngine("dfs://compoDB","pt","sym");
Append data to pt. Then get the latest records. Note that this example uses randomly generated data, so your execution results may differ from those shown in this example.
def writeData(batch){
pt=loadTable("dfs://compoDB","pt")
tmp=table(batch:0,`date`sym`val,[DATE,SYMBOL,DOUBLE])
dates=take(2018.09.01..2018.09.30,batch)
syms=take(`AAPL`MSFT`MS`C,batch)
vals=rand(100.0,batch)
insert into tmp values(dates,syms,vals)
pt.append!(tmp)
}
writeData(1000);
select [HINT_SNAPSHOT] * from loadTable("dfs://compoDB","pt");
date | sym | val |
---|---|---|
2018.09.29 | AAPL | 24.945753 |
2018.09.29 | MS | 14.034453 |
2018.09.30 | C | 3.89175 |
2018.09.30 | MSFT | 17.720025 |
writeData(1000);
select [HINT_SNAPSHOT] * from loadTable("dfs://compoDB","pt");
date | sym | val |
---|---|---|
2018.09.29 | AAPL | 86.296883 |
2018.09.29 | MS | 48.17885 |
2018.09.30 | C | 83.7821 |
2018.09.30 | MSFT | 44.415456 |
When executing the script above in cluster mode, the query may return different results on different executions. To avoid inconsistent results, set the "date" column as dedupColumnName to sort the data of each group by date in descending order and retain only the latest record.
unregisterSnapshotEngine(dbName="dfs://compoDB", tableName="pt")
registerSnapshotEngine(dbName="dfs://compoDB", tableName="pt",keyColumnName="sym", dedupColumnName="date");
writeData(1000);
// same query result across the cluster
select [HINT_SNAPSHOT] * from loadTable("dfs://compoDB","pt");