Join DFS Tables
Table joins on DFS tables are not supported outside SQL statements. The reason is that joining DFS tables without any filters such as select, where clauses usually results in unnecessarily large size tables. Table joins on DFS tables only support left join(lj / lsj), equi join(ej), asof join(aj), window join(wj), full join(fj).
Note:
-
When performing asof join or window join on DFS tables, you need to specify at least two joining columns.
-
When a DFS table joins with an in-memory table or a dimension table, the system will copy the in-memory or dimension table to all the nodes where the DFS table is located for joining. If the table is of significant size, it could be very time consuming to transfer the data. To improve performance, the system attempts to filter the in-memory table as much as possible with the where conditions before data transfer. In this case if the right table is too large, the execution could be slow. It is recommended to use a small right table for this case.
Examples
Create a DFS database:
dates=2019.01.01..2019.01.31
syms="A"+string(1..30)
sym_range=cutPoints(syms,3)
db1=database("",VALUE,dates)
db2=database("",RANGE,sym_range)
db=database("dfs://stock",COMPO,[db1,db2])
n=10000
datetimes=2019.01.01T00:00:00..2019.01.31T23:59:59
t=table(take(datetimes,n) as trade_time,take(syms,n) as sym,rand(1000,n) as qty,rand(500.0,n) as price)
trades=db.createPartitionedTable(t,`trades,`trade_time`sym).append!(t)
n=200
t2=table(take(datetimes,n) as trade_time,take(syms,n) as sym,rand(500.0,n) as bid,rand(500.0,n) as offer)
quotes=db.createPartitionedTable(t2,`quotes,`trade_time`sym).append!(t2)
t3=table(syms as sym,take(0 1,30) as type)
infos=db.createDimensionTable(t3,`infos).append!(t3)
Example 1. Join 2 DFS table trades and quotes:
// output
select * from ej(trades,quotes,`trade_time`sym);
trade_time | sym | qty | price | bid | offer |
---|---|---|---|---|---|
2019.01.01T00:00:00 | A1 | 39 | 7.366735 | 37.933525 | 446.917644 |
2019.01.01T00:00:09 | A10 | 15 | 461.381014 | 405.092702 | 26.659516 |
2019.01.01T00:00:10 | A11 | 987 | 429.981704 | 404.289413 | 347.64917 |
2019.01.01T00:00:11 | A12 | 266 | 60.466206 | 420.426175 | 83.538043 |
2019.01.01T00:00:12 | A13 | 909 | 362.057769 | 324.886047 | 162.502655 |
2019.01.01T00:00:13 | A14 | 264 | 113.964472 | 497.598722 | 103.114702 |
2019.01.01T00:00:14 | A15 | 460 | 347.518325 | 24.584629 | 357.854207 |
2019.01.01T00:00:15 | A16 | 196 | 258.889177 | 49.467399 | 13.974672 |
2019.01.01T00:00:16 | A17 | 198 | 403.564922 | 428.539984 | 208.410852 |
2019.01.01T00:00:17 | A18 | 30 | 288.469046 | 41.905556 | 378.080141 |
... | ... | ... | ... | ... | ... |
Example 2. Join a DFS table and a dimension table.
// output
select * from lj(trades,infos,`sym);
trade_time | sym | qty | price | type |
---|---|---|---|---|
2019.01.01T00:00:00 | A1 | 856 | 359.809918 | 0 |
2019.01.01T00:00:09 | A10 | 368 | 305.801702 | 1 |
2019.01.01T00:00:10 | A11 | 549 | 447.406744 | 0 |
2019.01.01T00:00:11 | A12 | 817 | 115.613373 | 1 |
2019.01.01T00:00:12 | A13 | 321 | 298.317481 | 0 |
2019.01.01T00:00:13 | A14 | 3 | 2.289171 | 1 |
2019.01.01T00:00:14 | A15 | 586 | 91.841629 | 0 |
2019.01.01T00:00:15 | A16 | 745 | 43.256142 | 1 |
2019.01.01T00:00:16 | A17 | 60 | 0.153205 | 0 |
... | ... | ... | ... | ... |
Example 3. Join a DFS table and an in-memory table.
// output
tmp=table("A"+string(1..15) as sym,2019.01.11..2019.01.25 as date);
select * from ej(trades,tmp,`sym);
trade_time | sym | qty | price | date |
---|---|---|---|---|
2019.01.01T00:00:00 | A1 | 856 | 359.809918 | 2019.01.11 |
2019.01.01T00:00:09 | A10 | 368 | 305.801702 | 2019.01.20 |
2019.01.01T00:00:10 | A11 | 549 | 447.406744 | 2019.01.21 |
2019.01.01T00:00:11 | A12 | 817 | 115.613373 | 2019.01.22 |
2019.01.01T00:00:12 | A13 | 321 | 298.317481 | 2019.01.23 |
2019.01.01T00:00:13 | A14 | 3 | 2.289171 | 2019.01.24 |
2019.01.01T00:00:14 | A15 | 586 | 91.841629 | 2019.01.25 |
2019.01.01T00:00:30 | A1 | 390 | 325.407485 | 2019.01.11 |
... | ... | ... | ... | ... |
Please refer to TableJoiners for more information.