asof join

Syntax

aj(leftTable, rightTable, matchingCols, [rightMatchingCols])

Arguments

leftTable and rightTable are the tables to be joined.

matchingCols is a STRING scalar/vector indicating matching columns.

rightMatchingCols (optional) is a STRING scalar/vector indicating all the matching columns in rightTable . This argument must be specified if at least one of the matching columns has different names in leftTable and rightTable. The joining column names in the result will be the joining column names from the left table.

Details

The asof join function is used in non-synchronous join. It is similar to the left join function with the following differences:

  • Assume the last matching column is "time". For a row in the left table with time=t, among the rows in the right table that match all other matching columns, if there is not a record with time=t, select the last row before time=t.
  • If there is only 1 joining column, the asof join function assumes the right table is sorted on the joining column. If there are multiple joining columns, the asof join function assumes the right table is sorted on the last joining column within each group defined by the other joining columns. The right table does not need to be sorted by the other joining columns. If these conditions are not met, unexpected results may be returned. The left table does not need to be sorted.

Note:

  • If the left table of asof join is not a DFS table, its right table cannot be a DFS table either.
  • The data type of the last matching column is usually of temporal types. It can also be of integral types, UUID or IPADDR type.
  • If either the left table or the right table is a partitioned table, the joining columns except the last one must include all of the partitioning columns of the partitioned tables.

Examples

t1 = table(2015.01.01+(0 31 59 90 120) as date, 1.2 7.8 4.6 5.1 9.5 as value)
t2 = table(2015.02.01+(0 15 89 89) as date, 1..4 as qty);
t1;
date value
2015.01.01 1.2
2015.02.01 7.8
2015.03.01 4.6
2015.04.01 5.1
2015.05.01 9.5
t2;
date qty
2015.02.01 1
2015.02.16 2
2015.05.01 3
2015.05.01 4
select * from lsj(t1, t2, `date);
date value qty
2015.01.01 1.2
2015.02.01 7.8 1
2015.03.01 4.6
2015.04.01 5.1
2015.05.01 9.5 3
select * from aj(t1, t2, `date);
date value t2_date qty
2015.01.01 1.2
2015.02.01 7.8 2015.02.01 1
2015.03.01 4.6 2015.02.16 2
2015.04.01 5.1 2015.02.16 2
2015.05.01 9.5 2015.05.01 4
select * from aj(t1, t2, `date) where t1.date>=2015.03.01;
date value t2_date qty
2015.03.01 4.6 2015.02.16 2
2015.04.01 5.1 2015.02.16 2
2015.05.01 9.5 2015.05.01 4

A common usage of asof join is to join on the time field to retrieve the latest information. Suppose we have the following 3 tables, where the data have all been sorted on the column minute .

minute = 09:30m 09:32m 09:33m 09:35m
price = 174.1 175.2 174.8 175.2
t1 = table(minute, price)

minute = 09:30m 09:31m 09:33m 09:34m
price = 29.2 28.9 29.3 30.1
t2 = table(minute, price)

minute =09:30m 09:31m 09:34m 09:36m
price = 51.2 52.4 51.9 52.8
t3 = table(minute, price);

t1;
minute price
09:30m 174.1
09:32m 175.2
09:33m 174.8
09:35m 175.2
t2;
minute price
09:30m 29.2
09:31m 28.9
09:33m 29.3
09:34m 30.1
t3;
minute price
09:30m 51.2
09:31m 52.4
09:34m 51.9
09:36m 52.8
t2 = aj(t2, t3, `minute);
t2;
minute price t3_minute t3_price
09:30m 29.2 09:30m 51.2
09:31m 28.9 09:31m 52.4
09:33m 29.3 09:31m 52.4
09:34m 30.1 09:34m 51.9
aj(t1, t2, `minute);
minute price t2_minute t2_price t3_minute t3_price
09:30m 174.1 09:30m 29.2 09:30m 51.2
09:32m 175.2 09:31m 28.9 09:31m 52.4
09:33m 174.8 09:33m 29.3 09:31m 52.4
09:35m 175.2 09:34m 30.1 09:34m 51.9

Note that there are no matching records for t2 and t3 at 09:32m, so the prices at 09:31m from table t2 and t3, which are the latest, are applied. The price from table t3 at 09:33m and the prices from table t2 and t3 at 09:35m also use the latest prices to replace the missing records. This is extremely useful when we populate data for certain time points. For example, certain information is only updated weekly or monthly. In designing a daily trading strategy, we can use asof join to populate the daily datasets with less frequently updated information.

The last joining column is of type UUID:

t1 = table(2015.01.01 2015.02.01 2015.03.01 2015.04.01 2015.05.01 as date,  uuid(["5d212a78-cc48-e3b1-4235-b4d91473ee81", "5d212a78-cc48-e3b1-4235-b4d91473ee83", "5d212a78-cc48-e3b1-4235-b4d91473ee85", "5d212a78-cc48-e3b1-4235-b4d91473ee87", "5d212a78-cc48-e3b1-4235-b4d91473ee89"]) as uid)
t2 = table(2015.01.15 2015.01.20 2015.01.25 2015.03.01 as date,uuid(["5d212a78-cc48-e3b1-4235-b4d91473ee81", "5d212a78-cc48-e3b1-4235-b4d91473ee83", "5d212a78-cc48-e3b1-4235-b4d91473ee85", "5d212a78-cc48-e3b1-4235-b4d91473ee87"]) as uid)
select * from aj(t1, t2, `uid);
date uid t2_date t2_uid
2015.01.01 5d212a78-cc48-e3b1-4235-b4d91473ee81 2015.01.15 5d212a78-cc48-e3b1-4235-b4d91473ee81
2015.02.01 5d212a78-cc48-e3b1-4235-b4d91473ee83 2015.01.20 5d212a78-cc48-e3b1-4235-b4d91473ee83
2015.03.01 5d212a78-cc48-e3b1-4235-b4d91473ee85 2015.01.25 5d212a78-cc48-e3b1-4235-b4d91473ee85
2015.04.01 5d212a78-cc48-e3b1-4235-b4d91473ee87 2015.03.01 5d212a78-cc48-e3b1-4235-b4d91473ee87
2015.05.01 5d212a78-cc48-e3b1-4235-b4d91473ee89 2015.03.01 5d212a78-cc48-e3b1-4235-b4d91473ee87