# createAsofJoinEngine {#createasofjoinengine}

**Parent topic:**[Functions](../../Functions/category.md)

## Syntax {#syntax}

`createAsofJoinEngine(name, leftTable, rightTable, outputTable, metrics, matchingColumn, [timeColumn], [useSystemTime=false], [delayedTime], [garbageSize], [sortByTime], [outputElapsedMicroseconds=false], \[snapshotDir\], \[snapshotIntervalInMsgCount\])`

## Details {#details}

Create an asof join streaming engine. Streams are ingested into the left table and the right table and joined on *matchingColumn* and *timeColumn* \(or system time\). For each record in the left table, join it with the right table record \(1\) with matching *matchingColumn* value and \(2\) whose timestamp is the last of the timestamps that are less than or equal to the timestamp of the left table record. This function returns a table object holding the asof join results.

Asof join engine joins records that have no exact match on time columns. For each timestamp in one table, the engine obtains the latest \(i.e., current as of the timestamp\) value from another table.

Note:

-   The records in the left table and the right table must be sequenced by time.

-   If *delayedTime* is not specified, a join operation is only triggered when the right table receives a record whose timestamp is greater than the timestamp of the latest record in the left table.

-   If *delayedTime* is specified, a join operation is triggered when either of the following conditions is met:

    -   In the left table, the difference between the timestamp of the latest record and the timestamp of the previous unjoined record is greater than *delayedTime*.

    -   The record is still not joined after 2 \* *delayedTime* or 2 seconds, whichever is larger, since its ingestion into the left table.


For more application scenarios, see [Streaming Engines](../../Streaming/streaming_engines.md).

## Arguments {#arguments}

**name** is a string indicating the name of the asof join engine. It is the unique identifier of the engine on a data/compute node. It can contain letters, numbers and underscores and must start with a letter.

**leftTable** and **rightTable** are table objects whose schema must be the same as the stream table to which the engine subscribes. Since version 2.00.11, array vectors are allowed for *leftTable* and *rightTable*.

**outputTable** is a table to which the engine inserts calculation result. It can be an in-memory table or a DFS table. Before calling a function, an empty table with specified column names must be created.Since version 2.00.11, array vectors are allowed for *outputTable*.

The columns of *outputTable* are in the following order:

\(1\) The first column must be a temporal column.

-   if *useSystemTime* = true, the data type must be TIMESTAMP;

-   if *useSystemTime* = false, it has the same data type as *timeColumn*.


\(2\) Then followed by one or more columns on which the tables are joined, arranged in the same order as specified in *matchingColumn*.

\(3\) Further followed by one or more columns which are the calculation results of *metrics*.

**metrics** is metacode \(can be a tuple\) specifying the calculation formulas. For more information about metacode, refer to [Metaprogramming](https://test.dolphindb.cn/en/Programming/Metaprogramming/MetacodeWithFunction.html).

-   *metrics* can use one or more expressions, built-in or user-defined functions \(but not aggregate functions\), or a constant scalar/vector. Note that the output column for a constant vector must be in array vector form.

-   *metrics* can be functions with multiple returns and the columns in the output table to hold the return values must be specified. For example, &lt;func\(price\) as \`col1\`col2&gt;.

-   To specify a column that exists in both the left and the right tables, use the format *tableName.colName*.

-   The column names specified in *metrics* are not case-sensitive and can be inconsistent with the column names of the input tables.


**matchingColumn** is a STRING scaler/vector/tuple indicating the column\(s\) on which the tables are joined. It supports integral, temporal or literal \(except UUID\) types.

-   When there is only 1 column to match - If the names of the columns to match are the same in both tables, *matchingColumn* should be specified as a STRING scalar; otherwise it's a tuple of two elements. For example, if the column is named "sym" in the left table and "sym1" in the right table, then *matchingColumn* = \[\[\`sym\],\[\`sym1\]\].
-   When there are multiple columns to match - If both tables share the names of all columns to match, *matchingColumn* is a STRING vector; otherwise it's a tuple of two elements. For example, if the columns are named "orderNo" and "sym" in the left table, whereas in the right table they're named "orderNo" and "sym1", then *matchingColumn* = \[\[\`orderNo, \`sym\], \[\`orderNo,\`sym1\]\].

**timeColumn** \(optional\) specifies the name of the time column in the left table and the right table. The time columns must have the same data type. When *useSystemTime* = false, it must be specified. If the names of the time column in the left table and the right table are the same, *timeColumn* is a string. Otherwise, it is a vector of 2 strings indicating the time column in each table.

**useSystemTime** \(optional\) indicates whether the left table and the right table are joined on the system time, instead of on *timeColumn*.

-   *useSystemTime* = true: join records based on the system time \(timestamp with millisecond precision\) when they are ingested into the engine.
-   *useSystemTime* = false \(default\): join records based on the specified \*timeColumn\* from the left table and the right table.

**delayedTime** \(optional\) is a positive integer with the same precision as *timeColumn*, indicating the maximum time to wait before the engine joins an uncalculated record in the left table with a right table record. To specify *delayedTime*, *timeColumn* must be specified. For more information, see Details.

**garbageSize** \(optional\) is a positive integer with the default value of 5,000 \(rows\). As the subscribed data is ingested into the engine, it continues to take up the memory. Within the left/right table, the records are grouped by *matchingColumn* values; When the number of records in a group exceeds *garbageSize*, the system will remove those already been calculated from memory.

**sortByTime** \(optional\) is a Boolean value that indicates whether the output data is globally sorted by time. The default value is false, meaning the output data is sorted only within groups. Note that if *sortByTime* is set to true, the parameter *delayedTime* cannot be specified, and the data input to the left and right tables must be globally sorted.

To enable snapshot in the streaming engines, specify parameters *snapshotDir* and *snapshotIntervalInMsgCount*.

**snapshotDir** \(optional\) is a string indicating the directory where the streaming engine snapshot is saved. The directory must already exist, otherwise an exception is thrown. If the *snapshotDir* is specified, the system checks whether a snapshot already exists in the directory when creating a streaming engine. If it exists, the snapshot will be loaded to restore the engine state. Multiple streaming engines can share a directory where the snapshot files are named as the engine names.

The file extension of a snapshot can be:

-   *&lt;engineName&gt;.tmp*: a temporary snapshot
-   *&lt;engineName&gt;.snapshot*: a snapshot that is generated and flushed to disk
-   *&lt;engineName&gt;.old*: if a snapshot with the same name already exists, the previous snapshot is renamed to *&lt;engineName&gt;.old*.

**snapshotIntervalInMsgCount** \(optional\) is a positive integer indicating the number of messages to receive before the next snapshot is saved.

## Examples {#examples}

```
share streamTable(1:0, `time`sym`price, [TIMESTAMP, SYMBOL, DOUBLE]) as trades
share streamTable(1:0, `time`sym`bid`ask, [TIMESTAMP, SYMBOL, DOUBLE, DOUBLE]) as quotes
share table(100:0, `time`sym`price`bid`ask`spread, [TIMESTAMP, SYMBOL, DOUBLE, DOUBLE, DOUBLE, DOUBLE]) as prevailingQuotes

ajEngine=createAsofJoinEngine(name="aj1", leftTable=trades, rightTable=quotes, outputTable=prevailingQuotes, metrics=<[price, bid, ask, abs(price-(bid+ask)/2)]>, matchingColumn=`sym, timeColumn=`time, useSystemTime=false)
tmp1=table(2020.08.27T09:30:00.000+2 8 20 22 23 24 as time, take(`A`B, 6) as sym, 20.01 20.04 20.07 20.08 20.4 20.5 as price)
tmp2=table(2020.08.27T09:30:00.000+1 5 6 11 19 20 21 as time, take(`A`B, 7) as sym, 20 20.02 20.03 20.05 20.06 20.6 20.4 as bid,  20.01 20.03 20.04 20.06 20.07 20.5 20.6 as ask)
tmp1.sortBy!(`time)
tmp2.sortBy!(`time)

subscribeTable(tableName="trades", actionName="joinLeft", offset=0, handler=appendForJoin{ajEngine, true}, msgAsTable=true)
subscribeTable(tableName="quotes", actionName="joinRight", offset=0, handler=appendForJoin{ajEngine, false}, msgAsTable=true)

trades.append!(tmp1)
quotes.append!(tmp2)

sleep(100)
select time, sym, bid from prevailingQuotes

/* output
time	sym	bid
2020.08.27T09:30:00.002	A	20
2020.08.27T09:30:00.020	A	20.06
2020.08.27T09:30:00.008	B	20.02
*/
```

```
// clean environment
unsubscribeTable(tableName="trades", actionName="joinLeft")
unsubscribeTable(tableName="quotes", actionName="joinRight")
undef(`trades,SHARED)
undef(`quotes,SHARED)
dropAggregator(name="aj1")

//define an asof join engine and set sortByTime=true
share streamTable(1:0, `time`sym`price, [TIMESTAMP, SYMBOL, DOUBLE]) as trades
share streamTable(1:0, `time`sym`bid`ask, [TIMESTAMP, SYMBOL, DOUBLE, DOUBLE]) as quotes
share table(100:0, `time`sym`price`bid`ask`spread, [TIMESTAMP, SYMBOL, DOUBLE, DOUBLE, DOUBLE, DOUBLE]) as prevailingQuotes
ajEngine=createAsofJoinEngine(name="aj1", leftTable=trades, rightTable=quotes, outputTable=prevailingQuotes, metrics=<[price, bid, ask, abs(price-(bid+ask)/2)]>, matchingColumn=`sym, timeColumn=`time, useSystemTime=false, sortByTime=true)

tmp1=table(2020.08.27T09:30:00.000+2 8 20 22 23 24 as time, take(`A`B, 6) as sym, 20.01 20.04 20.07 20.08 20.4 20.5 as price)
tmp2=table(2020.08.27T09:30:00.000+1 5 6 11 19 20 21 as time, take(`A`B, 7) as sym, 20 20.02 20.03 20.05 20.06 20.6 20.4 as bid,  20.01 20.03 20.04 20.06 20.07 20.5 20.6 as ask)
tmp1.sortBy!(`time)
tmp2.sortBy!(`time)

//only appendForJoin can be used to insert data
subscribeTable(tableName="trades", actionName="joinLeft", offset=0, handler=appendForJoin{ajEngine, true}, msgAsTable=true)
subscribeTable(tableName="quotes", actionName="joinRight", offset=0, handler=appendForJoin{ajEngine, false}, msgAsTable=true)

trades.append!(tmp1)
quotes.append!(tmp2)

sleep(100)

//check the output table
select time, sym, bid from prevailingQuotes

/* output
time                   sym   bid
2020.08.27T09:30:00.002      A       20
2020.08.27T09:30:00.008      B       20.02
2020.08.27T09:30:00.020      A       20.06
*/
```

