# createEquiJoinEngine {#createequijoinengine}

**Parent topic:**[Functions](../../Functions/category.md)

## Syntax {#syntax}

`createEquiJoinEngine(name, leftTable, rightTable, outputTable, metrics, matchingColumn, timeColumn, [garbageSize=5000], [maxDelayedTime], \[snapshotDir\], \[snapshotIntervalInMsgCount\])`

Alias: createEqualJoinEngine

## Details {#details}

Create an equi join streaming engine. Streams are ingested into the engine through left and right tables and joined on *matchingColumn* and *timeColumn*. Return a table object that is the equi join result of a left table and a right table. The result holds all records with matching values.

The equi-join engine is designed to process keyed streams that have the join columns and the time column as the key. Since it does not cache all the historical data, a newly ingested record may not find a match in the other table if the matching record has already been removed from cache, leading to behavior that differs from a traditional SQL equi join.

For more application scenarios, see [Streaming Engines](../../Streaming/streaming_engines.md).

## Calculation Rules {#calculation-rules}

When data is ingested into one table, the equi join streaming engine searches for records with matching values in the other table. If matches are found, the engine outputs the combined records with additional columns holding the calculation results of *metrics*.

## Arguments {#arguments}

Some parameters of the equi join engine are the same as those of the asof join engine, please refer to the function [createAsofJoinEngine](createAsofJoinEngine.md) for detailed information. The different parameters are described as below:

**name** is a string indicating the name of the equi join streaming engine. It is the unique identifier of the engine on a data/compute node. It can contain letters, numbers and underscores and must start with a letter.

**timeColumn** is a STRING scalar or vector indicating the time columns in the left table and the right table. The time columns in the left and right tables must have the same data type. When the two time columns have the same column name, *timeColumn* is a string scalar; otherwise, *timeColumn* is vector of two strings.

**garbageSize** \(optional\) is a positive integer with the default value of 5,000 \(in unit of rows\). When the number of rows of historical data in memory exceeds the *garbageSize*, the system will remove the historical data that is not needed for the current calculation on the following conditions:

-   The historical data has already been joined and returned.

-   For historical data that has not been joined, if the timestamp difference between the historical data and the new arriving data in left/right table has exceeded the *maxDelayedTime*, it will also be discarded.


**maxDelayedTime** \(optional\) is a positive integer with the default value of 3 \(seconds\), indicating the maximum time to keep cached data in the engine. This parameter only takes effect when the conditions described in *garbageSize* are met. It is not recommended to set the *maxDelayedTime* too small in case data got removed before it is joined.

## Examples {#examples}

Example 1.

```
share streamTable(1:0, `time`sym`price, [SECOND, SYMBOL, DOUBLE]) as leftTable
share streamTable(1:0, `time`sym`val, [SECOND, SYMBOL, DOUBLE]) as rightTable
share table(100:0, `time`sym`price`val`total, [SECOND, SYMBOL, DOUBLE, DOUBLE, DOUBLE]) as output
ejEngine=createEquiJoinEngine("test1", leftTable, rightTable, output, [<price>, <val>, <price*val>], `sym, `time)
subscribeTable(tableName="leftTable", actionName="joinLeft", offset=0, handler=appendForJoin{ejEngine, true}, msgAsTable=true)
subscribeTable(tableName="rightTable", actionName="joinRight", offset=0, handler=appendForJoin{ejEngine, false}, msgAsTable=true)

tmp1=table(13:30:10+1..20 as time, take(`AAPL, 10) join take(`IBM, 10) as sym, double(1..20) as price)
leftTable.append!(tmp1)
tmp2=table(13:30:10+1..20 as time, take(`AAPL, 10) join take(`IBM, 10) as sym, double(50..31) as val)
rightTable.append!(tmp2)

select count(*) from output
// output: 20
```

Example 2. The type of the *timeColumn* is timestamp. The default value of *maxDelayedTime* is 3000ms \(3s\).

```
share streamTable(5000000:0, `timestamp`sym`price, [TIMESTAMP, SYMBOL, DOUBLE]) as leftTable
share streamTable(5000000:0, `timestamp`sym`val, [TIMESTAMP, SYMBOL, DOUBLE]) as rightTable
share table(5000000:0, `timestamp`sym`price`val`total`diff`ratio, [TIMESTAMP, SYMBOL, DOUBLE, DOUBLE, DOUBLE, DOUBLE, DOUBLE]) as output
ejEngine=createEquiJoinEngine("test1", leftTable, rightTable, output, <[price, val, price+val, price-val, price/val]>, `sym, `timestamp, 5000)
topic1=subscribeTable(tableName="leftTable", actionName="writeLeft", offset=0, handler=appendForJoin{ejEngine, true}, batchSize=10000, throttle=1)
topic2=subscribeTable(tableName="rightTable", actionName="writeRight", offset=0, handler=appendForJoin{ejEngine, false}, batchSize=10000, throttle=1)
def writeLeftTable(mutable tb){
   batch = 1000
   for(i in 1..300){
           tmp = table(batch:batch, `timestamp`sym`price, [TIMESTAMP, SYMBOL, DOUBLE])
           tmp[`timestamp]=take(2012.01.01T00:00:00.000+i, batch)
           tmp[`sym]=shuffle("A"+string(1..batch))
           tmp[`price]=rand(100.0, batch)
           tb.append!(tmp)
   }
}

def writeRightTable(mutable tb){
   batch = 500
   for(i in 1..200){
           tmp = table(batch:batch, `timestamp`sym`val, [TIMESTAMP, SYMBOL, DOUBLE])
           tmp[`timestamp]=take(2012.01.01T00:00:00.000+i, batch)
           tmp[`sym]=shuffle("A"+string(1..batch))
           tmp[`val]=rand(100.0, batch)
           tb.append!(tmp)
   }
}

job1 = submitJob("writeLeft", "", writeLeftTable, leftTable)
job2 = submitJob("writeRight", "", writeRightTable, rightTable)

select count(*) from output order by sym, timestamp
// output: 100000
```

