# SQL Metaprogramming {#sql_metaprogramming}

SQL metaprogramming allows for the dynamic creation and manipulation of SQL statements, enabling queries to be defined, modified, or executed during runtime. DolphinDB includes two ways to dynamically generate SQL: with built-in functions or with macro variables.

**Parent topic:**[Metaprogramming](../../Programming/Metaprogramming/metaprogramming.md)

## Generating with Built-in Functions {#topic_umc_mr1_5cc}

In DolphinDB, SQL statements, particularly SELECT queries, can be broken down into several key components: select query body; table object; grouping fields; sorting fields; filtering conditions; the number of records to return.

To facilitate the dynamic generation of various SQL statements, DolphinDB provides specialized functions: `sql` \(for SELECT statements\), `sqlUpdate` \(for UPDATE statements\), `sqlDelete` \(for DELETE statements\). Each parameter of these functions corresponds to a specific clause in a SQL statement. Alternatively, you can use a combination of `parseExpr` and strings to generate metacode of SQL statements.

DolphinDB provides a suite of functions to facilitate dynamic SQL generation.

-   `sqlCol`: Generates metacode for selecting one or multiple columns with or without calculation. Aliases are supported. For example:

    ``` {#codeblock_g5k_nr1_5cc}
    // for a single column
    sqlCol("col")
    // output: <col>
    
    // for multiple columns
    sqlCol(["col0", "col1", ..., "colN"])
    // output: [<col0>, <col1>, ..., <colN>]
    
    // specify the applied function
    sqlCol("col", func = sum, alias = "newCol")
    // output: <sum(col) as newCol>
    
    // specify aliases
    sqlCol(["col0", "col1"], func=sum, alias=["newCol0", "newCol1"]) 
    // output： [<sum(col0) as newCol0>, <sum(col1) as newCol1>]
    ```

-   `sqlColAlias`: Uses metacode and an optional alias name to define a column. It can be used with `makeCall`/`makeUnifiedCall` to set alias for dynamic functions, i.e., `<func(cols.., args...)>`, or used with `expr`, `unifiedExpr`, and `binaryExpr` for multivariant expressions, such as `<a+b+c>`, `<a1*b1+a2*b2+... +an*bn>`.

    ``` {#codeblock_vyq_4r1_5cc}
    sqlColAlias(sqlCol("col"), "newCol") 
    // output: <col as newCol>
    
    sqlColAlias(makeCall(sum, sqlCol("col")), "newCol") 
    // output: <sum(col) as newCol>
    
    sqlColAlias(makeCall(corr, sqlCol("col0"), sqlCol("col1")), "newCol")
    // output: <corr(col0, col1) as newCol>
    ```

-   `parseExpr`: Converts strings into metacode. For example:

    ``` {#codeblock_ug4_pr1_5cc}
    parseExpr("select * from t") 
    // output: <select * from t>
    
    // generate metacode for the where condition used in the sql function
    parseExpr("time between 09:00:00 and 11:00:00")
    // output: < time between pair(09:00:00, 11:00:00) >
    ```


The following example demonstrates how to use the `sql` function to generate metacode of SQL statement, with a comprehensive breakdown of defining each component.

``` {#codeblock_zsr_qr1_5cc}
// the sample table
securityID = take(`GE,3) join take(`MSFT,3) join take(`F,3)
time=09:00:00 09:30:00 10:00:00 10:28:00 11:00:00 12:00:00 13:00:00 14:00:00 14:29:00
price=31.82 31.69 31.92 63.12 62.58 63.12 12.46 12.59 13.24 
volume=2300 3500 3700 1800 3800 6400 4200 5600 8900
t = table(securityID, time, price, volume);

// target SQL statement
select cumsum(price) as cumPrice from t
where time between 09:00:00 and 11:00:00
context by securityID csort time limit -1
```

Generate a dynamic SQL statement with each clause combined:

``` {#codeblock_br2_rr1_5cc}
selectCode = sqlColAlias(makeUnifiedCall(cumsum, sqlCol("price")), "cumPrice")
fromCode = "t"
whereCondition = parseExpr("time between 09:00:00 and 11:00:00")
contextByCol = sqlCol("securityID")
csortCol = sqlCol("time")
limitCount = -1
sql(select = selectCode, from = fromCode,
where = whereCondition, groupby = contextByCol,
groupFlag = 0, csort = csortCol, limit = limitCount)
```

Output:

``` {#codeblock_dzb_sr1_5cc}
< select cumsum(price) as cumPrice from objByName("t")
where time between pair(09:00:00, 11:00:00)
context by securityID csort time asc limit -1 >
```

The above code summarizes the rules for generating metacode with the `sql` function:

-   The query field must be declared with `sqlCol` or `sqlColAlias`.
-   Metacode for column calculations:
    -   Single-column: the `sqlCol` function specifies the *func* parameter.
    -   Multi-column: `sqlColAlias` function with `makeCall` or `makeUnifiedCall` function.
-   A table object can be a table variable name string, a table variable, or a handle returned by `loadTable`.

In a comparable manner, the `sqlUpdate` and `sqlDelete` functions can be employed to produce metacode for UPDATE and DELETE statements respectively.

``` {#codeblock_ebw_tr1_5cc}
// generate metacode of UPDATE statement
sqlUpdate(t, <price*2 as updatedPrice>)
// generate metacode of DELETE statement
sqlDelete(t, <time = 10:00:00>)
```

## Generating with `<select statement>` Using Macro Variables {#topic_xg3_5r1_5cc}

Since version 2.00.12/3.00.00, the use of macro variables is introduced for metaprogramming, allowing to generate metacode of SQL statements in a more intuitive manner through `<select statement>`.

The declaration of macro variables depends on whether the variable passed into the metacode is defined as a scalar or a vector:

-   For a scalar, use "\_$" \(for single-column macro variables\). For example, a variable is defined as `name="sym"`. The metacode can be `<SELECT _$name FROM t>`, which will be parsed as `SELECT sym FROM t` during execution.
-   For a vector, use "\_$$" \(for multi-column macro variables\). For example, a variable is defined as `names=["sym", "time"]`. The metacode can be `<SELECT _$$name FROM t>`, which will be parsed as `SELECT sym, time FROM t` during execution.

To make macro variable take effect, the following MUST be noted:

-   The variable name must be a string and variables must conform to the [Naming Rules](../Objects/Variable/Variable.md).
-   It can only be applied to columns, column aliases, parameters of functions, expressions, etc. Macro variables applied to functions and expressions are treated as a tuple, where each element corresponds to one column.
-   Currently, only SELECT statement is supported. Macro variables cannot applied to UPDATE and DELETE statements.
-   It can only be used with clauses WHERE, GROUP BY, PIVOT BY, CONTEXT BY, CSORT, and ORDER BY. Clauses such as CASE WHEN, OVER, and nested queries followed by FROM are not supported yet.
-   When used with CSORT and ORDER BY clauses, only single-column macro variable \(with "\_$"\) is supported.

For example, to obtain the same SQL query generated in the above section, you can use the following script with macro variables:

``` {#codeblock_m3w_zr1_5cc}
col = "price"
contextByCol = "securityID"
csortCol = "time"
a = 09:00:00
b = 11:00:00
<select cumsum(_$col) from t where _$csortCol between a and b
  context by _$contextByCol csort _$csortCol limit -1>
```

Another example of how to select the sum of multiple columns in a table. Given the data:

``` {#codeblock_qb5_1s1_5cc}
x1 = 1 3 5 7 11 16 23
x2 = 2 8 11 34 56 54 100
x3 = 8 12 81 223 501 699 521
y = 0.1 4.2 5.6 8.8 22.1 35.6 77.2;
t = table(y, x1, x2, x3)
```

-   Using the `sql` function.

    ``` {#codeblock_hny_bs1_5cc}
    name = [`y,`x1]
    alias = [`y1, `x11]
    sql(select = sqlCol(name, sum, alias), from = t).eval()
    ```

-   Using macro variables.

    ``` {#codeblock_jfs_cs1_5cc}
    name = [`y,`x1]
    alias = [`y1, `x11]
    <select sum:V(_$$name) as _$$alias from t>.eval()
    ```


## Examples {#topic_nln_ls1_5cc}

Examples for single-column macro variables.

\(1\) Used to select a single column.

``` {#codeblock_tc3_gcy_2bc}
t = table(take(2023.01M+1 2 1 1 2,5) as month, take(`Rome`London`London`Rome,5) as city, take(200 500 100 300 300, 5) as sold1, take(2100 1500 1100 3100 2300,5) as sold2)
selectCity="city"
<select _$selectCity from t>.eval()
```

Output:

|city|
|----|
|Rome|
|London|
|London|
|Rome|
|Rome|

\(2\) Used as the argument of a unary function to operate on a single column.

``` {#codeblock_ev1_jcy_2bc}
selectCity="city"
selectSold1="sold1"
alias="avg_sold1"
<select avg(_$selectSold1) as _$alias from t group by _$selectCity>.eval()
```

Output:

|city|avg\_sold1|
|----|----------|
|Rome|266.6667|
|London|300|

Examples for multi-column macro variables.

\(1\) Used to select multiple columns.

``` {#codeblock_fyv_kcy_2bc}
colNames=["city", "sold1"]
<select _$$colNames from t>.eval()
```

Output:

|city|sold1|
|----|-----|
|Rome|200|
|London|500|
|London|100|
|Rome|300|
|Rome|300|

\(2\) Each of the corresponding columns is passed as an argument to the function.

``` {#codeblock_ogp_mcy_2bc}
colNames=["sold1", "sold2"]
alias = "sum_sold"

<select rowSum(_$$colNames) as _$alias from t>.eval()
```

Output:

|sum\_sold|
|---------|
|2,300|
|2,000|
|1,200|
|3,400|
|2,600|

\(3\) Used with higher-order functions.

-   Apply a function to multiple columns by row. The following example calculates the sum for each row of column "sold1" and "sold2". It returns the same result as the example \(2\) above.

    ``` {#codeblock_z3g_4cy_2bc}
    colNames=["sold1", "sold2"]
    alias = "sum_sold"
    
    <select sum:H(_$$colNames) as _$alias from t>.eval()
    ```


-   Apply a function to multiple columns by column. The following example calculates the sum for columns "sold1" and "sold2".

    ``` {#codeblock_esb_qcy_2bc}
    colNames=["sold1", "sold2"]
    alias=["sum_sold1", "sum_sold2"]
    
    <select  sum:V(_$$colNames) as _$$alias from t>.eval()
    ```

    Output:

    |sum\_sold1|sum\_sold2|
    |----------|----------|
    |1,400|10,100|


