SQL Metaprogramming

SQL metaprogramming allows for the dynamic creation and manipulation of SQL statements, enabling queries to be defined, modified, or executed during runtime. DolphinDB includes two ways to dynamically generate SQL: with built-in functions or with macro variables.

Generating with Built-in Functions

In DolphinDB, SQL statements, particularly SELECT queries, can be broken down into several key components: select query body; table object; grouping fields; sorting fields; filtering conditions; the number of records to return.

To facilitate the dynamic generation of various SQL statements, DolphinDB provides specialized functions: sql (for SELECT statements), sqlUpdate (for UPDATE statements), sqlDelete (for DELETE statements). Each parameter of these functions corresponds to a specific clause in a SQL statement. Alternatively, you can use a combination of parseExpr and strings to generate metacode of SQL statements.

DolphinDB provides a suite of functions to facilitate dynamic SQL generation.

  • sqlCol: Generates metacode for selecting one or multiple columns with or without calculation. Aliases are supported. For example:
    // for a single column
    sqlCol("col")
    // output: <col>
    
    // for multiple columns
    sqlCol(["col0", "col1", ..., "colN"])
    // output: [<col0>, <col1>, ..., <colN>]
    
    // specify the applied function
    sqlCol("col", func = sum, alias = "newCol")
    // output: <sum(col) as newCol>
    
    // specify aliases
    sqlCol(["col0", "col1"], func=sum, alias=["newCol0", "newCol1"]) 
    // output: [<sum(col0) as newCol0>, <sum(col1) as newCol1>]
  • sqlColAlias: Uses metacode and an optional alias name to define a column. It can be used with makeCall/makeUnifiedCall to set alias for dynamic functions, i.e., <func(cols.., args...)>, or used with expr, unifiedExpr, and binaryExpr for multivariant expressions, such as <a+b+c>, <a1*b1+a2*b2+... +an*bn>.
    sqlColAlias(sqlCol("col"), "newCol") 
    // output: <col as newCol>
    
    sqlColAlias(makeCall(sum, sqlCol("col")), "newCol") 
    // output: <sum(col) as newCol>
    
    sqlColAlias(makeCall(corr, sqlCol("col0"), sqlCol("col1")), "newCol")
    // output: <corr(col0, col1) as newCol>
  • parseExpr: Converts strings into metacode. For example:
    parseExpr("select * from t") 
    // output: <select * from t>
    
    // generate metacode for the where condition used in the sql function
    parseExpr("time between 09:00:00 and 11:00:00")
    // output: < time between pair(09:00:00, 11:00:00) >
The following example demonstrates how to use the sql function to generate metacode of SQL statement, with a comprehensive breakdown of defining each component.
// the sample table
securityID = take(`GE,3) join take(`MSFT,3) join take(`F,3)
time=09:00:00 09:30:00 10:00:00 10:28:00 11:00:00 12:00:00 13:00:00 14:00:00 14:29:00
price=31.82 31.69 31.92 63.12 62.58 63.12 12.46 12.59 13.24 
volume=2300 3500 3700 1800 3800 6400 4200 5600 8900
t = table(securityID, time, price, volume);

// target SQL statement
select cumsum(price) as cumPrice from t
where time between 09:00:00 and 11:00:00
context by securityID csort time limit -1
Generate a dynamic SQL statement with each clause combined:
selectCode = sqlColAlias(makeUnifiedCall(cumsum, sqlCol("price")), "cumPrice")
fromCode = "t"
whereCondition = parseExpr("time between 09:00:00 and 11:00:00")
contextByCol = sqlCol("securityID")
csortCol = sqlCol("time")
limitCount = -1
sql(select = selectCode, from = fromCode,
where = whereCondition, groupby = contextByCol,
groupFlag = 0, csort = csortCol, limit = limitCount)
Output:
< select cumsum(price) as cumPrice from objByName("t")
where time between pair(09:00:00, 11:00:00)
context by securityID csort time asc limit -1 >
The above code summarizes the rules for generating metacode with the sql function:
  • The query field must be declared with sqlCol or sqlColAlias.
  • Metacode for column calculations:
    • Single-column: the sqlCol function specifies the func parameter.
    • Multi-column: sqlColAlias function with makeCall or makeUnifiedCall function.
  • A table object can be a table variable name string, a table variable, or a handle returned by loadTable.
In a comparable manner, the sqlUpdate and sqlDelete functions can be employed to produce metacode for UPDATE and DELETE statements respectively.
// generate metacode of UPDATE statement
sqlUpdate(t, <price*2 as updatedPrice>)
// generate metacode of DELETE statement
sqlDelete(t, <time = 10:00:00>)

Generating with <select statement> Using Macro Variables

Since verison 2.00.12/3.00.00, the use of macro variables is introduced for metaprogramming, allowing to generate metacode of SQL statements in a more intuitive manner through <select statement>.

The declaration of macro variables depends on whether the variable passed into the metacode is defined as a scalar or a vector:
  • For a scalar, use "_$" (for single-column macro variables). For example, a variable is defined as name="sym". The metacode can be <SELECT _$name FROM t>, which will be parsed as SELECT sym FROM t during execution.
  • For a vector, use "_$$" (for multi-column macro variables). For example, a variable is defined as names=["sym", "time"]. The metacode can be <SELECT _$$name FROM t>, which will be parsed as SELECT sym, time FROM t during execution.
To make macro variable take effect, the following MUST be noted:
  • The variable name must be a string and variables must conform to the Naming Rules.
  • It can only be applied to columns, column aliases, parameters of functions, expressions, etc. Macro variables applied to functions and expressions are treated as a tuple, where each element corresponds to one column.
  • Currently, only SELECT statement is supported. Macro variables cannot applied to UPDATE and DELETE statements.
  • It can only be used with clauses WHERE, GROUP BY, PIVOT BY, CONTEXT BY, CSORT, and ORDER BY. Clauses such as CASE WHEN, OVER, and nested queries followed by FROM are not supported yet.
  • When used with CSORT and ORDER BY clauses, only single-column macro variable (with "_$") is supported.
For example, to obtain the same SQL query generated in the above section, you can use the following script with macro variables:
col = "price"
contextByCol = "securityID"
csortCol = "time"
a = 09:00:00
b = 11:00:00
<select cumsum(_$col) from t where _$csortCol between a and b
  context by _$contextByCol csort _$csortCol limit -1>
Another example of how to select the sum of multiple columns in a table. Given the data:
x1 = 1 3 5 7 11 16 23
x2 = 2 8 11 34 56 54 100
x3 = 8 12 81 223 501 699 521
y = 0.1 4.2 5.6 8.8 22.1 35.6 77.2;
t = table(y, x1, x2, x3)
  • Using the sql function.
    name = [`y,`x1]
    alias = [`y1, `x11]
    sql(select = sqlCol(name, sum, alias), from = t).eval()
  • Using macro variables.
    name = [`y,`x1]
    alias = [`y1, `x11]
    <select sum:V(_$$name) as _$$alias from t>.eval()

Examples

Examples for single-column macro variables.

(1) Used to select a single column.

t = table(take(2023.01M+1 2 1 1 2,5) as month, take(`Rome`London`London`Rome,5) as city, take(200 500 100 300 300, 5) as sold1, take(2100 1500 1100 3100 2300,5) as sold2)
selectCity="city"
<select _$selectCity from t>.eval()

Output:

city
Rome
London
London
Rome
Rome

(2) Used as the argument of a unary function to operate on a single column.

selectCity="city"
selectSold1="sold1"
alias="avg_sold1"
<select avg(_$selectSold1) as _$alias from t group by _$selectCity>.eval()

Output:

city avg_sold1
Rome 266.6667
London 300

Examples for multi-column macro variables.

(1) Used to select multiple columns.

colNames=["city", "sold1"]
<select _$$colNames from t>.eval()

Output:

city sold1
Rome 200
London 500
London 100
Rome 300
Rome 300

(2) Each of the corresponding columns is passed as an argument to the function.

colNames=["sold1", "sold2"]
alias = "sum_sold"

<select rowSum(_$$colNames) as _$alias from t>.eval()

Output:

sum_sold
2,300
2,000
1,200
3,400
2,600

(3) Used with higher-order functions.

  • Apply a function to multiple columns by row. The following example calculates the sum for each row of column "sold1" and "sold2". It returns the same result as the example (2) above.
    colNames=["sold1", "sold2"]
    alias = "sum_sold"
    
    <select sum:H(_$$colNames) as _$alias from t>.eval()
  • Apply a function to multiple columns by column. The following example calculates the sum for columns "sold1" and "sold2".
    colNames=["sold1", "sold2"]
    alias=["sum_sold1", "sum_sold2"]
    
    <select  sum:V(_$$colNames) as _$$alias from t>.eval()
    Output:
    sum_sold1 sum_sold2
    1,400 10,100