SQL Statement#

Swordfish executes SQL scripts using SQL functions to query data. The available SQL keywords and operators are consistent with those in DolphinDB, enabling full use of DolphinDB’s powerful capabilities for complex data analysis and processing within a Python environment.

Additionally, Swordfish offers SQL performance on par with DolphinDB, ensuring efficient query execution and data manipulation when working with Swordfish.

This article demonstrates SQL operations in Swordfish with a few simple examples.

Before demonstrating SQL operations, we first create two tables, t1 and t2, which will be used for subsequent operations.

# import pandas
import pandas as pd
import swordfish.function as F

sym = ["a", "b", "a", "b", "a", "a", "b", "a"]
timestamp = F.timestamp(["2024.08.22 14:31:15", "2024.08.22 14:31:20",
                        "2024.08.22 14:31:30", "2024.08.22 14:31:30", "2024.08.22 14:31:50",
                        "2024.08.22 14:32:00", "2024.08.22 14:32:15", "2024.08.22 14:32:30"])
vol = [1, 2, 3, 6, 5, 4, 8, 5]
price = [11.5, 12.8, 13.5, 11.6, 13.8, 12.9, 11.8, 13.3]
table1 = sf.table({'sym': sym, 'timestamp': timestamp, 'vol': vol, 'price': price},
                names=["sym", "timestamp", "vol", "price"])

table1

""" output:
sym timestamp               vol price
--- ----------------------- --- -----
a   2024.08.22T14:31:15.000 1   11.5
b   2024.08.22T14:31:20.000 2   12.8
a   2024.08.22T14:31:30.000 3   13.5
b   2024.08.22T14:31:30.000 6   11.6
a   2024.08.22T14:31:50.000 5   13.8
a   2024.08.22T14:32:00.000 4   12.9
b   2024.08.22T14:32:15.000 8   11.8
a   2024.08.22T14:32:30.000 5   13.3
"""

table2 = sf.table(pd.DataFrame({"sym": ["a", "b", "c"],
                                "timestamp": F.timestamp(["2024.08.22 14:31:30", "2024.08.22 14:31:35","2024.08.22 14:31:40"]),
                                "vol": [5, 6, 9], "price": [12.6, 13.2, 15.9]}))

table2

""" output:
sym timestamp                     vol price
--- ----------------------------- --- -----
a   2024.08.22T14:31:30.000000000 5   12.6
b   2024.08.22T14:31:35.000000000 6   13.2
c   2024.08.22T14:31:40.000000000 9   15.9
"""

SELECT/EXEC#

Access table data using SELECT or EXEC. SELECT produces a table, while EXEC returns a scalar or a vector.

# The SELECT statement returns a table
sf.sql("SELECT vol FROM table1", vars={"table1": table1})

""" output:
vol
---
1
2
3
6
5
4
8
5
"""

# The EXEC statement returns a vector
sf.sql("EXEC vol FROM table1", vars={"table1": table1})

# [1,2,3,6,5,4,8,5]

WHERE#

Using the WHERE statement to filter data based on specified conditions.

sf.sql("SELECT * FROM table1 WHERE VOL>5", vars={"table1": table1})

""" output:
sym timestamp               vol price
--- ----------------------- --- -----
b   2024.08.22T14:31:30.000 6   11.6
b   2024.08.22T14:32:15.000 8   11.8

"""

DISTINCT#

Using the DISTINCT statement to retrieve unique values from a query.

sf.sql("SELECT DISTINCT vol FROM table1 WHERE VOL>5", vars={"table1": table1})

""" output:
vol
---
6
8

"""

GROUP BY#

Using the GROUP BY statement to group data by values in specified columns.

sf.sql("SELECT sym, sum(vol) FROM table1 GROUP BY sym", vars={"table1": table1})

""" output:
sym sum_vol
--- -------
a   18
b   16

"""

CONTEXT BY#

CONTEXT BY is a unique feature in DolphinDB. It is an extension to ANSI SQL for convenient time-series data manipulation. Both CONTEXT BY and GROUP BY conduct grouping. However, with GROUP BY, each group returns a scalar value; with CONTEXT BY, each group returns a vector of the same size as the group’s records. The GROUP BY clause can only be used with aggregate functions, whereas the CONTEXT BY clause can be used with aggregate functions, moving window functions, cumulative functions, etc. The CONTEXT BY clause is often used with the UPDATE statement. The CONTEXT BY clause can also be used together with the HAVING clause. Please refer to the section about HAVING.

sf.sql("SELECT sym, price, vol, wavg(price,vol) AS wvap, sum(vol) AS totalvol FROM table1 CONTEXT BY sym", vars={"table1": table1})

""" output:
sym price vol wvap               totalvol
--- ----- --- ------------------ --------
a   11.5  1   13.283333333333333 18
a   13.5  3   13.283333333333333 18
a   13.8  5   13.283333333333333 18
a   12.9  4   13.283333333333333 18
a   13.3  5   13.283333333333333 18
b   12.8  2   11.849999999999999 16
b   11.6  6   11.849999999999999 16
b   11.8  8   11.849999999999999 16
"""

PIVOT BY#

PIVOT BY is a unique feature in DolphinDB and an extension to ANSI SQL. It rearranges a column (or multiple columns) of a table (with or without a data transformation function) on two dimensions. If the PIVOT BY clause includes more than 2 columns, the last column determines the columns in the result and all other columns determine the rows in the result. It produces a table with SELECT statement and a matrix with EXEC statement.

sf.sql("SELECT price FROM table1 PIVOT BY timestamp, sym", vars={"table1": table1})

""" output:
timestamp               a    b
----------------------- ---- ----
2024.08.22T14:31:15.000 11.5
2024.08.22T14:31:20.000      12.8
2024.08.22T14:31:30.000 13.5 11.6
2024.08.22T14:31:50.000 13.8
2024.08.22T14:32:00.000 12.9
2024.08.22T14:32:15.000      11.8
2024.08.22T14:32:30.000 13.3

"""

sf.sql("EXEC price FROM table1 PIVOT BY timestamp, sym", vars={"table1": table1})

""" output:
                        a    b
                        ---- ----
2024.08.22T14:31:15.000|11.5
2024.08.22T14:31:20.000|     12.8
2024.08.22T14:31:30.000|13.5 11.6
2024.08.22T14:31:50.000|13.8
2024.08.22T14:32:00.000|12.9
2024.08.22T14:32:15.000|     11.8
2024.08.22T14:32:30.000|13.3

"""

ORDER BY#

The ORDER BY clause is used to sort the result based on specified column(s).

sf.sql("SELECT sym, vol FROM table1 WHERE VOL<5 ORDER BY sym", vars={"table1": table1})

""" output:
sym vol
--- ---
a   1
a   3
a   4
b   2

"""

TOP/LIMIT#

The TOP/LIMIT clause is used to specify the number of records to return.

# Return the first 2 rows
sf.sql("SELECT TOP 2 sym, vol FROM table1", vars={"table1": table1})

""" output:
sym vol
--- ---
a   1
b   2

"""

# Return the first 2 rows
sf.sql("SELECT vol FROM table1 LIMIT 2", vars={"table1": table1})

""" output:
vol
---
1
2

"""

# Return the last 2 rows

# Return 4 rows starting from the 2nd row.
sf.sql("SELECT sym, vol FROM table1 LIMIT 2, 4", vars={"table1": table1})

""" output:
sym vol
--- ---
a   3
b   6
a   5
a   4

"""

HAVING#

The HAVING clause specifies filtering conditions for groups or group members. It is always used after GROUP BY or CONTEXT BY clause.

sf.sql("SELECT sum(vol) as totalvol from table1 GROUP BY sym HAVING sum(vol)>16", vars={"table1": table1})

""" output:
sym totalvol
--- --------
a   18

"""

interval function#

The interval function is used to fill missing values in the query results.

# Fill missing values with 100
sf.sql("SELECT max(vol) FROM table1 GROUP BY interval(timestamp, 10s, 100)", vars={"table1": table1})

""" output:
interval_timestamp      max_vol
----------------------- -------
2024.08.22T14:31:10.000 1
2024.08.22T14:31:20.000 2
2024.08.22T14:31:30.000 6
2024.08.22T14:31:40.000 100
2024.08.22T14:31:50.000 5
2024.08.22T14:32:00.000 4
2024.08.22T14:32:10.000 8
2024.08.22T14:32:20.000 100
2024.08.22T14:32:30.000 5

"""

UPDATE/DELETE#

Update records in a table using the UPDATE statement, or delete records using the DELETE statement.

sf.sql("UPDATE table2 SET vol=10 WHERE sym=`a", vars={"table2": table2})
table2

""" output:
sym timestamp                     vol price
--- ----------------------------- --- -----
a   2024.08.22T14:31:30.000000000 10   12.6
b   2024.08.22T14:31:35.000000000 6   13.2
c   2024.08.22T14:31:40.000000000 9   15.9
"""

sf.sql("DELETE FROM table2 WHERE sym=`a", vars={"table2": table2})
table2

""" output:
sym timestamp                     vol price
--- ----------------------------- --- -----
b   2024.08.22T14:31:35.000000000 6   13.2
c   2024.08.22T14:31:40.000000000 9   15.9
"""

SQL Functions and Expressions#

In SQL, Swordfish supports various functions and expressions, including aggregate functions, mathematical functions, and conditional expressions. This section focuses on these three categories.

Aggregate Functions#

Aggregate functions include count, max, min, avg, first, last, sum, and others. The following example demonstrates the use of avg.

sf.sql("SELECT avg(vol) FROM table1 GROUP BY sym", vars={"table1": table1})

""" output:
sym avg_vol
--- -----------------
a   3.6
b   5.333333333333333

"""

Mathematics#

Mathematical functions include abs, sin, log, exp, ceil, and others. The following example demonstrates the use of ceil.

sf.sql("SELECT price, ceil(price) FROM table1", vars={"table1": table1})

""" output:
price ceil_price
----- ----------
11.5  12
12.8  13
13.5  14
11.6  12
13.8  14
12.9  13
11.8  12
13.3  14

"""

Conditional Expressions#

Conditional expressions include coalesce, any/all, nullif, exists, and others. The following example demonstrates the use of nullif.

sf.sql("SELECT nullIf(vol, 5) FROM table1", vars={"table1": table1})

""" output:
nullIf_vol
----------
1
2
3
6

4
8

"""

Table joiners#

Table joiners combine data from multiple tables. With the join condition specified, related data from different tables can be obtained for complex queries. This section demonstrates how to perform table joins in Swordfish, with asof join and window join as examples.

asof join (aj)#

The asof join function is used in non-synchronous join. It is an extension to ANSI SQL for convenient time-series data manipulation.

sf.sql("SELECT * FROM aj(table1, table2, 'timestamp')", vars={"table1": table1, "table2": table2})

""" output:
sym timestamp               vol price table2_sym table2_timestamp        table2_vol table2_price
--- ----------------------- --- ----- ---------- ----------------------- ---------- ------------
a   2024.08.22T14:31:15.000 1   11.5
b   2024.08.22T14:31:20.000 2   12.8
a   2024.08.22T14:31:30.000 3   13.5  a          2024.08.22T14:31:30.000 5          12.6
b   2024.08.22T14:31:30.000 6   11.6  a          2024.08.22T14:31:30.000 5          12.6
a   2024.08.22T14:31:50.000 5   13.8  c          2024.08.22T14:31:40.000 9          15.9
a   2024.08.22T14:32:00.000 4   12.9  c          2024.08.22T14:31:40.000 9          15.9
b   2024.08.22T14:32:15.000 8   11.8  c          2024.08.22T14:31:40.000 9          15.9
a   2024.08.22T14:32:30.000 5   13.3  c          2024.08.22T14:31:40.000 9          15.9

window join (wj)#

window join is an extension of asof join, implemented through the wj function.

sf.sql("SELECT * FROM wj(table1, table2, -60s:0s, <max(price)>, 'timestamp')", vars={"table1": table1, "table2": table2})

""" output:
sym timestamp               vol price max_price
--- ----------------------- --- ----- ---------
a   2024.08.22T14:31:15.000 1   11.5
b   2024.08.22T14:31:20.000 2   12.8
a   2024.08.22T14:31:30.000 3   13.5  12.6
b   2024.08.22T14:31:30.000 6   11.6  12.6
a   2024.08.22T14:31:50.000 5   13.8  15.9
a   2024.08.22T14:32:00.000 4   12.9  15.9
b   2024.08.22T14:32:15.000 8   11.8  15.9
a   2024.08.22T14:32:30.000 5   13.3  15.9