SQL Application#
This article describes how to use SQL statements in Python with Swordfish.
Create an In-Memory Table#
You can create a table using the CREATE TABLE statement.
import swordfish as sf
# Create an in-memory table
sf.sql('''
CREATE TABLE trades (
symbol SYMBOL,
price DOUBLE,
volume INT,
timestamp TIMESTAMP
)
''')
The created table object trades is managed within Swordfish.
You can check all variables managed by Swordfish using sf.function.objs().
Insert Data#
Swordfish supports multiple data insertion methods through the INSERT statement, including single-row insertion, batch insertion, and data import from another table.
# Insert a single row
sf.sql("INSERT INTO trades VALUES ('AAPL', 182.5, 1000, 2023.10.01 09:30:00.000)")
# Batch insert from a query result
sf.sql('''
CREATE TABLE temp_data (
id INT,
name STRING,
category STRING
)
''')
source_table = sf.table({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie'], 'category': ['tech', 'finance', 'tech']})
sf.sql('INSERT INTO temp_data SELECT * FROM source_table WHERE category="tech"', vars={'source_table': source_table})
Update and Delete Data#
Use the UPDATE and DELETE statements to perform conditional updates and batch deletions.
sf.sql('''
UPDATE trades
SET price = price * 1.05
WHERE symbol = 'AAPL' AND timestamp < 2023.09.01
''')
sf.sql("DELETE FROM trades")
Access and Query Data#
The SELECT statement enables flexible queries and computations, and it works on both Swordfish tables and Python table variables.
# Query data from a table in Swordfish
sf.sql('SELECT symbol, price FROM trades WHERE volume > 100')
# Query data from pandas.DataFrame object
import pandas as pd
source_table = pd.DataFrame({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie'], 'category': ['tech', 'finance', 'tech']})
sf.sql('SELECT * FROM source_table WHERE category="tech"', vars={'source_table': source_table})
Grouping and Aggregation#
Swordfish provides rich aggregate functions and supports both GROUP BY and the extended CONTEXT BY syntax for grouped computations.
GROUP BY#
GROUP BY groups data by specified columns, and applies aggregate functions (such as sum or avg)
to each group. The result of the GROUP BY for each group is a scalar.
Example: Calculate the average score of each subject in every class.
# Simulate the scores
data = {
"class": ["A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B"],
"student": ["Alice", "Alice", "Alice", "Bob", "Bob", "Bob", "Charlie", "Charlie", "Charlie", "David", "David", "David", "Eve", "Eve", "Eve", "Frank", "Frank", "Frank", "Grace", "Grace", "Grace"],
"subject": ["Math", "Science", "English", "Math", "Science", "English", "Math", "Science", "English", "Math", "Science", "English", "Math", "Science", "English", "Math", "Science", "English", "Math", "Science", "English"],
"score": [85, 92, 78, 88, 91, 85, 90, 87, 79, 84, 95, 80, 89, 90, 76, 83, 85, 88, 92, 79, 81]
}
# Calculate the average score of each subject in every class
groupbyTable = sf.table(data)
sf.sql('''
SELECT
avg(score) AS avg_score
FROM groupbyTable
GROUP BY class, subject
''', vars={'groupbyTable': groupbyTable})
CONTEXT BY#
Unlike GROUP BY, CONTEXT BY supports vectorized computation and can be combined with aggregate, moving window, or cumulative functions.
Example: Calculate the difference between each student’s score and the class average for each subject.
sf.sql('''
SELECT
class,
student,
subject,
score - avg(score) as deltas
FROM groupbyTable
CONTEXT BY class, subject
''', vars={'groupbyTable': groupbyTable})
Example: Calculate each student’s rank by subject within the grade.
sf.sql('''
SELECT
class, student, subject, rank(score)+1 as grade_rank
FROM groupbyTable
CONTEXT BY subject
''', vars={'groupbyTable': groupbyTable})
In summary, GROUP BY returns aggregated row per group,making it suitable for generating grouped statistical reports, while CONTEXT BY retains the original data along with grouped results, which is ideal for time-series or individual-level analysis.
PARTITION BY#
The PARTITION BY clause is also supported by Swordfish. It can achieve similar functionality to CONTEXT BY. However, CONTEXT BY is generally more concise and intuitive.
Example: The above CONTEXT BY examples can also be written using window functions in SQL:
# Calculate the difference between each student’s score and the class average for each subject.
window_query = '''
SELECT
class,
student,
subject,
score - avg(score) OVER (PARTITION BY class, subject) AS deltas
FROM groupbyTable
ORDER BY class, subject
'''
sf.sql(window_query, vars = {'groupbyTable': groupbyTable})
# Calculate each student's rank by subject within the grade
sf.sql('''
SELECT
class, student, subject, rank() OVER (PARTITION BY subject order by score) AS grade_rank
FROM groupbyTable
''', vars={'groupbyTable': groupbyTable})
Data Pivoting#
PIVOT BY is similar to pandas.DataFrame.pivot, but more flexible and efficient. PIVOT BY rearranges one or more columns of a table along two dimensions and can be combined with data transformation or aggregation functions.
Example: Display the average score of each subject by class in a pivoted layout using PIVOT BY:
sf.sql('''
SELECT
avg(score) AS avg_score
FROM groupbyTable
PIVOT BY class, subject
''', vars={'groupbyTable': groupbyTable})
Table Joiners#
Swordfish SQL supports multiple types of table joiners.
# UNION
source_table1 = sf.table({'id': [1, 2, 3], 'name': ['Alice', 'Bob', 'Charlie'], 'category': ['tech', 'finance', 'tech']})
source_table2 = sf.table({'id': [4, 5, 6], 'name': ['Lily', 'Mary', 'Tom'], 'category': ['tech', 'finance', 'tech']})
sf.sql('''
SELECT *
FROM source_table1
UNION source_table2
''', vars={'source_table1': source_table1,'source_table2': source_table2})
# JOIN
source_table3 = sf.table({'name': ['Alice', 'Mary', 'Tom'],'age': [20, 5, 35]})
sf.sql('''
SELECT *
FROM source_table1 as t1
LEFT JOIN source_table3 as t3
ON t1.name = t3.name
''', vars={'source_table1': source_table1,'source_table3': source_table3})
In addition to LEFT JOIN, Swordfish supports multiple join types for different scenarios.
Join Type |
Description |
|---|---|
INNER JOIN |
Returns rows that satisfy the join condition in both tables, excluding non-matching rows. |
LEFT JOIN / LEFT OUTER |
Returns all rows from the left table and the matching rows from the right table. If there is no match in the right table, NULL is returned. |
RIGHT JOIN / RIGHT OUTER |
Returns all rows from the right table and the matching rows from the left table. If there is no match in the left table, NULL is returned. |
FULL JOIN / FULL OUTER |
Returns all rows from both tables. If there is no match, NULL is returned. |
EQUI JOIN |
A join based on equality conditions, usually using |
CROSS JOIN |
Returns all possible combinations of rows from the two tables, independent of any condition. |
ASOF JOIN |
Used for time-series data; the join condition is based on a time column. |
WINDOW JOIN |
A join performed using window functions, typically involving aggregation or analysis within a window of data. |
PREFIX JOIN |
A join based on matching column name prefixes, suitable when column names are similar but not identical. |
Beyond ANSI SQL syntax, Swordfish also provides a functional API for table joiners.
For example, LEFT JOIN can be performed using the lj function:
import swordfish.function as F
F.lj(source_table1 , source_table3, "name")
Compared with ANSI SQL syntax, the functional API is more concise and intuitive, making it suitable for users familiar with functional programming.