Table#
In the swordfish.data module, the function table is defined to construct a Table object.
The object is a vector constant and provides a set of methods for creating and manipulating tables.
It supports creating instances from various data structures such as dict and pandas.DataFrame.
Creating Tables#
Use the table function to create a table. It accepts either a dictionary or a DataFrame as input.
By default, column types are inferred from the field or DataFrame data types.
You can also explicitly set column types with the types parameter.
Create an empty table (the types parameter must be specified in this case).
import swordfish.function as F
table = sf.table(types={'sym': "STRING", 'val': "INT"}, size=0, capacity=100)
temp = pd.DataFrame({'sym': ["A", "B"], 'val': [3, 7]})
F.append_(table, temp)
table
""" output:
sym val
--- ---
A 3
B 7
"""
Create a table from a dictionary.
sym = ["a", "b", "a", "b", "a", "a", "b", "a"]
timestamp = F.timestamp(["2024.08.22 14:31:15", "2024.08.22 14:31:20",
"2024.08.22 14:31:30", "2024.08.22 14:31:30", "2024.08.22 14:31:50",
"2024.08.22 14:32:00", "2024.08.22 14:32:15", "2024.08.22 14:32:30"])
vol = [1, 2, 3, 6, 5, 4, 8, 5]
price = [11.5, 12.8, 13.5, 11.6, 13.8, 12.9, 11.8, 13.3]
table1 = sf.table({'sym': sym, 'timestamp': timestamp, 'vol': vol, 'price': price},
names=["sym", "timestamp", "vol", "price"],
types=["STRING", "TIMESTAMP", "INT", "DECIMAL32(2)"])
table1
""" output:
sym timestamp vol price
--- ----------------------- --- -----
a 2024.08.22T14:31:15.000 1 11.50
b 2024.08.22T14:31:20.000 2 12.80
a 2024.08.22T14:31:30.000 3 13.50
b 2024.08.22T14:31:30.000 6 11.60
a 2024.08.22T14:31:50.000 5 13.80
a 2024.08.22T14:32:00.000 4 12.90
b 2024.08.22T14:32:15.000 8 11.80
a 2024.08.22T14:32:30.000 5 13.30
"""
Create a table from a DataFrame.
# Import pandas
import pandas as pd
table2 = sf.table(pd.DataFrame({"sym": ["a", "b", "c"],
"timestamp": [pd.Timestamp("2024.08.22 14:31:30"),
pd.Timestamp("2024.08.22 14:31:35"),
pd.Timestamp("2024.08.22 14:31:40")],
"vol": [5, 6, 9], "price": [12.6, 13.2, 15.9]}))
table2
""" output:
sym timestamp vol price
--- ----------------------------- --- -----
a 2024.08.22T14:31:30.000000000 5 12.6
b 2024.08.22T14:31:35.000000000 6 13.2
c 2024.08.22T14:31:40.000000000 9 15.9
"""
Viewing Table Schema#
Use table.schema to view the schema of a table.
table1.schema()
""" output:
colDefs->name typeString typeInt extra comment
--------- ------------ ------- ----- -------
sym STRING 18
timestamp TIMESTAMP 12
vol INT 4
price DECIMAL32(2) 37 2
chunkPath->
partitionColumnIndex->-1
"""
Accessing Tables#
Use table[rowIndex] to access rows. rowIndex can be a scalar, a vector, or a range (pair). Indexing starts from 0.
If rowIndex is a scalar, the result is a dictionary representing a single row.
If rowIndex is a vector, the result is a table containing the specified rows.
If rowIndex is a range (e.g., 1:3), the upper bound is excluded. For example, 1:3 includes rows 1 and 2. 2:0 includes rows 1 and 0. The result is a table containing the specified rows.
table1[0]
""" output:
sym->a
timestamp->2024.08.22T14:31:15.000
vol->1
price->11.50
"""
table1[1:4]
""" output:
sym timestamp vol price
--- ----------------------- --- -----
b 2024.08.22T14:31:20.000 2 12.80
a 2024.08.22T14:31:30.000 3 13.50
b 2024.08.22T14:31:30.000 6 11.60
"""
table1[[0,3]]
""" output:
sym timestamp vol price
--- ----------------------- --- -----
a 2024.08.22T14:31:15.000 1 11.50
b 2024.08.22T14:31:30.000 6 11.60
"""
Use table[rowIndex, colIndex] to access rows and columns simultaneously, where both rowIndex and colIndex can be scalars, vectors, or ranges.
table1[0,2] # 1
table1[[0,3], [0,2]]
""" output:
sym vol
--- ---
a 1
b 6
"""
table1[0:3, 0:2]
""" output:
sym timestamp
--- -----------------------
a 2024.08.22T14:31:15.000
b 2024.08.22T14:31:20.000
a 2024.08.22T14:31:30.000
"""
Conditional access is supported.
table1[table1["vol"]>5]
""" output:
sym timestamp vol price
--- ----------------------- --- -----
b 2024.08.22T14:31:30.000 6 11.60
b 2024.08.22T14:32:15.000 8 11.80
"""
Iterate over rows with a for loop.
for i in table2:
print(i)
""" output:
sym->a
timestamp->2024.08.22T14:31:30.000000000
vol->5
price->12.6
sym->b
timestamp->2024.08.22T14:31:35.000000000
vol->6
price->13.2
sym->c
timestamp->2024.08.22T14:31:40.000000000
vol->9
price->15.9
"""
Operations on Tables#
Update a table conditionally.
table1["vol", table1["vol"]>6] = 15
table1
""" output:
sym timestamp vol price
--- ----------------------- --- -----
a 2024.08.22T14:31:15.000 1 11.50
b 2024.08.22T14:31:20.000 2 12.80
a 2024.08.22T14:31:30.000 3 13.50
b 2024.08.22T14:31:30.000 6 11.60
a 2024.08.22T14:31:50.000 5 13.80
a 2024.08.22T14:32:00.000 4 12.90
b 2024.08.22T14:32:15.000 15 11.80
a 2024.08.22T14:32:30.000 5 13.30
"""
Update values or add new columns with assignment statements.
table1[["vol", "id"]] = [[1, 2, 1, 2, 1, 2, 1, 2],[1, 2, 3, 4, 5, 6, 7, 8]]
""" output:
sym timestamp vol price id
--- ----------------------- --- ----- --
a 2024.08.22T14:31:15.000 1 11.50 1
b 2024.08.22T14:31:20.000 2 12.80 2
a 2024.08.22T14:31:30.000 1 13.50 3
b 2024.08.22T14:31:30.000 2 11.60 4
a 2024.08.22T14:31:50.000 1 13.80 5
a 2024.08.22T14:32:00.000 2 12.90 6
b 2024.08.22T14:32:15.000 1 11.80 7
a 2024.08.22T14:32:30.000 2 13.30 8
"""
Update tables with SQL statements (in Swordfish, SQL statements must be executed using the sql function).
sym = ["a", "b", "a", "b", "a", "a", "b", "a"]
timestamp = F.timestamp(["2024.08.22 14:31:15", "2024.08.22 14:31:20",
"2024.08.22 14:31:30", "2024.08.22 14:31:30", "2024.08.22 14:31:50",
"2024.08.22 14:32:00", "2024.08.22 14:32:15", "2024.08.22 14:32:30"])
vol = [1, 2, 3, 6, 5, 4, 8, 5]
price = [11.5, 12.8, 13.5, 11.6, 13.8, 12.9, 11.8, 13.3]
table = sf.table({'sym': sym, 'timestamp': timestamp, 'vol': vol, 'price': price},
names=["sym", "timestamp", "vol", "price"],
types=["STRING", "TIMESTAMP", "INT", "DECIMAL32(2)"])
sf.sql("UPDATE table SET price=13.60 WHERE price=13.30", vars={"table": table})
table
""" output:
sym timestamp vol price
--- ----------------------- --- -----
a 2024.08.22T14:31:15.000 1 11.50
b 2024.08.22T14:31:20.000 2 12.80
a 2024.08.22T14:31:30.000 3 13.50
b 2024.08.22T14:31:30.000 6 11.60
a 2024.08.22T14:31:50.000 5 13.80
a 2024.08.22T14:32:00.000 4 12.90
b 2024.08.22T14:32:15.000 8 11.80
a 2024.08.22T14:32:30.000 5 13.60
"""
Insert data into a table with
function.append_or theinsert intostatement.
F.append_(table1, pd.DataFrame({"sym": ["b", "c"],
"timestamp": [pd.Timestamp("2024.08.22 14:31:35"),
pd.Timestamp("2024.08.22 14:31:40")],
"vol": [6, 9], "price": [13.2, 15.9]}))
table1
""" output:
sym timestamp vol price
--- ----------------------- --- -----
a 2024.08.22T14:31:15.000 1 11.50
b 2024.08.22T14:31:20.000 2 12.80
a 2024.08.22T14:31:30.000 3 13.50
b 2024.08.22T14:31:30.000 6 11.60
a 2024.08.22T14:31:50.000 5 13.80
a 2024.08.22T14:32:00.000 4 12.90
b 2024.08.22T14:32:15.000 8 11.80
a 2024.08.22T14:32:30.000 5 13.30
b 2024.08.22T14:31:35.000 6 13.20
c 2024.08.22T14:31:40.000 9 15.90
"""
sf.sql("insert into table1 values('a', 2024.08.22T14:31:44.000, 10, 14.50)", vars={"table1": table1})
table1
""" output:
sym timestamp vol price
--- ----------------------- --- -----
a 2024.08.22T14:31:15.000 1 11.50
b 2024.08.22T14:31:20.000 2 12.80
a 2024.08.22T14:31:30.000 3 13.50
b 2024.08.22T14:31:30.000 6 11.60
a 2024.08.22T14:31:50.000 5 13.80
a 2024.08.22T14:32:00.000 4 12.90
b 2024.08.22T14:32:15.000 8 11.80
a 2024.08.22T14:32:30.000 5 13.30
b 2024.08.22T14:31:35.000 6 13.20
c 2024.08.22T14:31:40.000 9 15.90
a 2024.08.22T14:31:44.000 10 14.50
"""
Deleting Data from a Table#
Delete rows with the DELETE statement.
sf.sql("DELETE from table1 WHERE price>13", vars={"table1": table1})
table1
""" output:
sym timestamp vol price
--- ----------------------- --- -----
a 2024.08.22T14:31:15.000 1 11.50
b 2024.08.22T14:31:20.000 2 12.80
b 2024.08.22T14:31:30.000 6 11.60
a 2024.08.22T14:32:00.000 4 12.90
b 2024.08.22T14:32:15.000 8 11.80
"""
Table Joiners#
Table joiners in Swordfish are consistent with DolphinDB. This section demonstrates asof join as an example of performing joins in Swordfish.
# Create table1
sym = ["a", "b", "a", "b"]
timestamp = F.timestamp(["2024.08.22 14:31:15", "2024.08.22 14:31:20",
"2024.08.22 14:31:30", "2024.08.22 14:31:30"])
vol = [1, 2, 3, 6]
price = [11.5, 12.8, 13.5, 11.6]
table1= sf.table({'sym': sym, 'timestamp': timestamp, 'vol': vol, 'price': price},
names=["sym", "timestamp", "vol", "price"],
types=["STRING", "TIMESTAMP", "INT", "DECIMAL32(2)"])
# Create table2
sym1 = ["a", "a", "b", "a", "b"]
timestamp1 = F.timestamp(["2024.08.22 14:30:15", "2024.08.22 14:31:00",
"2024.08.22 14:31:00", "2024.08.22 14:31:20", "2024.08.22 14:31:20"])
vol1 = [4, 8, 5, 6, 9]
price1 = [12.9, 11.8, 13.3, 13.2, 15.9]
table2= sf.table({'sym': sym1, 'timestamp': timestamp1, 'vol': vol1, 'price': price1},
names=["sym", "timestamp", "vol", "price"],
types=["STRING", "TIMESTAMP", "INT", "DECIMAL32(2)"])
# table1 asof join table2
sf.sql("SELECT * from aj(table1, table2, 'timestamp')", vars={"table1": table1, "table2": table2})
""" output:
sym timestamp vol price table2_sym table2_timestamp table2_vol table2_price
--- ----------------------- --- ----- ---------- ----------------------- ---------- ------------
a 2024.08.22T14:31:15.000 1 11.50 b 2024.08.22T14:31:00.000 5 13.30
b 2024.08.22T14:31:20.000 2 12.80 b 2024.08.22T14:31:20.000 9 15.90
a 2024.08.22T14:31:30.000 3 13.50 b 2024.08.22T14:31:20.000 9 15.90
b 2024.08.22T14:31:30.000 6 11.60 b 2024.08.22T14:31:20.000 9 15.90
"""