Table#

In the swordfish.data module, the function table is defined to construct a Table object. The object is a vector constant and provides a set of methods for creating and manipulating tables. It supports creating instances from various data structures such as dict and pandas.DataFrame.

Creating Tables#

Use the table function to create a table. It accepts either a dictionary or a DataFrame as input. By default, column types are inferred from the field or DataFrame data types. You can also explicitly set column types with the types parameter.

  • Create an empty table (the types parameter must be specified in this case).

import swordfish.function as F
table = sf.table(types={'sym': "STRING", 'val': "INT"}, size=0, capacity=100)
temp = pd.DataFrame({'sym': ["A", "B"], 'val': [3, 7]})

F.append_(table, temp)
table

""" output:
sym val
--- ---
A   3
B   7

"""
  • Create a table from a dictionary.

sym = ["a", "b", "a", "b", "a", "a", "b", "a"]
timestamp = F.timestamp(["2024.08.22 14:31:15", "2024.08.22 14:31:20",
                        "2024.08.22 14:31:30", "2024.08.22 14:31:30", "2024.08.22 14:31:50",
                        "2024.08.22 14:32:00", "2024.08.22 14:32:15", "2024.08.22 14:32:30"])
vol = [1, 2, 3, 6, 5, 4, 8, 5]
price = [11.5, 12.8, 13.5, 11.6, 13.8, 12.9, 11.8, 13.3]


table1 = sf.table({'sym': sym, 'timestamp': timestamp, 'vol': vol, 'price': price},
                names=["sym", "timestamp", "vol", "price"],
                types=["STRING", "TIMESTAMP", "INT", "DECIMAL32(2)"])
table1

""" output:
sym timestamp               vol price
--- ----------------------- --- -----
a   2024.08.22T14:31:15.000 1   11.50
b   2024.08.22T14:31:20.000 2   12.80
a   2024.08.22T14:31:30.000 3   13.50
b   2024.08.22T14:31:30.000 6   11.60
a   2024.08.22T14:31:50.000 5   13.80
a   2024.08.22T14:32:00.000 4   12.90
b   2024.08.22T14:32:15.000 8   11.80
a   2024.08.22T14:32:30.000 5   13.30
"""
  • Create a table from a DataFrame.

# Import pandas
import pandas as pd
table2 = sf.table(pd.DataFrame({"sym": ["a", "b", "c"],
                                "timestamp": [pd.Timestamp("2024.08.22 14:31:30"),
                                            pd.Timestamp("2024.08.22 14:31:35"),
                                            pd.Timestamp("2024.08.22 14:31:40")],
                                "vol": [5, 6, 9], "price": [12.6, 13.2, 15.9]}))
table2

""" output:
sym timestamp                     vol price
--- ----------------------------- --- -----
a   2024.08.22T14:31:30.000000000 5   12.6
b   2024.08.22T14:31:35.000000000 6   13.2
c   2024.08.22T14:31:40.000000000 9   15.9
"""

Viewing Table Schema#

Use table.schema to view the schema of a table.

table1.schema()

""" output:

colDefs->name      typeString   typeInt extra comment
--------- ------------ ------- ----- -------
sym       STRING       18
timestamp TIMESTAMP    12
vol       INT          4
price     DECIMAL32(2) 37      2

chunkPath->
partitionColumnIndex->-1

"""

Accessing Tables#

Use table[rowIndex] to access rows. rowIndex can be a scalar, a vector, or a range (pair). Indexing starts from 0.

  • If rowIndex is a scalar, the result is a dictionary representing a single row.

  • If rowIndex is a vector, the result is a table containing the specified rows.

  • If rowIndex is a range (e.g., 1:3), the upper bound is excluded. For example, 1:3 includes rows 1 and 2. 2:0 includes rows 1 and 0. The result is a table containing the specified rows.

table1[0]

""" output:
sym->a
timestamp->2024.08.22T14:31:15.000
vol->1
price->11.50
"""

table1[1:4]

""" output:
sym timestamp               vol price
--- ----------------------- --- -----
b   2024.08.22T14:31:20.000 2   12.80
a   2024.08.22T14:31:30.000 3   13.50
b   2024.08.22T14:31:30.000 6   11.60
"""

table1[[0,3]]

""" output:
sym timestamp               vol price
--- ----------------------- --- -----
a   2024.08.22T14:31:15.000 1   11.50
b   2024.08.22T14:31:30.000 6   11.60
"""

Use table[rowIndex, colIndex] to access rows and columns simultaneously, where both rowIndex and colIndex can be scalars, vectors, or ranges.

table1[0,2]            # 1

table1[[0,3], [0,2]]

""" output:
sym vol
--- ---
a   1
b   6
"""

table1[0:3, 0:2]

""" output:
sym timestamp
--- -----------------------
a   2024.08.22T14:31:15.000
b   2024.08.22T14:31:20.000
a   2024.08.22T14:31:30.000
"""

Conditional access is supported.

table1[table1["vol"]>5]

""" output:
sym timestamp               vol price
--- ----------------------- --- -----
b   2024.08.22T14:31:30.000 6   11.60
b   2024.08.22T14:32:15.000 8   11.80

"""

Iterate over rows with a for loop.

for i in table2:
    print(i)
""" output:
sym->a
timestamp->2024.08.22T14:31:30.000000000
vol->5
price->12.6

sym->b
timestamp->2024.08.22T14:31:35.000000000
vol->6
price->13.2

sym->c
timestamp->2024.08.22T14:31:40.000000000
vol->9
price->15.9
"""

Operations on Tables#

  • Update a table conditionally.

table1["vol", table1["vol"]>6] = 15
table1

""" output:
sym timestamp               vol price
--- ----------------------- --- -----
a   2024.08.22T14:31:15.000 1   11.50
b   2024.08.22T14:31:20.000 2   12.80
a   2024.08.22T14:31:30.000 3   13.50
b   2024.08.22T14:31:30.000 6   11.60
a   2024.08.22T14:31:50.000 5   13.80
a   2024.08.22T14:32:00.000 4   12.90
b   2024.08.22T14:32:15.000 15  11.80
a   2024.08.22T14:32:30.000 5   13.30

"""
  • Update values or add new columns with assignment statements.

table1[["vol", "id"]] = [[1, 2, 1, 2, 1, 2, 1, 2],[1, 2, 3, 4, 5, 6, 7, 8]]

""" output:
sym timestamp               vol price id
--- ----------------------- --- ----- --
a   2024.08.22T14:31:15.000 1   11.50 1
b   2024.08.22T14:31:20.000 2   12.80 2
a   2024.08.22T14:31:30.000 1   13.50 3
b   2024.08.22T14:31:30.000 2   11.60 4
a   2024.08.22T14:31:50.000 1   13.80 5
a   2024.08.22T14:32:00.000 2   12.90 6
b   2024.08.22T14:32:15.000 1   11.80 7
a   2024.08.22T14:32:30.000 2   13.30 8

"""
  • Update tables with SQL statements (in Swordfish, SQL statements must be executed using the sql function).

sym = ["a", "b", "a", "b", "a", "a", "b", "a"]
timestamp = F.timestamp(["2024.08.22 14:31:15", "2024.08.22 14:31:20",
                        "2024.08.22 14:31:30", "2024.08.22 14:31:30", "2024.08.22 14:31:50",
                        "2024.08.22 14:32:00", "2024.08.22 14:32:15", "2024.08.22 14:32:30"])
vol = [1, 2, 3, 6, 5, 4, 8, 5]
price = [11.5, 12.8, 13.5, 11.6, 13.8, 12.9, 11.8, 13.3]


table = sf.table({'sym': sym, 'timestamp': timestamp, 'vol': vol, 'price': price},
                names=["sym", "timestamp", "vol", "price"],
                types=["STRING", "TIMESTAMP", "INT", "DECIMAL32(2)"])

sf.sql("UPDATE table SET price=13.60 WHERE price=13.30", vars={"table": table})
table

""" output:
sym timestamp               vol price
--- ----------------------- --- -----
a   2024.08.22T14:31:15.000 1   11.50
b   2024.08.22T14:31:20.000 2   12.80
a   2024.08.22T14:31:30.000 3   13.50
b   2024.08.22T14:31:30.000 6   11.60
a   2024.08.22T14:31:50.000 5   13.80
a   2024.08.22T14:32:00.000 4   12.90
b   2024.08.22T14:32:15.000 8   11.80
a   2024.08.22T14:32:30.000 5   13.60
"""
  • Insert data into a table with function.append_ or the insert into statement.

F.append_(table1, pd.DataFrame({"sym": ["b", "c"],
                                "timestamp": [pd.Timestamp("2024.08.22 14:31:35"),
                                            pd.Timestamp("2024.08.22 14:31:40")],
                                "vol": [6, 9], "price": [13.2, 15.9]}))
table1

""" output:
sym timestamp               vol price
--- ----------------------- --- -----
a   2024.08.22T14:31:15.000 1   11.50
b   2024.08.22T14:31:20.000 2   12.80
a   2024.08.22T14:31:30.000 3   13.50
b   2024.08.22T14:31:30.000 6   11.60
a   2024.08.22T14:31:50.000 5   13.80
a   2024.08.22T14:32:00.000 4   12.90
b   2024.08.22T14:32:15.000 8   11.80
a   2024.08.22T14:32:30.000 5   13.30
b   2024.08.22T14:31:35.000 6   13.20
c   2024.08.22T14:31:40.000 9   15.90

"""

sf.sql("insert into table1 values('a', 2024.08.22T14:31:44.000, 10, 14.50)", vars={"table1": table1})
table1

""" output:
sym timestamp               vol price
--- ----------------------- --- -----
a   2024.08.22T14:31:15.000 1   11.50
b   2024.08.22T14:31:20.000 2   12.80
a   2024.08.22T14:31:30.000 3   13.50
b   2024.08.22T14:31:30.000 6   11.60
a   2024.08.22T14:31:50.000 5   13.80
a   2024.08.22T14:32:00.000 4   12.90
b   2024.08.22T14:32:15.000 8   11.80
a   2024.08.22T14:32:30.000 5   13.30
b   2024.08.22T14:31:35.000 6   13.20
c   2024.08.22T14:31:40.000 9   15.90
a   2024.08.22T14:31:44.000 10  14.50
"""

Deleting Data from a Table#

Delete rows with the DELETE statement.

sf.sql("DELETE from table1 WHERE price>13", vars={"table1": table1})

table1

""" output:
sym timestamp               vol price
--- ----------------------- --- -----
a   2024.08.22T14:31:15.000 1   11.50
b   2024.08.22T14:31:20.000 2   12.80
b   2024.08.22T14:31:30.000 6   11.60
a   2024.08.22T14:32:00.000 4   12.90
b   2024.08.22T14:32:15.000 8   11.80
"""

Table Joiners#

Table joiners in Swordfish are consistent with DolphinDB. This section demonstrates asof join as an example of performing joins in Swordfish.

# Create table1
sym = ["a", "b", "a", "b"]
timestamp = F.timestamp(["2024.08.22 14:31:15", "2024.08.22 14:31:20",
                        "2024.08.22 14:31:30", "2024.08.22 14:31:30"])
vol = [1, 2, 3, 6]
price = [11.5, 12.8, 13.5, 11.6]


table1= sf.table({'sym': sym, 'timestamp': timestamp, 'vol': vol, 'price': price},
                names=["sym", "timestamp", "vol", "price"],
                types=["STRING", "TIMESTAMP", "INT", "DECIMAL32(2)"])

# Create table2
sym1 = ["a", "a", "b", "a", "b"]
timestamp1 = F.timestamp(["2024.08.22 14:30:15", "2024.08.22 14:31:00",
                        "2024.08.22 14:31:00", "2024.08.22 14:31:20", "2024.08.22 14:31:20"])
vol1 = [4, 8, 5, 6, 9]
price1 = [12.9, 11.8, 13.3, 13.2, 15.9]

table2= sf.table({'sym': sym1, 'timestamp': timestamp1, 'vol': vol1, 'price': price1},
                names=["sym", "timestamp", "vol", "price"],
                types=["STRING", "TIMESTAMP", "INT", "DECIMAL32(2)"])

# table1 asof join table2
sf.sql("SELECT * from aj(table1, table2, 'timestamp')", vars={"table1": table1, "table2": table2})

""" output:
sym timestamp               vol price table2_sym table2_timestamp        table2_vol table2_price
--- ----------------------- --- ----- ---------- ----------------------- ---------- ------------
a   2024.08.22T14:31:15.000 1   11.50 b          2024.08.22T14:31:00.000 5          13.30
b   2024.08.22T14:31:20.000 2   12.80 b          2024.08.22T14:31:20.000 9          15.90
a   2024.08.22T14:31:30.000 3   13.50 b          2024.08.22T14:31:20.000 9          15.90
b   2024.08.22T14:31:30.000 6   11.60 b          2024.08.22T14:31:20.000 9          15.90
"""