Getting started#
This document guides you through the installation process of Python Swordfish and introduces its core features.
Installation#
Prerequisites#
Operating system: Linux x86-64 and Windows amd64
Python environment: Anaconda Distribution recommended, which includes Python and common libraries like future, NumPy, and pandas.
Version requirements:
Python: 3.8 to 3.12
Numpy: 1.18 or higher
pandas: 1.0.0 or higher (excluding 1.3.0)
future
packaging
toml
Installing with pip#
To install the latest version of Python Swordfish from PyPI:
pip install pyswordfish
Initialization#
Contact technical support for a valid license before using Swordfish.
The search order for dolphindb.lic (from highest to lowest priority) is as follows:
The path specified by the environment variable
SWORDFISH_PYTHON_LICENSE_PATH.The path specified in the license field of the configuration file swordfish.toml (this file must be created manually). Click here for details.
<home>/dolphindb.lic (default path). The <home> directory is determined in the following order:
Environment variable
SWORDFISH_PYTHON_HOME_PATHThe home configuration in swordfish.toml
The current working directory
Swordfish is automatically initialized upon import. Please ensure the license path is correctly configured before importing.
>>> import swordfish as sf
Tables#
Create Tables#
This example demonstrates how to create a Swordfish Table from a dict of lists:
>>> import swordfish as sf
>>> sym = ["a", "b", "a", "b", "a"]
>>> timestamp = sf.vector([
... "2024.08.22 14:31:15",
... "2024.08.22 14:31:20",
... "2024.08.22 14:31:30",
... "2024.08.22 14:31:30",
... "2024.08.22 14:31:50",
... ], type="TIMESTAMP")
>>> vol = [1, 2, 3, 6, 8]
>>> price = [11.5, 12.8, 13.5, 11.6, 13.8]
>>> t = sf.table({
... 'sym': sym,
... 'timestamp': timestamp,
... 'vol': vol,
... 'price': price,
... })
>>> t
sym timestamp vol price
--- ----------------------- --- ------------------
a 2024.08.22T14:31:15.000 1 11.5
b 2024.08.22T14:31:20.000 2 12.8
a 2024.08.22T14:31:30.000 3 13.5
b 2024.08.22T14:31:30.000 6 11.6
a 2024.08.22T14:31:50.000 8 13.8
You can create a Table by specifying the column names and data types.
>>> sym = ["a", "b", "a", "b", "a"]
>>> timestamp = sf.vector([
... "2024.08.22 14:31:15",
... "2024.08.22 14:31:20",
... "2024.08.22 14:31:30",
... "2024.08.22 14:31:30",
... "2024.08.22 14:31:50",
... ], type="TIMESTAMP")
>>> vol = [1, 2, 3, 6, 8]
>>> price = [11.5, 12.8, 13.5, 11.6, 13.8]
>>> t = sf.table(
... {
... 'sym': sym,
... 'timestamp': timestamp,
... 'vol': vol,
... 'price': price,
... },
... names=['sym', 'timestamp', 'vol', 'price'],
... types=['STRING', 'TIMESTAMP', 'INT', 'DOUBLE'],
... )
You can create a Table from a DataFrame.
>>> import pandas as pd
>>> df = pd.DataFrame({
... "name": ["a", "b", "c", "b", "c"],
... "id": [12, 13, 15, 14, 16],
... })
>>> t = sf.table(df)
>>> t
name id
---- -----
a 12
b 13
c 15
b 14
c 16
Subsequent examples in this document will use this created table for further operations.
Inspect Data#
The head and tail methods of a Table return the first or last few rows respectively.
>>> t.head(2)
sym timestamp vol price
--- ----------------------- --- -----
a 2024.08.22T14:31:15.000 1 11.5
b 2024.08.22T14:31:20.000 2 12.8
The above code retrieves the first two rows from the table ‘t’.
Retrieve the last two rows from the table ‘t’.
>>> t.tail(2)
sym timestamp vol price
--- ----------------------- --- -----
b 2024.08.22T14:31:30.000 6 11.6
a 2024.08.22T14:31:50.000 8 13.8
The count method returns the number of rows in a Table, while summary provides descriptive statistics.
Retrieve the total number of rows in the table ‘t’.
>>> t.count()
Int(5)
Retrieve the summary statistics of the table ‘t’, such as the count, mean, minimum, maximum, and other aggregate information for each column.
>>> t.summary()
name min max nonNullCount count avg std percentile
----- ---- ---- ------------ ----- ----- ----------------- ----------------------------------------------------------
vol 1 8 5 5 4 2.91547594742265 [1.999999523305699,3.000002384471506,5.999997616528495]
price 11.5 13.8 5 5 12.64 1.059716943339125 [11.600000524564352,12.800000238983796,13.500000620757871]
The sortBy_ method sorts the records based on specified column(s).
>>> t.sortBy_("sym")
sym timestamp vol price
--- ----------------------- --- ------------------
a 2024.08.22T14:31:15.000 1 11.5
a 2024.08.22T14:31:30.000 3 13.5
a 2024.08.22T14:31:50.000 8 13.8
b 2024.08.22T14:31:20.000 2 12.8
b 2024.08.22T14:31:30.000 6 11.6
Access Data#
Column Access: Use Table[colLabel] or Table.colLabel to retrieve one or more columns as Vector(s).
>>> t["sym"]
Vector(["a","a","a","b","b"], type=STRING)
>>> t.sym
Vector(["a","a","a","b","b"], type=STRING)
Row Access: Use Table[rowIndex]. When rowIndex is an integer, returns a Dictionary. When rowIndex is a slice or list, returns a Table.
>>> t[0]
Dictionary(
sym->a
timestamp->2024.08.22T14:31:15.000
vol->1
price->11.5
, key_type=STRING, val_type=ANY)
>>> t[1:3]
sym timestamp vol price
--- ----------------------- --- -----
b 2024.08.22T14:31:20.000 2 12.8
a 2024.08.22T14:31:30.000 3 13.5
Join Tables#
To join two Tables, use the join function from the swordfish.function module.
>>> t1 = sf.table({
... 'sym': ["a", "b", "a", "b", "a"],
... 'price': [11.5, 12.8, 13.5, 11.6, 13.8],
... })
>>> t2 = sf.table({
... 'name': ["a", "b", "c", "b", "c"],
... 'val': [12, 13, 15, 14, 16],
... })
>>> t1
sym price
--- ------------------
a 11.5
b 12.8
a 13.5
b 11.599999999999999
a 13.8
>>> t2
name val
---- -----
a 12
b 13
c 15
b 14
c 16
Call the join function from swordfish.function to join t1 and t2.
>>> import swordfish.function as F
>>> F.join(t1, t2)
sym price name val
--- ------------------ ---- -----
a 11.5 a 12
b 12.8 b 13
a 13.5 c 15
b 11.599999999999999 b 14
a 13.8 c 16
SQL Operations#
In Swordfish, SQL statements are executed using the sql function. This section demonstrates basic operations on tables using the sql function.
Selecting the “val” column from t2:
>>> sf.sql("SELECT val FROM t", vars={"t": t2})
val
-----
12
13
15
14
16
Filtering data with a WHERE clause:
>>> sf.sql("SELECT valt FROM t WHERE val>14", vars={"t": t2})
val
-----
15
16
Updating table data using an UPDATE statement:
>>> sf.sql("UPDATE t SET val=10 WHERE val=14", vars={"t": t2})
name val
---- -----
a 12
b 13
c 15
b 10
c 16
Deleting a row from the table using a DELETE statement:
>>> sf.sql("DELETE from t WHERE val=16", vars={"t": t2})
name val
---- -----
a 12
b 13
c 15
b 14
Using GROUP BY to count the number of entries for each “sym” in t1:
>>> sf.sql("SELECT count(sym) as counts FROM t GROUP BY sym", vars={"t": t1})
sym counts
--- ------
a 3
b 2
Joining two tables using LEFT JOIN:
>>> sf.sql("SELECT * FROM ta LEFT JOIN tb on ta.sym = tb.name",
... vars={"ta": t1, "tb": t2})
sym price val
--- ------------------ -----
a 11.5 12
b 12.8 13
b 12.8 14
a 13.5 12
b 11.599999999999999 13
b 11.599999999999999 14
a 13.8 12
Matrices#
Create Matrices#
This example demonstrates how to create a Swordfish Matrix from a list of lists:
>>> m = sf.matrix([[1, 2, 3], [0, 1, 4], [5, 6, 0]])
>>> m
Matrix(
#0 #1 #2
-- -- --
1 0 5
2 1 6
3 4 0
, type=LONG)
Calculate with Matrices#
Import the swordfish.function module for Matrix operations:
>>> import swordfish.function as F
Column-wise Max and Min
>>> F.max(m)
Vector([3,4,6], type=LONG)
>>> F.min(m)
Vector([1,0,0], type=LONG)
Matrix Transposition
>>> F.transpose(m)
Matrix(
#0 #1 #2
-- -- --
1 2 3
0 1 4
5 6 0
, type=LONG)
Inversion
>>> F.inverse(m)
Matrix(
#0 #1 #2
------------------- ------------------- ------------------
-24.000000000000088 20.000000000000074 -5.000000000000021
18.000000000000067 -15.000000000000055 4.000000000000014
5.000000000000018 -4.000000000000015 1.000000000000003
, type=DOUBLE)
File Exports and Imports#
To save a Table as a CSV file:
>>> t = sf.table({
... 'sym': ["a", "b", "a", "b", "a"],
... 'vol': [1, 2, 3, 6, 8],
... 'price': [11.5, 12.8, 13.5, 11.6, 13.8],
... })
>>> import swordfish.function as F
>>> F.saveText(t, "./data.csv")
To load a Table from a CSV file:
>>> F.loadText(t, "./data.csv")
sym vol price
--- --- ------------------
a 1 11.5
b 2 12.8
a 3 13.5
b 6 11.599999999999999
a 8 13.8