Getting started#

This document guides you through the installation process of Python Swordfish and introduces its core features.

Installation#

Prerequisites#

  • Operating system: Linux x86-64 and Windows amd64

  • Python environment: Anaconda Distribution recommended, which includes Python and common libraries like future, NumPy, and pandas.

  • Version requirements:

    • Python: 3.8 to 3.12

    • Numpy: 1.18 or higher

    • pandas: 1.0.0 or higher (excluding 1.3.0)

    • future

    • packaging

    • toml

Installing with pip#

  • To install the latest version of Python Swordfish from PyPI:

pip install pyswordfish

Initialization#

Contact technical support for a valid license before using Swordfish.

The search order for dolphindb.lic (from highest to lowest priority) is as follows:

  1. The path specified by the environment variable SWORDFISH_PYTHON_LICENSE_PATH.

  2. The path specified in the license field of the configuration file swordfish.toml (this file must be created manually). Click here for details.

  3. <home>/dolphindb.lic (default path). The <home> directory is determined in the following order:

    1. Environment variable SWORDFISH_PYTHON_HOME_PATH

    2. The home configuration in swordfish.toml

    3. The current working directory

Swordfish is automatically initialized upon import. Please ensure the license path is correctly configured before importing.

>>> import swordfish as sf

Tables#

Create Tables#

This example demonstrates how to create a Swordfish Table from a dict of lists:

>>> import swordfish as sf
>>> sym = ["a", "b", "a", "b", "a"]
>>> timestamp = sf.vector([
...    "2024.08.22 14:31:15",
...    "2024.08.22 14:31:20",
...    "2024.08.22 14:31:30",
...    "2024.08.22 14:31:30",
...    "2024.08.22 14:31:50",
... ], type="TIMESTAMP")
>>> vol = [1, 2, 3, 6, 8]
>>> price = [11.5, 12.8, 13.5, 11.6, 13.8]
>>> t = sf.table({
...     'sym': sym,
...     'timestamp': timestamp,
...     'vol': vol,
...     'price': price,
... })
>>> t
sym timestamp               vol price
--- ----------------------- --- ------------------
a   2024.08.22T14:31:15.000 1   11.5
b   2024.08.22T14:31:20.000 2   12.8
a   2024.08.22T14:31:30.000 3   13.5
b   2024.08.22T14:31:30.000 6   11.6
a   2024.08.22T14:31:50.000 8   13.8

You can create a Table by specifying the column names and data types.

>>> sym = ["a", "b", "a", "b", "a"]
>>> timestamp = sf.vector([
...     "2024.08.22 14:31:15",
...     "2024.08.22 14:31:20",
...     "2024.08.22 14:31:30",
...     "2024.08.22 14:31:30",
...     "2024.08.22 14:31:50",
... ], type="TIMESTAMP")
>>> vol = [1, 2, 3, 6, 8]
>>> price = [11.5, 12.8, 13.5, 11.6, 13.8]
>>> t = sf.table(
...     {
...         'sym': sym,
...         'timestamp': timestamp,
...         'vol': vol,
...         'price': price,
...     },
...     names=['sym', 'timestamp', 'vol', 'price'],
...     types=['STRING', 'TIMESTAMP', 'INT', 'DOUBLE'],
... )

You can create a Table from a DataFrame.

>>> import pandas as pd
>>> df = pd.DataFrame({
...     "name": ["a", "b", "c", "b", "c"],
...     "id": [12, 13, 15, 14, 16],
... })
>>> t = sf.table(df)
>>> t
name id
---- -----
a    12
b    13
c    15
b    14
c    16

Subsequent examples in this document will use this created table for further operations.

Inspect Data#

The head and tail methods of a Table return the first or last few rows respectively.

>>> t.head(2)
sym timestamp               vol price
--- ----------------------- --- -----
a   2024.08.22T14:31:15.000 1   11.5
b   2024.08.22T14:31:20.000 2   12.8

The above code retrieves the first two rows from the table ‘t’.

Retrieve the last two rows from the table ‘t’.

>>> t.tail(2)
sym timestamp               vol price
--- ----------------------- --- -----
b   2024.08.22T14:31:30.000 6   11.6
a   2024.08.22T14:31:50.000 8   13.8

The count method returns the number of rows in a Table, while summary provides descriptive statistics.

Retrieve the total number of rows in the table ‘t’.

>>> t.count()
Int(5)

Retrieve the summary statistics of the table ‘t’, such as the count, mean, minimum, maximum, and other aggregate information for each column.

>>> t.summary()
name  min  max  nonNullCount count avg   std               percentile
----- ---- ---- ------------ ----- ----- ----------------- ----------------------------------------------------------
vol   1    8    5            5     4     2.91547594742265  [1.999999523305699,3.000002384471506,5.999997616528495]
price 11.5 13.8 5            5     12.64 1.059716943339125 [11.600000524564352,12.800000238983796,13.500000620757871]

The sortBy_ method sorts the records based on specified column(s).

>>> t.sortBy_("sym")
sym timestamp               vol price
--- ----------------------- --- ------------------
a   2024.08.22T14:31:15.000 1   11.5
a   2024.08.22T14:31:30.000 3   13.5
a   2024.08.22T14:31:50.000 8   13.8
b   2024.08.22T14:31:20.000 2   12.8
b   2024.08.22T14:31:30.000 6   11.6

Access Data#

Column Access: Use Table[colLabel] or Table.colLabel to retrieve one or more columns as Vector(s).

>>> t["sym"]
Vector(["a","a","a","b","b"], type=STRING)
>>> t.sym
Vector(["a","a","a","b","b"], type=STRING)

Row Access: Use Table[rowIndex]. When rowIndex is an integer, returns a Dictionary. When rowIndex is a slice or list, returns a Table.

>>> t[0]
Dictionary(
sym->a
timestamp->2024.08.22T14:31:15.000
vol->1
price->11.5
, key_type=STRING, val_type=ANY)

>>> t[1:3]
sym timestamp               vol price
--- ----------------------- --- -----
b   2024.08.22T14:31:20.000 2   12.8
a   2024.08.22T14:31:30.000 3   13.5

Join Tables#

To join two Tables, use the join function from the swordfish.function module.

>>> t1 = sf.table({
...     'sym': ["a", "b", "a", "b", "a"],
...     'price': [11.5, 12.8, 13.5, 11.6, 13.8],
... })
>>> t2 = sf.table({
...    'name': ["a", "b", "c", "b", "c"],
...    'val': [12, 13, 15, 14, 16],
... })
>>> t1
sym price
--- ------------------
a   11.5
b   12.8
a   13.5
b   11.599999999999999
a   13.8

>>> t2
name val
---- -----
a    12
b    13
c    15
b    14
c    16

Call the join function from swordfish.function to join t1 and t2.

>>> import swordfish.function as F
>>> F.join(t1, t2)
sym price              name val
--- ------------------ ---- -----
a   11.5               a    12
b   12.8               b    13
a   13.5               c    15
b   11.599999999999999 b    14
a   13.8               c    16

SQL Operations#

In Swordfish, SQL statements are executed using the sql function. This section demonstrates basic operations on tables using the sql function.

  • Selecting the “val” column from t2:

>>> sf.sql("SELECT val FROM t", vars={"t": t2})
val
-----
12
13
15
14
16
  • Filtering data with a WHERE clause:

>>> sf.sql("SELECT valt FROM t WHERE val>14", vars={"t": t2})
val
-----
15
16
  • Updating table data using an UPDATE statement:

>>> sf.sql("UPDATE t SET val=10 WHERE val=14", vars={"t": t2})
name val
---- -----
a    12
b    13
c    15
b    10
c    16
  • Deleting a row from the table using a DELETE statement:

>>> sf.sql("DELETE from t WHERE val=16", vars={"t": t2})
name val
---- -----
a    12
b    13
c    15
b    14
  • Using GROUP BY to count the number of entries for each “sym” in t1:

>>> sf.sql("SELECT count(sym) as counts FROM t GROUP BY sym", vars={"t": t1})
sym counts
--- ------
a   3
b   2
  • Joining two tables using LEFT JOIN:

>>> sf.sql("SELECT * FROM ta LEFT JOIN tb on ta.sym = tb.name",
...     vars={"ta": t1, "tb": t2})
sym price              val
--- ------------------ -----
a   11.5               12
b   12.8               13
b   12.8               14
a   13.5               12
b   11.599999999999999 13
b   11.599999999999999 14
a   13.8               12

Matrices#

Create Matrices#

This example demonstrates how to create a Swordfish Matrix from a list of lists:

>>> m = sf.matrix([[1, 2, 3], [0, 1, 4], [5, 6, 0]])
>>> m
Matrix(
#0 #1 #2
-- -- --
1  0  5
2  1  6
3  4  0
, type=LONG)

Calculate with Matrices#

Import the swordfish.function module for Matrix operations:

>>> import swordfish.function as F
  • Column-wise Max and Min

>>> F.max(m)
Vector([3,4,6], type=LONG)
>>> F.min(m)
Vector([1,0,0], type=LONG)
  • Matrix Transposition

>>> F.transpose(m)
Matrix(
#0 #1 #2
-- -- --
1  2  3
0  1  4
5  6  0
, type=LONG)
  • Inversion

>>> F.inverse(m)
Matrix(
#0                  #1                  #2
------------------- ------------------- ------------------
-24.000000000000088 20.000000000000074  -5.000000000000021
18.000000000000067  -15.000000000000055 4.000000000000014
5.000000000000018   -4.000000000000015  1.000000000000003
, type=DOUBLE)

File Exports and Imports#

To save a Table as a CSV file:

>>> t = sf.table({
...     'sym': ["a", "b", "a", "b", "a"],
...     'vol': [1, 2, 3, 6, 8],
...     'price': [11.5, 12.8, 13.5, 11.6, 13.8],
... })
>>> import swordfish.function as F
>>> F.saveText(t, "./data.csv")

To load a Table from a CSV file:

>>> F.loadText(t, "./data.csv")
sym vol price
--- --- ------------------
a   1   11.5
b   2   12.8
a   3   13.5
b   6   11.599999999999999
a   8   13.8