Guide to Swordfish API Functions#

The Swordfish.function module encapsulates DolphinDB’s built-in functions. In this chapter, we will introduce the module’s structure, usage patterns, and best practices, and demonstrate how to efficiently use these functions.

All functions in the module inherit from the FunctionDef class under the Scalar hierarchy and implement the __call__ method. They can be invoked uniformly using the function.<function> syntax and are compatible with Python data objects such as numpy.ndarray and pandas.Series/DataFrame.

Functions accept two types of input parameters:

  • If a Constant object is passed, the function operates directly on that object.

  • If a Python object is passed, Swordfish implicitly converts it into an internal Constant object before execution.

This calling method is convenient for rapid prototyping or single-execution scenarios. Regardless of the input type, the function always returns an object of Swordfish type.

Note that while implicit conversion simplifies the call process, it may introduce additional overhead in high-frequency scenarios such as loops or streaming computations. To improve performance, it is recommended to explicitly construct Swordfish data structures using methods like sf.matrix or sf.table (see the “Data Types” chapter) to avoid repeated conversions.

Before using these functions, import the swordfish package:

import swordfish as sf
import swordfish.function as F

Calling Functions in a Library#

To call a function in a function library, specify the library and function name.

The functions in Swordfish can be invoked using the F.<function> syntax. You can pass Python objects directly (e.g., a NumPy matrix), and Swordfish will automatically perform implicit type conversion, transforming them into internal Swordfish objects for computation. For example, to compute the correlation matrix of a NumPy matrix:

import numpy as np
F.corrMatrix(np.matrix([[1,2,3], [4,5,6], [7,8,9]]))

This approach is suitable for single computations or rapid prototyping. However, frequent calls may incur additional overhead due to implicit conversion. For high-frequency scenarios, such as loops or streaming computations, it is recommended to explicitly convert data objects using functions like sf.matrix or sf.table and operate directly on native Swordfish data structures to improve performance:

swordfish_matrix = sf.matrix(np_matrix)  # Explicit conversion
F.corrMatrix(swordfish_matrix)           # Avoid repeated conversion overhead

Code Completion#

To facilitate usage, the Swordfish function library supports convenient code completion features, including function name completion and parameter hints.

  • Function Name Completion

When typing F.<function> to specify a function, the IDE will display a dropdown list for easy selection. The displayed list includes all functions built into the library.

  • Function Parameter Completion

When typing F.<function>(), the IDE will provide hints for the required input parameters. For example, after typing F.wj(), the IDE suggests parameters such as leftTable and rightTable to guide users in passing the correct arguments.

User-defined Functions#

Although Swordfish provides a rich set of built-in functions, complex data analysis scenarios often require users to call user-defined functions to meet specific needs. Swordfish supports converting native Python functions into Swordfish UDF objects via decorators, and it also allows direct use of Python lambda expressions.

Callback Mode: Compatible with Native Python Environment#

In complex data analysis scenarios, users may want to directly call existing Python functions within Swordfish higher-order functions to perform sophisticated computations. Swordfish allows converting a Python function into a Swordfish UDF object using the decorator @F.swordfish_udf().

For example, users can convert a user-defined function calCumVol into a Swordfish function object using the decorator. This allows calCumVol to be passed as a parameter to F.accumulate to compute the cumulative volume. In this case, the function operates in callback mode: Swordfish preserves the Python function object without translating it, so execution still relies on the Python environment. If the function requires third-party Python libraries or other global variables during execution, it is recommended to use callback mode.

# Translate a Python function into a Swordfish function and invoke it via a higher-order function
@F.swordfish_udf()
def caclCumVol(target, preResult):
    x = 1500000
    result = preResult + x
    if result - target> target - preResult:
        return x
    else:
        return result
# Numeric sequence
volume = np.array([288658, 234804, 182714, 371986, 265882, 174778, 153657, 201388])
F.accumulate(caclCumVol, volume)

%timeit F.accumulate(caclCumVol, volume)
# 97.1 μs ± 3.62 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

High-Performance Execution of User-defined Functions#

In addition, users can execute user-defined functions in translation mode by applying the decorator @F.swordfish_udf(mode="translate").

For example, when the mode parameter is set to translate, the user-defined function calcCumVol is converted into a built-in Swordfish function, resulting in nearly twice the execution efficiency. Therefore, for performance-critical scenarios, it is recommended to execute user-defined functions in translation mode.

# Translate Python functions into Swordfish functions and invoke them via higher-order functions
@F.swordfish_udf(mode="translate")
def caclCumVol(target, preResult):
    x = 1500000
    result = preResult + x
    if result - target> target - preResult:
        return x
    else:
        return result

volume = np.array([288658, 234804, 182714, 371986, 265882, 174778, 153657, 201388])

%timeit F.accumulate(caclCumVol, volume)
# 32.5 μs ± 141 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Building Aggregate Functions#

In data analysis scenarios, built-in aggregate functions (such as sum and avg) are often insufficient for complex business needs. For instance, in financial analysis, it is common to calculate the geometric mean (which reflects the compound growth rate). Example data is shown below.

n = 1000
date = pd.date_range('2023-01-01', periods=n, freq='D')
sym = np.random.choice(['AAPL', 'MSFT', 'GOOG'], n)
value = F.rand(100.0, n)
tb = pd.DataFrame({'date': date,'sym': sym, 'value': value})

Swordfish allows users to define user-defined aggregate functions by applying the decorator @F.swordfish_udf(is_aggregate=True). Users only need to specify the computation logic (e.g., taking the exponential of the mean of logarithms), and the function can then be called directly in SQL queries. It can be used together with GROUP BY or CONTEXT BY for grouped statistics.

The following example defines an aggregate function geometricMean to compute the geometric mean.

import math

@F.swordfish_udf(is_aggregation=True)
def geometricMean(x):
    return math.exp(np.avg(np.log(x)))

After defining the function, the final SQL aggregate query is as follows:

sf.sql("select geometricMean(value) as gmean_value from t group by sym",
        vars={'t': tb})

Using Lambda Expressions#

In data processing, it is often necessary to encapsulate temporary logic, such as dynamic computation rules or ad hoc text processing. Swordfish supports inline anonymous functions through Python lambda expressions, allowing them to be embedded directly in higher-order function calls and eliminating the need for redundant definitions.

In the following example, the F.each function is used with a Python lambda expression to compute the sum of two vectors.

F.each(lambda a,b:a+b, F.seq(1,11), F.seq(2,12))

Additionally, Python lambda expressions can also be invoked within Swordfish user-defined functions. For example, in the user-defined function byRowConcat, a lambda expression is called within F.each to concatenate strings row by row for a string vector.

t=sf.table({"str":F.take("aaaa_bbbb", 1000000)})

@F.swordfish_udf(mode="translate")
def byRowConcat(str):
    return F.each(lambda str: F.concat(F.reverse(F.split(str, '_')),"_"), str)

byRowConcat(t["str"])

Applications of Higher-Order Functions#

In complex data analysis scenarios, such as window calculation, aggregation operations, and vectorized operations, it is often necessary to perform the same operation across batches of data. Swordfish supports this batch-processing paradigm through higher-order functions, whose core mechanism takes both function objects and data objects as input pipelines. This approach maintains code simplicity while significantly improving development efficiency. It is important to note that Swordfish higher-order functions currently support only Swordfish build-in functions as parameters and do not yet support Python user-defined functions.

Window Calculation#

Window calculation is a common requirement in time-series analysis. Swordfish provides powerful window functions to support this need. The F.moving function is suitable for rolling calculations in fixed-length window - for example, calculating a 3-period moving average of stock quotes:

t = sf.table({'sym':F.take(['a', 'b'], 100), 'bidPrice':F.rand(100.0,100)})
F.moving(F.avg, t['bidPrice'], 3)

For basic statistical metrics, it is recommended to use the optimized m-functions (such as mavg) instead of the moving function to improve performance.

F.mavg(t['bidPrice'], 3)

The F.window function supports asymmetric window computations, making it ideal for use cases that require user-defined lookback and lookahead intervals. The following example identifies local extrema within a (-4, 4) window:

t = sf.table({'id':F.rand(np.arange(1,101),20)})
F.window(F.min, t['id'], F.pair(-4,4))

Panel Data Processing#

Swordfish provides capabilities for dynamic grouping and multidimensional pivoting. The F.segmentby function supports segmented aggregation based on continuous labels, making it suitable for analyzing stock price fluctuation patterns.

price = sf.vector([100,102,103,101,100,105,107])
change = price - F.prev(price)
F.segmentby(
    F.cumsum,
    F.abs(change),
    F.iif(change>0, 1, F.iif(change<0, -1, 0))  # Dynamically generate price movement indicators
)

The F.pivot function performs two-dimensional data restructuring and supports multidimensional cross calculations. The following example demonstrates a spatiotemporal weighted aggregation of stock trading volumes.

t1 = sf.table({'sym': ['A', 'A', 'B'], 'date': [date(2021,1,1), date(2021,1,2), date(2021,1,3)],
               'price': [10.6, 10.7, 20.6], 'volume': [1000.5, 1000.6, 2000.5]})
F.pivot(F.wavg, [t1['price'], t1['volume']], t1['date'], t1['sym'])  # Spatiotemporal weighted average

Vectorized Operation#

Vectorized processing eliminates explicit loops, significantly improving computational efficiency. The F.each function performs element-wise batch operations - for example, performing data cleansing using column-wise validation.

t = sf.table({'val1':F.join(np.arange(1,101), np.full(10, np.nan)),
              'val2':F.join(np.full(10, np.nan), np.arange(1,101))})
t[F.rowAnd(F.each(F.isValid, t.values()))]  # Filter rows with all valid columns

The F.byRow function enables row-wise matrix operations, such as quickly finding the index of the maximum value in each row.

m = sf.matrix([[2,3,4],[1,2,3],[1,4,5],[5,3,2]])
F.byRow(F.imax, m)  # Compute the index of the maximum value for each row

The F.cross function supports matrix cross computation, enabling efficient covariance matrix generation.

matt = np.matrix([[1,2,3],[4,5,6],[7,8,9]])
F.cross(F.covar, matt)  # Generate a covariance matrix

Additional Higher-Order Functions#

Swordfish provides a wide range of built-in higher-order functions. Users can select and combine the most appropriate ones for their specific use cases to fully leverage Swordfish’s batch computation capabilities. The following table lists commonly used higher-order functions for different scenarios:

Use Case

Functions

Window Calculations

F.rolling, F.moving, F.tmoving, F.window, F.twindow

Vectorized Calculations

F.each, F.eachRight, F.eachLeft, F.eachPre, F.eachPost

Panel Data Processing

F.pivot, F.contextby, F.segmentby

Recursive Calls

F.accumulate, F.reduce

Parallel computing

F.peach, F.ploop, F.pcall

Partial Application#

Partial application generates a new function by fixing some of the arguments of an existing function, which can significantly reduce code redundancy in high-frequency parameter scenarios. In Swordfish, it is recommended to use sf.partial, which allows flexible fixation of target function parameters via keyword arguments. For example, when computing 3-day window indicators in batch, the window length parameter can be fixed in advance:

# Fix window length to 3
mov3 = sf.partial(F.moving, window=3)
ma = mov3(F.avg, [1,2,3,4,5])    # 3-day moving average
max_val = mov3(F.max, [1,2,3,4,5])  # 3-day maximum

This method also supports fixing multiple parameters simultaneously, such as locking both the rolling window and the minimum number of observations:

np.random.seed(42)
price_data = np.cumprod(1 + np.random.normal(0, 0.02, 100))  # Simulated prices
returns_data = np.diff(np.log(price_data))  # Log returns

# Fix the window length and the minimum number of observations.
safe_roller = sf.partial(F.rolling, window=5, min_periods=2)
skewness = safe_roller(F.skew, returns_data)  # Rolling skewness with fault tolerance

For higher-order scenarios where a partially applied function needs to be passed as an argument (e.g., nested in F.each), F.partial can be used together with the placeholder F.VOID, but its syntax requires strict positional matching and has limited applicability. In daily development, sf.partial is preferred for better readability and maintainability.

Metaprogramming#

In dynamic computation scenarios, such as window joins and streaming engines, the definition and execution of computation logic often need to be separated. For example, in a window join, users need to encapsulate aggregation logic (e.g., retrieving the latest quote within a window) as an expression object rather than computing the result immediately, allowing dynamic evaluation in different contexts or time frames.

Swordfish provides a metaprogramming toolkit, centered around the sf.meta_code context manager. Using this tool, users can explicitly define column references and compose function logic to generate metacode. For example, to create an expression that computes the latest quote, simply reference the column and call the function within the metaprogramming context:

with sf.meta_code() as m:
    bid = m.col("bid")      # Reference bid column
    agg_logic = F.last(bid) # Generate metacode <last(bid)>

The resulting agg_logic is a metacode that can be passed directly as a delayed-execution argument. Additionally, Swordfish provides F.sqlCol and F.makeCall for quickly generating simple expressions or metacode for user-defined function call, F.sqlCol("bid", F.last) produces <last(bid)>. These shortcuts are suitable for simple scenarios, while complex logic is better handled using sf.meta_code for chained composition.

Taking a window join on stock quote data as an example, the core requirement is to match the left table with the right table over a time window and dynamically compute the latest quote in the right table’s window. Once defined via metaprogramming, the metacode can be passed to the aggs parameter of F.wj:

# Define a metacode
with sf.meta_code() as m:
    bid = m.col("bid")
    agg_code = F.last(bid)  # <last(bid)>


t1 = sf.table({'sym': ['A', 'A', 'B'],
                'date': [date(2021, 1, 1), date(2021, 1, 2), date(2021, 1, 3)],
                'price': [10.6, 10.7, 20.6],
                'bid': [10.5, 10.6, 20.5]})
t2 = sf.table({'sym': ['A', 'B', 'C'],
            'date': [date(2021, 1, 2), date(2021, 1, 4), date(2021, 1, 3)],
            'price': [10.6, 20.6, 30.6],
            'bid': [10.3, 10.4, 20.5]})

result = F.wj(
    leftTable=t1,                # daily stock quotes
    rightTable=t2,               # post-market updates
    window=F.pair(0, 1),         # Time window [current row, current row + 1 day]
    aggs=agg_code,               # Pass in a metacode
    matchingCols=["sym", "date"] # Match by stock code and date
)

After execution, each row in the left table is joined with the corresponding data from the right table within the time window, taking the last bid value. For example, a row in the left table dated 2021-01-02 will join with data from the previous day in the right table, and the output field last_bid contains the latest quote within that window.