Unified Stream and Batch Processing of WorldQuant 101 Alphas in DolphinDB

In 2015, the formulas of 101 quantitative trading alphas used by WorldQuant were presented in the paper 101 Formulaic Alphas. To conveniently calculate these 101 alphas in DolphinDB, you can use the functions encapsulated in the module wq101alpha (wq101alpha.dos).

This module has the following advantages over Python:

  • Better Performance: DolphinDB outperforms Python by a median of 15.5x. For 27.5% of the alphas, DolphinDB is more than 100 times faster than Python.
  • Unified Stream and Batch Processing: Functions defined in this module can be used for both stream and batch processing.
  • Simplicity: Most of the 101 alphas can be written with DolphinDB built-in functions. No need to develop new functions.

Note: All scripts in this tutorial are compatible with DolphinDB V2.00.8, 1.30.20 or higher.

1. Naming Conventions

  • All function names in module wq101alpha start with "WQAlpha" followed by a number from 1 to 101. For examples, WQAlpha1,WQAlpha2, etc.

  • The following is a list of parameters for alphas defined in the wq101alpha module. Each alpha may use a different set of parameters (see Appendix).

ParametersMeaningContains industry information
tradetimetrading hours×
securityidsecurity code×
openopen price×
closeclose price×
highhigh price×
lowlow price×
voltrading volume×
vwapvolume-weighted average price×
capmarket capitalization
indclassindustry classification
  • Two types of alphas are defined in the wq101alpha module: alphas with and without industry information.
Alpha typeInputOutputAlpha#
with industry informationtabletable48,56,58,59,63,67,69,70,76,79,80,82,87,89,90,91,93,97,100
without industry informationmatrixmatrixall other alphas

2. Examples

This chapter expounds how to calculate alphas with specific examples.

2.1. Environment Setup

Add the module file wq101alpha.dos to [home]/modules.

The [home] directory is specified by the configuration parameter home. (To check the value of home, use getHomeDir())

2.2. Data Preparation

You can simulate daily data with DataSimulation and industry information with IndustryInfo .

Alternatively, if you already have tables of daily data and industry information, you need to perform an equal join to combine these two tables, and make sure that the column names are consistent with the parameters defined in the module.

If you need to change the column names, you can use function prepareData in module prepare101 (add prepare101.dos to [home]/modules). It converts the column names to the defined parameters.

  • rawData is a table containing non-industry information.
  • infoData is a table containing industry information.
  • startTime and endTime determine the start time and end time of data.
  • Other parameters are column names to be converted.

Import the wq101alpha module and load the data you prepared:

use wq101alpha
use prepare101
login('admin', '123456')
rawData = loadTable("dfs://k_day_level", "k_day")
infoData = select * from loadTable("dfs://info", "info_data")
startTime = timestamp(2010.01.01)
endTime = timestamp(2010.01.31)
data = prepareData(rawData=rawData, startTime=startTime, endTime=endTime, securityidName="securityid", tradetimeName="tradetime", openName="open", closeName="close", highName="high", lowName="low", volumeName="vol", vwapName="vwap", infoSecurityidName="securityid", capName="cap", indclassName="indclass", infoData=infoData)

2.3. Calculating Alphas Without Industry Information

In the wq101alpha module, the calculation of alphas without industry information is generally conducted on two dimensions: time-series and cross-section. For these factors, you need to prepare a matrix as the input, and then call function WQAlpha#. Check Appendix for specific parameters.

For example, you can calculate alpha 1 and alpha 2 as follows:

use wq101alpha
input1 = exec close from data where tradetime between startTime : endTime pivot by tradetime, securityid
res1 = WQAlpha1(input1)

input2 = dict(`vol`close`open, panel(data.tradetime, data.securityid, [data.vol, data.close, data.open]))
res2 = WQAlpha2(input2.vol, input2.close, input2.open)

We provide function prepare# and calAlpha# in the prepare101 module to save your time spent on specifying parameters.

Function prepare# prepares the parameters required for each alpha and function calAlpha# encapsulates function prepare# and wqAlpha#.

Take alpha 1 as an example:

def prepare1(data, startTime, endTime){
    p = exec close from data where tradetime between startTime : endTime pivot by tradetime, securityid
    return p
}

def calAlpha1(data, startTime, endTime){
    input = prepare1(data, startTime, endTime)
    return WQAlpha1(input)
}
//call the module
use prepare101

res = calAlpha1(data, startTime, endTime)

Note:

In addition to matrices, parameters of function WQAlpha41, WQAlpha54, and WQAlpha101 can also be vectors. For example, you can calculate alpha 101 using a SQL statement as follows:

use wq101alpha 
res = select tradetime, securityid, `alpha101 as factorname, WQAlpha101(close, open, high, low) as val from data where tradetime between startTime : endTime

2.4. Calculating Alphas with Industry Information

To calculate alphas with industry information, you need to specify a table as the input.

Take alpha 48 as an example:

use wq101alpha

res = WQAlpha48(data)

You can also use function calAlpha# in prepare101 module.

def calAlpha48(data, startTime, endTime){
    input = select * from data where tradetime between startTime : endTime
    return WQAlpha48(input)
}
//call the module
use prepare101

res = calAlpha48(data, startTime, endTime)

Note: The alpha calculation in the paper 101 Formulatic Alphas adopts several industry classifications, such as IndClass, subindustry, IndClass.industry, IndClass.sector. For the sake of convenience, only IndClass is used in this module.

Functions in the wq101alpha module return a matrix or a table. You can save your results to database if needed. Please refer to wq101alphaStorage.

3. Performance Comparison

Our testings show that the wq101alpha module of DolphinDB outperforms Python pandas and Numpy.

Hardware

CPU: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz OS: 64-bit CentOS Linux 7 (Core)

Data

We use the simulated daily data in a year to conduct performance testing (see TestData).

3.1. DolphinDB vs. Python Pandas

We compare the performance of alpha calculation implemented by DolphinDB module wq101alpha and Python pandas

The following is the main script for performance testing of the wq101alpha module (see full script in wq101alphaDDBTime):

times = array(INT, 0)
defs()
for (i in 1:102){
    if (i in passList) times.append!(NULL)
    else{
        print(i)
        alphaName = exec name from defs() where name = "wq101alpha::WQAlpha"+string(i)
        alphaSyntax = exec syntax from defs() where name = "wq101alpha::WQAlpha"+string(i)
        function = alphaName + alphaSyntax
        t1 = time(now())
        res = parseExpr(function[0]).eval()
        t2 = time(now())
        times.append!(t2 - t1)
    }
}

The following is the main script for performance testing of Python pandas (see full script in wq101alphaPyTime):

times = []

nofunc = [48, 56, 58, 59, 63, 67, 69, 70, 76, 79, 80, 82, 87, 89, 90, 91, 93, 97, 100]

for i in range(1, 102):
    if i in nofunc:
        times.append('no function')
        continue
    else:
        factor = getattr(Alphas, "alpha{:03d}".format(i))
    try:
        t1 = time.time()
        res = factor(stock)
        t2 = time.time()
        times.append(t2 - t1)
    except Exception:
        times.append('error')

The execution time of all 101 alphas can be found in PerformanceComparison. Alphas that have not yet been implemented in Python pandas or whose results are questionable are excluded. 69 alphas are available for comparison (in millisecond):

Alpha#DolphinDBPandasPandas/DDBAlpha#DolphinDBPandasPandas/DDB
18668,837800.43389189,487983.38
22292,1179.2440732,37932.59
31402,01814.414184110.13
47589,4401,192.5342972182.25
51206005.4391165,9541,823.67
6211,76584.0544821,91823.39
714872,001486.49451704,85328.55
81121,51313.514635571.62
95071414.27472521,1564.59
101288086.314933371.13
111458986.2502352,47510.53
1215100.695136381.05
132131,7848.385213191,360697.4
141131,98717.595329280.97
151472,57217.49541751781.02
162081,7768.54552162,99713.88
17177174,055983.366015472,081468.06
181042,41723.24611472,61417.78
19724406.11623543,2049.05
202624391.68641912,95615.48
21782,29629.43651812,96816.4
22972,35824.316824781,582330.29
246468610.72743804,76112.53
25965005.21752794,46115.99
2661182,3402,989.18783845,20413.55
272262,57311.388151961,954119.37
28452,15547.9832091,1075.3
2940689,515220.488415280,908532.29
308283210.1585303184,645609.39
32532,14640.58612375,681615.3
33881481.6894169221,0361,307.91
342541,3825.449528767,899236.58
35131249,7481,906.47992034,75823.44
3638892,303237.891019202.2
371481,95313.2

The result shows that wq101alpha in DolphinDB outperforms python pandas. DolphinDB is faster than Python by a median of 15.5x. For 27.5% of the alphas, DolphinDB is more than 100 times faster than python.

3.2. DolphinDB vs. NumPy

Considering NumPy may have better performance than pandas, we choose 11 alphas that are time-consuming in pandas and implement them with NumPy. See partialAlphaNumpyTime for test results of NumPy.

Performance comparison of DolphinDB and NumPy:

Alpha#DDBpandasNumPypandas/DDBNumPy/DDB
18668,837418800.44.9
47589,44054,4171,192.5725.6
51206002185.01.8
714872,00139,472486.5266.7
81121,51326513.52.4
95071415214.33.0
17177174,05593,704983.4529.4
2940689,51547,100220.5116.0
389189,48746,257983.4508.3
5213191,36046,715697.4356.6
832091,1074645.32.2

We can see that while NumPy is faster than pandas, DolphinDB outperforms both. DolphinDB has optimized the implementation of its window functions. In comparison, NumPy is not optimized for window calculations implemented by numpy.lib.stride_tricks.sliding_window_view.

4. Stream Processing

It is complex to implement most alphas in real time, which requires more than one stream engine. DolphinDB provides the streamEngineParser function to automatically form a pipeline of stream engines to carry out the specified metrics calculation. In streamEngineParser, you can directly call functions in module wq101alpha.

See full script in wq101alphaStreamTest for the implementation of real-time alpha calculations.

Take alpha 1 for example:

Define the schemata of input and output tables.

inputSchemaT = table(1:0, ["SecurityID","TradeTime","close"], [SYMBOL,TIMESTAMP,DOUBLE])
resultStream = table(10000:0, ["SecurityID","TradeTime", "factor"], [SYMBOL,TIMESTAMP, DOUBLE])

Call the wq101alpha module and use WQAlpha1 as the metrics for the streamEngineParser function.

use wq101alpha
metrics = <[WQAlpha1(close)]>
streamEngine = streamEngineParser(name="WQAlpha1Parser", metrics=metrics, dummyTable=inputSchemaT, outputTable=resultStream, keyColumn="SecurityID", timeColumn=`tradetime, triggeringPattern='perBatch', triggeringInterval=4000)

Check the status of the stream engines with function getStreamEngineStat().

getStreamEngineStat()

#output
ReactiveStreamEngine->
name            user        status lastErrMsg numGroups ...
-------------   ----------- ------ ---------- --------- ...
WQAlpha1Parser0 admin OK                0         0       
WQAlpha1Parser2 admin OK                0         0       

CrossSectionalEngine->
name            user  status lastErrMsg numRows ...
--------------- ----- ------ ---------- ------- ...
WQAlpha1Parser1 admin OK                0       2          

Append data to stream engines and check the output table resultStream.

streamEngine.append!(data)
//check the result
res = exec factor from resultStream pivot by TradeTime, SecurityID

5. Conclusion

This tutorial introduces how to calculate 101 alphas with DolphinDB built-in functions in the wq101alpha module. This module features efficiency, speed, and simplicity, and achieves unified batch and stream processing.

6. Appendix: Required Parameters for Each Alpha

Alphas without industry information

Alpha#ParametersAlpha#ParametersAlpha#Parameters
1, 9, 10, 19, 24, 29, 34, 46, 49, 51close23high71vwap, vol, open, close, low
2, 14vol, open, close25, 47, 74vwap, vol, close, high72, 77vwap, vol, high, low
3, 6vol, open27, 50, 61, 81vwap, vol73vwap, open, low
4low28, 35, 55, 60, 68, 85vol, high, low, close75, 78vwap, vol, low
5vwap, open, close31, 52vol, close, low83vwap, vol, close, high, low
7, 12, 13, 17, 21, 30, 39, 43, 45vol, close32, 42, 57, 84vwap, close88, 92, 94vol, open, close, high, low
8, 18, 33, 37, 38open, close36, 86vwap, vol, open, close95vol, open, high, low
11, 96vwap, vol, close41vwap, high, low65, 98vwap, vol, open
15, 16, 26, 40, 44vol, high53close, high, low99vol, high, low
20, 54, 101open, close, high, low62, 64vwap, vol, open, high, low
22vol, high, close66vwap, open, high, low

Alphas with industry information

Alpha#ParametersAlpha#Parameters
48close, indclass76, 89vwap, vol, low, indclass
56close, cap80vol, open, high, indclass
58, 59vwap, vol, indclass82vol, open, indclass
63, 79vwap, vol, open, close, indclass90vol, close, indclass
67vwap, vol, high, indclass97vwap, vol, low, indclass
69, 70, 87, 91, 93vwap, vol, close, indclass100vol, close, high, low, indclass

See Also