Table

The DolphinDB Python API provides native methods to create and use DolphinDB databases and tables. This section describes how to create and query tables.

Constructing Table Object with Session.table

The Table class wraps the DolphinDB table object handle, providing an interface to interact with the tables in Python. Table objects are usually constructed using the Session.table or Session.loadTable method. It can also be obtained through Session.loadText, Session.loadTextEx, etc.

Syntax of Session.table:

Session.table(dbPath=None, data=None, tableAliasName=None, inMem=False, partitions=None)
  • dbPath: database path. Leave this parameter unspecified when creating an in-memory table or a stream table.
  • data: table data. It can be a dict, pandas.DataFrame, or a DolphinDB table name.
  • tableAliasName: table alias
  • inMem: whether to load table data into the DolphinDB server memory.
  • partitions: partitions to be loaded into the server memory.

data: Uploading Data as a Temporary Table

When the data parameter is a dictionary or a Pandas DataFrame, the data is uploaded to DolphinDB server as a temporary table. In this case, dbPath, inMem, and partitions do not need to be specified.

Example:

data1 = pd.DataFrame({
    'a': [1, 2, 3],
    'b': [4, 5, 6],
})
t1 = s.table(data=data1)
print(t1, t1.tableName())
data2 = {
    'a': ['a', 'b', 'c'],
    'b': [1, 2, 3],
}
t2 = s.table(data=data2)
print(t2, t2.tableName())

Output:

<dolphindb.table.Table object at 0x7fbd5f02bd60> TMP_TBL_3cc57246
<dolphindb.table.Table object at 0x7fbd3205fc70> TMP_TBL_dbae4978

"data1" and "data2" are uploaded to the server as temporary tables "TMP_TBL_3cc57246" and "TMP_TBL_dbae4978", respectively.

data: Obtaining Handle to a DolphinDB Table

When the data parameter is a string indicating a DolphinDB table name, Session.table obtains a handle to the table.

(1) When dbPath and data are both specified, calling Session.table means to load the table with the name specified by data from the database specified by dbPAth.

dbPath = "dfs://testTable"
if s.existsDatabase(dbPath):
    s.dropDatabase(dbPath)
db = s.database(partitionType=keys.VALUE, partitions=[1, 2, 3], dbPath=dbPath, engine="TSDB")
s.run("schema_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
schema_t = s.table(data="schema_t")
db.createTable(schema_t, "pt", ["csymbol"])
pt = s.table(dbPath=dbPath, data="pt")
print(pt, pt.tableName())
print(pt.toDF())

Output:

<dolphindb.table.Table object at 0x7f5036bcd040> pt_TMP_TBL_5229a3cc
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []

(2) When only data is specified, Session.table obtains a handle to the in-memory table with the name specified by data.

s.run("test_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
t = s.table(data="test_t")
print(t, t.tableName())
print(t.toDF())

Output:

<dolphindb.table.Table object at 0x7f11ffb3c070> test_t
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []

tableAliasName

If tableAliasName is specified, Session.table will assign it as the table handle instead of assigning a random string.

(1) Specify tableAliasName when uploading local data to server

data1 = pd.DataFrame({
    'a': [1, 2, 3],
    'b': [4, 5, 6],
})
t1 = s.table(data=data1, tableAliasName="data1")
print(t1, t1.tableName())
data2 = {
    'a': ['a', 'b', 'c'],
    'b': [1, 2, 3],
}
t2 = s.table(data=data2, tableAliasName="data2")
print(t2, t2.tableName())

Output:

<dolphindb.table.Table object at 0x7f167ecb69d0> data1
<dolphindb.table.Table object at 0x7f1651d0bc40> data2

(2) Specify tableAliasName when loading table from database to obtain its handle

dbPath = "dfs://testTable"
if s.existsDatabase(dbPath):
    s.dropDatabase(dbPath)
db = s.database(partitionType=keys.VALUE, partitions=[1, 2, 3], dbPath=dbPath, engine="TSDB")
s.run("schema_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
schema_t = s.table(data="schema_t")
db.createTable(schema_t, "pt", ["csymbol"])
pt = s.table(dbPath=dbPath, data="pt", tableAliasName="tmp_pt")
print(pt, pt.tableName())
print(pt.toDF())

Output:

<dolphindb.table.Table object at 0x7f3350edc040> tmp_pt
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []

(3) tableAliasName does not take effect when obtaining the handle to a DolphinDB in-memory table:

s.run("test_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
t = s.table(data="test_t", tableAliasName="test_t2")
print(t, t.tableName())
print(t.toDF())

Output:

<dolphindb.table.Table object at 0x7f9fb55b4070> test_t
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []

tableAliasName can be used to specify an alias when loading a partitioned table or uploading local data as a DolphinDB table, but it does not work when loading a table with its name specified. This helps to avoid using a temporary table name.

inMem, partitions

These two parameters only take effect for local databases on disk. For details, see the DolphinDB User Manual - loadTable.

Session.loadTable

Session.loadTable returns a Table object. It is similar to Session.table. The difference is that Session.loadTable does not upload local data and can only be used to obtain handles to tables on the DolphinDB server.

Session.loadTable(tableName, dbPath=None, partitions=None, memoryMode=False)
  • tableName: the name of an in-memory table or a database table
  • dbPath: database path
  • partitions: partitions to be loaded into the server memory
  • memoryMode: whether to load table data into the server memory

Loading an In-Memory Table

Example:

s.run("test_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
t = s.loadTable("test_t")
print(t, t.tableName())
print(t.toDF())

Output:

<dolphindb.table.Table object at 0x7fd1c90a4c10> test_t
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []

Loading a Database Table

Example:

dbPath = "dfs://testTable"
if s.existsDatabase(dbPath):
    s.dropDatabase(dbPath)
db = s.database(partitionType=keys.VALUE, partitions=[1, 2, 3], dbPath=dbPath, engine="TSDB")
s.run("schema_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
schema_t = s.table(data="schema_t")
db.createTable(schema_t, "pt", ["csymbol"])
pt = s.loadTable("pt", dbPath=dbPath)
print(pt, pt.tableName())
print(pt.toDF())

Output:

<dolphindb.table.Table object at 0x7fdaf7885eb0> pt_TMP_TBL_0dfdc80a
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []

Life Cycle of Uploaded Tables

The table and loadTable methods both return a Python local variable. When local data is uploaded to the DolphinDB server without specifying an alias, a random name will be assigned as the handle to the uploaded variable.

The following example uploads local object "data" to the DolphinDB server as a table. "t" represents the local variable of the uploaded table. For the name of the uploaded table on the server, call Table.tableName.

data = pd.DataFrame({
    'a': [1, 2, 3],
    'b': [4, 5, 6],
})
t = s.table(data=data)
print(t.tableName())

Output:

TMP_TBL_e03723c9

The prefix "TMP_TBL_" indicates that it is a handle to a temporary table. Once the Python Table object is destructed, the corresponding temporary table on the server will be destructed as well.

There are 3 options to release the table variable on DolphinDB server:

  • undef
s.undef(t.tableName(), "VAR")
  • assign NULL value to the variable on DolphinDB server
s.run(f"{t.tableName()}=NULL")
  • Destruct the local variable in Python to remove its reference to the server variable
del t

Note:

If an alias is specified when obtaining a handle, or the obtained handle already exists before the call, the destruction of the Python object will not affect the corresponding data on server.

If the data is used only once on DolphinDB server, it is recommended to include it as a parameter in a function call instead of uploading it as a table variable. A function call does not cache data. After the function call is executed, all variables are released. Additionally, a function call is faster to execute as the network transmission only takes place once.

Operating on Tables

The DolphinDB Python API provide a variety of methods to operate DolphinDB tables in Python, including querying data with conditions, and update and delete data.

Note: The objects returned by the update and delete methods are not Table objects, but TableUpdate and TableDelete objects. Although this step is transparent to users, after calling update and delete, you must call the execute method to synchronize the table updates to the server. For details, see the description of these methods in the following documentation.

This section uses the example.csv file in examples.

Obtaining Table Properties

Use rows and cols to get the number of table rows and columns.

>>> s.run("t = table(1..5 as a, 2..6 as b)")
>>> t = s.table(data="t")
>>> t.rows
5
>>> t.cols
2

Use colNames to get the column names of table table.

>>> t.colNames
['a', 'b']

schema returns a Pandas DataFrame indicating the schema of the table. It has the same columns as the output of the server function schema().colDefs.

>>> t.schema
  name typeString  typeInt  extra comment
0    a        INT        4    NaN        
1    b        INT        4    NaN   

select

Similar to the SQL "select" keyword, the select method selects columns from a table. The following examples use the toDF method to convert the table to a Pandas DataFrame object.

(1) List of column names as input

>>> trade=s.loadText(WORK_DIR+"/example.csv")
>>> trade.select(["ticker", "date"]).toDF()
      ticker       date
0       AMZN 1997-05-15
1       AMZN 1997-05-16
2       AMZN 1997-05-19
3       AMZN 1997-05-20
4       AMZN 1997-05-21
...

(2) String as input

>>> trade.select("ticker, date, bid").toDF()
      ticker       date        bid
0       AMZN 1997-05-15   23.50000
1       AMZN 1997-05-16   20.50000
2       AMZN 1997-05-19   20.50000
3       AMZN 1997-05-20   19.62500
4       AMZN 1997-05-21   17.12500
...

update

Use update to update a table. The cols parameter indicates the columns to update and the vals parameter indicate the values for update.

As shown in the following examples, to synchronize the updates to server, execute() must be executed at the same time.

 >>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> t1 = trade.update(["VOL"],["999999"]).where("TICKER=`AMZN").where(["date=2015.12.16"])
>>> t2 = trade.where("ticker=`AMZN").where("date=2015.12.16")
>>> t2.toDF()
  TICKER       date      VOL        PRC        BID        ASK
0   AMZN 2015-12-16  3964470  675.77002  675.76001  675.83002
>>> t1 = trade.update(["VOL"],["999999"]).where("TICKER=`AMZN").where(["date=2015.12.16"]).execute()
>>> t2.toDF()
  TICKER       date     VOL        PRC        BID        ASK
0   AMZN 2015-12-16  999999  675.77002  675.76001  675.83002

delete

Use delete to delete records from table. To synchronize the changes to server, execute must be executed at the same time.

>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.rows
13136
>>> t = trade.delete().where('date<2013.01.01')
>>> trade.rows
13136
>>> t = trade.delete().where('date<2013.01.01').execute()
>>> trade.rows
3024

groupby

groupby groups records based on the specified rules. It is equivalent to the DolphinDB server function groupby.

groupby accepts string list or a single string as argument, which indicates the column(s) to group by. If the SQL statement also includes a having condition, it must contain an aggregate function, meaning to generate a record for each group satisfying the having condition.

Note: groupby must be followed by an aggregate function such as count, sum, avg, std, etc.

Prepare the data:

>>> dbPath = "dfs://valuedb"
>>> if s.existsDatabase(dbPath):
...     s.dropDatabase(dbPath)
>>> s.database(dbName='mydb', partitionType=keys.VALUE, partitions=["AMZN","NFLX","NVDA"], dbPath=dbPath)
>>> trade = s.loadTextEx(dbPath=dbPath, partitionColumns=["TICKER"], tableName='trade', remoteFilePath=WORK_DIR+"/example.csv")

Calculate the sum of column "vol" and the sum of column "prc" in each "ticker" group:

>>> trade.select(['sum(vol)','sum(prc)']).groupby(['ticker']).toDF()
  ticker      sum_vol       sum_prc
0   AMZN  33706396492  772503.81377
1   NFLX  14928048887  421568.81674
2   NVDA  46879603806  127139.51092

Use groupby and having in the same query:

>>> trade.select('count(ask)').groupby(['vol']).having('count(ask)>1').toDF()
       vol  count_ask
0   579392          2
1  3683504          2
2  5732076          2
3  6299736          2
4  6438038          2
5  6946976          2
6  8160197          2
7  8924303          2

contextby

contextby is a unique feature in DolphinDB. It is an extension to standard SQL for convenient time-series data manipulation.

contextby is similar to groupby except that for each group, groupby returns a scalar whereas contextby returns a vector of the same size as the number of rows in the group. For details, see the DolphinDB User Manual - context by.

If having is used after contextby and is used with only aggregate functions, the result is the rows of the members of the groups whose aggregate function values satisfy the specified conditions. If having is after contextby and is used with a function that is not aggregate function, the result is the rows of the records that satisfy the specified conditions.

>>> trade.contextby('ticker').top(3).toDF()
  TICKER       date      VOL      PRC      BID      ASK
0   AMZN 1997-05-15  6029815  23.5000  23.5000  23.6250
1   AMZN 1997-05-16  1232226  20.7500  20.5000  21.0000
2   AMZN 1997-05-19   512070  20.5000  20.5000  20.6250
3   NFLX 2002-05-23  7507079  16.7500  16.7500  16.8500
4   NFLX 2002-05-24   797783  16.9400  16.9400  16.9500
5   NFLX 2002-05-28   474866  16.2000  16.2000  16.3700
6   NVDA 1999-01-22  5702636  19.6875  19.6250  19.6875
7   NVDA 1999-01-25  1074571  21.7500  21.7500  21.8750
8   NVDA 1999-01-26   719199  20.0625  20.0625  20.1250
>>> trade.select("TICKER, month(date) as month, cumsum(VOL)").contextby("TICKER,month(date)").toDF()
      TICKER      month  cumsum_VOL
0       AMZN 1997-05-01     6029815
1       AMZN 1997-05-01     7262041
2       AMZN 1997-05-01     7774111
3       AMZN 1997-05-01     8230468
4       AMZN 1997-05-01     9807882
...

Use contextby and having in the same query:

>>> trade.contextby('ticker').having("sum(VOL)>40000000000").toDF()
     TICKER       date       VOL       PRC       BID       ASK
0      NVDA 1999-01-22   5702636   19.6875   19.6250   19.6875
1      NVDA 1999-01-25   1074571   21.7500   21.7500   21.8750
2      NVDA 1999-01-26    719199   20.0625   20.0625   20.1250
3      NVDA 1999-01-27    510637   20.0000   19.8750   20.0000
4      NVDA 1999-01-28    476094   19.9375   19.8750   20.0000
...

pivotby

pivotby is a unique feature in DolphinDB. It is an extension to standard SQL for convenient time-series data manipulation. It rearranges a column (or multiple columns) of a table (with or without a data transformation function) on two dimensions. For details, see the DolphinDB User Manual - pivot by.

When used with select, pivotby returns a table.

>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = trade.select("VOL").pivotby("TICKER", "date")
>>> t1.toDF()
  TICKER  1997.05.15  1997.05.16  ...  2016.12.28  2016.12.29  2016.12.30
0   AMZN   6029815.0   1232226.0  ...     3301025     3158299     4139451
1   NFLX         NaN         NaN  ...     4388956     3444729     4455012
2   NVDA         NaN         NaN  ...    57384116    54384676    30323259

When used with select, pivotby returns a DolphinDB matrix.

>>> trade.exec("VOL").pivotby("TICKER", "date").toDF()
[array([[ 6029815.,  1232226.,   512070., ...,  3301025.,  3158299.,
         4139451.],
       [      nan,       nan,       nan, ...,  4388956.,  3444729.,
         4455012.],
       [      nan,       nan,       nan, ..., 57384116., 54384676.,
        30323259.]]), array(['AMZN', 'NFLX', 'NVDA'], dtype=object), array(['1997-05-15T00:00:00.000000000', '1997-05-16T00:00:00.000000000',
       '1997-05-19T00:00:00.000000000', ...,
       '2016-12-28T00:00:00.000000000', '2016-12-29T00:00:00.000000000',
       '2016-12-30T00:00:00.000000000'], dtype='datetime64[ns]')]

sort, csort

sort and csort can be used to sort the data in ascending ("asc") or descending ("desc") order. sort is equivalent to the "order by" SQL clause whereas csort is only used to sort the data in each group after contextby.

Use sort to sort query result:

>>> trade = s.loadTable("trade", "dfs://valuedb")
>>> trade.sort('date').toDF()
      TICKER       date       VOL      PRC        BID        ASK
0       AMZN 1997-05-15   6029815   23.500   23.50000   23.62500
1       AMZN 1997-05-16   1232226   20.750   20.50000   21.00000
2       AMZN 1997-05-19    512070   20.500   20.50000   20.62500
3       AMZN 1997-05-20    456357   19.625   19.62500   19.75000
4       AMZN 1997-05-21   1577414   17.125   17.12500   17.25000
...

Use csort with a contextby clause:

>>> trade = s.loadTable("trade", "dfs://valuedb")
>>> trade.contextby('ticker').csort('date desc').toDF()
      TICKER       date      VOL        PRC        BID        ASK
0       AMZN 2016-12-30  4139451  749.87000  750.02002  750.40002
1       AMZN 2016-12-29  3158299  765.15002  764.66998  765.15997
2       AMZN 2016-12-28  3301025  772.13000  771.92999  772.15997
3       AMZN 2016-12-27  2638725  771.40002  771.40002  771.76001
4       AMZN 2016-12-23  1981616  760.59003  760.33002  760.59003
...

You can also specify the sorting order by passing in arguments:

sort(by, ascending=True)
csort(by, ascending=True)

The ascending parameter indicates whether to sort data in ascending order. The default value is True. You can specify different sorting methods for multiple columns by passing in a list.

>>> trade.select("*").contextby('ticker').csort(["TICKER", "VOL"], True).limit(5).toDF()
   TICKER       date    VOL      PRC      BID     ASK
0    AMZN 1997-12-26  40721  54.2500  53.8750  54.625
1    AMZN 1997-08-12  47939  26.3750  26.3750  26.750
2    AMZN 1997-07-21  48325  26.1875  26.1250  26.250
3    AMZN 1997-08-13  49690  26.3750  26.0000  26.625
4    AMZN 1997-06-02  49764  18.1250  18.1250  18.375
5    NFLX 2002-09-05  20725  12.8500  12.8500  12.950
6    NFLX 2002-11-11  26824   8.4100   8.3000   8.400
7    NFLX 2002-09-04  27319  13.0000  12.8200  13.000
8    NFLX 2002-06-10  35421  16.1910  16.1900  16.300
9    NFLX 2002-09-06  54951  12.8000  12.7900  12.800
10   NVDA 1999-05-10  41250  17.5000  17.5000  17.750
11   NVDA 1999-05-07  52310  17.5000  17.3750  17.625
12   NVDA 1999-05-14  59807  18.0000  17.7500  18.000
13   NVDA 1999-04-01  63997  20.5000  20.1875  20.500
14   NVDA 1999-04-19  65940  19.0000  19.0000  19.125
>>> trade.select("*").contextby('ticker').csort(["TICKER", "VOL"], [True, False]).limit(5).toDF()
   TICKER       date        VOL       PRC     BID       ASK
0    AMZN 2007-04-25  104463043   56.8100   56.80   56.8100
1    AMZN 1999-09-29   80380734   80.7500   80.75   80.8125
2    AMZN 2006-07-26   76996899   26.2600   26.17   26.1800
3    AMZN 2007-04-26   62451660   62.7810   62.77   62.8300
4    AMZN 2005-02-03   60580703   35.7500   35.74   35.7300
5    NFLX 2015-07-16   63461015  115.8100  115.85  115.8600
6    NFLX 2015-08-24   59952448   96.8800   96.85   96.8800
7    NFLX 2016-04-19   55728765   94.3400   94.30   94.3100
8    NFLX 2016-07-19   55685209   85.8400   85.81   85.8300
9    NFLX 2016-01-20   53009419  107.7400  107.73  107.7800
10   NVDA 2011-01-06   87693472   19.3300   19.33   19.3400
11   NVDA 2011-02-17   87117555   25.6800   25.68   25.7000
12   NVDA 2011-01-12   86197484   23.3525   23.34   23.3600
13   NVDA 2011-08-12   80488616   12.8800   12.86   12.8700
14   NVDA 2003-05-09   77604776   21.3700   21.39   21.3700

top, limit

top is used to get the top N number of records in a table. For details, see the DolphinDB User Manual - top.

>>> trade = s.table("dfs://valuedb", "trade")
>>> trade.top(5).toDF()
  TICKER       date      VOL     PRC     BID     ASK
0   AMZN 1997-05-15  6029815  23.500  23.500  23.625
1   AMZN 1997-05-16  1232226  20.750  20.500  21.000
2   AMZN 1997-05-19   512070  20.500  20.500  20.625
3   AMZN 1997-05-20   456357  19.625  19.625  19.750
4   AMZN 1997-05-21  1577414  17.125  17.125  17.250

The limit clause is similar to the top clause with the following differences:

  • The top clause cannot use negative integers. When used with the context by clause, the limit clause can use a negative integer to select a limited number of records from the end of each group. In all other cases, the limit clause can only use non-negative integers.
  • The limit clause can select a limited number of rows starting from a specified row.

For details, see the DolphinDB User Manual - limit.

>>> trade.select("*").contextby('ticker').limit(-2).toDF()
  TICKER       date       VOL        PRC        BID        ASK
0   AMZN 2016-12-29   3158299  765.15002  764.66998  765.15997
1   AMZN 2016-12-30   4139451  749.87000  750.02002  750.40002
2   NFLX 2016-12-29   3444729  125.33000  125.31000  125.33000
3   NFLX 2016-12-30   4455012  123.80000  123.80000  123.83000
4   NVDA 2016-12-29  54384676  111.43000  111.26000  111.42000
5   NVDA 2016-12-30  30323259  106.74000  106.73000  106.75000
>>> trade.select("*").limit([2, 5]).toDF()
  TICKER       date      VOL     PRC     BID     ASK
0   AMZN 1997-05-19   512070  20.500  20.500  20.625
1   AMZN 1997-05-20   456357  19.625  19.625  19.750
2   AMZN 1997-05-21  1577414  17.125  17.125  17.250
3   AMZN 1997-05-22   983855  16.750  16.625  16.750
4   AMZN 1997-05-23  1330026  18.000  18.000  18.125

merge, merge_asof, merge_window, and merge_cross

Use merge for inner joins, left joins, left-semi joins, and outer joins. Use merge_asof for asof joins, and merge_window for window joins.

merge

Specify join columns with parameter on if join column names are identical in both tables; use parameters left_on and right_on when join column names are different. The optional parameter how indicates the table join type. The default table join mode is inner join.

  • When join column names are identical in both tables:
>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = s.table(data={
...     'TICKER': ['AMZN', 'AMZN', 'AMZN'], 
...     'date': np.array(['2015-12-31', '2015-12-30', '2015-12-29'], dtype='datetime64[D]'), 
...     'open': [695, 685, 674],
... })
...
>>> t1 = t1.select("TICKER, date(date) as date, open")
>>> trade.merge(t1,on=["TICKER","date"]).toDF()
  TICKER       date      VOL        PRC        BID        ASK  open
0   AMZN 2015-12-29  5734996  693.96997  693.96997  694.20001   674
1   AMZN 2015-12-30  3519303  689.07001  689.07001  689.09998   685
2   AMZN 2015-12-31  3749860  675.89001  675.85999  675.94000   695
  • Specify parameters left_on and right_on when the join column names are different.
>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = s.table(data={
...     'TICKER': ['AMZN', 'AMZN', 'AMZN'], 
...     'date': np.array(['2015-12-31', '2015-12-30', '2015-12-29'], dtype='datetime64[D]'), 
...     'open': [695, 685, 674],
... })
...
>>> t1 = t1.select("TICKER as TICKER1, date(date) as date1, open")
>>> trade.merge(t1, left_on=["TICKER","date"], right_on=["TICKER1", "date1"]).toDF()
  TICKER       date      VOL        PRC        BID        ASK  open
0   AMZN 2015-12-29  5734996  693.96997  693.96997  694.20001   674
1   AMZN 2015-12-30  3519303  689.07001  689.07001  689.09998   685
2   AMZN 2015-12-31  3749860  675.89001  675.85999  675.94000   695
  • To conduct left join, set how = "left".
>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = s.table(data={
...     'TICKER': ['AMZN', 'AMZN', 'AMZN'], 
...     'date': np.array(['2015-12-31', '2015-12-30', '2015-12-29'], dtype='datetime64[D]'), 
...     'open': [695, 685, 674],
... })
...
>>> t1 = t1.select("TICKER, date(date) as date, open")
>>> trade.merge(t1,how="left", on=["TICKER","date"]).where('TICKER=`AMZN').where('2015.12.23<=date<=2015.12.31').toDF()
  TICKER       date      VOL        PRC        BID        ASK   open
0   AMZN 2015-12-23  2722922  663.70001  663.48999  663.71002    NaN
1   AMZN 2015-12-24  1092980  662.78998  662.56000  662.79999    NaN
2   AMZN 2015-12-28  3783555  675.20001  675.00000  675.21002    NaN
3   AMZN 2015-12-29  5734996  693.96997  693.96997  694.20001  674.0
4   AMZN 2015-12-30  3519303  689.07001  689.07001  689.09998  685.0
5   AMZN 2015-12-31  3749860  675.89001  675.85999  675.94000  695.0
  • To conduct outer join, set how = "outer".
>>> t1 = s.table(data={'TICKER': ['AMZN', 'AMZN', 'NFLX'], 'date': ['2015.12.29', '2015.12.30', '2015.12.31'], 'open': [674, 685, 942]})
>>> t2 = s.table(data={'TICKER': ['AMZN', 'NFLX', 'NFLX'], 'date': ['2015.12.29', '2015.12.30', '2015.12.31'], 'close': [690, 936, 951]})
>>> t1.merge(t2, how="outer", on=["TICKER","date"]).toDF()
  TICKER        date   open tmp_TICKER    tmp_date  close
0   AMZN  2015.12.29  674.0       AMZN  2015.12.29  690.0
1   AMZN  2015.12.30  685.0                           NaN
2   NFLX  2015.12.31  942.0       NFLX  2015.12.31  951.0
3                       NaN       NFLX  2015.12.30  936.0

Note: A partitioned table can only be outer joined with a partitioned table, and an in-memory table can only be outer joined with an in-memory table.

merge_asof

The merge_asof method is equivalent to the keyword asof join (aj) in DolphinDB (see the DolphinDB User Manual). asof join is similar to left join with the following differences:

  • For asof join, the data type of the last matching column is usually temporal. For a row in the left table with time t, if there is not a match of left join in the right table, the row in the right table that corresponds to the most recent time before time t is used, if all the other matching columns are matched; if there are more than one matching record in the right table, the last record is used.
  • If there is only 1 join column, asof join assumes the right table is sorted on the join column. If there are multiple join columns, asof join assumes the right table is sorted on the last join column within each group defined by the other join columns. The right table does not need to be sorted by the other join columns. If these conditions are not met, we may see unexpected results. The left table does not need to be sorted.

For the examples in this and the next section, we use trades.csv and quotes.csv which have AAPL and FB trades and quotes data on 10/24/2016 taken from the NYSE website.

>>> dbPath = "dfs://tickDB"
>>> if s.existsDatabase(dbPath):
...     s.dropDatabase(dbPath)
... 
>>> s.database(partitionType=keys.VALUE, partitions=["AAPL","FB"], dbPath=dbPath)
>>> trades = s.loadTextEx(dbPath, tableName='trades', partitionColumns=["Symbol"], remoteFilePath=WORK_DIR+"/trades.csv")
>>> quotes = s.loadTextEx(dbPath, tableName='quotes', partitionColumns=["Symbol"], remoteFilePath=WORK_DIR+"/quotes.csv")
>>> trades.top(5).toDF()
                        Time  Exchange  Symbol  Trade_Volume  Trade_Price
0 1970-01-01 08:00:00.022239        75    AAPL           300        27.00
1 1970-01-01 08:00:00.022287        75    AAPL           500        27.25
2 1970-01-01 08:00:00.022317        75    AAPL           335        27.26
3 1970-01-01 08:00:00.022341        75    AAPL           100        27.27
4 1970-01-01 08:00:00.022368        75    AAPL            31        27.40
>>> quotes.where("second(Time)>=09:29:59").top(5).toDF()
                         Time  Exchange  Symbol  Bid_Price  Bid_Size  Offer_Price  Offer_Size
0  1970-01-01 09:30:00.005868        90    AAPL      26.89         1        27.10           6
1  1970-01-01 09:30:00.011058        90    AAPL      26.89        11        27.10           6
2  1970-01-01 09:30:00.031523        90    AAPL      26.89        13        27.10           6
3  1970-01-01 09:30:00.284623        80    AAPL      26.89         8        26.98           8
4  1970-01-01 09:30:00.454066        80    AAPL      26.89         8        26.98           1
>>> trades.merge_asof(quotes,on=["Symbol","Time"]).select(["Symbol","Time","Trade_Volume","Trade_Price","Bid_Price", "Bid_Size","Offer_Price", "Offer_Size"]).top(5).toDF()
  Symbol                        Time          Trade_Volume  Trade_Price  Bid_Price  Bid_Size  \
0   AAPL  1970-01-01 08:00:00.022239                   300        27.00       26.9         1
1   AAPL  1970-01-01 08:00:00.022287                   500        27.25       26.9         1
2   AAPL  1970-01-01 08:00:00.022317                   335        27.26       26.9         1
3   AAPL  1970-01-01 08:00:00.022341                   100        27.27       26.9         1
4   AAPL  1970-01-01 08:00:00.022368                    31        27.40       26.9         1

  Offer_Price   Offer_Size
0       27.49           10
1       27.49           10
2       27.49           10
3       27.49           10
4       27.49           10

merge_window

merge_window is equivalent to the keyword window join (wj) in DolphinDB (see the DolphinDB User Manual). It is a generalization of asof join. With a window defined by parameters leftBound (w1) and rightBound (w2), for each row in the left table with the value of the last join column equal to t, find the rows in the right table with the value of the last join column between (t+w1) and (t+w2) conditional on all other join columns are matched, then apply aggFunctions to the selected rows in the right table.

The only difference between window join and prevailing window join is that if the right table doesn't contain a matching value for t+w1 (the left boundary of the window), prevailing window join will fill it with the last value before t+w1 (conditional on all other join columns are matched), and apply aggFunctions. To use prevailing window join, set prevailing = True.

>>> trades.merge_window(quotes, -5000000000, 0, aggFunctions=["avg(Bid_Price)","avg(Offer_Price)"], on=["Symbol","Time"]).where("Time>=07:59:59").top(10).toDF()
 Time                          Exchange Symbol  Trade_Volume  Trade_Price  avg_Bid_Price  avg_Offer_Price
0 1970-01-01 08:00:00.022239        75   AAPL           300        27.00          26.90            27.49
1 1970-01-01 08:00:00.022287        75   AAPL           500        27.25          26.90            27.49
2 1970-01-01 08:00:00.022317        75   AAPL           335        27.26          26.90            27.49
3 1970-01-01 08:00:00.022341        75   AAPL           100        27.27          26.90            27.49
4 1970-01-01 08:00:00.022368        75   AAPL            31        27.40          26.90            27.49
5 1970-01-01 08:00:02.668076        68   AAPL          2434        27.42          26.75            27.36
6 1970-01-01 08:02:20.116025        68   AAPL            66        27.00            NaN              NaN
7 1970-01-01 08:06:31.149930        75   AAPL           100        27.25            NaN              NaN
8 1970-01-01 08:06:32.826399        75   AAPL           100        27.25            NaN              NaN
9 1970-01-01 08:06:33.168833        75   AAPL            74        27.25            NaN              NaN

[10 rows x 6 columns]

merge_cross

merge_cross is equivalent to the keyword cross join (cj) in DolphinDB (see the DolphinDB User Manual). It returns the Cartesian product of two tables. If the left table has n rows and the right table has m rows, then cross join returns n*m rows.

>>> s.run("""
...     t1 = table(2010 2011 2012 as year);
...     t2 = table(`IBM`C`AAPL as Ticker);
... """)
...
>>> t1 = s.table(data="t1")
>>> t2 = s.table(data="t2")
>>> t1.merge_cross(t2).toDF()
   year Ticker
0  2010    IBM
1  2010      C
2  2010   AAPL
3  2011    IBM
4  2011      C
5  2011   AAPL
6  2012    IBM
7  2012      C
8  2012   AAPL

rename

Use rename to set a new name for a table.

>>> t.tableName()
t
>>> t.rename("xx")
>>> t.tableName()
xx

Note: Renaming a temporary table can potentially cause memory leaks. This happens because renaming a table prevents it from being automatically destroyed, leading to memory occupied by the table not being freed in a timely manner.

drop

Use drop to delete elements from a table.

>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.colNames
['TICKER', 'date', 'VOL', 'PRC', 'BID', 'ASK']
>>> t1 = trade.drop(['ask', 'bid'])
>>> t1.colNames
['TICKER', 'date', 'VOL', 'PRC']

exec

The select clause always generates a table, even when only one column is selected. To generate a scalar or vector, you can use exec clause.

If only one column is selected, exec generates a DolphinDB vector. Download the object with toDF() in Python and you can obtain an np.ndarray object:

>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.exec("ticker").toDF()
['AMZN' 'AMZN' 'AMZN' ... 'NFLX' 'NFLX' 'NFLX']

If multiple columns are selected, same as select, the exec clause generates a DolphinDB table. Download the object with toDF() in Python and you can obtain a pd.DataFrame object.

>>> trade.exec(["ticker", "date", "bid"]).toDF()
      ticker       date        bid
0       AMZN 1997-05-15   23.50000
1       AMZN 1997-05-16   20.50000
2       AMZN 1997-05-19   20.50000
3       AMZN 1997-05-20   19.62500
4       AMZN 1997-05-21   17.12500
...

where

The where clause is used to extract only the records that satisfy the specified condition or conditions.

(1) Multiple where Conditions

When the where clause is used, if multiple filter conditions are specified, each condition is connected by "and" by default.

Note: To ensure the correct interpretation of the priority of operations when adding multiple "where" conditions before Python API 1.30.21.2, an additional pair of parentheses should be included for each condition. For instance, the SQL query select * from table where (a = false) and (b = true or c = true) should be represented as t.where("(a = false)").where("(b = true or c = true)").

>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> t1 = trade.select(['date','bid','ask','prc','vol']).where('TICKER=`AMZN').where('bid!=NULL').where('ask!=NULL').where('vol>10000000')
>>> t1.toDF()
          date        bid        ask        prc       vol
0   1998-09-01   79.93750   80.25000   79.95313  11321844
1   1998-11-17  148.68750  149.00000  148.50000  10279448
2   1998-11-20  179.62500  179.75000  180.62500  11314228
3   1998-11-23  217.75000  217.81250  218.00000  11559042
4   1998-11-24  214.25000  214.62500  214.50000  13820992
...
>>> t1.rows
765
>>> t1.showSQL()
select date,bid,ask,prc,vol from TMP_TBL_2744917d where (TICKER=`AMZN) and (bid!=NULL) and (ask!=NULL) and (vol>10000000)

(2) String as Input

We can pass a list of conditions as a string to where method.

>>> t1 = trade.select("ticker, date, vol").where("bid!=NULL, ask!=NULL, vol>50000000")
>>> t1.toDF()
  ticker       date        vol
0    AMZN 1999-09-29   80380734
1    AMZN 2000-06-23   52221978
2    AMZN 2001-11-26   51543686
3    AMZN 2002-01-22   57235489
4    AMZN 2005-02-03   60580703
...
>>> t1.rows
41

execute

(1) For the Table class, use execute to select column(s) as a scalar or vector, which is equivalent to using exec. Pass a string or a list of strings to execute to indicate the column(s) to select.

>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.execute("TICKER")
['AMZN' 'AMZN' 'AMZN' ... 'NFLX' 'NFLX' 'NFLX']
>>> trade.execute(["VOL", "PRC"])
           VOL        PRC
0      6029815   23.50000
1      1232226   20.75000
2       512070   20.50000
3       456357   19.62500
4      1577414   17.12500
...        ...        ...
13131  2010120  125.59000
13132  5287520  128.35001
13133  4388956  125.89000
13134  3444729  125.33000
13135  4455012  123.80000

[13136 rows x 2 columns]

(2) After applying update or delete on a table object, call execute to make sure the operations are synchronized to the server. In this case, submit the update/delete statement to the server. A table object is returned.

executeAs

executeAs saves query result as a DolphinDB table. The table name is specified by parameter newTableName.

Note: Tables created by this method is independent of the Python script. Instead, their lifespan is tied to the lifespan of the session object.

>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> t1 = trade.select(['date','bid','ask','prc','vol']).where('TICKER=`AMZN').where('bid!=NULL').where('ask!=NULL').where('vol>10000000').executeAs("AMZN")
>>> t1.tableName()
AMZN

showSQL

Use showSQL to convert the SQL call to a standard DolphinDB SQL statement.

>>> trade=s.loadText(WORK_DIR+"/example.csv")
>>> trade.select(["ticker", "date"]).showSQL()
select ticker,date from TMP_TBL_fb11c541

toDF, toList

toDF and toList both execute the SQL statement (can be obtained with showSQL) cached in Python and return the execution result. The differences are:

  • toDF has the same behavior as session.run(sql) whereas toList has the same behavior as session.run(sql,pickleTableToList).
  • When a session is constructed, if protocol is set to PROTOCOL_PICKLE or PROTOCOL_DDB, toDF returns a pd.DataFrame, and toList returns a list of np.ndarrays where each np.ndarray represents a table column.

For details, see PROTOCOL_DDB and PROTOCOL_PICKLE.