Table
The DolphinDB Python API provides native methods to create and use DolphinDB databases and tables. This section describes how to create and query tables.
Constructing Table Object with Session.table
The Table class wraps the DolphinDB table object handle, providing an interface to interact with the tables in Python. Table objects are usually constructed using the Session.table or Session.loadTable method. It can also be obtained through Session.loadText, Session.loadTextEx, etc.
Syntax of Session.table:
Session.table(dbPath=None, data=None, tableAliasName=None, inMem=False, partitions=None)- dbPath: database path. Leave this parameter unspecified when creating an in-memory table or a stream table.
- data: table data. It can be a dict, pandas.DataFrame, or a DolphinDB table name.
- tableAliasName: table alias
- inMem: whether to load table data into the DolphinDB server memory.
- partitions: partitions to be loaded into the server memory.
data: Uploading Data as a Temporary Table
When the data parameter is a dictionary or a Pandas DataFrame, the data is uploaded to DolphinDB server as a temporary table. In this case, dbPath, inMem, and partitions do not need to be specified.
Example:
data1 = pd.DataFrame({
'a': [1, 2, 3],
'b': [4, 5, 6],
})
t1 = s.table(data=data1)
print(t1, t1.tableName())
data2 = {
'a': ['a', 'b', 'c'],
'b': [1, 2, 3],
}
t2 = s.table(data=data2)
print(t2, t2.tableName())Output:
<dolphindb.table.Table object at 0x7fbd5f02bd60> TMP_TBL_3cc57246
<dolphindb.table.Table object at 0x7fbd3205fc70> TMP_TBL_dbae4978"data1" and "data2" are uploaded to the server as temporary tables "TMP_TBL_3cc57246" and "TMP_TBL_dbae4978", respectively.
data: Obtaining Handle to a DolphinDB Table
When the data parameter is a string indicating a DolphinDB table name, Session.table obtains a handle to the table.
(1) When dbPath and data are both specified, calling Session.table means to load the table with the name specified by data from the database specified by dbPAth.
dbPath = "dfs://testTable"
if s.existsDatabase(dbPath):
s.dropDatabase(dbPath)
db = s.database(partitionType=keys.VALUE, partitions=[1, 2, 3], dbPath=dbPath, engine="TSDB")
s.run("schema_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
schema_t = s.table(data="schema_t")
db.createDimensionTable(schema_t, "pt", ["csymbol"])
pt = s.table(dbPath=dbPath, data="pt")
print(pt, pt.tableName())
print(pt.toDF())Output:
<dolphindb.table.Table object at 0x7f5036bcd040> pt_TMP_TBL_5229a3cc
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: [](2) When only data is specified, Session.table obtains a handle to the in-memory table with the name specified by data.
s.run("test_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
t = s.table(data="test_t")
print(t, t.tableName())
print(t.toDF())Output:
<dolphindb.table.Table object at 0x7f11ffb3c070> test_t
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []tableAliasName
If tableAliasName is specified, Session.table will assign it as the table handle instead of assigning a random string.
(1) Specify tableAliasName when uploading local data to server
data1 = pd.DataFrame({
'a': [1, 2, 3],
'b': [4, 5, 6],
})
t1 = s.table(data=data1, tableAliasName="data1")
print(t1, t1.tableName())
data2 = {
'a': ['a', 'b', 'c'],
'b': [1, 2, 3],
}
t2 = s.table(data=data2, tableAliasName="data2")
print(t2, t2.tableName())Output:
<dolphindb.table.Table object at 0x7f167ecb69d0> data1
<dolphindb.table.Table object at 0x7f1651d0bc40> data2(2) Specify tableAliasName when loading table from database to obtain its handle
dbPath = "dfs://testTable"
if s.existsDatabase(dbPath):
s.dropDatabase(dbPath)
db = s.database(partitionType=keys.VALUE, partitions=[1, 2, 3], dbPath=dbPath, engine="TSDB")
s.run("schema_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
schema_t = s.table(data="schema_t")
db.createDimensionTable(schema_t, "pt", ["csymbol"])
pt = s.table(dbPath=dbPath, data="pt", tableAliasName="tmp_pt")
print(pt, pt.tableName())
print(pt.toDF())Output:
<dolphindb.table.Table object at 0x7f3350edc040> tmp_pt
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: [](3) tableAliasName does not take effect when obtaining the handle to a DolphinDB in-memory table:
s.run("test_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
t = s.table(data="test_t", tableAliasName="test_t2")
print(t, t.tableName())
print(t.toDF())Output:
<dolphindb.table.Table object at 0x7f9fb55b4070> test_t
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []tableAliasName can be used to specify an alias when loading a partitioned table or uploading local data as a DolphinDB table, but it does not work when loading a table with its name specified. This helps to avoid using a temporary table name.
inMem, partitions
These two parameters only take effect for local databases on disk. For details, see the DolphinDB User Manual - loadTable.
Session.loadTable
Session.loadTable returns a Table object. It is similar to Session.table. The difference is that Session.loadTable does not upload local data and can only be used to obtain handles to tables on the DolphinDB server.
Session.loadTable(tableName, dbPath=None, partitions=None, memoryMode=False)- tableName: the name of an in-memory table or a database table
- dbPath: database path
- partitions: partitions to be loaded into the server memory
- memoryMode: whether to load table data into the server memory
Loading an In-Memory Table
Example:
s.run("test_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
t = s.loadTable("test_t")
print(t, t.tableName())
print(t.toDF())Output:
<dolphindb.table.Table object at 0x7fd1c90a4c10> test_t
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []Loading a Database Table
Example:
dbPath = "dfs://testTable"
if s.existsDatabase(dbPath):
s.dropDatabase(dbPath)
db = s.database(partitionType=keys.VALUE, partitions=[1, 2, 3], dbPath=dbPath, engine="TSDB")
s.run("schema_t = table(100:0, `ctime`csymbol`price`qty, [TIMESTAMP, SYMBOL, DOUBLE, INT])")
schema_t = s.table(data="schema_t")
db.createDimensionTable(schema_t, "pt", ["csymbol"])
pt = s.loadTable("pt", dbPath=dbPath)
print(pt, pt.tableName())
print(pt.toDF())Output:
<dolphindb.table.Table object at 0x7fdaf7885eb0> pt_TMP_TBL_0dfdc80a
Empty DataFrame
Columns: [ctime, csymbol, price, qty]
Index: []Life Cycle of Uploaded Tables
The table and loadTable methods both return a Python local variable. When local data is uploaded to the DolphinDB server without specifying an alias, a random name will be assigned as the handle to the uploaded variable.
The following example uploads local object "data" to the DolphinDB server as a table. "t" represents the local variable of the uploaded table. For the name of the uploaded table on the server, call Table.tableName.
data = pd.DataFrame({
'a': [1, 2, 3],
'b': [4, 5, 6],
})
t = s.table(data=data)
print(t.tableName())Output:
TMP_TBL_e03723c9The prefix "TMP_TBL_" indicates that it is a handle to a temporary table. Once the Python Table object is destructed, the corresponding temporary table on the server will be destructed as well.
There are 3 options to release the table variable on DolphinDB server:
undef
s.undef(t.tableName(), "VAR")- assign NULL value to the variable on DolphinDB server
s.run(f"{t.tableName()}=NULL")- Destruct the local variable in Python to remove its reference to the server variable
del tNote:
If an alias is specified when obtaining a handle, or the obtained handle already exists before the call, the destruction of the Python object will not affect the corresponding data on server.
If the data is used only once on DolphinDB server, it is recommended to include it as a parameter in a function call instead of uploading it as a table variable. A function call does not cache data. After the function call is executed, all variables are released. Additionally, a function call is faster to execute as the network transmission only takes place once.
Operating on Tables
The DolphinDB Python API provide a variety of methods to operate DolphinDB tables in Python, including querying data with conditions, and update and delete data.
Note: The objects returned by the update and delete methods are not Table objects, but TableUpdate and TableDelete objects. Although this step is transparent to users, after calling update and delete, you must call the execute method to synchronize the table updates to the server. For details, see the description of these methods in the following documentation.
This section uses the example.csv file in examples.
Obtaining Table Properties
Use rows and cols to get the number of table rows and columns.
>>> s.run("t = table(1..5 as a, 2..6 as b)")
>>> t = s.table(data="t")
>>> t.rows
5
>>> t.cols
2Use colNames to get the column names of table table.
>>> t.colNames
['a', 'b']schema returns a Pandas DataFrame indicating the schema of the table. It has the same columns as the output of the server function schema().colDefs.
>>> t.schema
name typeString typeInt extra comment
0 a INT 4 NaN
1 b INT 4 NaN select
Similar to the SQL "select" keyword, the select method selects columns from a table. The following examples use the toDF method to convert the table to a Pandas DataFrame object.
(1) List of column names as input
>>> trade=s.loadText(WORK_DIR+"/example.csv")
>>> trade.select(["ticker", "date"]).toDF()
ticker date
0 AMZN 1997-05-15
1 AMZN 1997-05-16
2 AMZN 1997-05-19
3 AMZN 1997-05-20
4 AMZN 1997-05-21
...(2) String as input
>>> trade.select("ticker, date, bid").toDF()
ticker date bid
0 AMZN 1997-05-15 23.50000
1 AMZN 1997-05-16 20.50000
2 AMZN 1997-05-19 20.50000
3 AMZN 1997-05-20 19.62500
4 AMZN 1997-05-21 17.12500
...update
Use update to update a table. The cols parameter indicates the columns to update and the vals parameter indicate the values for update.
As shown in the following examples, to synchronize the updates to server, execute() must be executed at the same time.
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> t1 = trade.update(["VOL"],["999999"]).where("TICKER=`AMZN").where(["date=2015.12.16"])
>>> t2 = trade.where("ticker=`AMZN").where("date=2015.12.16")
>>> t2.toDF()
TICKER date VOL PRC BID ASK
0 AMZN 2015-12-16 3964470 675.77002 675.76001 675.83002
>>> t1 = trade.update(["VOL"],["999999"]).where("TICKER=`AMZN").where(["date=2015.12.16"]).execute()
>>> t2.toDF()
TICKER date VOL PRC BID ASK
0 AMZN 2015-12-16 999999 675.77002 675.76001 675.83002delete
Use delete to delete records from table. To synchronize the changes to server, execute must be executed at the same time.
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.rows
13136
>>> t = trade.delete().where('date<2013.01.01')
>>> trade.rows
13136
>>> t = trade.delete().where('date<2013.01.01').execute()
>>> trade.rows
3024groupby
groupby groups records based on the specified rules. It is equivalent to the DolphinDB server function groupby.
groupby accepts string list or a single string as argument, which indicates the column(s) to group by. If the SQL statement also includes a having condition, it must contain an aggregate function, meaning to generate a record for each group satisfying the having condition.
Note: groupby must be followed by an aggregate function such as count, sum, avg, std, etc.
Prepare the data:
>>> dbPath = "dfs://valuedb"
>>> if s.existsDatabase(dbPath):
... s.dropDatabase(dbPath)
>>> s.database(dbName='mydb', partitionType=keys.VALUE, partitions=["AMZN","NFLX","NVDA"], dbPath=dbPath)
>>> trade = s.loadTextEx(dbPath=dbPath, partitionColumns=["TICKER"], tableName='trade', remoteFilePath=WORK_DIR+"/example.csv")Calculate the sum of column "vol" and the sum of column "prc" in each "ticker" group:
>>> trade.select(['sum(vol)','sum(prc)']).groupby(['ticker']).toDF()
ticker sum_vol sum_prc
0 AMZN 33706396492 772503.81377
1 NFLX 14928048887 421568.81674
2 NVDA 46879603806 127139.51092Use groupby and having in the same query:
>>> trade.select('count(ask)').groupby(['vol']).having('count(ask)>1').toDF()
vol count_ask
0 579392 2
1 3683504 2
2 5732076 2
3 6299736 2
4 6438038 2
5 6946976 2
6 8160197 2
7 8924303 2contextby
contextby is a unique feature in DolphinDB. It is an extension to standard SQL for convenient time-series data manipulation.
contextby is similar to groupby except that for each group, groupby returns a scalar whereas contextby returns a vector of the same size as the number of rows in the group. For details, see the DolphinDB User Manual - context by.
If having is used after contextby and is used with only aggregate functions, the result is the rows of the members of the groups whose aggregate function values satisfy the specified conditions. If having is after contextby and is used with a function that is not aggregate function, the result is the rows of the records that satisfy the specified conditions.
>>> trade.contextby('ticker').top(3).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997-05-15 6029815 23.5000 23.5000 23.6250
1 AMZN 1997-05-16 1232226 20.7500 20.5000 21.0000
2 AMZN 1997-05-19 512070 20.5000 20.5000 20.6250
3 NFLX 2002-05-23 7507079 16.7500 16.7500 16.8500
4 NFLX 2002-05-24 797783 16.9400 16.9400 16.9500
5 NFLX 2002-05-28 474866 16.2000 16.2000 16.3700
6 NVDA 1999-01-22 5702636 19.6875 19.6250 19.6875
7 NVDA 1999-01-25 1074571 21.7500 21.7500 21.8750
8 NVDA 1999-01-26 719199 20.0625 20.0625 20.1250
>>> trade.select("TICKER, month(date) as month, cumsum(VOL)").contextby("TICKER,month(date)").toDF()
TICKER month cumsum_VOL
0 AMZN 1997-05-01 6029815
1 AMZN 1997-05-01 7262041
2 AMZN 1997-05-01 7774111
3 AMZN 1997-05-01 8230468
4 AMZN 1997-05-01 9807882
...Use contextby and having in the same query:
>>> trade.contextby('ticker').having("sum(VOL)>40000000000").toDF()
TICKER date VOL PRC BID ASK
0 NVDA 1999-01-22 5702636 19.6875 19.6250 19.6875
1 NVDA 1999-01-25 1074571 21.7500 21.7500 21.8750
2 NVDA 1999-01-26 719199 20.0625 20.0625 20.1250
3 NVDA 1999-01-27 510637 20.0000 19.8750 20.0000
4 NVDA 1999-01-28 476094 19.9375 19.8750 20.0000
...pivotby
pivotby is a unique feature in DolphinDB. It is an extension to standard SQL for convenient time-series data manipulation. It rearranges a column (or multiple columns) of a table (with or without a data transformation function) on two dimensions. For details, see the DolphinDB User Manual - pivot by.
When used with select, pivotby returns a table.
>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = trade.select("VOL").pivotby("TICKER", "date")
>>> t1.toDF()
TICKER 1997.05.15 1997.05.16 ... 2016.12.28 2016.12.29 2016.12.30
0 AMZN 6029815.0 1232226.0 ... 3301025 3158299 4139451
1 NFLX NaN NaN ... 4388956 3444729 4455012
2 NVDA NaN NaN ... 57384116 54384676 30323259When used with select, pivotby returns a DolphinDB matrix.
>>> trade.exec("VOL").pivotby("TICKER", "date").toDF()
[array([[ 6029815., 1232226., 512070., ..., 3301025., 3158299.,
4139451.],
[ nan, nan, nan, ..., 4388956., 3444729.,
4455012.],
[ nan, nan, nan, ..., 57384116., 54384676.,
30323259.]]), array(['AMZN', 'NFLX', 'NVDA'], dtype=object), array(['1997-05-15T00:00:00.000000000', '1997-05-16T00:00:00.000000000',
'1997-05-19T00:00:00.000000000', ...,
'2016-12-28T00:00:00.000000000', '2016-12-29T00:00:00.000000000',
'2016-12-30T00:00:00.000000000'], dtype='datetime64[ns]')]sort, csort
sort and csort can be used to sort the data in ascending ("asc") or descending ("desc") order. sort is equivalent to the "order by" SQL clause whereas csort is only used to sort the data in each group after contextby.
Use sort to sort query result:
>>> trade = s.loadTable("trade", "dfs://valuedb")
>>> trade.sort('date').toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997-05-15 6029815 23.500 23.50000 23.62500
1 AMZN 1997-05-16 1232226 20.750 20.50000 21.00000
2 AMZN 1997-05-19 512070 20.500 20.50000 20.62500
3 AMZN 1997-05-20 456357 19.625 19.62500 19.75000
4 AMZN 1997-05-21 1577414 17.125 17.12500 17.25000
...Use csort with a contextby clause:
>>> trade = s.loadTable("trade", "dfs://valuedb")
>>> trade.contextby('ticker').csort('date desc').toDF()
TICKER date VOL PRC BID ASK
0 AMZN 2016-12-30 4139451 749.87000 750.02002 750.40002
1 AMZN 2016-12-29 3158299 765.15002 764.66998 765.15997
2 AMZN 2016-12-28 3301025 772.13000 771.92999 772.15997
3 AMZN 2016-12-27 2638725 771.40002 771.40002 771.76001
4 AMZN 2016-12-23 1981616 760.59003 760.33002 760.59003
...You can also specify the sorting order by passing in arguments:
sort(by, ascending=True)
csort(by, ascending=True)The ascending parameter indicates whether to sort data in ascending order. The default value is True. You can specify different sorting methods for multiple columns by passing in a list.
>>> trade.select("*").contextby('ticker').csort(["TICKER", "VOL"], True).limit(5).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997-12-26 40721 54.2500 53.8750 54.625
1 AMZN 1997-08-12 47939 26.3750 26.3750 26.750
2 AMZN 1997-07-21 48325 26.1875 26.1250 26.250
3 AMZN 1997-08-13 49690 26.3750 26.0000 26.625
4 AMZN 1997-06-02 49764 18.1250 18.1250 18.375
5 NFLX 2002-09-05 20725 12.8500 12.8500 12.950
6 NFLX 2002-11-11 26824 8.4100 8.3000 8.400
7 NFLX 2002-09-04 27319 13.0000 12.8200 13.000
8 NFLX 2002-06-10 35421 16.1910 16.1900 16.300
9 NFLX 2002-09-06 54951 12.8000 12.7900 12.800
10 NVDA 1999-05-10 41250 17.5000 17.5000 17.750
11 NVDA 1999-05-07 52310 17.5000 17.3750 17.625
12 NVDA 1999-05-14 59807 18.0000 17.7500 18.000
13 NVDA 1999-04-01 63997 20.5000 20.1875 20.500
14 NVDA 1999-04-19 65940 19.0000 19.0000 19.125
>>> trade.select("*").contextby('ticker').csort(["TICKER", "VOL"], [True, False]).limit(5).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 2007-04-25 104463043 56.8100 56.80 56.8100
1 AMZN 1999-09-29 80380734 80.7500 80.75 80.8125
2 AMZN 2006-07-26 76996899 26.2600 26.17 26.1800
3 AMZN 2007-04-26 62451660 62.7810 62.77 62.8300
4 AMZN 2005-02-03 60580703 35.7500 35.74 35.7300
5 NFLX 2015-07-16 63461015 115.8100 115.85 115.8600
6 NFLX 2015-08-24 59952448 96.8800 96.85 96.8800
7 NFLX 2016-04-19 55728765 94.3400 94.30 94.3100
8 NFLX 2016-07-19 55685209 85.8400 85.81 85.8300
9 NFLX 2016-01-20 53009419 107.7400 107.73 107.7800
10 NVDA 2011-01-06 87693472 19.3300 19.33 19.3400
11 NVDA 2011-02-17 87117555 25.6800 25.68 25.7000
12 NVDA 2011-01-12 86197484 23.3525 23.34 23.3600
13 NVDA 2011-08-12 80488616 12.8800 12.86 12.8700
14 NVDA 2003-05-09 77604776 21.3700 21.39 21.3700top, limit
top is used to get the top N number of records in a table. For details, see the DolphinDB User Manual - top.
>>> trade = s.table("dfs://valuedb", "trade")
>>> trade.top(5).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997-05-15 6029815 23.500 23.500 23.625
1 AMZN 1997-05-16 1232226 20.750 20.500 21.000
2 AMZN 1997-05-19 512070 20.500 20.500 20.625
3 AMZN 1997-05-20 456357 19.625 19.625 19.750
4 AMZN 1997-05-21 1577414 17.125 17.125 17.250The limit clause is similar to the top clause with the following differences:
- The
topclause cannot use negative integers. When used with thecontext byclause, thelimitclause can use a negative integer to select a limited number of records from the end of each group. In all other cases, thelimitclause can only use non-negative integers. - The
limitclause can select a limited number of rows starting from a specified row.
For details, see the DolphinDB User Manual - limit.
>>> trade.select("*").contextby('ticker').limit(-2).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 2016-12-29 3158299 765.15002 764.66998 765.15997
1 AMZN 2016-12-30 4139451 749.87000 750.02002 750.40002
2 NFLX 2016-12-29 3444729 125.33000 125.31000 125.33000
3 NFLX 2016-12-30 4455012 123.80000 123.80000 123.83000
4 NVDA 2016-12-29 54384676 111.43000 111.26000 111.42000
5 NVDA 2016-12-30 30323259 106.74000 106.73000 106.75000>>> trade.select("*").limit([2, 5]).toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997-05-19 512070 20.500 20.500 20.625
1 AMZN 1997-05-20 456357 19.625 19.625 19.750
2 AMZN 1997-05-21 1577414 17.125 17.125 17.250
3 AMZN 1997-05-22 983855 16.750 16.625 16.750
4 AMZN 1997-05-23 1330026 18.000 18.000 18.125merge, merge_asof, merge_window, and merge_cross
Use merge for inner joins, left joins, left-semi joins, and outer joins. Use merge_asof for asof joins, and merge_window for window joins.
merge
Specify join columns with parameter on if join column names are identical in both tables; use parameters left_on and right_on when join column names are different. The optional parameter how indicates the table join type. The default table join mode is inner join.
- When join column names are identical in both tables:
>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = s.table(data={
... 'TICKER': ['AMZN', 'AMZN', 'AMZN'],
... 'date': np.array(['2015-12-31', '2015-12-30', '2015-12-29'], dtype='datetime64[D]'),
... 'open': [695, 685, 674],
... })
...
>>> t1 = t1.select("TICKER, date(date) as date, open")
>>> trade.merge(t1,on=["TICKER","date"]).toDF()
TICKER date VOL PRC BID ASK open
0 AMZN 2015-12-29 5734996 693.96997 693.96997 694.20001 674
1 AMZN 2015-12-30 3519303 689.07001 689.07001 689.09998 685
2 AMZN 2015-12-31 3749860 675.89001 675.85999 675.94000 695- Specify parameters left_on and right_on when the join column names are different.
>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = s.table(data={
... 'TICKER': ['AMZN', 'AMZN', 'AMZN'],
... 'date': np.array(['2015-12-31', '2015-12-30', '2015-12-29'], dtype='datetime64[D]'),
... 'open': [695, 685, 674],
... })
...
>>> t1 = t1.select("TICKER as TICKER1, date(date) as date1, open")
>>> trade.merge(t1, left_on=["TICKER","date"], right_on=["TICKER1", "date1"]).toDF()
TICKER date VOL PRC BID ASK open
0 AMZN 2015-12-29 5734996 693.96997 693.96997 694.20001 674
1 AMZN 2015-12-30 3519303 689.07001 689.07001 689.09998 685
2 AMZN 2015-12-31 3749860 675.89001 675.85999 675.94000 695- To conduct left join, set how = "left".
>>> trade = s.table("dfs://valuedb", "trade")
>>> t1 = s.table(data={
... 'TICKER': ['AMZN', 'AMZN', 'AMZN'],
... 'date': np.array(['2015-12-31', '2015-12-30', '2015-12-29'], dtype='datetime64[D]'),
... 'open': [695, 685, 674],
... })
...
>>> t1 = t1.select("TICKER, date(date) as date, open")
>>> trade.merge(t1,how="left", on=["TICKER","date"]).where('TICKER=`AMZN').where('2015.12.23<=date<=2015.12.31').toDF()
TICKER date VOL PRC BID ASK open
0 AMZN 2015-12-23 2722922 663.70001 663.48999 663.71002 NaN
1 AMZN 2015-12-24 1092980 662.78998 662.56000 662.79999 NaN
2 AMZN 2015-12-28 3783555 675.20001 675.00000 675.21002 NaN
3 AMZN 2015-12-29 5734996 693.96997 693.96997 694.20001 674.0
4 AMZN 2015-12-30 3519303 689.07001 689.07001 689.09998 685.0
5 AMZN 2015-12-31 3749860 675.89001 675.85999 675.94000 695.0- To conduct outer join, set how = "outer".
>>> t1 = s.table(data={'TICKER': ['AMZN', 'AMZN', 'NFLX'], 'date': ['2015.12.29', '2015.12.30', '2015.12.31'], 'open': [674, 685, 942]})
>>> t2 = s.table(data={'TICKER': ['AMZN', 'NFLX', 'NFLX'], 'date': ['2015.12.29', '2015.12.30', '2015.12.31'], 'close': [690, 936, 951]})
>>> t1.merge(t2, how="outer", on=["TICKER","date"]).toDF()
TICKER date open tmp_TICKER tmp_date close
0 AMZN 2015.12.29 674.0 AMZN 2015.12.29 690.0
1 AMZN 2015.12.30 685.0 NaN
2 NFLX 2015.12.31 942.0 NFLX 2015.12.31 951.0
3 NaN NFLX 2015.12.30 936.0Note: A partitioned table can only be outer joined with a partitioned table, and an in-memory table can only be outer joined with an in-memory table.
merge_asof
The merge_asof method is equivalent to the keyword asof join (aj) in DolphinDB (see the DolphinDB User Manual). asof join is similar to left join with the following differences:
- For asof join, the data type of the last matching column is usually temporal. For a row in the left table with time t, if there is not a match of left join in the right table, the row in the right table that corresponds to the most recent time before time t is used, if all the other matching columns are matched; if there are more than one matching record in the right table, the last record is used.
- If there is only 1 join column, asof join assumes the right table is sorted on the join column. If there are multiple join columns, asof join assumes the right table is sorted on the last join column within each group defined by the other join columns. The right table does not need to be sorted by the other join columns. If these conditions are not met, we may see unexpected results. The left table does not need to be sorted.
For the examples in this and the next section, we use trades.csv and quotes.csv which have AAPL and FB trades and quotes data on 10/24/2016 taken from the NYSE website.
>>> dbPath = "dfs://tickDB"
>>> if s.existsDatabase(dbPath):
... s.dropDatabase(dbPath)
...
>>> s.database(partitionType=keys.VALUE, partitions=["AAPL","FB"], dbPath=dbPath)
>>> trades = s.loadTextEx(dbPath, tableName='trades', partitionColumns=["Symbol"], remoteFilePath=WORK_DIR+"/trades.csv")
>>> quotes = s.loadTextEx(dbPath, tableName='quotes', partitionColumns=["Symbol"], remoteFilePath=WORK_DIR+"/quotes.csv")
>>> trades.top(5).toDF()
Time Exchange Symbol Trade_Volume Trade_Price
0 1970-01-01 08:00:00.022239 75 AAPL 300 27.00
1 1970-01-01 08:00:00.022287 75 AAPL 500 27.25
2 1970-01-01 08:00:00.022317 75 AAPL 335 27.26
3 1970-01-01 08:00:00.022341 75 AAPL 100 27.27
4 1970-01-01 08:00:00.022368 75 AAPL 31 27.40
>>> quotes.where("second(Time)>=09:29:59").top(5).toDF()
Time Exchange Symbol Bid_Price Bid_Size Offer_Price Offer_Size
0 1970-01-01 09:30:00.005868 90 AAPL 26.89 1 27.10 6
1 1970-01-01 09:30:00.011058 90 AAPL 26.89 11 27.10 6
2 1970-01-01 09:30:00.031523 90 AAPL 26.89 13 27.10 6
3 1970-01-01 09:30:00.284623 80 AAPL 26.89 8 26.98 8
4 1970-01-01 09:30:00.454066 80 AAPL 26.89 8 26.98 1
>>> trades.merge_asof(quotes,on=["Symbol","Time"]).select(["Symbol","Time","Trade_Volume","Trade_Price","Bid_Price", "Bid_Size","Offer_Price", "Offer_Size"]).top(5).toDF()
Symbol Time Trade_Volume Trade_Price Bid_Price Bid_Size \
0 AAPL 1970-01-01 08:00:00.022239 300 27.00 26.9 1
1 AAPL 1970-01-01 08:00:00.022287 500 27.25 26.9 1
2 AAPL 1970-01-01 08:00:00.022317 335 27.26 26.9 1
3 AAPL 1970-01-01 08:00:00.022341 100 27.27 26.9 1
4 AAPL 1970-01-01 08:00:00.022368 31 27.40 26.9 1
Offer_Price Offer_Size
0 27.49 10
1 27.49 10
2 27.49 10
3 27.49 10
4 27.49 10merge_window
merge_window is equivalent to the keyword window join (wj) in DolphinDB (see the DolphinDB User Manual). It is a generalization of asof join. With a window defined by parameters leftBound (w1) and rightBound (w2), for each row in the left table with the value of the last join column equal to t, find the rows in the right table with the value of the last join column between (t+w1) and (t+w2) conditional on all other join columns are matched, then apply aggFunctions to the selected rows in the right table.
The only difference between window join and prevailing window join is that if the right table doesn't contain a matching value for t+w1 (the left boundary of the window), prevailing window join will fill it with the last value before t+w1 (conditional on all other join columns are matched), and apply aggFunctions. To use prevailing window join, set prevailing = True.
>>> trades.merge_window(quotes, -5000000000, 0, aggFunctions=["avg(Bid_Price)","avg(Offer_Price)"], on=["Symbol","Time"]).where("Time>=07:59:59").top(10).toDF()
Time Exchange Symbol Trade_Volume Trade_Price avg_Bid_Price avg_Offer_Price
0 1970-01-01 08:00:00.022239 75 AAPL 300 27.00 26.90 27.49
1 1970-01-01 08:00:00.022287 75 AAPL 500 27.25 26.90 27.49
2 1970-01-01 08:00:00.022317 75 AAPL 335 27.26 26.90 27.49
3 1970-01-01 08:00:00.022341 75 AAPL 100 27.27 26.90 27.49
4 1970-01-01 08:00:00.022368 75 AAPL 31 27.40 26.90 27.49
5 1970-01-01 08:00:02.668076 68 AAPL 2434 27.42 26.75 27.36
6 1970-01-01 08:02:20.116025 68 AAPL 66 27.00 NaN NaN
7 1970-01-01 08:06:31.149930 75 AAPL 100 27.25 NaN NaN
8 1970-01-01 08:06:32.826399 75 AAPL 100 27.25 NaN NaN
9 1970-01-01 08:06:33.168833 75 AAPL 74 27.25 NaN NaN
[10 rows x 6 columns]merge_cross
merge_cross is equivalent to the keyword cross join (cj) in DolphinDB (see the DolphinDB User Manual). It returns the Cartesian product of two tables. If the left table has n rows and the right table has m rows, then cross join returns n*m rows.
>>> s.run("""
... t1 = table(2010 2011 2012 as year);
... t2 = table(`IBM`C`AAPL as Ticker);
... """)
...
>>> t1 = s.table(data="t1")
>>> t2 = s.table(data="t2")
>>> t1.merge_cross(t2).toDF()
year Ticker
0 2010 IBM
1 2010 C
2 2010 AAPL
3 2011 IBM
4 2011 C
5 2011 AAPL
6 2012 IBM
7 2012 C
8 2012 AAPLrename
Use rename to set a new name for a table.
>>> t.tableName()
t
>>> t.rename("xx")
>>> t.tableName()
xxNote: Renaming a temporary table can potentially cause memory leaks. This happens because renaming a table prevents it from being automatically destroyed, leading to memory occupied by the table not being freed in a timely manner.
drop
Use drop to delete elements from a table.
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.colNames
['TICKER', 'date', 'VOL', 'PRC', 'BID', 'ASK']
>>> t1 = trade.drop(['ask', 'bid'])
>>> t1.colNames
['TICKER', 'date', 'VOL', 'PRC']exec
The select clause always generates a table, even when only one column is selected. To generate a scalar or vector, you can use exec clause.
If only one column is selected, exec generates a DolphinDB vector. Download the object with toDF() in Python and you can obtain an np.ndarray object:
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.exec("ticker").toDF()
['AMZN' 'AMZN' 'AMZN' ... 'NFLX' 'NFLX' 'NFLX']If multiple columns are selected, same as select, the exec clause generates a DolphinDB table. Download the object with toDF() in Python and you can obtain a pd.DataFrame object.
>>> trade.exec(["ticker", "date", "bid"]).toDF()
ticker date bid
0 AMZN 1997-05-15 23.50000
1 AMZN 1997-05-16 20.50000
2 AMZN 1997-05-19 20.50000
3 AMZN 1997-05-20 19.62500
4 AMZN 1997-05-21 17.12500
...where
The where clause is used to extract only the records that satisfy the specified condition or conditions.
(1) Multiple where Conditions
When the where clause is used, if multiple filter conditions are specified, each condition is connected by "and" by default.
Note: To ensure the correct interpretation of the priority of operations when adding multiple "where" conditions before Python API 1.30.21.2, an additional pair of parentheses should be included for each condition. For instance, the SQL query select * from table where (a = false) and (b = true or c = true) should be represented as t.where("(a = false)").where("(b = true or c = true)").
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> t1 = trade.select(['date','bid','ask','prc','vol']).where('TICKER=`AMZN').where('bid!=NULL').where('ask!=NULL').where('vol>10000000')
>>> t1.toDF()
date bid ask prc vol
0 1998-09-01 79.93750 80.25000 79.95313 11321844
1 1998-11-17 148.68750 149.00000 148.50000 10279448
2 1998-11-20 179.62500 179.75000 180.62500 11314228
3 1998-11-23 217.75000 217.81250 218.00000 11559042
4 1998-11-24 214.25000 214.62500 214.50000 13820992
...
>>> t1.rows
765
>>> t1.showSQL()
select date,bid,ask,prc,vol from TMP_TBL_2744917d where (TICKER=`AMZN) and (bid!=NULL) and (ask!=NULL) and (vol>10000000)(2) String as Input
We can pass a list of conditions as a string to where method.
>>> t1 = trade.select("ticker, date, vol").where("bid!=NULL, ask!=NULL, vol>50000000")
>>> t1.toDF()
ticker date vol
0 AMZN 1999-09-29 80380734
1 AMZN 2000-06-23 52221978
2 AMZN 2001-11-26 51543686
3 AMZN 2002-01-22 57235489
4 AMZN 2005-02-03 60580703
...
>>> t1.rows
41execute
(1) For the Table class, use execute to select column(s) as a scalar or vector, which is equivalent to using exec. Pass a string or a list of strings to execute to indicate the column(s) to select.
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> trade.execute("TICKER")
['AMZN' 'AMZN' 'AMZN' ... 'NFLX' 'NFLX' 'NFLX']
>>> trade.execute(["VOL", "PRC"])
VOL PRC
0 6029815 23.50000
1 1232226 20.75000
2 512070 20.50000
3 456357 19.62500
4 1577414 17.12500
... ... ...
13131 2010120 125.59000
13132 5287520 128.35001
13133 4388956 125.89000
13134 3444729 125.33000
13135 4455012 123.80000
[13136 rows x 2 columns](2) After applying update or delete on a table object, call execute to make sure the operations are synchronized to the server. In this case, submit the update/delete statement to the server. A table object is returned.
executeAs
executeAs saves query result as a DolphinDB table. The table name is specified by parameter newTableName.
Note: Tables created by this method is independent of the Python script. Instead, their lifespan is tied to the lifespan of the session object.
>>> trade = s.loadText(WORK_DIR+"/example.csv")
>>> t1 = trade.select(['date','bid','ask','prc','vol']).where('TICKER=`AMZN').where('bid!=NULL').where('ask!=NULL').where('vol>10000000').executeAs("AMZN")
>>> t1.tableName()
AMZNshowSQL
Use showSQL to convert the SQL call to a standard DolphinDB SQL statement.
>>> trade=s.loadText(WORK_DIR+"/example.csv")
>>> trade.select(["ticker", "date"]).showSQL()
select ticker,date from TMP_TBL_fb11c541toDF, toList
toDF and toList both execute the SQL statement (can be obtained with showSQL) cached in Python and return the execution result. The differences are:
toDFhas the same behavior assession.run(sql)whereastoListhas the same behavior assession.run(sql,pickleTableToList).- When a session is constructed, if protocol is set to PROTOCOL_PICKLE or PROTOCOL_DDB,
toDFreturns a pd.DataFrame, andtoListreturns a list of np.ndarrays where each np.ndarray represents a table column.
For details, see PROTOCOL_DDB and PROTOCOL_PICKLE.
