Common Methods of Session
This section introduces the common methods of session.
Executing Scripts
To execute a script in a session, use Session.run
:
run(script, *args,
clearMemory=None, pickleTableToList=None,
priority=None, parallelism=None,
fetchSize=None, disableDecimal=None)
Basic Parameters
- script: the script or function for execution
- args: arguments to the function for execution
Executing a Script Without Arguments
To execute a simple script without any arguments, pass the script as a string:
>>> s.run("x = 1;")
Executing Function with Arguments
You can use run
to execute built-in DolphinDB functions or user-defined functions on the DolphinDB server. In this case, pass the function name as the first argument, and the function parameters as the following arguments.
The following examples use run
to execute the DolphinDB built-in function add
which has two parameters, x and y. There are a few ways to execute a function and provide arguments, depending on where the argument values are defined:
(1) All argument values defined on the server
If x and y have already been assigned a value on the server through run
:
>>> s.run("x = [1,3,5];y = [2,4,6]")
Then you can directly pass the function script to run
:
>>> s.run("add(x,y)")
array([3, 7, 11], dtype=int32)
(2) One argument value defined on the server
If only x has been assigned a value on the server through run
, and y needs to be defined when executing add
:
>>> s.run("x = [1,3,5];")
Then you need to use partial application to fix x when calling run
. For more on partial application, see DolphinDB User Manual.
>>> import numpy as np
>>> y = np.array([1,2,3])
>>> result = s.run("add{x,}", y)
>>> result
array([2,5,8])
>>> result.dtype
dtype('int64')
(3) All argument values defined in Python
>>> import numpy as np
>>> x = np.array([1.5,2.5,7])
>>> y = np.array([8.5,7.5,3])
>>> result = s.run("add", x, y)
>>> result
array([10., 10., 10.])
>>> result.dtype
dtype('float64')
When using run
to execute built-in DolphinDB functions, the arguments can be scalars, lists, dictionaries, NumPy objects, Pandas DataFrames and Series, etc.
Note
- The dimensions of NumPy arrays passed as arguments must not exceed 2.
- The indices of Pandas DataFrames and Series will not be preserved when uploaded to DolphinDB. To retain the index, use the Pandas
reset_index
method. - If you need to pass date or time values as arguments, convert them to NumPy.datetime64 in Python before uploading to DolphinDB.
Parameters for Advanced Features
clearMemory
When executing code on a DolphinDB server using the run
method, variables are created in the server's memory. Both the session and DBConnectionPool .run
methods provide the clearMemory parameter. Specifying clearMemory = True removes all variables created during that run
call immediately after it completes. This helps prevent unnecessary variable buildup and reduces memory usage.
>>> s.run("t=1", clearMemory = True)
>>> s.run("t")
Since the "t" variable was removed after the first run
call completed, the second call raises an exception:
RuntimeError: <Exception> in run: Server response: 'Syntax Error: [line #1] Cannot recognize the token t' script: 't'
pickleTableToList
This parameter is effective when the protocol parameter is set to PROTOCOL_DDB or PROTOCOL_PICKLE during session construction. When pickleTableToList = True, if the execution output is a DolphinDB table, it will be downloaded to Python as a list of NumPy.ndarrays where each element represents one column of the table. For more on data formats, see Data Type Conversion.
>>> import dolphindb.settings as keys
>>> s = ddb.Session(protocol=keys.PROTOCOL_DDB)
>>> s.connect("localhost", 8848)
True
>>> s.run("table(1..3 as a)")
a
0 1
1 2
2 3
>>> s.run("table(1..3 as a)", pickleTableToList=True)
[array([1, 2, 3], dtype=int32)]
fetchSize
When querying tables of large size, configure fetchSize to load the data in blocks. This parameter is only effective since DolphinDB 1.20.5 and Python API 1.30.0.6.
In this example, we first create a table of large amount of data by executing the following script in Python:
>>> s = ddb.Session()
>>> s.connect("localhost", 8848, "admin", "123456")
True
>>> script = """
... rows=100000;
... testblock=table(take(1,rows) as id,take(`A,rows) as symbol,take(2020.08.01..2020.10.01,rows) as date, rand(50,rows) as size,rand(50.5,rows) as price);
... """
>>> s.run(script)
Set fetchSize to specify the size of a block in run
, which returns a BlockReader object. Use its read
method to read data in blocks. Note that fetchSize cannot be smaller than 8192 (records).
>>> script1 = "select * from testblock"
>>> block= s.run(script1, fetchSize = 8192)
>>> total = 0
>>> while block.hasNext():
... tem = block.read()
... total+=len(tem)
...
>>> total
100000
When using the above method to read data in blocks, if not all blocks are read, call the skipAll
method to abort the reading before executing the subsequent code. Otherwise, data will be stuck in the socket buffer and the deserialization of the subsequent data will fail.
>>> block= s.run(script1, fetchSize = 8192)
>>> re = block.read()
>>> block.skipAll()
>>> s.run("1+1;")
2
priority
In DolphinDB, jobs are executed based on priority - the bigger the value, the higher the priority. A multi-level queue mechanism is provided to implement the job priority system. Specifically, the system maintains 10 queues, which correspond to 10 priority levels. The system always allocates thread resources to the high-priority jobs, and for jobs of the same priority, threads are allocated in a round-robin manner; only when a priority queue is empty, the jobs in the next lower priority queue are processed. For details, see DolphinDB tutorial - Job Management.
Starting from DolphinDB Python API 1.30.22.2, the session
method for session and DBConnectionPool methods provides the priority parameter for specifying job priority. The default value is 4. For example:
>>> s.run("1+1", priority=7)
parallelism
Parallelism indicates the maximum number of threads to execute the tasks of a job on a data node at the same time. For details, see DolphinDB tutorial - Job Management.
Starting from DolphinDB Python API 1.30.22.2, the run
method for Session and DBConnectionPool provides the parallelism parameter with default value of 2.
Starting from 3.0.1.1, the default value has been increased to 64. If the maximum value per user is also set with setMaxJobParallelism(userId, maxParallelism) on the server side, the parallelism of an API job will be the minimum of parallelism and maxParallelism.
>>> s.run("1+1", parallelism=16)
disableDecimal
Starting from DolphinDB Python API 3.0.0.2, the run
method in the session
/Session
and DBConnectionPool
classes provides the disableDecimal parameter, which determines whether to convert the DECIMAL column to DOUBLE type. If True, the DOUBLE column will be returned. Using this parameter requires DolphinDB server 2.00.12.3/3.00.1 or higher.
For example:
>>> df = s.run("table(decimal32([1.23, 2.3456], 3) as a)", disableDecimal=True)
>>> df.dtypes
a float64
dtype: object
Related Methods
runFile(filePath)
Runs a local DolphinDB script file on the server. Like run
, you can pass in indefinite position parameters and indefinite keyword parameters to runFile
.
Note: filePath must be a local path on the client.
Example
>>> with open("./test.dos", "w+") as f:
... f.write("""
... t = table(1..3 as a);
... t;
... """)
...
47
>>> s.runFile("./test.dos")
a
0 1
1 2
2 3
Uploading Variables
upload(nameObjectDict)
upload
uploads Python objects to DolphinDB server. It accepts a dictionary object where the keys specify the variables names and the values are the objects to be uploaded as variables. When the objects are successfully uploaded to server, upload
returns the addresses of the objects in memory. For more on upload Python objects, see Data Type Conversion.
>>> s.upload({'a': 8, 'b': "abc", 'c': {'a':1, 'b':2}})
[59763200, 60161968, 54696752]
>>> s.run("a")
8
>>> s.run("b")
abc
>>> s.run("c")
{'a': 1, 'b': 2}
Loading Data
This section uses the example.csv file in examples.
table
table(dbPath=None, data=None, tableAliasName=None, inMem=False, partitions=None)
- data: str, dict or DataFrame. If data is a string, it indicates the name of a table on the server; if data is a dictionary or a DataFrame, it means to upload local data to the server as a temporary table.
- dbPath: str. The path to the database where the table to be loaded is located.
- tableAliasName: the alias of the table to be loaded.
- inMem: whether to load data from server disk to memory.
- partitions: the partitions to load.
Note: When data is a string, table
in fact encapsulates the DolphinDB built-in function loadTable
. It loads the table from the specified database and obtains its handle. For details on inMem and partitions, see DolphinDB User Manual - loadTable.
For more information on table handle and object-oriented SQL queries, see section Object Oriented Operations on DolphinDB OBjects.
loadTable
loadTable(tableName, dbPath=None, partitions=None, memoryMode=False)
loadTable
is similar to table
. The difference is that loadTable
can only be used to load tables on the server. It returns the table handle.
loadTableBySQL
loadTableBySQL(tableName, dbPath, sql)
loadTableBySQL
encapsulates the DolphinDB’s built-in function loadTableBySQL
to load the records that satisfy the filtering conditions in a SQL query into a partitioned in-memory table. It returns a handle to the in-memory table in Python. For more information, see DolphinDB User Manual - loadTableBySQL.
- tableName/dbPath: the partitioned table specified in sql is loaded based on tableName and dbPath.
- sql: a metacode object representing a SQL query. It can use
where
clause to filter partitions or rows and useselect
statement to select columns including calculated columns. However, it cannot usetop
,group by
,order by
,context by
andlimit
clauses.
loadText
loadText(remoteFilePath, delimiter=",")
Use loadText
to import text files into DolphinDB as an in-memory table. The text files and the DolphinDB server must be located on the same machine. It returns a DolphinDB table object in Python that corresponds to a DolphinDB in-memory table. You can convert the table object in Python to a pandas.DataFrame with toDF
.
Note: When loading a text file as an in-memory table with loadText
, the table size must be smaller than the available memory.
>>> WORK_DIR = "C:/DolphinDB/Data"
>>> trade = s.loadText(WORK_DIR+"/example.csv")
Convert the returned DolphinDB table into a pandas.DataFrame. This is when the data transferring takes place.
>>> trade.toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997.05.16 6029815 23.50000 23.50000 23.6250
1 AMZN 1997.05.17 1232226 20.75000 20.50000 21.0000
2 AMZN 1997.05.20 512070 20.50000 20.50000 20.6250
3 AMZN 1997.05.21 456357 19.62500 19.62500 19.7500
4 AMZN 1997.05.22 1577414 17.12500 17.12500 17.2500
5 AMZN 1997.05.23 983855 16.75000 16.62500 16.7500
...
13134 NFLX 2016.12.29 3444729 125.33000 125.31000 125.3300
13135 NFLX 2016.12.30 4455012 123.80000 123.80000 123.8300
The default delimiter of loadText
is comma ",". You can also specify other delimiters. For example, to import a tabular text file with "\t" as delimiter:
>>> t1 = s.loadText(WORK_DIR+"/t1.tsv", '\t')
Note: loadText
/ ploadText
/ loadTextEx
are all used to load existing files on server, rather than loading local files.
ploadText
ploadText
loads a text file in parallel to generate a partitioned in-memory table. It runs much faster than loadText
.
>>> trade = s.ploadText(WORK_DIR+"/example.csv")
>>> trade.rows
13136
loadTextEx
loadTextEx(dbPath, tableName, partitionColumns=None, remoteFilePath=None, delimiter=",", sortColumns=None)
loadTextEx
can be used to import text files into a partitioned table in a DFS database. If the partitioned table does not exist, the method automatically creates the partitioned table and append the data to the table. If the partitioned table already exists, the data will be directly appended to the partitioned table.
- dbPath: database path
- tableName: partitioned table name
- partitionColumns: partitioning columns
- remoteFilePath: the absolute path to the text file on the DolphinDB server
- delimiter: the table column separator. It is a comma (",") by default.
- sortColumns: a STRING scalar/vector specifying the columns based on which the written data is sorted. Note that sortColumns only applies to the TSDB engine.
This example creates the "trade" partitioned table with loadTextEx
and loads the data in example.csv into "trade".
>>> import dolphindb.settings as keys
>>> if s.existsDatabase("dfs://valuedb"):
... s.dropDatabase("dfs://valuedb")
...
>>> s.database(dbName='mydb', partitionType=keys.VALUE, partitions=["AMZN","NFLX", "NVDA"], dbPath="dfs://valuedb")
>>> trade = s.loadTextEx(dbPath="mydb", tableName='trade',partitionColumns=["TICKER"], remoteFilePath=WORK_DIR + "/example.csv")
>>> trade.toDF()
TICKER date VOL PRC BID ASK
0 AMZN 1997-05-15 6029815 23.500 23.500 23.625
1 AMZN 1997-05-16 1232226 20.750 20.500 21.000
2 AMZN 1997-05-19 512070 20.500 20.500 20.625
3 AMZN 1997-05-20 456357 19.625 19.625 19.750
4 AMZN 1997-05-21 1577414 17.125 17.125 17.250
... ... ... ... ... ... ...
13131 NVDA 2016-12-23 16193331 109.780 109.770 109.790
13132 NVDA 2016-12-27 29857132 117.320 117.310 117.320
13133 NVDA 2016-12-28 57384116 109.250 109.250 109.290
13134 NVDA 2016-12-29 54384676 111.430 111.260 111.420
13135 NVDA 2016-12-30 30323259 106.740 106.730 106.750
[13136 rows x 6 columns]
The number of records returned:
>>> trade.rows
13136
The number of columns returned:
>>> trade.cols
6
The schema of the table returned:
>>> trade.schema
name typeString typeInt comment
0 TICKER SYMBOL 17
1 date DATE 6
2 VOL INT 4
3 PRC DOUBLE 16
4 BID DOUBLE 16
5 ASK DOUBLE 16
Managing Databases and Tables
The DolphinDB Python API encapsulates some commonly used database/table management functions on the server side as methods of the session class.
database
database(dbName=None, partitionType=None, partitions=None, dbPath=None, engine=None, atomic=None, chunkGranularity=None)
If the imported data needs to be persistently save, or the imported file exceeds the available memory, you can import the data into a DFS database for storage. The following example introduces how to create a DFS database. In this example, the database "valuedb" is used. First check if the database exists, and if so, delete it.
>>> if s.existsDatabase("dfs://valuedb"):
... s.dropDatabase("dfs://valuedb")
...
Use database
to create a VALUE-partitioned database with stock symbols as the partitioning column. The partitions parameter indicates the partition scheme.
First, import dolphindb.settings. Then call database
:
>>> import dolphindb.settings as keys
>>> s.database(dbName='mydb', partitionType=keys.VALUE, partitions=['AMZN','NFLX', 'NVDA'], dbPath='dfs://valuedb')
The script above is equivalent to db=database('dfs://valuedb', VALUE, ['AMZN','NFLX','NVDA'])
in DolphinDB.
Besides VALUE, DolphinDB also supports partition types including HASH, RANGE, LIST and COMPO. For more information, see DolphinDB User Manual - database.
Once a partitioned database has been created, the partition type cannot be changed. Generally, the partitioning scheme cannot be changed either. However, after creating VALUE partitions or RANGE partitions (or VALUE partitions or RANGE partitions in COMPO partitions), you can use the addValuePartitions
and addRangePartitions
functions to add partitions respectively in a DolphinDB script.
For details on the database
method, see Object Oriented Operations on DolphinDB OBjects.
existsDatabase
>>> s.existsDatabase(dbUrl="dfs://testDB")
False
existsDatabase
checks if the DolphinDB server already has an exsiting database at dburl. For more information, see DolphinDB User Manual - existsDatabase.
existsTable
>>> s.existsTable(dbUrl="dfs://valuedb", tableName="trade")
True
existsTable
checks if the specified table already exists in the specified database. For more information, see DolphinDB User Manual - existsTable.
dropDatabase
>>> s.dropDatabase(dbPath="dfs://valuedb")
dropDatabase
deletes all physical files from the specified database. . For more information, see DolphinDB User Manual - dropDatabase.
dropPartition
>>> s.dropPartition(dbPath="dfs://valuedb", partitionPaths="AMZN", tableName="trade")
dropPartition
deletes data from one or multiple partitions from a table or a database.
If tableName is specified, delete one or multiple partitions of the given table; if tableName is not specified, delete one or multiple partitions of all tables with this partition. For more information, see DolphinDB User Manual - dropPartition.
Note: If the chunkGranularity parameter is specified as "DATABASE" when the database is created, tableName can be left empty. Otherwise, tableName must be specified.
dropTable
>>> s.dropTable(dbPath="dfs://valuedb", tableName="trade")
dropTable
deletes the table from the specified database. For more information, see DolphinDB User Manual - dropTable.
Other Methods
undef
/undefAll
>>> s.undef("t1", "VAR")
>>> s.undefAll()
The method undef
releases specified objects in a session and the method undefAll
releases all objects in a Session. undef
can be used on the following objects: "VAR"(variable), "SHARED"(shared variable) and "DEF"(function definition). The default type is "VAR". "SHARED" refers to shared variables across sessions, such as a shared stream table.
Assume there is a DolphinDB table object t1 in the session. You can release it with undef("t1","VAR")
. However, releasing an object does not necessarily mean its memory is freed immediately on the server. This is due to DolphinDB's memory management. The memory DolphinDB allocates from the OS is not returned immediately after release, since DolphinDB can reuse it. DolphinDB first requests memory from its internal memory pool; only when the memory is insufficient, does it request more from the OS. The maxMemSize parameter in the dolphindb.cfg configuration file sets a soft memory limit for DolphinDB to utilize. For example, *maxMemSize=*8GB means DolphinDB will try to use up to 8GB. So, to free memory by repeatedly undefing a variable, you may need to lower maxMemSize. Otherwise, the current memory may not actually be released, while later operations require more than the system's maximum and cause DolphinDB to be killed or raise an out of memory error.
clearAllCache
>>> s.clearAllCache()
>>> s.clearAllCache(dfs=True)
Under the hood, clearAllCache
calls the server built-in function clearAllCache
to clear cache on the server. If dfs is set to True, it clears the cache on all nodes; otherwise, it only clears the node that the session is currently connected to.
setTimeout
Unlike the keepAliveTime parameter in session construction, setTimeout
is a method of the session class.
setTimeout
specifies the TCP connection timeout period (in seconds) before the peer acknowledges receiving the data packets. After the timeout period expires, the TCP connection is closed. It is equivalent to the TCP_USER_TIMEOUT option. For more information, see Linux Socket options.
>>> ddb.Session.setTimeout(3600)
Note: This method is only available on Linux. The default time is 30 seconds.
Stream Subscription Methods
The DolphinDB Python API provides a stream data subscription interface that allows you to subscribe to stream tables on the server and retrieve their data. For details, see sections Subscription (Basic) and Subscription (Advanced).
Related methods: enableStreaming
/ subscribe
/ unsubscribe
/ getSubscriptionTopics