Common Methods of Session

This section introduces the common methods of session.

Executing Scripts

To execute a script in a session, use Session.run:

run(script, *args,
    clearMemory=None, pickleTableToList=None,
    priority=None, parallelism=None,
    fetchSize=None, disableDecimal=None)

Basic Parameters

  • script: the script or function for execution
  • args: arguments to the function for execution

Executing a Script Without Arguments

To execute a simple script without any arguments, pass the script as a string:

>>> s.run("x = 1;")

Executing Function with Arguments

You can use run to execute built-in DolphinDB functions or user-defined functions on the DolphinDB server. In this case, pass the function name as the first argument, and the function parameters as the following arguments.

The following examples use run to execute the DolphinDB built-in function add which has two parameters, x and y. There are a few ways to execute a function and provide arguments, depending on where the argument values are defined:

(1) All argument values defined on the server

If x and y have already been assigned a value on the server through run:

>>> s.run("x = [1,3,5];y = [2,4,6]")

Then you can directly pass the function script to run:

>>> s.run("add(x,y)")
array([3, 7, 11], dtype=int32)

(2) One argument value defined on the server

If only x has been assigned a value on the server through run, and y needs to be defined when executing add:

>>> s.run("x = [1,3,5];")

Then you need to use partial application to fix x when calling run. For more on partial application, see DolphinDB User Manual.

>>> import numpy as np
>>> y = np.array([1,2,3])
>>> result = s.run("add{x,}", y)
>>> result
array([2,5,8])
>>> result.dtype
dtype('int64')

(3) All argument values defined in Python

>>> import numpy as np
>>> x = np.array([1.5,2.5,7])
>>> y = np.array([8.5,7.5,3])
>>> result = s.run("add", x, y)
>>> result
array([10., 10., 10.])
>>> result.dtype
dtype('float64')

When using run to execute built-in DolphinDB functions, the arguments can be scalars, lists, dictionaries, NumPy objects, Pandas DataFrames and Series, etc.

Note

  • The dimensions of NumPy arrays passed as arguments must not exceed 2.
  • The indices of Pandas DataFrames and Series will not be preserved when uploaded to DolphinDB. To retain the index, use the Pandas reset_index method.
  • If you need to pass date or time values as arguments, convert them to NumPy.datetime64 in Python before uploading to DolphinDB.

Parameters for Advanced Features

clearMemory

When executing code on a DolphinDB server using the run method, variables are created in the server's memory. Both the session and DBConnectionPool .run methods provide the clearMemory parameter. Specifying clearMemory = True removes all variables created in the session immediately after the run call completes. This helps prevent unnecessary variable buildup and reduces memory usage.

>>> s.run("t=1", clearMemory = True)
>>> s.run("t")

Since the "t" variable was removed after the first run call completed, the second call raises an exception:

RuntimeError: <Exception> in run: Server response: 'Syntax Error: [line #1] Cannot recognize the token t' script: 't'

pickleTableToList

This parameter is effective when the protocol parameter is set to PROTOCOL_DDB or PROTOCOL_PICKLE during session construction. When pickleTableToList = True, if the execution output is a DolphinDB table, it will be downloaded to Python as a list of NumPy.ndarrays where each element represents one column of the table. For more on data formats, see Data Type Conversion.

>>> import dolphindb.settings as keys
>>> s = ddb.Session(protocol=keys.PROTOCOL_DDB)
>>> s.connect("localhost", 8848)
True
>>> s.run("table(1..3 as a)")
   a
0  1
1  2
2  3
>>> s.run("table(1..3 as a)", pickleTableToList=True)
[array([1, 2, 3], dtype=int32)]

fetchSize

When querying tables of large size, configure fetchSize to load the data in blocks. This parameter is only effective since DolphinDB 1.20.5 and Python API 1.30.0.6.

In this example, we first create a table of large amount of data by executing the following script in Python:

>>> s = ddb.Session()
>>> s.connect("localhost", 8848, "admin", "123456")
True
>>> script = """
... rows=100000;
... testblock=table(take(1,rows) as id,take(`A,rows) as symbol,take(2020.08.01..2020.10.01,rows) as date, rand(50,rows) as size,rand(50.5,rows) as price);
... """
>>> s.run(script)

Set fetchSize to specify the size of a block in run, which returns a BlockReader object. Use its read method to read data in blocks. Note that fetchSize cannot be smaller than 8192 (records).

>>> script1 = "select * from testblock"
>>> block= s.run(script1, fetchSize = 8192)
>>> total = 0
>>> while block.hasNext():
...     tem = block.read()
...     total+=len(tem)
... 
>>> total
100000

When using the above method to read data in blocks, if not all blocks are read, call the skipAll method to abort the reading before executing the subsequent code. Otherwise, data will be stuck in the socket buffer and the deserialization of the subsequent data will fail.

>>> block= s.run(script1, fetchSize = 8192)
>>> re = block.read()
>>> block.skipAll()
>>> s.run("1+1;") 
2

priority

In DolphinDB, jobs are executed based on priority - the bigger the value, the higher the priority. A multi-level queue mechanism is provided to implement the job priority system. Specifically, the system maintains 10 queues, which correspond to 10 priority levels. The system always allocates thread resources to the high-priority jobs, and for jobs of the same priority, threads are allocated in a round-robin manner; only when a priority queue is empty, the jobs in the next lower priority queue are processed. For details, see DolphinDB tutorial - Job Management.

Starting from DolphinDB Python API 1.30.22.2, the session method for session and DBConnectionPool methods provides the priority parameter for specifying job priority. The default value is 4. For example:

>>> s.run("1+1", priority=7)

parallelism

Parallelism indicates the maximum number of threads to execute the tasks of a job on a data node at the same time. For details, see DolphinDB tutorial - Job Management.

Starting from DolphinDB Python API 1.30.22.2, the run method for Session and DBConnectionPool provides the parallelism parameter with default value of 2.

Starting from 3.0.1.1, the default value has been increased to 64. If the maximum value per user is also set with setMaxJobParallelism(userId, maxParallelism) on the server side, the parallelism of an API job will be the minimum of parallelism and maxParallelism.

>>> s.run("1+1", parallelism=16)

disableDecimal

Starting from DolphinDB Python API 3.0.0.2, the run method in the session/Session and DBConnectionPool classes provides the disableDecimal parameter, which determines whether to convert the DECIMAL column to DOUBLE type. If True, the DOUBLE column will be returned. Using this parameter requires DolphinDB server 2.00.12.3/3.00.1 or higher.

For example:

>>> df = s.run("table(decimal32([1.23, 2.3456], 3) as a)", disableDecimal=True)
>>> df.dtypes
a    float64
dtype: object

Uploading Variables

upload(nameObjectDict)

upload uploads Python objects to DolphinDB server. It accepts a dictionary object where the keys specify the variables names and the values are the objects to be uploaded as variables. When the objects are successfully uploaded to server, upload returns the addresses of the objects in memory. For more on upload Python objects, see Data Type Conversion.

>>> s.upload({'a': 8, 'b': "abc", 'c': {'a':1, 'b':2}})
[59763200, 60161968, 54696752]
>>> s.run("a")
8
>>> s.run("b")
abc
>>> s.run("c")
{'a': 1, 'b': 2}

Loading Data

This section uses the example.csv file in examples.

table

table(dbPath=None, data=None, tableAliasName=None, inMem=False, partitions=None)
  • data: str, dict or DataFrame. If data is a string, it indicates the name of a table on the server; if data is a dictionary or a DataFrame, it means to upload local data to the server as a temporary table.
  • dbPath: str. The path to the database where the table to be loaded is located.
  • tableAliasName: the alias of the table to be loaded.
  • inMem: whether to load data from server disk to memory.
  • partitions: the partitions to load.

Note: When data is a string, table in fact encapsulates the DolphinDB built-in function loadTable. It loads the table from the specified database and obtains its handle. For details on inMem and partitions, see DolphinDB User Manual - loadTable.

For more information on table handle and object-oriented SQL queries, see section Object Oriented Operations on DolphinDB OBjects.

loadTable

loadTable(tableName, dbPath=None, partitions=None, memoryMode=False)

loadTable is similar to table. The difference is that loadTable can only be used to load tables on the server. It returns the table handle.

loadTableBySQL

loadTableBySQL(tableName, dbPath, sql)

loadTableBySQL encapsulates the DolphinDB’s built-in function loadTableBySQL to load the records that satisfy the filtering conditions in a SQL query into a partitioned in-memory table. It returns a handle to the in-memory table in Python. For more information, see DolphinDB User Manual - loadTableBySQL.

  • tableName/dbPath: the partitioned table specified in sql is loaded based on tableName and dbPath.
  • sql: a metacode object representing a SQL query. It can use where clause to filter partitions or rows and use select statement to select columns including calculated columns. However, it cannot use top, group by, order by, context by and limit clauses.

Note: In scenarios involving large-scale data queries, this function may result in high memory usage. In such cases, it is recommended to use SQL statements directly for querying instead.

loadText

loadText(remoteFilePath, delimiter=",")

Use loadText to import text files into DolphinDB as an in-memory table. The text files and the DolphinDB server must be located on the same machine. It returns a DolphinDB table object in Python that corresponds to a DolphinDB in-memory table. You can convert the table object in Python to a pandas.DataFrame with toDF.

Note: When loading a text file as an in-memory table with loadText, the table size must be smaller than the available memory.

>>> WORK_DIR = "C:/DolphinDB/Data"
>>> trade = s.loadText(WORK_DIR+"/example.csv")

Convert the returned DolphinDB table into a pandas.DataFrame. This is when the data transferring takes place.

>>> trade.toDF()
      TICKER        date       VOL        PRC        BID       ASK
0       AMZN  1997.05.16   6029815   23.50000   23.50000   23.6250
1       AMZN  1997.05.17   1232226   20.75000   20.50000   21.0000
2       AMZN  1997.05.20    512070   20.50000   20.50000   20.6250
3       AMZN  1997.05.21    456357   19.62500   19.62500   19.7500
4       AMZN  1997.05.22   1577414   17.12500   17.12500   17.2500
5       AMZN  1997.05.23    983855   16.75000   16.62500   16.7500
...
13134   NFLX  2016.12.29   3444729  125.33000  125.31000  125.3300
13135   NFLX  2016.12.30   4455012  123.80000  123.80000  123.8300

The default delimiter of loadText is comma ",". You can also specify other delimiters. For example, to import a tabular text file with "\t" as delimiter:

>>> t1 = s.loadText(WORK_DIR+"/t1.tsv", '\t')

Note: loadText / ploadText / loadTextEx are all used to load existing files on server, rather than loading local files.

ploadText

ploadText loads a text file in parallel to generate a partitioned in-memory table. It runs much faster than loadText.

>>> trade = s.ploadText(WORK_DIR+"/example.csv")
>>> trade.rows
13136

loadTextEx

loadTextEx(dbPath, tableName,  partitionColumns=None, remoteFilePath=None, delimiter=",", sortColumns=None)

loadTextEx can be used to import text files into a partitioned table in a DFS database. If the partitioned table does not exist, the method automatically creates the partitioned table and append the data to the table. If the partitioned table already exists, the data will be directly appended to the partitioned table.

  • dbPath: database path
  • tableName: partitioned table name
  • partitionColumns: partitioning columns
  • remoteFilePath: the absolute path to the text file on the DolphinDB server
  • delimiter: the table column separator. It is a comma (",") by default.
  • sortColumns: a STRING scalar/vector specifying the columns based on which the written data is sorted. Note that sortColumns only applies to the TSDB engine.

This example creates the "trade" partitioned table with loadTextEx and loads the data in example.csv into "trade".

>>> import dolphindb.settings as keys
>>> if s.existsDatabase("dfs://valuedb"):
...     s.dropDatabase("dfs://valuedb")
...
>>> s.database(dbName='mydb', partitionType=keys.VALUE, partitions=["AMZN","NFLX", "NVDA"], dbPath="dfs://valuedb")
>>> trade = s.loadTextEx(dbPath="mydb", tableName='trade',partitionColumns=["TICKER"], remoteFilePath=WORK_DIR + "/example.csv")
>>> trade.toDF()
      TICKER       date       VOL      PRC      BID      ASK
0       AMZN 1997-05-15   6029815   23.500   23.500   23.625
1       AMZN 1997-05-16   1232226   20.750   20.500   21.000
2       AMZN 1997-05-19    512070   20.500   20.500   20.625
3       AMZN 1997-05-20    456357   19.625   19.625   19.750
4       AMZN 1997-05-21   1577414   17.125   17.125   17.250
...      ...        ...       ...      ...      ...      ...
13131   NVDA 2016-12-23  16193331  109.780  109.770  109.790
13132   NVDA 2016-12-27  29857132  117.320  117.310  117.320
13133   NVDA 2016-12-28  57384116  109.250  109.250  109.290
13134   NVDA 2016-12-29  54384676  111.430  111.260  111.420
13135   NVDA 2016-12-30  30323259  106.740  106.730  106.750

[13136 rows x 6 columns]

The number of records returned:

>>> trade.rows
13136

The number of columns returned:

>>> trade.cols
6

The schema of the table returned:

>>> trade.schema
     name typeString  typeInt comment
0  TICKER     SYMBOL       17        
1    date       DATE        6        
2     VOL        INT        4        
3     PRC     DOUBLE       16        
4     BID     DOUBLE       16        
5     ASK     DOUBLE       16

Managing Databases and Tables

The DolphinDB Python API encapsulates some commonly used database/table management functions on the server side as methods of the session class.

database

database(dbName=None, partitionType=None, partitions=None, dbPath=None, engine=None, atomic=None, chunkGranularity=None)

If the imported data needs to be persistently save, or the imported file exceeds the available memory, you can import the data into a DFS database for storage. The following example introduces how to create a DFS database. In this example, the database "valuedb" is used. First check if the database exists, and if so, delete it.

>>> if s.existsDatabase("dfs://valuedb"):
...     s.dropDatabase("dfs://valuedb")
...

Use database to create a VALUE-partitioned database with stock symbols as the partitioning column. The partitions parameter indicates the partition scheme.

First, import dolphindb.settings. Then call database:

>>> import dolphindb.settings as keys
>>> s.database(dbName='mydb', partitionType=keys.VALUE, partitions=['AMZN','NFLX', 'NVDA'], dbPath='dfs://valuedb')

The script above is equivalent to db=database('dfs://valuedb', VALUE, ['AMZN','NFLX','NVDA']) in DolphinDB.

Besides VALUE, DolphinDB also supports partition types including HASH, RANGE, LIST and COMPO. For more information, see DolphinDB User Manual - database.

Once a partitioned database has been created, the partition type cannot be changed. Generally, the partitioning scheme cannot be changed either. However, after creating VALUE partitions or RANGE partitions (or VALUE partitions or RANGE partitions in COMPO partitions), you can use the addValuePartitions and addRangePartitions functions to add partitions respectively in a DolphinDB script.

For details on the database method, see Object Oriented Operations on DolphinDB OBjects.

existsDatabase

>>> s.existsDatabase(dbUrl="dfs://testDB")
False

existsDatabase checks if the DolphinDB server already has an exsiting database at dburl. For more information, see DolphinDB User Manual - existsDatabase.

existsTable

>>> s.existsTable(dbUrl="dfs://valuedb", tableName="trade")
True

existsTable checks if the specified table already exists in the specified database. For more information, see DolphinDB User Manual - existsTable.

dropDatabase

 >>> s.dropDatabase(dbPath="dfs://valuedb")

dropDatabase deletes all physical files from the specified database. . For more information, see DolphinDB User Manual - dropDatabase.

dropPartition

>>> s.dropPartition(dbPath="dfs://valuedb", partitionPaths="AMZN", tableName="trade")

dropPartition deletes data from one or multiple partitions from a table or a database.

If tableName is specified, delete one or multiple partitions of the given table; if tableName is not specified, delete one or multiple partitions of all tables with this partition. For more information, see DolphinDB User Manual - dropPartition.

Note: If the chunkGranularity parameter is specified as "DATABASE" when the database is created, tableName can be left empty. Otherwise, tableName must be specified.

dropTable

>>> s.dropTable(dbPath="dfs://valuedb", tableName="trade")

dropTable deletes the table from the specified database. For more information, see DolphinDB User Manual - dropTable.

Other Methods

undef/undefAll

>>> s.undef("t1", "VAR")
>>> s.undefAll()

The method undef releases specified objects in a session and the method undefAll releases all objects in a Session. undef can be used on the following objects: "VAR"(variable), "SHARED"(shared variable) and "DEF"(function definition). The default type is "VAR". "SHARED" refers to shared variables across sessions, such as a shared stream table.

Assume there is a DolphinDB table object t1 in the session. You can release it with undef("t1","VAR"). However, releasing an object does not necessarily mean its memory is freed immediately on the server. This is due to DolphinDB's memory management. The memory DolphinDB allocates from the OS is not returned immediately after release, since DolphinDB can reuse it. DolphinDB first requests memory from its internal memory pool; only when the memory is insufficient, does it request more from the OS. The maxMemSize parameter in the dolphindb.cfg configuration file sets a soft memory limit for DolphinDB to utilize. For example, *maxMemSize=*8GB means DolphinDB will try to use up to 8GB. So, to free memory by repeatedly undefing a variable, you may need to lower maxMemSize. Otherwise, the current memory may not actually be released, while later operations require more than the system's maximum and cause DolphinDB to be killed or raise an out of memory error.

clearAllCache

>>> s.clearAllCache()
>>> s.clearAllCache(dfs=True)

Under the hood, clearAllCache calls the server built-in function clearAllCache to clear cache on the server. If dfs is set to True, it clears the cache on all nodes; otherwise, it only clears the node that the session is currently connected to.

setTimeout

Unlike the keepAliveTime parameter in session construction, setTimeout is a method of the session class.

setTimeout specifies the TCP connection timeout period (in seconds) before the peer acknowledges receiving the data packets. After the timeout period expires, the TCP connection is closed. It is equivalent to the TCP_USER_TIMEOUT option. For more information, see Linux Socket options.

>>> ddb.Session.setTimeout(3600)

Note: This method is only available on Linux. The default time is 30 seconds.

Stream Subscription Methods

The DolphinDB Python API provides a stream data subscription interface that allows you to subscribe to stream tables on the server and retrieve their data. For details, see sections Subscription (Basic) and Subscription (Advanced).

Related methods: enableStreaming / subscribe / unsubscribe / getSubscriptionTopics