Usage Notes on DolphinDB

1. Introduction

The DolphinDB scripting language (DLang) supports commonly used SQL-92 syntax and keywords and provides a large number of built-in functions. However, it is fundamentally different in design from other scripting languages such as SQL and Python. Developers accustomed to SQL, Python, or other languages may have some misconceptions when first learning DolphinDB. This article summarizes the fundamentals of scripting and O&M in DolphinDB for quick reference.

This chapter compiles frequently asked questions about using DolphinDB, covering database and table management, SQL usage, DLang syntax, cluster O&M, and other practical features. You can locate questions by category and consult the corresponding answers to improve development and O&M efficiency.

This article serves as a quick reference guide for DolphinDB beginners, covering database and table management, SQL usage, DLang syntax, and O&M tasks.

2. FAQ

This chapter compiles frequently asked questions about using DolphinDB, covering database and table management, SQL usage, DLang syntax, cluster O&M, and other practical features. You can locate questions by category and consult the corresponding answers to improve development and O&M efficiency.

2.1 Database and Table Operations

This section covers questions about database and table operations, including renaming databases, considerations for creating databases and tables, retrieving the SQL statements used to create databases and tables, viewing or modifying table schemas, table design strategies (involving small-sized tables, primary keys, auto-increment columns, and unique indexes), and usage notes on the TSDB storage engine.

2.1.1 How Do I Rename a Database?

Once a database has been created successfully, its name cannot be changed . Therefore, when creating a database, choose an appropriate name to avoid the need for renaming.

2.1.2 What Should I Keep in Mind About Database Partitioning?

When creating a DFS database, you must choose an appropriate partitioning scheme. Once the partitioning scheme is determined, it cannot be changed, and all tables in the database will use the same partitioning scheme. Also note the following details:

  1. Partition levels: The minimum is 1, and the maximum is 3.

  2. Range partitioning: Range partitions can only be added forward and cannot expand automatically. You must call addRangePartitions to extend the partition range. As data volume grows, if allowMissingPartitions = true (default setting), data outside the partition range is discarded without any error. If you want an error to be thrown, set this parameter to false.

  3. Value partitioning: Value partitions expand automatically, but you should not initialize too many empty partitions.

2.1.3 How Do I Retrieve the SQL Statements for Creating Databases and Tables?

You can use the ops functions (getDatabaseDDL and getDBTableDDL) to retrieve the SQL statements for creating databases and tables, respectively. For more information, see ops.

use ops
getDatabaseDDL("dfs://stock_lv2_snapshot")
getDBTableDDL("dfs://stock_lv2_snapshot", "snapshot")

2.1.4 How Do I Obtain a Database Handle?

You can use the database function to obtain a handle for the specified database.

db = database("dfs://stock_lv2_snapshot")

2.1.5 How Do I Check Which Databases or Tables Exist in the Cluster?

You can use the following functions to retrieve database and table information:

  • getClusterDFSDatabases: Retrieves all databases in the cluster.

  • getClusterDFSTables: Retrieves all tables in the cluster.

  • getTables: Retrieves all tables in the specified database.

getClusterDFSDatabases()
getClusterDFSTables()
getTables(database("dfs://stock_lv2_snapshot"))

2.1.6 How Do I View a Database or Table Schema?

You can use the schema function to view the schema information of a specified database or table.

db = database("dfs://stock_lv2_snapshot")
db.schema()
tb = loadTable("dfs://stock_lv2_snapshot", "snapshot")
tb.schema()
tb.schema().colDefs

2.1.7 How Do I Modify a Table Schema?

  • Rename a Table

    You can use the renameTable function to rename a table in the specified database.

    db = database("dfs://stock_lv2_snapshot")
    tb = loadTable("dfs://stock_lv2_snapshot", "snapshot")
    select count(*) from tb
    renameTable(db, "snapshot", "newSnapshot")
    tbn = loadTable("dfs://stock_lv2_snapshot", "newSnapshot")
    select count(*) from tbn
    renameTable(db, "newSnapshot", "snapshot")
  • Modify a Table Comment

    You can use the setTableComment function to set a table comment.

    tb = loadTable("dfs://stock_lv2_snapshot", "snapshot")
    setTableComment(tb, "testsnapshot")
    tb.schema().tableComment
  • Modify a Column Comment

    You can use the setColumnComment function to set a column comment.

    tb = loadTable("dfs://stock_lv2_snapshot", "snapshot")
    setColumnComment(tb,{SecurityID:"stock symbol",OpenPrice:"open price",ClosePrice:"closing price"})
    schema(tb).colDefs;
  • Add Columns

    You can use the addColumn function or alter..add to add columns. New columns can only be appended after the existing columns; adding a column after a specified column is not supported.

    tb = loadTable("dfs://stock_lv2_snapshot", "snapshot")
    addColumn(tb, ["insertTime"], [TIMESTAMP])
    alter table tb add updateTime TIMESTAMP
    tb = loadTable("dfs://stock_lv2_snapshot", "snapshot")
    tb.schema().colDefs
  • Drop Columns

    You can use the dropColumns! function or alter..drop to drop columns. Supported only by the OLAP storage engine, and partition columns cannot be dropped.

    tb = loadTable("dfs://k_minute_level", "k_minute")
    tb.schema().engineType
    dropColumns!(tb, "val")
    alter table tb drop vwap
  • Rename a Column

    You can use the rename! function or alter..rename to rename a column. Supported only by the OLAP storage engine, and partition columns cannot be renamed.

    tb = loadTable("dfs://k_minute_level", "k_minute")
    rename!(tb, "open", "openPrice")
    alter table tb rename close to closePrice
    tb = loadTable("dfs://k_minute_level", "k_minute")
    tb.schema().colDefs
  • Change a Column Type

    You can use the replaceColumn! function to change a column type. Supported only by the OLAP storage engine, and partition column types cannot be changed.

    tb = loadTable("dfs://k_minute_level", "k_minute")
    replaceColumn!(tb, "vol", array(LONG, 0, 1))
    tb.schema().colDefs
  • Change Column Order

    You cannot change the column order for DFS tables.

2.1.8 How Do I Create a Table to Store Very Small, Static Data or Configuration Information?

Create a dimension table in a DFS database. A dimension table is an unpartitioned table in a DFS database. During queries, all data in the table is loaded into memory, making it suitable for storing small datasets that are updated infrequently. For more information, see createDimensionTable.

db = database("dfs://stock_lv2_snapshot")
colNames = ["param_key", "param_value", "param_flag", "param_desc", 
"insert_time","update_time"]
colTypes =  [STRING, STRING, STRING, STRING, TIMESTAMP, TIMESTAMP]
tb = table(1:0, colNames, colTypes);
params_cfg =db.createDimensionTable(tb,`params_cfg, 
sortColumns = `param_key`insert_time)
params_cfg.tableInsert(table(["CONN_TIME_OUT"] as param_key, 
["60"] as param_value, ["1"] as param_falg, 
["timeout"] as param_desc , 
[now()] as insert_time, [now()] as update_time))
select * from params_cfg;

2.1.9 Can a Table Have a Primary Key? Can I define an auto-increment column?

In traditional databases, each table has a primary key, and even tables without a natural primary key often use a virtual auto-increment ID column as the primary key. However, DolphinDB does not support creating primary keys or defining auto-increment columns. For scenarios where data from OLTP tables with primary keys is captured via CDC and loaded into DolphinDB for analysis, see Primary Key Storage Engine.

2.1.10 Can a Table Have a Unique Index?

Traditional databases often allow you to create unique indexes on non-primary-key columns that must remain unique, as well as regular indexes on columns used to improve query performance. However, DolphinDB does not support creating unique indexes or regular indexes.

2.1.11 Can TSDB sortColumns Be Chosen from Columns with Uniqueness Constraints?

In the DolphinDB TSDB engine, sortColumns are used not only for sorting but also for deduplication: only one record is retained for data with the same sortKey. Therefore, upstream primary key columns or uniquely indexed columns are not recommended as sortColumns. Because such fields are unique, each row would form its own sortKey. This prevents deduplication from taking effect and also causes index bloat, which can degrade query performance. For considerations on creating databases and tables, see 10 Commonly Overlooked Details in Creating Databases and Tables.

2.2 SQL Usage

This section introduces common SQL operations in DolphinDB, such as querying table data, selecting all columns except specific ones, limiting the number of rows returned with SELECT TOP or SELECT LIMIT, viewing query plans, and handling column names that contain special characters. In addition, this section explains the difference between SELECT and EXEC, and considerations on using GROUP BY and append!.

2.2.1 How Do I Query Table Data?

Before querying a DFS table, you must load the table metadata with loadTable , and then use a SELECT statement to query data. Note that loadTable does not return data directly.

tb = loadTable("dfs://stock_lv2_snapshot", "snapshot")
t1 = select * from tb 
t2 = select * from loadTable("dfs://stock_lv2_snapshot", "snapshot")

2.2.2 How Do I Query All Columns Except Specific Ones?

You can achieve this using SQL metaprogramming.

tb = table(rand(10.,100) as col1,rand(10.,100) as col2,rand(10.,100) as col3,
rand(10.,100) as col4,rand(10.,100) as col5)
exceptColNames = `col1`col5
filterColNames = tb.columnNames()[!(tb.columnNames() in exceptColNames)]
res = <select _$$filterColNames from tb>.eval()

2.2.3 How Do I Limit the Number of Rows Returned by a Query?

You can achieve this using the SELECT TOP or SELECT LIMIT clause. When used together with the CONTEXT BY clause, the scalar value in the LIMIT clause can be a negative integer to return the specified number of last records from each group.

tb = loadTable("dfs://stock_lv2_snapshot", "snapshot")
select top 1 * from tb
select * from tb limit 1
select top 1:5 * from tb
select * from tb limit 1, 4
select * from tb where TradeDate = 2022.01.04 
context by SecurityID order by TradeTime limit -1;

2.2.4 How Do I View the SQL Execution Plan?

You can use [HINT_EXPLAIN] to view the execution plan of a SELECT statement.

select [HINT_EXPLAIN] * from  loadTable("dfs://stock_lv2_snapshot", "snapshot")

2.2.5 Why Do I Need to Assign the Result of a SELECT Query to a Variable?

When you run a SELECT query directly, the result is transmitted to the client over the network in one batch and rendered locally. With large datasets, this can lead to significant latency and resource consumption.

By contrast, when you assign the result of a SELECT query to a variable, the result is first stored in an in-memory table on the server. When you click the variable in the client to view the data, the system uses paged loading and returns only part of the data, significantly reducing data transfer and query time.

2.2.6 What If a Column Name in a SELECT Statement Contains Special Characters or Starts with a Digit?

In SQL, when referring to a column name that contains special characters or starts with a digit, enclose the column name in double quotes and prefix it with an underscore as an identifier.

tb = loadTable("dfs://stock_lv2_snapshot", "snapshot")
// Add special columns.
addColumn(tb, ["exec"], [INT])
addColumn(tb, ["1_col"], [STRING])
tb = loadTable("dfs://stock_lv2_snapshot", "snapshot")
select _"1_col" from tb
select _"exec" from tb

2.2.7 What Is the Difference Between SELECT and EXEC?

A SELECT clause always returns a table, even when only a single column is selected. If you need a scalar or a vector, use an EXEC clause.

2.2.8 What Should I Keep in Mind When Using the GROUP BY Clause?

By default, the GROUP BY clause returns the grouping columns. Therefore, if you write "SELECT aggregate function(grouping column) as grouping column", an error is returned: Duplicated column name (see sample code 4 below). The GROUP BY clause must be used with an aggregate function. Otherwise, the system returns "All columns must be of the same length" (see Sample Code 5 below).

tb = loadTable("dfs://stock_lv2_snapshot", "snapshot")
// Sample Code 1: Correct
select sum(TotalVolumeTrade) as tot from tb  
where tradeDate = 2022.01.05 group by SecurityID
// Sample Code 2: Correct
select SecurityID  , sum(TotalVolumeTrade) as tot from tb  
where tradeDate = 2022.01.05 group by SecurityID
// Sample Code 3: Correct
select min(SecurityID) as sid , sum(TotalVolumeTrade) as tot from tb  
where tradeDate = 2022.01.05 group by SecurityID
// Sample Code 4: Incorrect
select min(SecurityID) as SecurityID , sum(TotalVolumeTrade) as tot from tb  
where tradeDate = 2022.01.05 group by SecurityID
// Sample Code 5: Incorrect
select SecurityID as sid , sum(TotalVolumeTrade) as tot from tb  
where tradeDate = 2022.01.05 group by SecurityID

2.2.9 How Do I Filter Array Vectors in a WHERE Condition?

You can filter array vectors in a WHERE condition by using the byRow function.

t = table(fixedLengthArrayVector([1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8]) as ids,
[1.0, 2.0, 3.0, 4.0] as value);
select * from t where rowAnd(byRow(eq{[4, 6, 8]}, ids))

2.2.10 What Should I Keep in Mind When Inserting Data via append!?

This function does not check column names or column order between the two tables. It can execute as long as the data types of columns in corresponding positions are consistent. Therefore, when performing an append! operation on a table, verify the column names and column order in both tables to avoid errors.

2.3 DLang Syntax

This chapter covers common DLang syntax and language features, including the differences among single quotes, double quotes, and backticks; conversion between CHAR and SYMBOL; the differences between slashes (/) and backslashes (\) in division; the differences among NULL, empty strings, and NaN; method of viewing variable types; method of undefining variables; conversion between temporal data types; floating-point precision and comparison; and string formatting.

2.3.1 What Are the Differences Among Single Quotes, Double Quotes, and Backticks?

Character Primary Use Key Differences or Notes

Single quotes

Define a string or character.

If single quotes contain a single character, the value is of type CHAR. For example, 'A' is a CHAR, not a STRING.

If single quotes contain multiple characters, the value is of type STRING, with no difference from double quotes.

Double quotes

Define a string.

Double quotes can contain special characters.

Backticks

Define a string.

Backticks cannot contain special characters.

2.3.2 Can Data of Type CHAR Be Automatically Converted to Data of Type SYMBOL?

No. Example of incorrect code:

symVec = [`a, `b, `c, `d]$SYMBOL
c = 'a'
symVec.append!(c) // append!(symVec, c) => Failed to append data

However, data of type STRING can be automatically converted to data of type SYMBOL. The corrected code example is shown below:

symVec.append!(c.string())

2.3.3 What Are the Differences Between Slashes (/) and Backslashes (\) in Division?

If either the dividend or the divisor is a FLOATING type, there is no difference between the two operators.

If both the dividend and the divisor are integers, division with the backslash (\) operator returns a FLOATING result and preserves the fractional part. By contrast, the slash (/) operator truncates the division result and returns an integer.

100 / 2.1 // 47.61904761904762
100 \ 2.1 // 47.61904761904762
100.0 / 3 // 33.333333333333336
100.0 \ 3 // 33.333333333333336
100 \ 3   // 33.333333333333336
100 / 3   // 33

2.3.4 What Are the Differences Among NULL, Empty Strings, and NaN?

Concept Type Representation

NULL

VOID or a typed null value

NULL, int(), 00i, etc.

Empty string

STRING

""

NaN

FLOAT or DOUBLE

Caused by arithmetic overflow, indicating an invalid numeric result.

2.3.5 How Do I View the Data Type of a Variable?

Use the typestr function to view the type of any data object.

symVec = [`a, `b, `c, `d]$SYMBOL
t = table(1 2 3 as id, symVec as symbol, 1 2 3 as value)
d = dict(`a`b`c, 1 2 3)
typestr symVec // FAST SYMBOL VECTOR
typestr t // IN-MEMORY TABLE
typestr d // STRING->INT DICTIONARY

2.3.6 How Do I Undefine a Variable?

Variable type Function used to undefine it

Local variable

undef(`var)

Shared variable

undef(`var, SHARED)

Function definition

undef(`def, DEF)

Stream table

dropStreamTable(`streamTableDemo)

Streaming engine

dropStreamEngine(`streamEngineDemo)

2.3.9 How do I format, align, or pad strings?

Use the stringFormat function.

// Right-aligned (a length of 10, padded with spaces)
stringFormat("%10W", "DolphinDB") // Output: " DolphinDB"
// Left-aligned (a length of 10, padded with zeros)
stringFormat("%010l", 123l) // Output: "0000000123"

2.3.10 How Do I Determine Whether a String Contains a Character or Substring?

Use the strpos or strFind function.

strpos("Hello, DolphinDB!", "Dolphin") // Output: 7

2.3.11 How Do I Extract a Specified Character or Substring from a String?

Use functions such as regexFindStr and substr. For example, extract 20220701 from AP210_ZCE_20220701.txt.

fileName = "AP210_ZCE_20220701.txt"
regexFindStr(fileName, "[0-9]{8}") // Output: '20220701'
substr(fileName, 10, 8) // Output: '20220701'
split(fileName, "_").last().split(".").first() // Output: '20220701'

2.3.12 In a Vector, How Do I Get the Indexes of Elements Greater Than a Specified Value?

v = 1 2 3 4 5 6 7 8 9 10
at(v>3)

2.3.13 Dictionary-Related Questions

How do I convert a dictionary to a table?

Use the transpose function.

d = dict(`sym`val, [`a`b`c, 1 2 3])
transpose(d)

How do I delete a key from a dictionary?

Use the erase! function.

d = dict(`sym`val, [`a`b`c, 1 2 3])
erase!(d, `sym)

2.3.14 Module-Related Questions

How do I upload a module through VSCode?

After you install the DolphinDB VSCode Extension, an upload icon appears in the upper-right corner of VSCode. First, select the module file to upload. Then, click Upload and select DolphinDB: Upload Module . In the dialog box, choose whether to encrypt the module. The detailed steps are shown in the figures below.

Why can't the uploaded module be found when I use it?

Troubleshooting:

  1. Check whether the node to which the module was uploaded is the same node to which the client is connected.

  2. Check whether the DOS file name matches the module name defined in the DOS file.

  3. Check whether the module was uploaded to the Home path.

After re-uploading a module to overwrite the existing one, why is the old module still being used?

Troubleshooting:

  1. Check whether the node to which the new module was uploaded is the same node to which the client is connected.

  2. Modules are cached. Use clearCachedModules to clear the cache, and try again.

  3. Disconnect the current session in VSCode, reconnect, and then upload the module again to reference the new module.

2.4 O&M

This section describes common O&M tasks when using DolphinDB, including graceful shutdown, disk space management, data migration, configuration of commonly used file directories, system troubleshooting, and important operations that must be performed only after consulting DolphinDB technical staff.

2.4.1 Graceful Shutdown

Since a graceful shutdown flushes transactions to disk before the service stops, the shutdown may take an extended period. Therefore, after performing a graceful shutdown by running the stopSingle.sh or stopAllNode.sh script, make sure that the shutdown completes successfully before restarting the service. Otherwise, node ports may remain occupied, causing the restart to fail. See Graceful Shutdown.

2.4.2 Disk Space

We recommend monitoring disk space usage based on O&M requirements and triggering alerts promptly when disk usage reaches the configured threshold. Otherwise, if the disk becomes full, it may cause data writes to fail, result in incomplete writes, freeze the system, or lead to other unpredictable issues, making timely service recovery difficult.

2.4.3 Data Migration

For data migration, we recommend using the backup-related functions to back up data, and the restore-related functions to restore and import data. We do not recommend migrating data by directly copying data files to avoid issues such as data inconsistency or missing metadata.

2.4.4 System Files

Files in the DolphinDB deployment directory and data storage directories (including dfsMetaDir, chunkMetaDir, volumes, TSDBRedoLogDir, persistenceDir, and computeNodeCacheDir) contain important data required for system operation. You must not modify or delete them arbitrarily.

We recommend using a unified directory convention that includes a DolphinDB subdirectory when configuring data storage paths (for example, chunkMetaDir=/ssd/ssd1/DolphinDB/chunkMeta/). This clearly identifies the directory as DolphinDB-related data storage and reduces the risk of accidental operations.

2.4.6 Risky Operations

Some operations may cause serious issues, such as startup failure and data loss. For the scenarios listed below, we recommend consulting DolphinDB technical staff before proceeding and performing the operations under their guidance.

  • Disk operations: Mounting, unmounting, scaling up, scaling in, and replacement.

  • Changing the hostname.

  • Changing the IP address.

  • Changing data storage directories: volumes, dfsMetaDir, chunkMetaDir, redoLogDir, TSDBRedoLogDir.

2.5 Others

This section covers other FAQs when using DolphinDB, such as whether upgrading the server or updating the license requires a cluster restart, how regular users can change their own passwords, how to calculate the disk space used by databases and tables, and how to view the memory usage of different users.

2.5.1 Do I Need to Restart the DolphinDB Cluster After Upgrading the Server or Updating the License?

After upgrading the server, you must restart the DolphinDB cluster. However, license updates can be performed either online or offline.

For online license updates, you can use the updateLicense function or the updateAllLicenses function provided in the ops module. Online license updates are subject to certain restrictions. For more information, see Update License File in High-availability Cluster Deployment.

For offline license updates, replace the existing license file and restart the cluster.

2.5.2 How Do Regular Users Change Their Own Passwords?

Use the changePwd function.

2.5.3 How Do I Check the Disk Space Used by a Database or Table?

Use the getTableDiskUsage function in the ops module.

2.5.4 How Do I View the Memory Usage of Different Users?

Use the getSessionMemoryStat function.

3. Summary

This article serves as a quick reference guide for DolphinDB beginners, covering database and table management, SQL usage, DLang syntax, and O&M tasks.