Delete Data

DolphinDB provides several methods to delete data from both in-memory tables and DFS tables. This page focuses on how to delete table rows.

For other deletion options, see:

Note: Before deleting data, verify that you have the necessary permissions. For more information, refer to User Access Control.

Delete Rows from DFS Tables

SQL DELETE Statement

You can use SQL DELETE to remove specific rows from DFS tables based on conditions.

The syntax is as follows:

DELETE FROM <table> WHERE ...

For example, delete records with date earlier than 2023.10.05 from distributed partitioned table pt.

db = database("dfs://olapdemo",partitionType=VALUE, partitionScheme=2023.10.01..2023.10.10) 
pt = createPartitionedTable(dbHandle=db, table=t, tableName=`pt, partitionColumns=`date) 
pt.tableInsert(t) 

delete from pt where date < 2023.10.05

When deleting data from DFS tables, the system loads data into memory and writes back remaining data using MVCC.The new version of data is temporarily stored in separate files. This allows uninterrupted data access during deletion but temporarily increases disk usage. Disk space is not released until the old version is cleaned up. Therefore, ensure you have sufficient memory and disk space when using DELETE on DFS tables.

Therefore, for optimal performance, it is recommended to include partitioning columns in your WHERE clause, ensuring only relevant partitions matching the WHERE condition are loaded into memory. Otherwise, the system must scan the entire table into memory, resulting in resource exhaustion.

Note: SQL DELETE also supports deleting rows from in-memory table, see Deleting Rows from In-Memory Tables for detailed example.

truncate Function

The truncate function removes all rows from a DFS table while preserving its schema. It is faster than the DELETE statement and the dropPartition function.

For example, remove all rows from pt.

truncate(dbUrl="dfs://olapdemo", tableName="pt")

Delete Rows from In-Memory Tables

SQL DELETE Statement

You can use SQL DELETE to remove specific rows from in-memory tables based on conditions.

For example, delete records with date earlier than 2023.10.05 from in-memory table t.

t = table(2023.10.01 + take(0..9,100) as date, take(['A','B','C','D'],100) as sym, 1..100 as val)
delete from t where date < 2023.10.05

Note: SQL DELETE also supports deleting rows from DFS table, see Deleting Rows from DFS Tables for detailed example.

drop Function

The drop function deletes the first or last n rows from an in-memory table.

For example, delete the first ten rows of table t.

t = table(2023.10.01 + take(0..9,100) as date, take(["A01","B01","C01","D01"],100) as sym, 1..100 as val)
t = drop(t,10)

Note: The drop function is a non-inplace operation, meaning it returns a new result instead of modifying the original table. To apply the deletion to the original table, reassign the output back to it.

erase! Function

The erase! function removes specific rows from an in-memory table.

For example, remove rows from table t where the date is 2023.10.05.

t = table(2023.10.01 + take(0..9,100) as date, take(['A','B','C','D'],100) as sym, 1..100 as val)
erase!(obj=t,filter=<date<2023.10.05>)

clear! Function

The clear! function empties an in-memory table while preserving its schema.

For example, clear table t.

t = table(2023.10.01 + take(0..9,100) as date, take(['A','B','C','D'],100) as sym, 1..100 as val)
clear!(t)

Best Practices for Data Deletion in DFS Tables

There are three ways to remove data from DFS tables: SQL DELETE, truncate, and dropPartition (see Drop Partitions).

When to use DELETE:

  • For tables that need uninterrupted query access during deletion. Deleting data with DELETE won't disrupt the ongoing query, whereas truncate may cause the query to fail.
  • For options to cancel the operation. You can cancel a DELETE job midway, either by cancelJob or disconnecting, but the truncate function cannot be stopped once it starts (unless an error or power outage occurs).

When to use truncate:

  • For quickly removing all data when disk space is limited. DELETE operations use additional disk space. If space is limited, truncate is better for clearing all data quickly to free up storage.
  • For complete removal of large datasets.

For large tables with partial deletions:

  • Combine dropPartition with DELETE to minimize disk usage, or
  • Save data to be retained in memory, use dropPartition, then reinsert the saved data.