Data Rebalancing

Even data distribution across nodes is crucial for cluster performance, stability, and resource efficiency. It is a key consideration in the DolphinDB distributed architecture design.

When scaling the cluster (adding/removing data nodes or disks), data can become unevenly distributed, leading to strained resources on some nodes/disks while others remain underutilized. To address this, DolphinDB provides tools for generating and executing data rebalancing plans, ensuring optimal data distribution.

Data Rebalancing Mechanism

The system rebalances data across the cluster by migrating chunks from high-usage disks to less utilized ones, based on metrics like cluster storage capacity and disk usage.

Note: In this context, “chunks“ are data files containing single replicas of table partitions.

When data rebalancing begins, the system follows this workflow:

Validation: Verifies that the user has administrator privileges and that the request comes from the controller node. Also checks parameters and ensures no other rebalancing tasks are in progress.
Data collection: Collects disk statistics to identify high-usage and low-usage disks, as well as chunk statistics.
Plan generation: Generates a rebalancing plan with tasks for each chunk to be migrated, specifying source and destination paths. Chunks are migrated to local disks preferably; only if local disks lack sufficient space are chunks mapped to the lowest-utilized disk on another machine.
Execution: Migrates data by task. Tasks with non-conflicting source/destination pathsare executed in parallel.

Functions and Configuration

DolphinDB offers built-in functions and configurations for executing, monitoring and controlling data rebalancing tasks. The functions must be executed on the controller by an administrator.

rebalanceChunksWithinDataNode: rebalances data among disks within a data node.
rebalanceChunksAmongDataNodes:
- [Before 2.00.12/3.00.0] Rebalances data across data nodes.
- [2.00.12/3.00.0 and above] Rebalances data across all disks.
- Note: When enableChunkGranularityConfig=false (see details in Configuration > enableChunkGranularityConfig) all tables in the same database partition are distributed in the same node. When calling this function, if the node is down, some tables in the same partition may fail to relocate, i.e., different tables under the same partition will be distributed in different nodes. Execute restoreDislocatedTablet to move all tables under the same partition to one node.

Both functions provide the following options:

updatedBeforeDays: Limits the rebalancing process to only include chunks that haven’t been updated for a user-specified period of time.
exec: Controls whether to execute the rebalancing. Defaults to false, returning only the rebalancing plan without executing it. It is recommended that you review the plan first before execution.

Note:

Use the parameter dfsRebalanceConcurrency to configure task execution parallelism.
Data rebalancing effectiveness may be limited by factors such as varying chunk sizes or disks storing non-DolphinDB data. Multiple rebalancing runs can improve results.

Notes

The following are common cases that may occur during data rebalancing:

Data migration and rebalancing tasks can be resource-intensive, and chunks that are being written, modified, or deleted may fail to migrate due to locks.
For time-consuming calculation jobs, exceptions may be thrown when the cache points to the old chunk path.

Therefore, it is recommended to perform data rebalancing operations when there are no write or query tasks being executed.

For complete procedure of data rebalancing tasks, see the Data Rebalancing tutorial.