Tiered Storage

Starting from 1.30.18/2.00.6, DolphinDB has introduced tiered storage to the cluster mode. With tiered storage, users can relocate older data on the disk (usually lower-speed disk) or transfer to cloud (S3). Normally, the transferred data (cold data) is not frequently queried or computed, but takes up a lot of disk resources. Therefore, storing the cold data in cloud or migrating it from high-speed disks (SSDs) to lower-speed disks (HDDs) can effectively save resources overhead.

Tiered Storage Architecture

hot data (stored in the storage location specified by volumes) → cold data (stored in the storage location specified by coldVolumes) → expired data (deleted)

  1. Cold Data Storage

    coldVolumes = file://home/mypath/hdd, s3://bucket1/data

    coldVolumes is the volumes to store the cold data. Multiple (local or S3) directories can be specified with a comma delimiter. A local path starts with the identifier "file://"; An S3 path is in the format of "s3://{BucketName}/{s3path}" where "s3path" must be specified.

    • Specify the storage path for each data node separately, which can be defined with a macro in the format of coldVolumes=s3://bucket/ddb/<ALIAS>.

    • If an S3 path is specified for coldVolumes, S3 related parameters must be configured accordingly.

    • It is recommended that there is no file under the specified S3 path.

    • The data stored in S3 are read-only.

  2. S3

    //S3-related configuration

    Refer to https://aws.amazon.com/cn/getting-started/guides/setup-environment/ for more information on AWS S3 setup.

Storage Policy

You can move the cold data to cold volumes with function moveHotDataToColdVolume or setRetentionPolicy. The differences between the 2 migration strategies are:

Difference moveHotDataToColdVolume setRetentionPolicy
Trigger mechanism Execute the command to force migrate the data Set data retention policy to migrate the data automatically
Time interval of the transferred data specified by parameter checkRange, in the range of [current time - hoursToColdVolumes - checkRange, current time - hoursToColdVolumes) only data of 10 days, in the range of [current time - retentionHours - 10 days, current time - retentionHours)

When migrating a large amount of historical data for the first time, it is recommended to use moveHotDataToColdVolume first and then configure a reasonable auto-transfer policy with function setRetentionPolicy so that later the system can migrate cold data automatically.