Tiered Storage
Starting from 1.30.18/2.00.6, DolphinDB has introduced tiered storage to the cluster mode. With tiered storage, users can relocate older data on the disk (usually lower-speed disk) or transfer to cloud (S3). Normally, the transferred data (cold data) is not frequently queried or computed, but takes up a lot of disk resources. Therefore, storing the cold data in cloud or migrating it from high-speed disks (SSDs) to lower-speed disks (HDDs) can effectively save resources overhead.
Tiered Storage Architecture
hot data (stored in the storage location specified by volumes) → cold data (stored in the storage location specified by coldVolumes) → expired data (deleted)
Configuration:
(1) Cold Data Storage
coldVolumes = file://home/mypath/hdd, s3://bucket1/data
coldVolumes is the volumes to store the cold data. Multiple (local or S3) directories can be specified with a comma delimiter. A local path starts with the identifier "file://"; An S3 path is in the format of "s3://{BucketName}/{s3path}" where "s3path" must be specified.
Note:
-
Specify the storage path for each data node separately, which can be defined with a macro in the format of
coldVolumes=s3://bucket/ddb/<ALIAS>
. -
If an S3 path is specified for coldVolumes, S3 related parameters must be configured accordingly.
-
It is recommended that there is no file under the specified S3 path.
-
The data stored in S3 are read-only.
(2) S3
//plugin
pluginDir=plugins
preloadModules=plugins::awss3
//S3-related configuration
s3AccessKeyId
s3SecretAccessKey
s3Region
Refer to https://aws.amazon.com/cn/getting-started/guides/setup-environment/ for more information on AWS S3 setup.
Storage Policy
You can move the cold data to cold volumes with function moveHotDataToColdVolume or setRetentionPolicy. The differences between the 2 migration strategies are:
Difference |
moveHotDataToColdVolume
|
setRetentionPolicy
|
---|---|---|
Trigger mechanism | Execute the command to force migrate the data | Set data retention policy to migrate the data automatically |
Time interval of the transferred data | specified by parameter checkRange, in the range of [current time - hoursToColdVolumes - checkRange, current time - hoursToColdVolumes) | only data of 10 days, in the range of [current time - retentionHours - 10 days, current time - retentionHours) |
When migrating a large amount of historical data for the first time, it is
recommended to use moveHotDataToColdVolume first and then configure a reasonable
auto-transfer policy with function setRetentionPolicy
so that later
the system can migrate cold data automatically.