Tiered Storage
Starting from 1.30.18/2.00.6, DolphinDB has introduced tiered storage to the cluster mode. With tiered storage, users can relocate older data on the disk (usually lower-speed disk) or transfer to cloud (S3). Normally, the transferred data (cold data) is not frequently queried or computed, but takes up a lot of disk resources. Therefore, storing the cold data in cloud or migrating it from high-speed disks (SSDs) to lower-speed disks (HDDs) can effectively save resources overhead.
Tiered Storage Architecture
hot data (stored in the storage location specified by volumes) → cold data (stored in the storage location specified by coldVolumes) → expired data (deleted)
-
Cold Data Storage
coldVolumes = file://home/mypath/hdd, s3://bucket1/data
coldVolumes is the volumes to store the cold data. Multiple (local or S3) directories can be specified with a comma delimiter. A local path starts with the identifier "file://"; An S3 path is in the format of "s3://{BucketName}/{s3path}" where "s3path" must be specified.
Note:-
Specify the storage path for each data node separately, which can be defined with a macro in the format of
coldVolumes=s3://bucket/ddb/<ALIAS>
. -
If an S3 path is specified for coldVolumes, S3 related parameters must be configured accordingly.
-
It is recommended that there is no file under the specified S3 path.
-
The data stored in S3 are read-only.
-
-
S3
//plugin pluginDir=plugins preloadModules=plugins::awss3 //S3-related configuration s3AccessKeyId s3SecretAccessKey s3Region
Refer to https://aws.amazon.com/cn/getting-started/guides/setup-environment/ for more information on AWS S3 setup.
Storage Policy
You can move the cold data to cold volumes with function moveHotDataToColdVolume or setRetentionPolicy. The differences between the 2 migration strategies are:
Difference |
moveHotDataToColdVolume
|
setRetentionPolicy
|
---|---|---|
Trigger mechanism | Execute the command to force migrate the data | Set data retention policy to migrate the data automatically |
Time interval of the transferred data | specified by parameter checkRange, in the range of [current time - hoursToColdVolumes - checkRange, current time - hoursToColdVolumes) | only data of 10 days, in the range of [current time - retentionHours - 10 days, current time - retentionHours) |
When migrating a large amount of historical data for the first time, it is
recommended to use moveHotDataToColdVolume
first and then configure
a reasonable auto-transfer policy with function setRetentionPolicy
so that later the system can migrate cold data automatically.