Temporal Type Comparison
Temporal types are commonly used in databases, and comparisons between them are frequently required during data processing. However, the rules for comparing vary depending on the scenarios. This tutorial will introduce the following topics:
- Temporal types
- Temporal type conversion
- Temporal type comparison
- Partition pruning with temporal columns
1. Temporal Types
DolphinDB supports the following temporal types:
Category | Data Type | Format | Example |
---|---|---|---|
Date | DATE | YYYY.MM.dd | 2023.01.04 |
MONTH | YYYY.MM.dd (end with “M”) | 2023.01M | |
Time | MINUTE | HH:mm (end with “m”) | 13:30m |
SECOND | HH:mm:ss | 13:30:10 | |
TIME | HH:mm:ss.sss | 13:30:10.008 | |
NANOTIME | HH:mm:ss.nnnnnnnnn | 13:30:10.006007008 | |
Datetime | DATEHOUR | YYYY.MM.ddTHH, YYYY.MM.dd HH |
2023.01.04T21 Can only be obtained by function |
DATETIME | YYYY.MM.ddTHH:mm:ss, YYYY.MM.dd HH:mm:ss | 2023.01.04T13:30:10, 2023.01.04 13:30:10 | |
TIMESTAMP | YYYY.MM.ddTHH:mm:ss.sss, YYYY.MM.dd HH:mm:ss.sss | 2023.01.04T13:30:10.008, 2023.01.04 13:30:10.008 | |
NANOTIMESTAMP | YYYY.MM.ddTHH:mm:ss.sss.nnnnnnnnn, YYYY.MM.dd HH:mm:ss.sss.nnnnnnnnn | 2023.01.04T13:30:10.001002003, 2023.01.04 13:30:10.001002003 |
Temporal values in DolphinDB do not include time zone information. It is up to the
users to interpret the time zones of the temporal objects. Users can use
localtime
, gmtime
, and
convertTZ
functions to handle time zone conversions. Refer to
Time Zone and Conversion for the details.
Retrieve the current date with function today
:
today() // 2024.06.03
Retrieve the current timestamp with function now
:
now() // 2024.06.03T09:29:38.390
now(true) // 2024.06.03T09:31:17.318298137
It returns a timestamp with millisecond precision by default. If you specify
nanoSecond=true
, it will return the result with nanosecond
precision.
2. Temporal Type Conversion
Temporal types can be converted using specialized data type conversion functions or
the function cast
.
Temporal type casting support in DolphinDB:
From \ To | Time | Date | Datetime |
---|---|---|---|
Time | √ | × | × |
Date | × | √ | √ |
Datetime | √ | √ | √ |
(1) Temporal types under the same category are convertible. For example:
DATE and MONTH.
date(2012.01M) // 2012.01.01
month(2012.01.02) // 2012.01M
MINUTE, SECOND, TIME and NANOTIME.
minute(23:30:00) // 23:30m
minute(23:30:00.000) // 23:30m
minute(23:30:00.000000000) // 23:30m
second(23:30m) // 23:30:00
second(23:30:00.001) // 23:30:00
second(23:30:00.000000001) // 23:30:00
time(23:31m) // 23:31:00.000
time(23:30:01) // 23:30:01.000
time(23:30:01.000000001) // 23:30:01.000
nanotime(23:30m) // 23:30:00.000000000
nanotime(23:30:31) // 23:30:31.000000000
nanotime(23:30:31.001) //23:30:31.001000000
DATEHOUR, DATETIME, TIMESTAMP and NANOTIMESTAMP.
datehour(2020.01.01 13:30:01) // 2020.01.01T13
datehour(2020.01.01T13:30:01.001) // 2020.01.01T13
datehour(2020.01.01T13:30:01.001002003) // 2020.01.01T13
datetime(datehour(2020.01.01 13:00:01)) // 2020.01.01T13:00:00
datetime(2020.01.01T13:30:01.001) // 2020.01.01T13:30:01
datetime(2020.01.01T13:30:01.001002003) // 2020.01.01T13:30:01
timestamp(datehour(2020.01.01 13:00:01)) // 2020.01.01T13:00:00.000
timestamp(2020.01.01 13:00:01) // 2020.01.01T13:00:01.000
timestamp(2020.01.01T13:30:01.001002003) // 2020.01.01T13:30:01.001
nanotimestamp(datehour(2020.01.01 13:00:01)) // 2020.01.01T13:00:00.000000000
nanotimestamp(2020.01.01T13:30:01) // 2020.01.01T13:30:01.000000000
nanotimestamp(2020.01.01T13:30:01.001) // 2020.01.01T13:30:01.001000000
(2) Date and datetime data are convertible.
The system will add the information with time set to 0:00 when converting date data to datetime data.
datehour(2023.01.02) // 2023.01.02T00
datetime(2023.01.02) // 2023.01.02T00:00:00
timestamp(2023.01.02) // 2023.01.02T00:00:00.000
nanotimestamp(2023.01.02) // 2023.01.02T00:00:00.000000000
datehour(2023.01M) // 2023.01.01T00
datetime(2023.01M) // 2023.01.01T00:00:00
timestamp(2023.01M) // 2023.01.01T00:00:00.000
nanotimestamp(2023.01M) // 2023.01.01T00:00:00.000000000
The system will discard the information about time when converting datetime data to date data.
date(datehour(2020.01.01 13:00:01)) // 2020.01.01
date(2020.01.01 13:00:01) // 2020.01.01
date(2020.01.01 13:00:01.001) // 2020.01.01
date(2020.01.01 13:00:01.001002003) // 2020.01.01
month(datehour(2020.01.01 13:00:01)) // 2020.01M
month(2020.01.01 13:00:01) // 2020.01M
month(2020.01.01 13:00:01.001) // 2020.01M
month(2020.01.01 13:00:01.001002003) // 2020.01M
(3) Datetime data are convertible to time data.
The system will discard information about time during the conversion.
time(2020.01.01 13:00:01.001002003) // 13:00:01.001
minute(2020.01.01 13:00:01) // 13:00m
However, time data cannot be converted to datetime data.
datetime(13:00:01) // Throw an exception
(4) Time data and date data are unconvertible.
month(13:00:01) // Throw an exception
minute(2020.01.01) // The function minute does not support date data
3. Temporal Type Comparison
Temporal values can be compared using comparison operators (>, <, >=, <=, ==, !=) and logical operators (between, in).
When comparing data of different temporal types, DolphinDB attempts to convert the temporal type with a lower precision to the higher one, following the rules in Chapter 2. If the conversion cannot be done, the system will throw an exception.
Note: Temporal precision from lowest to highest: MONTH < DATE < DATEHOUR < DATETIME < TIMESTAMP < NANOTIMESTAMP.
For example, when executing 2023.01.04T13:30:10.001 > 2023.01.04
,
the system will convert 2023.01.04
to
2023.01.04T00:00:00.000
for comparison and return the
result.
2023.01.04T13:30:10.001 > 2023.01.04 // true
2011.01.01T13:00:00 > 2011.01.02 // false
2023.01.04T13:30:10.001 == 2023.01.04 // false
2023.01.04 == 2023.01.04T00:00:00.000 // true
Note:
- MONTH values can only be compared with MONTH values.
- Datetime values cannot be compared with time values even though the conversion is feasible.
- Logical operator
between
requires both operands to be of the same type.
When the operands are of different types, the system will throw an exception:2023.01.04T13:30:10.001 between 2023.01.04T13:30:10.003:2023.01.04T13:30:10.004 // false
2023.01.04 between 2023.01.04T13:30:10.003:2023.01.04T13:30:10.004 // between(X, Y). Temporal data comparison should have the same data type.
4. Partition Pruning with Temporal Columns
Temporal columns are often used as partitioning columns in distributed databases. When queries filter on the partitioning columns, partition pruning will be triggered to reduce the partitions to be scanned, optimizing query performance.
In practical scenarios, queries are often performed either by directly filtering on the partitioning columns or by applying an explicit type conversion to the partitioning columns before filtering. The pruning rules differ slightly between these scenarios.
These terms are key to understanding the following pruning rules:
- Partitioning scheme type: The data type of the parameter partitionScheme
of function database. In the
following example, the partitioning scheme is
[2022.09.01, 2022.09.02, 2022.09.03]
of DATE type.dbName = "dfs://time_comparison" if(existsDatabase(dbName)) dropDatabase(dbName) db = database(dbName, VALUE, [2022.09.01,2022.09.02, 2022.09.03])
- Partitioning column type: The data type of parameter partitionColumns of
function createPartitionedTable. In the following example, the partitioning
column is column “time” of DATETIME
type.
n = 6 t = table(n:n,[`time,`value],[DATETIME,DOUBLE]) t[`time] = [2022.09.01T00:00:00, 2022.09.01T12:00:00, 2022.09.02T00:00:00, 2022.09.02T12:00:00, 2022.09.03T00:00:00, 2022.09.03T12:00:00] t[`value] = 1..6 pt = db.createPartitionedTable(t, `pt, `time).append!(t)
- Temporal object in the comparison. In this example, the partitioning column
“time” is compared with
2022.09.01
, where2022.09.01
of DATE type is the temporal object in the comparison.select * from pt where time == 2022.09.01 /* time value ----------------------------------- 2022.09.01T00:00:00 1.00000000 2022.09.01T12:00:00 2.00000000 */
4.1 Filtering on the Partitioning Columns
Temporal type comparison rules are the same for distributed queries and in-memory queries. Distributed queries filtering on the temporal partitioning columns can trigger partition pruning. Note that distributed tables cannot be created with the partitioning scheme specified as DATEHOUR or DATETIME types.
In the following example, the partitioning column is of DATETIME type and the data is partitioned by VALUE on a daily basis according to the column.
dbName = "dfs://time_comparison"
if(existsDatabase(dbName))
dropDatabase(dbName)
db = database(dbName, VALUE, [2022.09.01,2022.09.02, 2022.09.03])
n = 6
t = table(n:n,[`time,`value],[DATETIME,DOUBLE])
t[`time] = [2022.09.01T00:00:00, 2022.09.01T12:00:00, 2022.09.02T00:00:00, 2022.09.02T12:00:00, 2022.09.03T00:00:00, 2022.09.03T12:00:00]
t[`value] = 1..6
pt = db.createPartitionedTable(t, `pt, `time).append!(t)
If you want to query data of 2022.09.01
, you can directly
compare the partitioning column with 2022.09.01
. Even if the
object and the partitioning column are of different types, it still can trigger
partition pruning and only the partition 2022.09.01
will be
scanned. We can use sqlDS
to view how the distributed query is
split into subqueries.
sqlDS(<select * from pt where time == 2022.09.01>)
To query data of 2022.09.01T00:00:00.000
, you can also directly
compare the partitioning column with it. Even if the partitioning column and the
timestamp are of different data types, partition pruning will still be triggered
and only the partition 2022.09.01
will be scanned.
sqlDS(<select * from pt where time == 2022.09.01T00:00:00.000>)
4.2 Applying an Explicit Type Conversion Before Filtering
It is common to use filters with explicit type conversion in such form:
convert_func(col) <operator> constant
where:
-
convert_func
is a conversion function (e.g.,date
,month
); col
is the partitioning column;<operator>
is an operator (<, <=, =, ==, >, >=, between, in);constant
is the object in the comparison.
It will trigger partition pruning in the following scenarios.
4.2.1 Using Comparison Operators
When the <operator>
is a comparison operator (<,
<=, =, ==, >, >=), partition pruning can be triggered when the
precision of the data returned by convert_func
is less
than or equal to the precision of constant
and the
partitioning column.
Here’s an example of a distributed table partitioned by date value.
dbName = "dfs://time_comparison"
if(existsDatabase(dbName))
dropDatabase(dbName)
db = database(dbName, VALUE, [2022.09.01,2022.09.30,2022.10.01,2022.10.02,2022.10.31,2022.11.01,2022.11.02,2022.12.31,2023.01.01])
n = 10
t = table(n:n,[`time,`value],[DATE,DOUBLE])
t[`time] = take([2022.09.01,2022.09.30,2022.10.01,2022.10.02,2022.10.31,2022.11.01,2022.11.02,2022.12.31,2023.01.01],n)
t[`value] = rand(100.0,n)
pt = db.createPartitionedTable(t, `pt, `time).append!(t)
MONTH values can only be compared with MONTH values, for which you cannot
directly compare the partitioning column of DATE type with the object of
MONTH type. Instead, use the function month
to convert the
partitioning column to MONTH type for comparison:
select * from pt where month(time) > 2022.10M
/*
time value
------------------------
2022.11.01 15.00570112
2022.11.02 66.54577804
2022.12.31 48.09958597
2023.01.01 50.57664175
*/
In this scenario, both operands are of MONTH type, data returned by the
function month
has lower precision than the partitioning
column, partitions < 2022.11M
are pruned.
To query data of a specific timestamp, you can use function
timestamp
to convert the partitioning column to
TIMESTAMP type:
select * from pt where timestamp(time) = 2022.09.30T00:00:00.000
/*
time value
------------------------
2022.09.30 19.33508650
*/
The data returned by function timestamp
has higher precision
than the partitioning column. Therefore, partition pruning cannot be
triggered in this scenario. Filtering on the partitioning columns directly
is more efficient for such cases.
4.2.2 Using between
When the <operator>
is between
,
partition pruning can be triggered only if the data returned by
convert_func
and constant
are of the same data type.
For example, to query data of consecutive months, you cannot directly compare the partitioning column with the object of MONTH type.
select * from pt where time between month(2022.10M:2022.11M)
// between(X, Y). Temporal data comparison should have the same data type.
In such cases, you can use function month
to convert the
partitioning column to MONTH type for comparison with the MONTH object.
select * from pt where month(time) between month(2022.10M:2022.11M)
/*
time value
------------------------
2022.10.01 24.45175347
2022.10.02 86.05015869
2022.10.31 78.28769609
2022.11.01 15.00570112
2022.11.02 66.54577804
*/
In this scenario, both operands of between
are of MONTH type
with lower precision than the partitioning column of DATE type. Partitions
2022.10.01, 2022.10.02, 2022.10.31, 2022.11.01,
2022.11.02
are scanned and others are pruned.
4.2.3 Using in
When the <operator>
is in
, partition
pruning can be triggered when:
- The data returned by
convert_func
andconstant
are of the same data type; - The number of continuous segments in
constant
is less than 16.
Here is an example of how to define the number of continuous segments:
For the list [2020.01.02, 2020.01.03, 2020.01.04, 2020.01.06,
2020.01.07, 2020.01.12]
, there are 3 continuous segments:
2020.01.02–2020.01.04
,
2020.01.06–2020.01.07
, and
2020.01.12
.
To query data of specified months, you can use the function
month
to convert the partitioning column to MONTH type
for comparison with the object of MONTH type.
select * from pt where month(time) in [2022.09M, 2022.11M]
/*
time value
------------------------
2022.09.01 51.37807030
2022.09.01 50.86722047
2022.09.30 13.91816022
2022.11.01 76.58300183
2022.11.02 74.23354792
*/
In this scenario, both operands are of MONTH type with lower precision than
the partitioning column of DATE type. The number of continuous segments in
[2022.09M, 2022.11M]
is less than 16. Partitions
2022.09.01, 2022.09.30, 2022.11.01, 2022.11.02
are
scanned and others are pruned.
To query data of consecutive timestamps, you can use function
timestamp
to convert the partitioning column to
TIMESTAMP type for comparison with the object of TIMESTAMP type.
select * from pt where timestamp(time) in timestamp(2022.10.31..2022.11.01)
/*
time value
------------------------
2022.10.31 78.28769609
2022.11.01 15.00570112
*/
In this scenario, partition pruning cannot be triggered because the precision
of the data returned by the function timestamp
is higher
than the partitioning column.
Summary
The comparison rules for temporal types can be divided into two types: comparisons that involve partition pruning and those that do not. Typically, comparisons of temporal vectors, in-memory queries, and distributed queries filtering on non-partitioning columns do not involve partition pruning. When querying distributed tables with filter conditions on partitioning columns, comparison of temporal values will trigger partition pruning. In practical scenarios, it is recommended to use the WHERE clause that supports partition pruning to improve the query performance. It is especially effective when querying large tables, significantly reducing query time.