cut
Syntax
cut(X, size|cutPositions)
Details
This function divides X based on the specified size or cutPositions and returns a tuple.
-
When X is a scalar, size can only be specified as 1.
-
When X is a vector:
-
if size is specified, it divides X into a list of scalars (size=1) or vectors (size>1) of length size.
-
if cutPositions is specified, it divides X into a list of vectors at the specified positions.
-
-
When X is a matrix (table):
-
if size is specified, it divides X into several matrices (tables) with size columns (rows).
-
if cutPositions is specified, it divides X into several matrices (tables) at the specified positions.
-
Refer to function flatten for the reverse operation.
cut
is different from Pandas' pandas.cut. DolphinDB's
cut is used for splitting data structures and does not support
parameters like right, labels, or retbins available in
pandas.cut; while pandas.cut is used for
binning and categorizing numerical data.DolphinDB cut |
pandas.cut |
|
|---|---|---|
| Purpose | Physically split data structures into segments | Logically categorize numerical data into intervals |
| Parameter X / x | Scalar/vector | 1D array |
| Parameter size / bins (int) | Number of elements per segment | Number of intervals to create |
| Parameter cutPositions / bins (sequence of scalars) | Index positions for splitting (0-based) | Boundary values for numerical intervals |
Parameters
X is a scalar/vector/matrix/table.
size is a positive integer that must be no greater than the size of X.
cutPositions is a vector with increasing elements, which is used to specify the starting position of each vector in the result.
Examples
a=1..10;
cut(a,2);
// output: ([1,2],[3,4],[5,6],[7,8],[9,10])
cut(a,3);
// output: ([1,2,3],[4,5,6],[7,8,9],[10])
cut(a,9);
// output: ([1,2,3,4,5,6,7,8,9],[10])
b = cut(a,2);
b;
// output: ([1,2],[3,4],[5,6],[7,8],[9,10])
flatten b;
// output: (1,2,3,4,5,6,7,8,9,10)
cut(a, 0 2 7);
// output: ([1,2],[3,4,5,6,7],[8,9,10])
cut(a, 2 7);
// output: ([3,4,5,6,7],[8,9,10])
The cut function can be a convenient tool in time-series data
analysis. In the example below, we use the cut function to calculate an aggregate
measure between two events.
incomes=table(2016.07.31 - 10..1 as date, rand(100,10) as income);
incomes;
| date | income |
|---|---|
| 2016.07.21 | 78 |
| 2016.07.22 | 61 |
| 2016.07.23 | 79 |
| 2016.07.24 | 15 |
| 2016.07.25 | 78 |
| 2016.07.26 | 22 |
| 2016.07.27 | 30 |
| 2016.07.28 | 81 |
| 2016.07.29 | 17 |
| 2016.07.30 | 52 |
eventdates = [2016.07.22, 2016.07.25, 2016.07.29];
x = incomes.date.binsrch(eventdates);
x;
// output: [1,4,8]
incomes.date.cut(x);
// output: ([2016.07.22,2016.07.23,2016.07.24],[2016.07.25,2016.07.26,2016.07.27,2016.07.28],[2016.07.29,2016.07.30])
table(eventdates as startDate, each(last,incomes.date.cut(x)) as endDate, each(sum,incomes.income.cut(x)) as incomeSum);
| startDate | endDate | incomeSum |
|---|---|---|
| 2016.07.22 | 2016.07.24 | 155 |
| 2016.07.25 | 2016.07.28 | 211 |
| 2016.07.29 | 2016.07.30 | 69 |
