cut

Syntax

cut(X, size|cutPositions)

Details

This function divides X based on the specified size or cutPositions and returns a tuple.

  • When X is a scalar, size can only be specified as 1.

  • When X is a vector:

    • if size is specified, it divides X into a list of scalars (size=1) or vectors (size>1) of length size.

    • if cutPositions is specified, it divides X into a list of vectors at the specified positions.

  • When X is a matrix (table):

    • if size is specified, it divides X into several matrices (tables) with size columns (rows).

    • if cutPositions is specified, it divides X into several matrices (tables) at the specified positions.

Refer to function flatten for the reverse operation.

Note:
Despite the identical name, DolphinDB's cut is different from Pandas' pandas.cut. DolphinDB's cut is used for splitting data structures and does not support parameters like right, labels, or retbins available in pandas.cut; while pandas.cut is used for binning and categorizing numerical data.
DolphinDB cut pandas.cut
Purpose Physically split data structures into segments Logically categorize numerical data into intervals
Parameter X / x Scalar/vector 1D array
Parameter size / bins (int) Number of elements per segment Number of intervals to create
Parameter cutPositions / bins (sequence of scalars) Index positions for splitting (0-based) Boundary values for numerical intervals

Parameters

X is a scalar/vector/matrix/table.

size is a positive integer that must be no greater than the size of X.

cutPositions is a vector with increasing elements, which is used to specify the starting position of each vector in the result.

Examples

a=1..10;

cut(a,2);
// output: ([1,2],[3,4],[5,6],[7,8],[9,10])

cut(a,3);
// output: ([1,2,3],[4,5,6],[7,8,9],[10])

cut(a,9);
// output: ([1,2,3,4,5,6,7,8,9],[10])

b = cut(a,2);
b;
// output: ([1,2],[3,4],[5,6],[7,8],[9,10])

flatten b;
// output: (1,2,3,4,5,6,7,8,9,10)

cut(a, 0 2 7);
// output: ([1,2],[3,4,5,6,7],[8,9,10])

cut(a, 2 7);
// output: ([3,4,5,6,7],[8,9,10])

The cut function can be a convenient tool in time-series data analysis. In the example below, we use the cut function to calculate an aggregate measure between two events.

incomes=table(2016.07.31 - 10..1 as date, rand(100,10) as income);
incomes;
date income
2016.07.21 78
2016.07.22 61
2016.07.23 79
2016.07.24 15
2016.07.25 78
2016.07.26 22
2016.07.27 30
2016.07.28 81
2016.07.29 17
2016.07.30 52
eventdates = [2016.07.22, 2016.07.25, 2016.07.29];

x = incomes.date.binsrch(eventdates);
x;
// output: [1,4,8]

incomes.date.cut(x);
// output: ([2016.07.22,2016.07.23,2016.07.24],[2016.07.25,2016.07.26,2016.07.27,2016.07.28],[2016.07.29,2016.07.30])

table(eventdates as startDate, each(last,incomes.date.cut(x)) as endDate, each(sum,incomes.income.cut(x)) as incomeSum);
startDate endDate incomeSum
2016.07.22 2016.07.24 155
2016.07.25 2016.07.28 211
2016.07.29 2016.07.30 69