summary#

swordfish.function.summary()#

summary generates summary statistics for the input data. It returns an in-memory table containing the minimum, maximum, count, mean, standard deviation, and specified percentiles in ascending order.

If X is a table, summary only computes statistics for the numeric columns.
If X is a data source, it can only contain numeric columns, otherwise an error will occur during computation.

Parameters:

X (Constant) – An in-memory table, DFS table or data source generated from sqlDS. Note that data sources with SQL metacode containing table joins are currently not supported.
interpolation (Constant, optional) –
A string indicating the interpolation method for percentiles, by default DFLT.

It can be “linear” (default), “nearest”, “lower”, “higher” and “midpoint”.
characteristic (Constant, optional) –
A string scalar or vector indicating the characteristics to compute, by default DFLT.

It can be “avg” and/or “std”. Default is both characteristics.
percentile (Constant, optional)
falls (A DOUBLE vector of percentiles to compute. Each vector element) – between 0 ang 100, by default DFLT.
precision (Constant, optional) – A DOUBLE scalar greater than 0, by default DFLT.
partitionSampling (Constant, optional) –
Can be a positive integer specifying the number of partitions to sample, or a float between (0, 1] specifying the sampling ratio, by default DFLT.

If not specified, statistics are computed on all partitions. When specifying partitionSampling
Note
- For a partitioned table:
  - at least one partition will always be sampled. If the sampling ratio * total partitions < 1, one partition is sampled.
  - The sampling ratio is rounded down if the sampling ratio * total partitions is not an integer. E.g. if ratio=0.26 and total partitions is 10, 2 partitions are sampled.
  - If partitionSampling (integer) > total partitions, all partitions are used.
- partitionSampling has no effect for non-partitioned tables.