summary#

swordfish.function.summary()#

summary generates summary statistics for the input data. It returns an in-memory table containing the minimum, maximum, count, mean, standard deviation, and specified percentiles in ascending order.

  • If X is a table, summary only computes statistics for the numeric columns.

  • If X is a data source, it can only contain numeric columns, otherwise an error will occur during computation.

Parameters:
  • X (Constant) – An in-memory table, DFS table or data source generated from sqlDS. Note that data sources with SQL metacode containing table joins are currently not supported.

  • interpolation (Constant, optional) –

    A string indicating the interpolation method for percentiles, by default DFLT.

    It can be “linear” (default), “nearest”, “lower”, “higher” and “midpoint”.

  • characteristic (Constant, optional) –

    A string scalar or vector indicating the characteristics to compute, by default DFLT.

    It can be “avg” and/or “std”. Default is both characteristics.

  • percentile (Constant, optional)

  • falls (A DOUBLE vector of percentiles to compute. Each vector element) – between 0 ang 100, by default DFLT.

  • precision (Constant, optional) – A DOUBLE scalar greater than 0, by default DFLT.

  • partitionSampling (Constant, optional) –

    Can be a positive integer specifying the number of partitions to sample, or a float between (0, 1] specifying the sampling ratio, by default DFLT.

    If not specified, statistics are computed on all partitions. When specifying partitionSampling

    Note

    • For a partitioned table:

      • at least one partition will always be sampled. If the sampling ratio * total partitions < 1, one partition is sampled.

      • The sampling ratio is rounded down if the sampling ratio * total partitions is not an integer. E.g. if ratio=0.26 and total partitions is 10, 2 partitions are sampled.

      • If partitionSampling (integer) > total partitions, all partitions are used.

    • partitionSampling has no effect for non-partitioned tables.