pca#

swordfish.function.pca()#

Conduct principal component analysis for the specified columns of the data source. Return a dictionary with the following keys:

  • components: the matrix of principal component coefficients with size(colNames) rows and k columns.

  • explainedVarianceRatio: a vector of length k with the percentage of the total variance explained by each of the first k principal component.

  • singularValues: a vector of length k with the principal component variances (eigenvalues of the covariance matrix).

Parameters:
  • X (Constant) – One or multiple data source. It is usually generated by function sqlDS.

  • colNames (Constant, optional) – A string vector indicating column names. The default value is the names of all columns in ds.

  • k (Constant, optional) – A positive integer indicating the number of principal components. The default value is the number of columns in ds.

  • normalize (Constant, optional) – A Boolean value indicating whether to normalize each column. The default value is false.

  • maxIter (Constant, optional) – A positive integer indicating the number of iterations when svdSolver=”randomized”. If it is not specified, maxIter=7 if k<0.1*cols and maxIter=7 otherwise. Here cols means the number of columns in ds.

  • svdSolver (Constant, optional) – A string. It can take the value of “full”, “randomized” or “auto”. svdSolver=”full” is suitable for situations where k is close to size(colNames); svdSolver=”randomized” is suitable for situations where k is much smaller than size(colNames). The default value is “auto”, which means the system automatically determines whether to use “full” or “randomized”.

  • randomState (Constant, optional) – An integer indicating the random seed. It only takes effect when set svdSolver=”randomized”. The default value is int(time(now())).