pca#
- swordfish.function.pca()#
Conduct principal component analysis for the specified columns of the data source. Return a dictionary with the following keys:
components: the matrix of principal component coefficients with size(colNames) rows and k columns.
explainedVarianceRatio: a vector of length k with the percentage of the total variance explained by each of the first k principal component.
singularValues: a vector of length k with the principal component variances (eigenvalues of the covariance matrix).
- Parameters:
X (Constant) – One or multiple data source. It is usually generated by function sqlDS.
colNames (Constant, optional) – A string vector indicating column names. The default value is the names of all columns in ds.
k (Constant, optional) – A positive integer indicating the number of principal components. The default value is the number of columns in ds.
normalize (Constant, optional) – A Boolean value indicating whether to normalize each column. The default value is false.
maxIter (Constant, optional) – A positive integer indicating the number of iterations when svdSolver=”randomized”. If it is not specified, maxIter=7 if k<0.1*cols and maxIter=7 otherwise. Here cols means the number of columns in ds.
svdSolver (Constant, optional) – A string. It can take the value of “full”, “randomized” or “auto”. svdSolver=”full” is suitable for situations where k is close to size(colNames); svdSolver=”randomized” is suitable for situations where k is much smaller than size(colNames). The default value is “auto”, which means the system automatically determines whether to use “full” or “randomized”.
randomState (Constant, optional) – An integer indicating the random seed. It only takes effect when set svdSolver=”randomized”. The default value is int(time(now())).