# pca {#pca}

**Parent topic:**[Functions](../../Functions/category.md)

## Syntax {#syntax}

`pca(X, [colNames], [k], [normalize], [maxIter], [svdSolver], [randomState])`

## Arguments {#arguments}

**ds** is one or multiple data source. It is usually generated by function [sqlDS](../s/sqlDS.md).

**colNames** is a string vector indicating column names. The default value is the names of all columns in *ds*.

**k** is a positive integer indicating the number of principal components. The default value is the number of columns in *ds*.

**normalize** is a Boolean value indicating whether to normalize each column. The default value is false.

**maxIter** is a positive integer indicating the number of iterations when *svdSolver*="randomized". If it is not specified, *maxIter*=7 if *k*&lt;0.1\*cols and *maxIter*=7 otherwise. Here cols means the number of columns in *ds*.

**svdSolver** is a string. It can take the value of "full", "randomized" or "auto". *svdSolver*="full" is suitable for situations where *k* is close to size\(*colNames*\); *svdSolver*="randomized" is suitable for situations where *k* is much smaller than size\(*colNames*\). The default value is "auto", which means the system automatically determines whether to use "full" or "randomized".

**randomState** is an integer indicating the random seed. It only takes effect when set *svdSolver*="randomized". The default value is `int(time(now()))`.

## Details {#details}

Conduct principal component analysis for the specified columns of the data source. Return a dictionary with the following keys:

-   components: the matrix of principal component coefficients with size\(*colNames*\) rows and *k* columns.

-   explainedVarianceRatio: a vector of length *k* with the percentage of the total variance explained by each of the first *k* principal component.

-   singularValues: a vector of length *k* with the principal component variances \(eigenvalues of the covariance matrix\).


## Examples {#examples}

```
x = [7,1,1,0,5,2]
y = [0.7, 0.9, 0.01, 0.8, 0.09, 0.23]
t=table(x, y)
ds = sqlDS(<select * from t>);

pca(ds);

/* output:
components->
#0        #1
--------- ---------
-0.999883 0.015306
-0.015306 -0.999883
*/
```

```
explainedVarianceRatio->[0.980301,0.019699]
// output: singularValues->[6.110802,0.866243]
```

