ridgeBasic
Syntax
ridgeBasic(Y, X, [mode=0], [alpha=1.0], [intercept=true], [normalize=false],
[maxIter=1000], [tolerance=0.0001], [solver='svd'], [swColName])
Details
Perform Ridge regression.
Minimize the following objective function:
Parameters
Y is a numeric vector indicating the dependent variable.
X is a numeric vector/tuple/matrix/table indicating the independent variable.
-
When X is a vector/tuple, it must be of the same length as Y.
-
When X is a matrix/table, the number of rows must be the same as the length of Y.
modeis an integer indicating the contents in the output. It can be:
-
0 (default): a vector of the coefficient estimates.
-
1: a table with coefficient estimates, standard error, t-statistics, and p-values.
-
2: a dictionary with the following keys: ANOVA, RegressionStat, Coefficient, and Residual.
|
Source of Variance |
DF (degree of freedom) |
SS (sum of square) |
MS (mean of square) |
F (F-score) |
Significance |
|---|---|---|---|---|---|
| Regression | p | sum of squares regression, SSR | regression mean square, MSR=SSR/R | MSR/MSE | p-value |
| Residual | n-p-1 | sum of squares error, SSE | mean square error, MSE=MSE/E | ||
| Total | n-1 | sum of squares total, SST |
|
Item |
Description |
|---|---|
| R2 | R-squared |
| AdjustedR2 | The adjusted R-squared corrected based on the degrees of freedom by comparing the sample size to the number of terms in the regression model. |
| StdError | The residual standard error/deviation corrected based on the degrees of freedom. |
| Observations | The sample size. |
|
Item |
Description |
|---|---|
| factor | Independent variables |
| beta | Estimated regression coefficients |
| StdError | Standard error of the regression coefficients |
| tstat | t statistic, indicating the significance of the regression coefficients |
Residual: the difference between each predicted value and the actual value.
alpha(optional) is a floating number representing the constant that multiplies the L1-norm. The default value is 1.0.
intercept (optional) is a Boolean value indicating whether to include the intercept in the regression. The default value is true.
normalize (optional) is a Boolean value. If true, the regressors will be normalized before regression by subtracting the mean and dividing by the L2-norm. If intercept=false, this parameter will be ignored. The default value is false.
maxIter (optional) is a positive integer indicating the maximum number of iterations. The default value is 1000.
tolerance (optional) is a floating number. The iterations stop when the improvement in the objective function value is smaller than tolerance. The default value is 0.0001.
solver (optional) is a string indicating the solver to use in the computation. It can be either 'svd' or 'cholesky'. It ds is a list of data sources, solver must be 'cholesky'.
swColName (optional) is a STRING indicating a column name of ds. The specified column is used as the sample weight. If it is not specified, the sample weight is treated as 1.
Returns
A vector, table, or dictionary, depending on the mode parameter
Examples
Example 1: Set mode=0 to output a coefficient estimate vector.
Y = [225.72, -76.20, 63.09, 139.45, -65.55]
X0 = [2.24, -0.85, 0.40, 1.45, -0.98]
X1 = [0.98, 0.31, 1.76, 0.14, 1.87]
coefficients = ridgeBasic(Y, [X0, X1], mode=0, alpha=0.5, intercept=true)
coefficients
// Output:[7.940468476954727, 88.20426761349431, 9.380634942436586]
Example 2: Set mode=1 to output a dictionary containing ANOVA (analysis of variance), RegressionStat, Coefficient, and Residual.
Y = [1.5, 2.3, 4.7, 3.2, 5.1]
X = matrix([1.1, 2.2, 3.1, 2.8, 4.0], [0.5, 0.8, 1.2, 1.0, 1.5])
result = ridgeBasic(Y, X, mode=2, alpha=0.8, solver='svd')
View analysis of variance.
result[`ANOVA]
| Breakdown | DF | SS | MS | F | Significance |
|---|---|---|---|---|---|
| Regression | 2 | 6.439542296188023 | 3.2197711480940114 | 6.3847474657971865 | 0.1354142446483848 |
| Residual | 2 | 1.0085821452899069 | 0.5042910726449534 | ||
| Total | 4 | 9.432000000000016 |
View regression statistics.
result[`RegressionStat]
| item | statistics |
|---|---|
| R2 | 0.6827334919622574 |
| AdjustedR2 | 0.3654669839245148 |
| StdError | 0.7101345454524469 |
| Observations | 5 |
View regression coefficients.
result[`Coefficient]
| factor | beta | stdError | tstat | pvalue |
|---|---|---|---|---|
| intercept | 0.22075506589464267 | 1.084447930898624 | 0.20356446778566378 | 0.8575265882132079 |
| beta0 | 1.0193512860448009 | 2.662502504483437 | 0.3828545829828503 | 0.7386872800445148 |
| beta1 | 0.44815753894708277 | 7.540425982186433 | 0.05943398158218302 | 0.9580108927931805 |
View residuals.
result[`Residual]
// Output: [-0.06612025001746513, -0.5218539263508712, 0.7814669006299755, -0.32309620576716735, 0.12960348150553003]
