logisticRegression
Syntax
logisticRegression(ds, yColName, xColNames, [intercept=true], [initTheta],
[tolerance=1e-3], [maxIter=500], [regularizationCoeff=1.0],
[numClasses=2])
Details
Analyzes the relationship between multiple predictor variables (xColNames) and
a target outcome variable (yColName) using logistic regression. The resulting
model is compatible with the predict function, allowing you to
easily classify new observations based on the learned patterns from your training
data.
Parameters
ds is the data source to be trained. It can be generated with function sqlDS.
yColName is a string specifying the column name in ds that represents dependent variables (category).
xColNames is a string scalar/vector specifying the column name(s) in ds that represents independent variables.
intercept (optional) is a boolean scalar specifying whether the regression uses an intercept. The default value is true, which means that a column of 1s is added to the independent variables.
initTheta (optional) is a vector indicating the initial values of the parameters when the iterations begin. The default value is a vector of zeroes with the length of xColNames.size()+intercept.
tolerance (optional) is a numeric scalar. If the difference in the value of the log likelihood functions of 2 adjacent iterations is smaller than tolerance, the iterations would stop. The default value is 0.001.
maxIter (optional) is a positive integer indicating the maximum number of iterations. The iterations will stop if the number of iterations reaches maxIter. The default value is 500.
regularizationCoeff (optional) is a positive number indicating the coefficient of the regularization term. The default value is 1.0.
numClasses (optional) is an integer no less than 2, indicating the number of categories for the dependent variable. The default value is 2.
Returns
A dictionary with the following keys:
-
modelName: A string indicating the name of the model, i.e., "Logistic Regression".
-
tolerance: A floating-point number indicating the threshold for stopping the iteration.
-
xColNames: A string scalar or vector indicating the independent variables.
-
intercept: A boolean value indicating whether the regression includes an intercept term.
-
numClasses: A positive integer indicating the number of categories for the dependent variable.
-
coefficients: The estimated model parameters for each category of the dependent variable:
-
For binary problems (numClasses = 2), returns a vector indicating the coefficients for the positive class.
-
For multi-class problems (numClasses > 2), returns a matrix where the i-th row indicates the coefficients for the i-th class.
-
-
iterations: The number of iterations for each category of the dependent variable:
-
For binary problems (numClasses = 2), returns an integer indicating the number of iterations for the positive class.
-
For multi-class problems (numClasses > 2), returns a vector where the i-th element indicates the number of iterations for the i-th class.
-
-
logLikelihood: The final log-likelihood value for each category of the dependent variable after iteration:
-
For binary problems (numClasses = 2), returns a floating-point value indicating the log-likelihood for the positive class.
-
For multi-class problems (numClasses > 2), returns a vector where the i-th element indicates the log-likelihood for the i-th class.
-
-
predict: A callable function for making predictions on new data.
Examples
Fit a logistic regression model with simulated data:
t = table(100:0, `y`x0`x1, [INT,DOUBLE,DOUBLE])
y = take(0, 50)
x0 = norm(-1.0, 1.0, 50)
x1 = norm(-1.0, 1.0, 50)
insert into t values (y, x0, x1)
y = take(1, 50)
x0 = norm(1.0, 1.0, 50)
x1 = norm(1.0, 1.0, 50)
insert into t values (y, x0, x1)
model = logisticRegression(sqlDS(<select * from t>), `y, `x0`x1);
/* output:
modelName->Logistic Regression
tolerance->0.001
xColNames->["x0","x1"]
intercept->1
numClasses->2
coefficients->[1.279258204733693,1.58441731843656,-0.107855546076936]
iterations->[5]
logLikelihood->[20.774272153640865]
predict->logisticRegressionPredict
*/
Use the fitted model in forecasting:
predict(model, t);
Save the fitted model to disk:
saveModel(model, "C:/DolphinDB/data/logisticModel.txt");
Load a saved model:
loadModel("C:/DolphinDB/data/logisticModel.txt");
