logisticRegression

Syntax

logisticRegression(ds, yColName, xColNames, [intercept=true], [initTheta], [tolerance=1e-3], [maxIter=500], [regularizationCoeff=1.0])

Arguments

ds is the data source to be trained. It can be generated with function sqlDS.

yColName is a string indicating the category column name.

xColNames is a string scalar/vector indicating the names of independent variables.

intercept is a Boolean scalar indicating whether the regression uses an intercept. The default value is true, which means that a column of 1s is added to the independent variables.

initTheta is a vector indicating the initial values of the parameters when the iterations begin. The default value is a vector of zeroes with the length of xColNames.size()+intercept.

tolerance is a numeric scalar. If the difference in the value of the log likelihood functions of 2 adjacent iterations is smaller than tolerance, the iterations would stop. The default value is 0.001.

maxIter is a positive integer indicating the maximum number of iterations. The iterations will stop if the number of iterations reaches maxIter. The default value is 500.

regularizationCoeff is a positive number indicating the coefficient of the regularization term. The default value is 1.0.

intercept, initTheta, tolerance, maxIter, regularizationCoeff are optional.

Details

Fit a logistic regression model. The result is a dictionary with the following keys: iterations, modelName, coefficients, tolerance, logLikelihood, xColNames and intercept. iterations is the number of iterations, modelName is "Logistic Regression", coefficients is a vector of the parameter estimates, logLikelihood is the final value of the log likelihood function.

The fitted model can be used as an input for function predict.

Examples

Fit a logistic regression model with simulated data:

t = table(100:0, `y`x0`x1, [INT,DOUBLE,DOUBLE])
y = take(0, 50)
x0 = norm(-1.0, 1.0, 50)
x1 = norm(-1.0, 1.0, 50)
insert into t values (y, x0, x1)
y = take(1, 50)
x0 = norm(1.0, 1.0, 50)
x1 = norm(1.0, 1.0, 50)
insert into t values (y, x0, x1)

model = logisticRegression(sqlDS(<select * from t>), `y, `x0`x1);

/* output:
modelName->Logistic Regression
logLikelihood->-23.269132
intercept->true
coefficients->[1.377971,1.914001,-0.305114]
xColNames->[x0,x1]
iterations->7
tolerance->0.001
*/

Use the fitted model in forecasting:

predict(model, t);

Save the fitted model to disk:

saveModel(model, "C:/DolphinDB/data/logisticModel.txt");

Load a saved model:

loadModel("C:/DolphinDB/data/logisticModel.txt");