glm

Syntax

glm(ds, yColName, xColNames, [family], [link], [tolerance=1e-6], [maxIter=100])

Arguments

ds is the data source to be trained. It can be generated with function sqlDS.

yColName is a string indicating the dependent variable column.

xColNames is a STRING scalar/vector indicating the names of the indepenent variable columns.

family (optional) is a string indicating the type of distribution. It can be gaussian (default), poisson, gamma, inverseGuassian or binomial.

link (optional) is a string indicating the type of the link function.

Possible values of link and the dependent variable for each family:

family link default link dependent variable
gaussian identity, inverse, log identity DOUBLE type
poisson log, sqrt, identity log non-negative integer
gamma inverse, identity, log inverse y>=0
inverseGaussian nverseOfSquare, inverse, identity, log inverseOfSquare y>=0
binomial logit, probit logit y=0,1

tolerance (optional) is a numeric scalar. The iterations stops if the difference in the value of the log likelihood functions of 2 adjacent iterations is smaller than tolerance. The default value is 0.000001.

maxIter (optional) is a positive integer indicating the maximum number of iterations. The default value is 100.

Details

Fit a generalized linear model. The result is a dictionary with the following keys: coefficients, link, tolerance, family, xColNames, tolerance, modelName, residualDeviance, iterations and dispersion.

  • coefficients is a table with the coefficient estimate, standard deviation, t value and p value for each coefficient;
  • modelName is "Generalized Linear Model";
  • iterations is the number of iterations;
  • dispersion is the dispersion coefficient of the model.

Examples

Fit a generalized linear model model with simulated data:

x1 = rand(100.0, 100)
x2 = rand(100.0, 100)
b0 = 6
b1 = 1
b2 = -2
err = norm(0, 10, 100)
y = b0 + b1 * x1 + b2 * x2 + err
t = table(x1, x2, y)
model = glm(sqlDS(<select * from t>), `y, `x1`x2, `gaussian, `identity);
model;

/* output:
coefficients->

beta     stdError tstat      pvalue
-------- -------- ---------- --------
1.027483 0.032631 31.487543  0
-1.99913 0.03517  -56.842186 0
5.260677 2.513633 2.092858   0.038972

link->identity
tolerance->1.0E-6
family->gaussian
xColNames->["x1","x2"]
modelName->Generalized Linear Model
residualDeviance->8873.158697
iterations->5
dispersion->91.475863
*/

Use the fitted model in forecasting:

predict(model, t);

Save the fitted model to disk:

saveModel(model, "C:/DolphinDB/Data/GLMModel.txt");

Load a saved model:

loadModel("C:/DolphinDB/Data/GLMModel.txt");