XGBoost Plugin
The DolphinDB zip plugin has different branches, such as release200 and release130. Each branch corresponds to a DolphinDB server version. Please make sure you are in the correct branch of the plugin documentation.
DolphinDB XGBoost plugin offers methods for model training and prediction with given DolphinDB tables. You can also use the methods to save or load the trained models.
Install the Plugin
You can download precompiled binaries libPluginXgboost.dll or libPluginXgboost.so to install the DolphinDB XGBoost Plugin directly. To manually compile a XGBoost plugin, please follow the instructions:
Compile on Linux
- Build a static library
The following steps show how to compile a static library
(1) Download XGBoost from GitHub:
git clone --recursive https://github.com/dmlc/xgboost
(2) Use CMake to build a static library:
cd xgboost
mkdir build
cd build
cmake .. -G "MinGW Makefiles" -DBUILD_STATIC_LIB=ON
make
The static libraries are stored in xgboost/lib/xgboost.a, xgboost/build/rabit/librabit.a, xgboost/build/dmlc-core/libdmlc.a, respectively.
Create a directory of xgboost_static under the working directory of the project, and copy the header files as well as built libraries to the folder.
cd path_to/DolphinDBPlugin/xgboost
mkdir xgboost_static
cp path_to/xgboost/lib/xgboost.a xgboost_static/lib
cp path_to/xgboost/build/rabit/librabit.a xgboost_static/lib
cp path_to/xgboost/build/dmlc-core/libdmlc.a xgboost_static/lib
cp -r path_to/xgboost/include xgboost_static
Note: please replace the "path_to" in the above script with the plugin directory.
- Compile XGBoost plugin
Compile with CMakeLists
mkdir build
cd build
cmake .. -DLIBDOLPHINDB=path_to_libDolphinDB
make
Note: please replace the "path_to_libDolphinDB" in the above script with the path of DolphinDB server. If the file libDolphinDB.so is included in the g++ search path, then LIBDOLPHINDB is not required.
Compile on Windows
Note: Currently the XGBoost plugin for Windows only supports DolphinDB server (JIT). It is recommended to compile with 8.1.0-posix version of MinGW-w64.
- Build a static library
(1) Download XGBoost 1.2.0 from GitHub:
git clone -b release_1.2.0 https://github.com/dmlc/xgboost.git
git submodule update --init --recursive
(2) Build dynamic and static libraries with CMake
cd xgboost
mkdir build
cd build
cmake .. -G "MinGW Makefiles"
make
The libraries are stored in xgboost/lib/xgboost.dll, xgboost/build/rabit/libr abit.a, xgboost/build/dmlc-core/libdmlc.a, respectively.
Please note that the static files for Linux system are stored under the directory of xgboost_static provided by DolphinDB. For Windows users, please first delete the files under the folder, and then copy the relevant header files and static libraries built above to the folder. See the following code:
cd path_to/DolphinDBPlugin/xgboost
mkdir xgboost_static
cp path_to/xgboost/lib/xgboost.dll xgboost_static/lib
cp path_to/xgboost/build/rabit/librabit.a xgboost_static/lib
cp path_to/xgboost/build/dmlc-core/libdmlc.a xgboost_static/lib
cp -r path_to/xgboost/include xgboost_static
Note: please replace the "path_to" in the above script with the plugin directory.
- Compile XGBoost Plugin
Compile with CMakeLists:
mkdir build
cd build
cmake .. -G "MinGW Makefiles" -DLIBDOLPHINDB=path_to_libDolphinDB
make
Note: please replace the "path_to_libDolphinDB" in the above script with the server directory. If the file libDolphinDB.dll is in the library search path, it is not necessary to specify LIBDOLPHINDB.
After compiling the plugin, please copy the following dependencies to the sibling directory of libPluginXgboost.dll.
Assuming it is under the build directory:
cp path_to/xgboost/lib/xgboost.dll ./
cp path_to/mingw64/bin/libgomp-1.dll ./
Interfaces
xgboost::train
Syntax
xgboost::train(Y, X, [params], [numBoostRound=10], [xgbModel])
Parameters
- Y: a vector indicating the dependent variables.
- X: a matrix or table indicating the independent variables.
- params: a dictionary representing the parameters used for XGBoost training. See XGBoost Docs.
- numBoostRound: a positive integer indicating the number of boosting iterations.
- model: an XGBoost model (allows training continuation). You can obtain a model with
xgboost::train
, or load an existing model withxgboost::loadModel
.
Details
Train the given table or matrix. Return the trained model which can be used for further training or prediction.
xgboost::predict
Syntax
xgboost::predict(model, X, [outputMargin=false], [ntreeLimit=0], [predLeaf=false], [predContribs=false], [training=false])
Parameters
- model: an XGBoost model used for prediction. You can obtain a model with
xgboost::train
, or load an existing model withxgboost::loadModel
. - X: a matrix or table for prediction
- outputMargin: A Boolean value indicating whether to output the raw untransformed margin value.
- ntreeLimit: a non-negative interger indicating which layer of trees are used in prediction. The default value is 0, indicating all trees are used.
- predLeaf: a Boolean value. When this option is on, the output will be a matrix of (nsample, ntrees) with each record indicating the predicted leaf index of each sample in each tree.
- predContribs: a Boolean value. When this is True the output will be a matrix of size (nsample, nfeats + 1) with each record indicating the feature contributions (SHAP values) for that prediction. The sum of all feature contributions is equal to the raw untransformed margin value of the prediction.
- training: a Boolean value indicating whether the prediction value is used for training.
Details
Predict with the given matrix or table.
xgboost::saveModel
Syntax
xgboost::saveModel(model, path)
Parameters
- model: an XGBoost model to be saved.
- path: a string indicating where the model is saved.
Details
Save the trained model to disk.
xgboost::loadModel
Syntax
xgboost::loadModel(path)
Parameter
- path: a string indicating where the model is saved.
Details
Load the model from disk.
Examples
loadPlugin("path_to/PluginXgboost.txt")
// Create a table for training
t = table(1..5 as c1, 1..5 * 2 as c2, 1..5 * 3 as c3)
label = 1 2 9 28 65
// Set params
params = {objective: "reg:linear", max_depth: 5, eta: 0.1, min_child_weight: 1, subsample: 0.5, colsample_bytree: 1, num_parallel_tree: 1}
// Train the model
model = xgboost::train(label, t, params, 100)
// Predict with the model
xgboost::predict(model, t)
// Save the model
xgboost::saveModel(model, WORK_DIR + "/xgboost001.model")
// Load the model
model = xgboost::loadModel(WORK_DIR + "/xgboost001.model")
// Continue training on the model
model = xgboost::train(label, t, params, 100, model)