Title: | Tune Random Forest of the 'ranger' Package |
---|---|
Description: | Tuning random forest with one line. The package is mainly based on the packages 'ranger' and 'mlrMBO'. |
Authors: | Philipp Probst [aut, cre], Simon Klau [ctb] |
Maintainer: | Philipp Probst <[email protected]> |
License: | GPL-3 |
Version: | 0.7 |
Built: | 2024-11-18 06:25:59 UTC |
Source: | https://github.com/philipppro/tuneranger |
estimateTimeTuneRanger
estimateTimeTuneRanger( task, iters = 100, num.threads = 1, num.trees = 1000, respect.unordered.factors = "order" )
estimateTimeTuneRanger( task, iters = 100, num.threads = 1, num.trees = 1000, respect.unordered.factors = "order" )
task |
The mlr task created by makeClassifTask or makeRegrTask. |
iters |
Number of iterations. |
num.threads |
Number of threads. Default is 1. |
num.trees |
Number of trees. |
respect.unordered.factors |
Handling of unordered factor covariates. One of 'ignore', 'order' and 'partition'. 'order' is the default. |
estimated time for the tuning procedure
estimateTimeTuneRanger(iris.task)
estimateTimeTuneRanger(iris.task)
Restarts the tuning process if an error occured.
restartTuneRanger(save.file.path = "optpath.RData", task, measure = NULL)
restartTuneRanger(save.file.path = "optpath.RData", task, measure = NULL)
save.file.path |
File name in the current working directory to which interim results were saved by |
task |
The mlr task created by |
measure |
Performance measure that was already used in the original |
A list with elements
recommended.pars |
Recommended hyperparameters. |
results |
A data.frame with all evaluated hyperparameters and performance and time results for each run. |
No model is build.
## Not run: library(tuneRanger) library(mlr) # iris is a bit nonsense here # A mlr task has to be created in order to use the package # the already existing iris task is used here estimateTimeTuneRanger(iris.task) # temporarily file name to save results path = tempfile() res = tuneRanger(iris.task, measure = list(multiclass.brier), num.trees = 1000, num.threads = 8, iters = 70, save.file.path = path) # Mean of best 5 % of the results res # Restart after failing in one of the iterations: res = restartTuneRanger(save.file.path = path, iris.task, measure = list(multiclass.brier)) ## End(Not run)
## Not run: library(tuneRanger) library(mlr) # iris is a bit nonsense here # A mlr task has to be created in order to use the package # the already existing iris task is used here estimateTimeTuneRanger(iris.task) # temporarily file name to save results path = tempfile() res = tuneRanger(iris.task, measure = list(multiclass.brier), num.trees = 1000, num.threads = 8, iters = 70, save.file.path = path) # Mean of best 5 % of the results res # Restart after failing in one of the iterations: res = restartTuneRanger(save.file.path = path, iris.task, measure = list(multiclass.brier)) ## End(Not run)
Similar to tuneRF in randomForest
but for ranger
.
tuneMtryFast( formula = NULL, data = NULL, dependent.variable.name = NULL, mtryStart = floor(sqrt(ncol(data) - 1)), num.treesTry = 50, stepFactor = 2, improve = 0.05, trace = TRUE, plot = TRUE, doBest = FALSE, ... )
tuneMtryFast( formula = NULL, data = NULL, dependent.variable.name = NULL, mtryStart = floor(sqrt(ncol(data) - 1)), num.treesTry = 50, stepFactor = 2, improve = 0.05, trace = TRUE, plot = TRUE, doBest = FALSE, ... )
formula |
Object of class formula or character describing the model to fit. Interaction terms supported only for numerical variables. |
data |
Training data of class data.frame, matrix, dgCMatrix (Matrix) or gwaa.data (GenABEL). |
dependent.variable.name |
Name of dependent variable, needed if no formula given. For survival forests this is the time variable. |
mtryStart |
starting value of mtry; default is the same as in |
num.treesTry |
number of trees used at the tuning step |
stepFactor |
at each iteration, mtry is inflated (or deflated) by this value |
improve |
the (relative) improvement in OOB error must be by this much for the search to continue |
trace |
whether to print the progress of the search |
plot |
whether to plot the OOB error as function of mtry |
doBest |
whether to run a forest using the optimal mtry found |
... |
options to be given to |
Provides fast tuning for the mtry hyperparameter.
Starting with the default value of mtry, search for the optimal value (with respect to Out-of-Bag error estimate) of mtry for randomForest.
If doBest=FALSE (default), it returns a matrix whose first column contains the mtry values searched, and the second column the corresponding OOB error.
If doBest=TRUE, it returns the ranger
object produced with the optimal mtry.
library(tuneRanger) data(iris) res <- tuneMtryFast(Species ~ ., data = iris, stepFactor = 1.5)
library(tuneRanger) data(iris) res <- tuneMtryFast(Species ~ ., data = iris, stepFactor = 1.5)
Automatic tuning of random forests of the ranger
package with one line of code.
tuneRanger( task, measure = NULL, iters = 70, iters.warmup = 30, time.budget = NULL, num.threads = NULL, num.trees = 1000, parameters = list(replace = FALSE, respect.unordered.factors = "order"), tune.parameters = c("mtry", "min.node.size", "sample.fraction"), save.file.path = NULL, build.final.model = TRUE, show.info = getOption("mlrMBO.show.info", TRUE) )
tuneRanger( task, measure = NULL, iters = 70, iters.warmup = 30, time.budget = NULL, num.threads = NULL, num.trees = 1000, parameters = list(replace = FALSE, respect.unordered.factors = "order"), tune.parameters = c("mtry", "min.node.size", "sample.fraction"), save.file.path = NULL, build.final.model = TRUE, show.info = getOption("mlrMBO.show.info", TRUE) )
task |
The mlr task created by |
measure |
Performance measure to evaluate/optimize. Default is brier score for classification and mse for regression. Can be changed to accuracy, AUC or logaritmic loss by setting it to |
iters |
Number of iterations. Default is 70. |
iters.warmup |
Number of iterations for the warmup. Default is 30. |
time.budget |
Running time budget in seconds. Note that the actual mbo run can take more time since the condition is checked after each iteration. The default NULL means: There is no time budget. |
num.threads |
Number of threads. Default is number of CPUs available. |
num.trees |
Number of trees. |
parameters |
Optional list of fixed named parameters that should be passed to |
tune.parameters |
Optional character vector of parameters that should be tuned. Default is mtry, min.node.size and sample.fraction. Additionally replace and respect.unordered.factors can be included in the tuning process. |
save.file.path |
File to which interim results are saved (e.g. "optpath.RData") in the current working directory.
Default is NULL, which does not save the results. If a file was specified and one iteration fails the algorithm can be
started again with |
build.final.model |
[ |
show.info |
Verbose mlrMBO output on console? Default is |
Model based optimization is used as tuning strategy and the three parameters min.node.size, sample.fraction and mtry are tuned at once. Out-of-bag predictions are used for evaluation, which makes it much faster than other packages and tuning strategies that use for example 5-fold cross-validation. Classification as well as regression is supported. The measure that should be optimized can be chosen from the list of measures in mlr: mlr tutorial
A list with elements
recommended.pars |
Recommended hyperparameters. |
results |
A data.frame with all evaluated hyperparameters and performance and time results for each run. |
model |
The final model if |
estimateTimeTuneRanger
for time estimation and restartTuneRanger
for continuing the algorithm if there was an error.
## Not run: library(tuneRanger) library(mlr) # A mlr task has to be created in order to use the package data(iris) iris.task = makeClassifTask(data = iris, target = "Species") # Estimate runtime estimateTimeTuneRanger(iris.task) # Tuning res = tuneRanger(iris.task, measure = list(multiclass.brier), num.trees = 1000, num.threads = 2, iters = 70, save.file.path = NULL) # Mean of best 5 % of the results res # Model with the new tuned hyperparameters res$model # Prediction predict(res$model, newdata = iris[1:10,]) ## End(Not run)
## Not run: library(tuneRanger) library(mlr) # A mlr task has to be created in order to use the package data(iris) iris.task = makeClassifTask(data = iris, target = "Species") # Estimate runtime estimateTimeTuneRanger(iris.task) # Tuning res = tuneRanger(iris.task, measure = list(multiclass.brier), num.trees = 1000, num.threads = 2, iters = 70, save.file.path = NULL) # Mean of best 5 % of the results res # Model with the new tuned hyperparameters res$model # Prediction predict(res$model, newdata = iris[1:10,]) ## End(Not run)