Title: | RF Variable Importance for Arbitrary Measures |
---|---|
Description: | Computes the random forest variable importance (VIMP) for the conditional inference random forest (cforest) of the 'party' package. Includes a function (varImp) that computes the VIMP for arbitrary measures from the 'measures' package. For calculating the VIMP regarding the measures accuracy and AUC two extra functions exist (varImpACC and varImpAUC). |
Authors: | Philipp Probst [aut, cre], Silke Janitza [ctb] |
Maintainer: | Philipp Probst <[email protected]> |
License: | GPL-3 |
Version: | 0.4 |
Built: | 2024-11-04 03:18:29 UTC |
Source: | https://github.com/philipppro/varimp |
Computes the variable importance for arbitrary measures from the 'measures' package.
varImp( object, mincriterion = 0, conditional = FALSE, threshold = 0.2, nperm = 1, OOB = TRUE, pre1.0_0 = conditional, measure = "multiclass.Brier", ... )
varImp( object, mincriterion = 0, conditional = FALSE, threshold = 0.2, nperm = 1, OOB = TRUE, pre1.0_0 = conditional, measure = "multiclass.Brier", ... )
object |
An object as returned by cforest. |
mincriterion |
The value of the test statistic or 1 - p-value that must be exceeded in order to include a split in the computation of the importance. The default mincriterion = 0 guarantees that all splits are included. |
conditional |
a logical determining whether unconditional or conditional computation of the importance is performed. |
threshold |
The threshold value for (1 - p-value) of the association between the variable of interest and a covariate, which must be exceeded inorder to include the covariate in the conditioning scheme for the variable of interest (only relevant if conditional = TRUE). A threshold value of zero includes all covariates. |
nperm |
The number of permutations performed. |
OOB |
A logical determining whether the importance is computed from the out-of-bag sample or the learning sample (not suggested). |
pre1.0_0 |
Prior to party version 1.0-0, the actual data values were permuted according to the original permutation importance suggested by Breiman (2001). Now the assignments to child nodes of splits in the variable of interest are permuted as described by Hapfelmeier et al. (2012), which allows for missing values in the explanatory variables and is more efficient wrt memory consumption and computing time. This method does not apply to conditional variable importances. |
measure |
The name of the measure of the 'measures' package that should be used for the variable importance calculation. |
... |
Further arguments (like positive or negativ class) that are needed by the measure. |
Many measures have not been tested for the usefulness of random forests variable importance. Use at your own risk.
Vector with computed permutation importance for each variable.
# multiclass case data(iris) iris.cf = cforest(Species ~ ., data = iris, control = cforest_unbiased(mtry = 2, ntree = 50)) set.seed(123) vimp = varImp(object = iris.cf, measure = "multiclass.Brier") vimp
# multiclass case data(iris) iris.cf = cforest(Species ~ ., data = iris, control = cforest_unbiased(mtry = 2, ntree = 50)) set.seed(123) vimp = varImp(object = iris.cf, measure = "multiclass.Brier") vimp
Computes the variable importance regarding the accuracy (ACC).
varImpACC( object, mincriterion = 0, conditional = FALSE, threshold = 0.2, nperm = 1, OOB = TRUE, pre1.0_0 = conditional )
varImpACC( object, mincriterion = 0, conditional = FALSE, threshold = 0.2, nperm = 1, OOB = TRUE, pre1.0_0 = conditional )
object |
An object as returned by cforest. |
mincriterion |
The value of the test statistic or 1 - p-value that must be exceeded in order to include a split in the computation of the importance. The default mincriterion = 0 guarantees that all splits are included. |
conditional |
The value of the test statistic or 1 - p-value that must be exceeded in order to include a split in the computation of the importance. The default mincriterion = 0 guarantees that all splits are included. |
threshold |
The threshold value for (1 - p-value) of the association between the variable of interest and a covariate, which must be exceeded inorder to include the covariate in the conditioning scheme for the variable of interest (only relevant if conditional = TRUE). A threshold value of zero includes all covariates. |
nperm |
The number of permutations performed. |
OOB |
A logical determining whether the importance is computed from the out-of-bag sample or the learning sample (not suggested). |
pre1.0_0 |
Prior to party version 1.0-0, the actual data values were permuted according to the original permutation importance suggested by Breiman (2001). Now the assignments to child nodes of splits in the variable of interest are permuted as described by Hapfelmeier et al. (2012), which allows for missing values in the explanatory variables and is more efficient wrt memory consumption and computing time. This method does not apply to conditional variable importances. |
Vector with computed permutation importance for each variable
data(iris) iris2 = iris iris2$Species = factor(iris$Species == "versicolor") iris.cf = cforest(Species ~ ., data = iris2,control = cforest_unbiased(mtry = 2, ntree = 50)) set.seed(123) a = varImpACC(object = iris.cf)
data(iris) iris2 = iris iris2$Species = factor(iris$Species == "versicolor") iris.cf = cforest(Species ~ ., data = iris2,control = cforest_unbiased(mtry = 2, ntree = 50)) set.seed(123) a = varImpACC(object = iris.cf)
Computes the variable importance regarding the AUC. Bindings are not taken into account in the AUC definition as they did not provide as good results as the version without bindings in the paper of Janitza et. al (2013) (see References section).
varImpAUC( object, mincriterion = 0, conditional = FALSE, threshold = 0.2, nperm = 1, OOB = TRUE, pre1.0_0 = conditional )
varImpAUC( object, mincriterion = 0, conditional = FALSE, threshold = 0.2, nperm = 1, OOB = TRUE, pre1.0_0 = conditional )
object |
An object as returned by cforest. |
mincriterion |
The value of the test statistic or 1 - p-value that must be exceeded in order to include a split in the computation of the importance. The default mincriterion = 0 guarantees that all splits are included. |
conditional |
The value of the test statistic or 1 - p-value that must be exceeded in order to include a split in the computation of the importance. The default mincriterion = 0 guarantees that all splits are included. |
threshold |
The threshold value for (1 - p-value) of the association between the variable of interest and a covariate, which must be exceeded inorder to include the covariate in the conditioning scheme for the variable of interest (only relevant if conditional = TRUE). A threshold value of zero includes all covariates. |
nperm |
The number of permutations performed. |
OOB |
A logical determining whether the importance is computed from the out-of-bag sample or the learning sample (not suggested). |
pre1.0_0 |
Prior to party version 1.0-0, the actual data values were permuted according to the original permutation importance suggested by Breiman (2001). Now the assignments to child nodes of splits in the variable of interest are permuted as described by Hapfelmeier et al. (2012), which allows for missing values in the explanatory variables and is more efficient wrt memory consumption and computing time. This method does not apply to conditional variable importances. |
For using the original AUC definition and multiclass AUC you can use the varImp function and specify the particular measure.
Vector with computed permutation importance for each variable
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-119
# multiclass case data(iris) iris2 = iris iris2$Species = factor(iris$Species == "versicolor") iris.cf = cforest(Species ~ ., data = iris2,control = cforest_unbiased(mtry = 2, ntree = 50)) set.seed(123) varImpAUC(object = iris.cf)
# multiclass case data(iris) iris2 = iris iris2$Species = factor(iris$Species == "versicolor") iris.cf = cforest(Species ~ ., data = iris2,control = cforest_unbiased(mtry = 2, ntree = 50)) set.seed(123) varImpAUC(object = iris.cf)
Computes the variable importance for ranger models and for arbitrary measures from the 'measures' package.
varImpRanger(object, data, target, nperm = 1, measure = "multiclass.Brier")
varImpRanger(object, data, target, nperm = 1, measure = "multiclass.Brier")
object |
An object as returned by cforest. |
data |
Original data that was used for training the random forest. |
target |
Target variable as used in the trained model. |
nperm |
The number of permutations performed. |
measure |
The name of the measure of the 'measures' package that should be used for the variable importance calculation. |
Vector with computed permutation importance for each variable.
## Not run: library(ranger) iris.rg = ranger(Species ~ ., data = iris, keep.inbag = TRUE, probability = TRUE) vimp.ranger = varImpRanger(object = iris.rg, data = iris, target = "Species") vimp.ranger ## End(Not run)
## Not run: library(ranger) iris.rg = ranger(Species ~ ., data = iris, keep.inbag = TRUE, probability = TRUE) vimp.ranger = varImpRanger(object = iris.rg, data = iris, target = "Species") vimp.ranger ## End(Not run)