Determine the optimal number of groups for a feature.
optimal_ngroups( pd, lambda, max_ngrps = 15, search_grid = seq_len(min(length(unique(pd$y)), max_ngrps)) )
pd | Data frame containing the partial dependence effect as returned by
|
---|---|
lambda | The complexity parameter in the penalized loss function (see the accompanying research paper or R vignette for details on this aspect). |
max_ngrps | Integer specifying the maximum number of groups that each feature's values/levels are allowed to be grouped into. |
search_grid | Integer vector containing the grid of values to evaluate for the number of groups. |
Integer specifying the optimal number of groups. When multiple groupings lead to the lowest loss, the smallest value is returned.
if (FALSE) { data('mtpl_be') features <- setdiff(names(mtpl_be), c('id', 'nclaims', 'expo', 'long', 'lat')) set.seed(12345) gbm_fit <- gbm::gbm(as.formula(paste('nclaims ~', paste(features, collapse = ' + '))), distribution = 'poisson', data = mtpl_be, n.trees = 50, interaction.depth = 3, shrinkage = 0.1) gbm_fun <- function(object, newdata) mean(predict(object, newdata, n.trees = object$n.trees, type = 'response')) gbm_fit %>% get_pd(var = 'ageph', grid = 'ageph' %>% get_grid(data = mtpl_be), data = mtpl_be, subsample = 10000, fun = gbm_fun) %>% optimal_ngroups(lambda = 0.00001) }