Optimal number of groups — optimal

Determine the optimal number of groups for a feature.

optimal_ngroups(
  pd,
  lambda,
  max_ngrps = 15,
  search_grid = seq_len(min(length(unique(pd$y)), max_ngrps))
)

Arguments

pd	Data frame containing the partial dependence effect as returned by `get_pd`.
lambda	The complexity parameter in the penalized loss function (see the accompanying research paper or R vignette for details on this aspect).
max_ngrps	Integer specifying the maximum number of groups that each feature's values/levels are allowed to be grouped into.
search_grid	Integer vector containing the grid of values to evaluate for the number of groups.

Value

Integer specifying the optimal number of groups. When multiple groupings lead to the lowest loss, the smallest value is returned.

Examples

if (FALSE) {
data('mtpl_be')
features <- setdiff(names(mtpl_be), c('id', 'nclaims', 'expo', 'long', 'lat'))
set.seed(12345)
gbm_fit <- gbm::gbm(as.formula(paste('nclaims ~',
                               paste(features, collapse = ' + '))),
                    distribution = 'poisson',
                    data = mtpl_be,
                    n.trees = 50,
                    interaction.depth = 3,
                    shrinkage = 0.1)
gbm_fun <- function(object, newdata) mean(predict(object, newdata, n.trees = object$n.trees, type = 'response'))
gbm_fit %>% get_pd(var = 'ageph',
                   grid = 'ageph' %>% get_grid(data = mtpl_be),
                   data = mtpl_be,
                   subsample = 10000,
                   fun = gbm_fun) %>%
            optimal_ngroups(lambda = 0.00001)
}