Determine the optimal number of groups for a feature.

optimal_ngroups(
  pd,
  lambda,
  max_ngrps = 15,
  search_grid = seq_len(min(length(unique(pd$y)), max_ngrps))
)

Arguments

pd

Data frame containing the partial dependence effect as returned by get_pd.

lambda

The complexity parameter in the penalized loss function (see the accompanying research paper or R vignette for details on this aspect).

max_ngrps

Integer specifying the maximum number of groups that each feature's values/levels are allowed to be grouped into.

search_grid

Integer vector containing the grid of values to evaluate for the number of groups.

Value

Integer specifying the optimal number of groups. When multiple groupings lead to the lowest loss, the smallest value is returned.

Examples

if (FALSE) { data('mtpl_be') features <- setdiff(names(mtpl_be), c('id', 'nclaims', 'expo', 'long', 'lat')) set.seed(12345) gbm_fit <- gbm::gbm(as.formula(paste('nclaims ~', paste(features, collapse = ' + '))), distribution = 'poisson', data = mtpl_be, n.trees = 50, interaction.depth = 3, shrinkage = 0.1) gbm_fun <- function(object, newdata) mean(predict(object, newdata, n.trees = object$n.trees, type = 'response')) gbm_fit %>% get_pd(var = 'ageph', grid = 'ageph' %>% get_grid(data = mtpl_be), data = mtpl_be, subsample = 10000, fun = gbm_fun) %>% optimal_ngroups(lambda = 0.00001) }