Grouping of feature values/levels by binning continuous/ordinal features and clustering nominal features. Partial dependencies are used to perform the grouping of feature values/levels with similar behavior in a data-driven way.

group_pd(pd, ngroups)

group_pd_ckseg(pd, ngroups)

group_pd_ckmns(pd, ngroups)

Arguments

pd

Data frame containing the partial dependence effect as returned by get_pd.

ngroups

Integer specifying the number of groups.

Value

Tidy data frame (i.e., a "tibble" object) supplied in pd with three additional columns: xgrp, ygrp and wgrp. Column xgrp contains feature groups, column ygrp the average partial dependence for the group and wgrp the sum of observation counts for the group.

Functions

Examples

if (FALSE) { data('mtpl_be') features <- setdiff(names(mtpl_be), c('id', 'nclaims', 'expo', 'long', 'lat')) set.seed(12345) gbm_fit <- gbm::gbm(as.formula(paste('nclaims ~', paste(features, collapse = ' + '))), distribution = 'poisson', data = mtpl_be, n.trees = 50, interaction.depth = 3, shrinkage = 0.1) gbm_fun <- function(object, newdata) mean(predict(object, newdata, n.trees = object$n.trees, type = 'response')) gbm_fit %>% get_pd(var = 'ageph', grid = get_grid(var = 'ageph', data = mtpl_be), data = mtpl_be, subsample = 10000, fun = gbm_fun) %>% group_pd(ngroups = 5) }