Grouping of feature values/levels by binning continuous/ordinal features and clustering nominal features. Partial dependencies are used to perform the grouping of feature values/levels with similar behavior in a data-driven way.
group_pd(pd, ngroups) group_pd_ckseg(pd, ngroups) group_pd_ckmns(pd, ngroups)
pd | Data frame containing the partial dependence effect as returned by
|
---|---|
ngroups | Integer specifying the number of groups. |
Tidy data frame (i.e., a "tibble" object) supplied in pd
with
three additional columns: xgrp, ygrp and wgrp. Column xgrp
contains
feature groups, column ygrp
the average partial dependence for the
group and wgrp
the sum of observation counts for the group.
group_pd_ckseg
: Grouping via Cksegs.1d.dp
.
group_pd_ckmns
: Grouping via Ckmeans.1d.dp
.
if (FALSE) { data('mtpl_be') features <- setdiff(names(mtpl_be), c('id', 'nclaims', 'expo', 'long', 'lat')) set.seed(12345) gbm_fit <- gbm::gbm(as.formula(paste('nclaims ~', paste(features, collapse = ' + '))), distribution = 'poisson', data = mtpl_be, n.trees = 50, interaction.depth = 3, shrinkage = 0.1) gbm_fun <- function(object, newdata) mean(predict(object, newdata, n.trees = object$n.trees, type = 'response')) gbm_fit %>% get_pd(var = 'ageph', grid = get_grid(var = 'ageph', data = mtpl_be), data = mtpl_be, subsample = 10000, fun = gbm_fun) %>% group_pd(ngroups = 5) }