Title: Survival Ensembles and Representative Trees for Cancer Prognostication
Speaker: Mousumi Banerjee, PhD
Research Professor of Biostatistics
School of Public Health & Comprehensive Cancer Center,
Director of Biostatistics
Center for Healthcare Outcomes and Policy
The University of Michigan, Ann Arbor, USA.
Abstract: Tree-based methods have become popular for analyzing right censored survival data where the primary goal is the prognostic stratification of patients. Ensemble techniques such as random forest improve the accuracy in prediction and address the instability in a single tree by growing an ensemble of trees and aggregating. However, individual trees are lost in the forest. This talk will first provide an overview of the methodological aspects of tree-based modeling in the censored data setting. Next, we propose a methodology for identifying the most representative trees in a forest for survival data, based on several tree distance metrics. For any two trees, the metrics are chosen to (1) measure similarity of the covariates used to split the trees; (2) reflect similar clustering of patients in the terminal nodes of the trees; and (3) measure similarity in predictions from the two trees. While the latter focuses on prediction, the first two metrics focus on the architectural similarity between two trees. The most representative trees in the forest are chosen based on the average distance between a tree and all other trees in the forest. Out of bag estimate of error rate is obtained using neighborhoods of representative trees. Simulations and data examples show gains in predictive accuracy when averaging over such neighborhoods. Although our focus is on trees for censored data, the ideas are also applicable to classification and regression trees. We illustrate our methods using data from a thyroid cancer study.