A Comparative Study on Model Selection Criteria in GEE
| Full Title: | A Comparative Study on Model Selection Criteria in GEE: Monte Carlo Simulation and Application to Maternal Morbidity Data |
| Author: | Rozana Rahman |
| Batch: | 8 |
| Year: | 2009 |
| Supervisor: | Md. Anower Hossain |
Researchers are often interested in analyzing data that arise from longitudinal or clustered design where there exists correlation among observations of a given subject. In analyzing longitudinal data, this dependence must be taken into account to avoid misleading inference. But if the outcomes are binary or counts, general maximum likelihood based approaches are less tractable. To overcome this difficulty, Generalized Estimating Equations (GEE) was suggested by Liang and Zeger (1986).
Model (or variable) selection is an essential part of any statistical analysis. Since a GEE model does not specify a likelihood structure, traditional model selection criteria are not well defined in the GEE approach. In last decade, modified Akaike’s Information Criterion (mAIC), modified Bayesian Information Criterion (mBIC), and extended Mallow’s Cp are suggested by Pan (2001), Dziaka and Li (2007) and Cantoni et al (2007), respectively for GEE.
In this study, first, we analyze the prospective maternal morbidity data collected by Bangladesh Institute of Research for Promotion of Essential and Reproductive Health Technologies (BIRPERHT) during November 1992 to December 1993, by using the GEE approach. Our main goal is to compare the above mentioned model selection criteria of GEE to find the suitable one. For this we conduct an extensive Monte Carlo simulation study to examine the relative performance of these criteria to select the best underlying model. We find that the GCp perform better than other criteria for number of subjects 200 and above, irrespective of the value of the correlation parameter. The mBIC shows better performance when the number of individual is greater than 50 and less than 200, irrespective of the value correlation parameter. When there exists moderate to large correlation among responses of given subjects, mBIC also performs well than the other criteria for number of subjects less than 50. If the value of the correlation parameter is small, mAIC performs better to select the best model for number of individual equals to or less than 50.
Finally, considering the simulation study results, we apply GCp criterion to maternal morbidity data to select best underlying model and find that model for major pregnancy complications with covariates - education of the respondent, gainful employment, whether wanted the index pregnancy and food supplement appears to be the best choice among all possible models.
