Regression Models with Missing Covariate Values
| Full Title: | Regression Models with Missing Covariate Values and Application to Maternal Pregnancy Data |
| Author: | Mohammad Shahed Masud |
| Batch: | 1 |
| Year: | 2001 |
| Supervisor: | M. Mushfiqur Rahman |
Regression analysis is used to identify the relationship between dependent and independent variables. Especially in biostatistics and epidemiology dependent variable acts as a disease variable and independent variable acts as a risk factor for developing the disease. Linear regression analysis is used when dependent variable is continuous and approximately Guassian (normal). But now-a-days binary outcomes are familiar in many research fields relating to biostatistics and epidemiology. When the outcome variable is dichotomous, such as, presence or absence of disease, logistic regression model is used. In recent years, logistic regression model plays a very important role for identifying the risk factors for developing the disease.
Generally, missing data occur frequently in biostatistics, epidemiology, experimental design, econometrics etc. In the presence of missing data we cannot perform regression analysis usually. So, in the presence of missing data we have to adopt some methods, which are helpful for regression analysis. Three methods are available in the context of missing data. The easiest method is the complete case analysis. This method is not a good one because it ignores the observations with missing values. If the complete cases are not a random sub sample of the original observations then the results may be biased and involves loss of efficiency. Another method is the imputation method, which imputes values for the missing observations. The third method is model-based procedure. With the help of this method we can overcome the bias, created in complete case analysis. In this method first we specify a model for the observed data and missing data and finally we estimate the parameters by maximum likelihood method. Model-based procedure gives better result than complete case analysis and imputation method. But the limitation of this method is that it is applicable only when the data pattern is monotone. Recently a developed method called expectation-maximization (EM) algorithm is used for any pattern of missing data. In this method first we replace the missing values by estimated values and then we estimate the parameters. Using these new parameters we reestimate the missing values and we estimate the parameters again. We do this procedure until convergence. This method is one type of imputation method. This method gives ultimately the same result as model-based procedure. In this study we used the data collected by BIRPERHT (Bangladesh Institute of Research for Promotion of Essential & Reproductive Health and Technologies).
In our study at first we obtain the parameters and standard errors of complete case analysis, model-based procedure, imputation procedure (mean) and EM algorithm for linear regression and logistic regression. We compare the results obtained from the above four methods. In our result we see that model-based procedure gives better result than complete case analysis, imputation method and EM algorithm procedure. EM algorithm gives better result than mean imputation method when amount of missing values is large. EM algorithm can be used for any data pattern. So, we can say that in the context of missing data we use model-based procedure when the data pattern is monotone and when the data pattern is not monotone we use EM algorithm.
