Listy Biometryczne - Biometrical Letters Vol. 32(1995), No. 2, 61-80


Show full-size cover
REGRESSION MODELING OF THE POLISH MORTALITY
DATA 1989-1991: MORTALITY OF MEN


Anna Bartkowiak1 and Witold Kupść2

1 Institute of Computer Science, University of Wrocław, Przesmyckiego 20,
51-151 Wrocław
2Department of Epidemiology and Prevention of CVD, Institute of Cardiology,
Niemodlińska 33, 04-635 Warszawa


We consider the total mortality rates of male population as observed in 49 voivodeships of Poland. We look for a linear regression explaining the mortality in men (Y) as a linear function of nine variates denoted in the following as XI,...,XQ and reported for each voivodeship. The squared multiple correlation coefficient for the established regression is about 0.75. However, constructing a full regression of all the potential predictors and looking at the t statistics indicating the importance of the considered explanatory variables, we do not obtain an unique and univocal indication of which variables are good predictors for the considered /-variable, and which are not. It should be stressed, that the importance of variables - as indicated by the I statistics - is valid only in the context of the established regression; variables appearing in the established "full" regression as "nonsignificant" (in the meaning of the t statistics) can have big predictive power not indicated by the full regression. We illustrate this by finding alternative subsets of variables. This is done by some detailed considerations based on exploratory data analysis (EDA) techniques and also by detecting near collinearities amongst the variables using a method proposed by Hawkins (1973), which allows to deduce from the established relations which variables are exchangeable.