Pigott (2001, p.362) agrees saying that when a data set has only a few missing observations, the assumption of MCAR data is more likely to apply implying that there is a greater chance of the complete cases representing the population when only a few cases are missing. Everett and Dunn (1991) recommend conducting a complete case analysis for cases where there are few missing values and the data are missing completely at random. Tabachnick and Fidell (2007) point out that if less than 5% are missing completely at random almost any procedure for handling missing values yields similar results.
If there is no relationship between data values and the 'missingness' group one might be inclined to treat the missing values as missing completely at random (MCAR). There are other approaches that can be used (for an overview see here) which assume data is missing at random ie the reason for the missingness is associated with some of the other observed variables.Įxamination of the missing data can be performed using group analyses such as non-parametric Mann-Whitney U tests to compare the group of subjects with missing values to those with complete cases to check if the the missing data mechanism is related to other variables in the data set (Tabachnick and Fidell, 2007) ie is missing at random (MAR). Shrive, Stuart, Quan and Ghali (2006) perform simulations suggesting using within-subject item means can be used to impute missing data.
Multiple imputation spss 22 how to#
The below illustrates how to use macros to replace missing values with variable means in SPSS and assumes missing values are missing completely at random so the missing values are not likely to differ in value from those that are recorded. Single imputation is methodologically appropriate given that with small amounts of missing data single imputation performs almost equally well as other more sophisticated imputation techniques (Peyre et al., 2011, Shrive et al., 2006 and here). These choices are examples of single imputation using just one variable to 'fill-in' its missing value. You can also assess sampling variability by replacing missing values with subject minima and maxima to see how sensitive results are to choices of missing values. (2006) suggest mean imputation is permissible provided no more than a more liberal 10-20% of data is missing)).
One simplistic approach to this problem is to 'fill in' the missing values using variable means (or medians) which is OK if you only have a few missing values (say 5% of a sample (Tabachnick and Fidell,(2007), p.63 and also here although Peng et al. Missing values are problematic in multivariate analyses because they reduce the number of cases as cases with any incomplete information are automatically dropped.