Multivariate Correspondence Analysis
When individuals are described by categorical variables, one appropriate multivariate analysis to summarize the pattern of relationships between those different categorical variables is Multiple Correspondence Analysis (MCA).
This function works as a chain:
-
First, a MCA is performed. If there are no missing values in the dataset, the MCA from the FactoMineR package is performed, otherwise a function called missmca(), which performs MCA when missing values are present, is used
-
Then, we use the results from the MCA to perform a cluster analysis
-
Finally, the outputs contain a variable created by the clustering process that indicates the cluster individuals belong to, a plot where the individuals are coloured depending on their cluster and other numerical results
data(tea)
res.enmca=ENMCA(tea[,1:18])#tea: the data set used whith the columns (i.e. the categorical variables) on which to perform the MCA
First a Hierarchical tree appears. The user has to choose a number of cluster:
Then the MCA is performed. We get the usual numerical and graphical outputs as scatterplots of individuals and variables:
We also get a scatterplot where individuals are coloured according to the cluster they belong to and a bar plot showing the number of individuals in each cluster:
For the first ten variables the moste linked to the cluster variable, a bar plot is displayed showing the categories of those variables depending on the clusters. Here an example with the variables "location.of.purchase" and "shape":