Multivariate Correspondence Analysis

When individuals are described by categorical variables, one appropriate multivariate analysis to summarize the pattern of relationships between those different categorical variables is Multiple Correspondence Analysis (MCA).

This function works as a chain:

  • First, a MCA is performed. If there are no missing values in the dataset, the MCA from the FactoMineR package is performed, otherwise a function called missmca(), which performs MCA when missing values are present, is used
  • Then, we use the results from the MCA to perform a cluster analysis
  • Finally, the outputs contain a variable created by the clustering process that indicates the cluster individuals belong to, a plot where the individuals are coloured depending on their cluster and other numerical results

data(tea)
res.enmca=ENMCA(tea[,1:18])
#tea: the data set used whith the columns (i.e. the categorical variables) on which to perform the MCA

First a Hierarchical tree appears. The user has to choose a number of cluster:

Hierarchical treeClick to viewHierarchical tree, choice of four clusters

Then the MCA is performed. We get the usual numerical and graphical outputs as scatterplots of individuals and variables:

Scatterplot of the individualsClick to viewScatterplot of the variables

We also get a scatterplot where individuals are coloured according to the cluster they belong to and a bar plot showing the number of individuals in each cluster:

Scatterplot of the individuals coloured by clusterClick to viewNumber of individuals in each cluster

For the first ten variables the moste linked to the cluster variable, a bar plot is displayed showing the categories of those variables depending on the clusters. Here an example with the variables "location.of.purchase" and "shape":

Bar plot of the variable "location.of.purchase" depending on the cluster variableClick to viewBar plot of the variable "shape" depending on the cluster variable