A number of statistical procedures are available to identify homogeneous subtypes for the development of empirical typologies. Important considerations in the selection of a statistical procedure are the size of the data set, the value of classifying all cases, the relative importance of working with smaller rather than larger numbers of subtypes, and the need to confirm or reject subtypes reported in the literature.
Cluster analysis typically focuses on patterns of individual symptom clustering (e.g., syndrome manifestation). Most investigators apply cluster analysis to cases, rather than attributes. One advantage of empirical clustering techniques like the k-means clustering procedure is that all cases can be classified, and the method tends to favor the identification of a small (e.g., 2-5) rather than a larger number of groups.
In addition to cluster analysis, a variety of alternative procedures are available for representing structure. DelBoca (1994) has argued that nonmetric multidimensional scaling (MDS) can be useful to identify major dimensions along which members of a particular heterogeneous group can be ranked. This approach is particularly suited for finding a relatively small number of important dimensions that underlie the similarities or differences among cases or attributes (“objects”). Based on the degree of similarity or dissimilarity between each pair of objects, the procedure produces an array of objects in n-dimensional space. The reference axes in the resulting MDS spatial configuration are arbitrary but multiple regression can be used to fit substantive dimensions in the space.
Latent class analysis (LCA) is a multivariate statistical technique used to explore the structure and number of unobserved subgroups. LCA assumes that there are qualitatively meaningful groups (or classes) that exist in a population and that symptom frequency can be explained by the existence of a small number of mutually exclusive classes, with each class having a distinct profile of item endorsement probabilities. Another important assumption is that the variables are statistically independent and conditional on class membership.
Each approach has its strengths and limitations. With many different variables, possibly in different categories (exogenous, endogenous, etc), multidimensional scaling might be the method of choice for more complicated modeling.