dendextend is an R package for creating and comparing visually appealing tree diagrams. dendextend provides utility functions for manipulating dendrogram objects (their color, shape, and content) as well as several advanced methods for comparing trees to one another (both statistically and visually). As such, dendextend offers a flexible framework for enhancing R’s rich ecosystem of packages for performing hierarchical clustering of items.
Hierarchical cluster analysis (HCA) is a widely used classificatory technique in many areas of scientific knowledge. Applications usually yield a dendrogram from an HCA run over a given data set, using a grouping algorithm and a similarity measure. However, even when such parameters are fixed, ties in proximity (i.e. two equidistant clusters from a third one) may produce several different dendrograms, having different possible clustering patterns (different classifications). This situation is usually disregarded and conclusions are based on a single result, leading to questions concerning the permanence of clusters in all the resulting dendrograms; this happens, for example, when using HCA for grouping molecular descriptors to select that less similar ones in QSAR studies.
- Journal of bioinformatics and computational biology
- Published over 4 years ago
Hierarchical clustering is extensively used in the bioinformatics community to analyze biomedical data. These data are often tagged with class labels, as e.g. disease subtypes or gene ontology (GO) terms. Heatmaps in connection with dendrograms are the common standard to visualize results of hierarchical clustering. The heatmap can be enriched by an additional color bar at the side, indicating for each instance in the data set to which class it belongs. In the ideal case, when the clustering matches perfectly with the classes, one would expect that instances from the same class cluster together and the color bar consists of well-separated color blocks without frequent alteration of colors (classes). But even in the case when instances from the same class cluster perfectly together, the dendrogram might not reflect this important aspect due to the fact that its representation is not unique. In this paper, we propose a leaf ordering algorithm for the dendrogram that preserving the hierarchical clustering result tries to group instances from the same class together. It is based on the concept of dynamic programming which can efficiently compute the optimal or nearly optimal order, consistent with the structure of the tree.