Cluster analysis (in marketing)

Cluster analysis (in marketing)

Cluster analysis is a class of statistical techniques that can be applied to data that exhibit “natural” groupings. Cluster analysis sorts through the raw data and groups them into clusters. A cluster is a group of relatively homogeneous cases or observations. Objects in a cluster are similar to each other. They are also dissimilar to objects outside the cluster, particularly objects in other clusters.

The diagram below illustrates the results of a survey that studied drinkers’ perceptions of spirits (alcohol). Each point represents the results from one respondent. The research indicates there are four clusters in this market.

Perceptual Map
Illustration of clusters

Another example is the vacation travel market. Recent research has identified three clusters or market segments. They are the: 1) The demanders - they want exceptional service and expect to be pampered; 2) The escapists - they want to get away and just relax; 3) The educationalist - they want to see new things, go to museums, go on a safari, or experience new cultures.

Cluster analysis, like factor analysis and multi-dimensional scaling, is an interdependence technique: it makes no distinction between dependent and independent variables. The entire set of interdependent relationships is examined. It is similar to multi-dimensional scaling in that both examine inter-object similarity by examining the complete set of interdependent relationships. The difference is that multi-dimensional scaling identifies underlying dimensions, while cluster analysis identifies clusters. Cluster analysis is the obverse of factor analysis. Whereas factor analysis reduces the number of variables by grouping them into a smaller set of factors, cluster analysis reduces the number of observations or cases by grouping them into a smaller set of clusters.


In marketing, cluster analysis is used for

Basic procedure

  1. Formulate the problem - select the variables to which you wish to apply the clustering technique
  2. Select a distance measure - various ways of computing distance:
    • Squared Euclidean distance - the sum of the squared differences in value for each variable
    • Manhattan distance - the sum of the absolute differences in value for any variable
    • Chebyshev distance - the maximum absolute difference in values for any variable
    • Mahalanobis (or correlation) distance - this measure uses the correlation coefficients between the observations and uses that as a measure to cluster them. This is an important measure since it is unit invariant (can figuratively compare apples to oranges)
  3. Select a clustering procedure (see below)
  4. Decide on the number of clusters
  5. Map and interpret clusters - draw conclusions - illustrative techniques like perceptual maps, icicle plots, and dendrograms are useful
  6. Assess reliability and validity - various methods:
    • repeat analysis but use different distance measure
    • repeat analysis but use different clustering technique
    • split the data randomly into two halves and analyze each part separately
    • repeat analysis several times, deleting one variable each time
    • repeat analysis several times, using a different order each time

Clustering procedures

There are several types of clustering methods:

  • Non-Hierarchical clustering (also called k-means clustering)
    • first determine a cluster center, then group all objects that are within a certain distance
    • examples:
      • Sequential Threshold method - first determine a cluster center, then group all objects that are within a predetermined threshold from the center - one cluster is created at a time
      • Parallel Threshold method - simultaneously several cluster centers are determined, then objects that are within a predetermined threshold from the centers are grouped
      • Optimizing Partitioning method - first a non-hierarchical procedure is run, then objects are reassigned so as to optimize an overall criterion.
  • Hierarchical clustering
    • objects are organized into an hierarchical structure as part of the procedure
    • examples:
      • Divisive clustering - start by treating all objects as if they are part of a single large cluster, then divide the cluster into smaller and smaller clusters
      • Agglomerative clustering - start by treating each object as a separate cluster, then group them into bigger and bigger clusters
        • examples:
          • Centroid methods - clusters are generated that maximize the distance between the centers of clusters (a centroid is the mean value for all the objects in the cluster)
          • Variance methods - clusters are generated that minimize the within-cluster variance
            • example:
              • Ward’s Procedure - clusters are generated that minimize the squared Euclidean distance to the center mean
          • Linkage methods - cluster objects based on the distance between them
            • examples:
              • Single Linkage method - cluster objects based on the minimum distance between them (also called the nearest neighbour rule)
              • Complete Linkage method - cluster objects based on the maximum distance between them (also called the furthest neighbour rule)
              • Average Linkage method - cluster objects based on the average distance between all pairs of objects (one member of the pair must be from a different cluster)

See also


  • Sheppard, A. G. (1996). The sequence of factor analysis and cluster analysis: Differences in segmentation and dimensionality through the use of raw and factor scores. Tourism Analysis, 1(Inaugural Volume), 49-57.

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Cluster analysis — The result of a cluster analysis shown as the coloring of the squares into three clusters. Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more… …   Wikipedia

  • analysis — a‧nal‧y‧sis [əˈnælss] noun analyses PLURALFORM [ siːz] [countable, uncountable] 1. a careful examination of something in order to understand it better: • The researchers carried out a detailed analysis of recent trends in share prices. •… …   Financial and business terms

  • Cluster sampling — is a sampling technique used when natural groupings are evident in a statistical population. It is often used in marketing research. In this technique, the total population is divided into these groups (or clusters) and a sample of the groups is… …   Wikipedia

  • Cluster-Analyse — Unter Clusteranalyse (der Begriff Ballungsanalyse wird selten verwendet) versteht man strukturentdeckende, multivariate Analyseverfahren zur Ermittlung von Gruppen (Clustern) von Objekten, deren Eigenschaften oder Eigenschaftsausprägungen… …   Deutsch Wikipedia

  • Outline of marketing — The following outline is provided as an overview of and topical guide to marketing: Marketing refers to the social and managerial processes by which products, services and value are exchanged in order to fulfil individuals or group s needs and… …   Wikipedia

  • List of marketing topics — This is a list of marketing topics. Marketing fundamentals * [ [Marketing] * Consumer * Business Marketing * Core * Customer ** Customer lifetime value (CLV) ** Customer relationship management (CRM) * Marketing mix * Marketing orientation, also… …   Wikipedia

  • Topic outline of marketing — For a more comprehensive list, see the List of marketing topics. Marketing refers to the social and managerial processes by which products, services and value are exchanged in order to fulfil individual s or group s needs and wants. These… …   Wikipedia

  • Factor analysis — is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors. In other words, it is possible, for example, that variations in …   Wikipedia

  • Preference regression (in marketing) — Preference regression is a statistical technique used by marketers to determine consumers’ preferred core benefits. It usually supplements product positioning techniques like multi dimensional scaling or factor analysis and is used to create… …   Wikipedia

  • multivariate analysis — A statistical procedure that simultaneously analyses multiple measurements on each individual or object under study in a marketing research enquiry. Examples of the procedures used include multiple regression, factor analysis, cluster analysis… …   Big dictionary of business and management