In this step, we clustered consumers into different groups by 5 variables: purposes of visit, brand of interest, gender, age, and income. These variables are categorical and ordinal variables which make the clustering algorithms based on numerical distance metrics such as L-norm distance, cosine similarity, or Pearson correlation useless in this data setting. We were considering 2 methods to address this problem.
data.mca = MCA(data, ncp = 20, graph = FALSE)
data.mca$eig
## eigenvalue percentage of variance cumulative percentage of variance
## dim 1 0.3177 6.354 6.354
## dim 2 0.2347 4.693 11.047
## dim 3 0.2234 4.467 15.515
## dim 4 0.2155 4.310 19.824
## dim 5 0.2105 4.210 24.034
## dim 6 0.2081 4.163 28.197
## dim 7 0.2058 4.115 32.312
## dim 8 0.2056 4.113 36.425
## dim 9 0.2038 4.075 40.500
## dim 10 0.2026 4.052 44.551
## dim 11 0.2010 4.020 48.572
## dim 12 0.2004 4.008 52.579
## dim 13 0.1998 3.995 56.575
## dim 14 0.1985 3.970 60.545
## dim 15 0.1971 3.941 64.486
## dim 16 0.1962 3.924 68.410
## dim 17 0.1952 3.904 72.314
## dim 18 0.1950 3.899 76.214
## dim 19 0.1925 3.851 80.064
## dim 20 0.1872 3.744 83.808
## dim 21 0.1838 3.675 87.484
## dim 22 0.1770 3.539 91.023
## dim 23 0.1698 3.395 94.418
## dim 24 0.1557 3.113 97.532
## dim 25 0.1234 2.468 100.000
options(width = 1200)
data[c(1, 2, 6), ]
## Purpose Brand Gender Age Income
## 2 Shopping for a new vehicle Unknown Male 55 to 64 $45,000 - $54,999
## 4 Get Service info or Schedule Servi Unknown Male 25 to 34 Over $100,000
## 10 Just browsing Unknown Male 25 to 34 $45,000 - $54,999
gower.dist(data[1, ], data[6, ])
## [,1]
## [1,] 0.4
gower.dist(data[1, ], data[2, ])
## [,1]
## [1,] 0.6
We used 'daisy {cluster}' function to compute a dissimilarity matrix that contains the distances among 31311 out of 50203 data points with unique SessionID (we removed observations that have NA values to handle missing data). After computing the dissimilarity matrix of all pairs of the observations, we examined clustering algorithms such as PAM (Partition Around Medoids) and hierarchical clustering. We went for the hierarchical method as it has more persuasive visualization than that of PAM.
# distance matrix with gower dissimilarity function
demo.agr = daisy(data, metric = "gower")
# ward methods for agglomerative, have tried complete/average but ward is the chosen method
clust.ward = hclust(demo.agr, method = "ward")
plot(clust.ward, labels = FALSE)
We used Silhouette average width as a criterion to choose an appropriate number of clusters. Silhouette width of an observation is a measure of how close it is to the cluster it’s belonged to, and how far it is to the other clusters. The width with a value of 1 means this observation should be in this cluster with 100% certainty, whereas -1 means it definitely shouldn’t. Maximizing the Silhouette average width of all data points is our optimizing objective that leads us the number of clusters K. As we can see from the figure 1, it looks like 11 is the best choice in term of maximizing Silhouette average width and manageable size.
# separate them into groups, choosing number of clusters by silhouette average width
avg_widths = c(rep(0, 20))
for (i in 2:20) {
clust = cutree(clust.ward, k = i)
s = silhouette(clust, demo.agr)
avg_widths[i] = summary(s)$avg.width
}
plot(avg_widths, ylab = "Silhouette Average Width", xlab = "Number of clusters",
xlim = c(2, 20), col = "purple")
Failed Task Completion (other) | Site change Suggestions | |||
Before PLDA | After PLDA | Before PLDA | After PLDA | |
Multinomial Naïve Bayes | 44.6% | 60.9% | 56.6% | 66,8% |
Multinomial Logistic Regression | 50.4% | 65.8% | 58% | 67,9% |
Topic Analysis Process |
Bellman, Richard. “Dynamic programming and Lagrange multipliers.” Proceedings of the National Academy of Sciences of the United States of America 42.10 (1956): 767.
Gower, John C. “A general coefficient of similarity and some of its properties.” Biometrics (1971): 857-871.