Walking a hierarchical tree
I want to be able to "walk" (iterate) through a hierarchical cluster (see figure below and code). What I want is:
A function that that takes a matrix and a minimum height. Say 10 in this example.
splitme <- function(matrix, minH){
##Some code
}
Starting from the top to minH
, start cutting whenever there is a new split. This is the first problem. How to detect a new splits to get an height h
.
At this particular h
, how many clusters are there? Retrieve clusters
mycl <- cutree(hr, h=x);#x is that found h
count <- count(mycl)# Bad code
Save in variable(s) each of the new matrices. This is another hard one, dynamic creation of x new matrices. So perhaps a function that takes the clusters does what needs to be done (comparisons) and returns a variable ??
Continue 3 and 4 until minH
reached
Figure
Code
# Generate data
set.seed(12345)
desc.1 <- c(rnorm(10, 0, 1), rnorm(20, 10, 4))
desc.2 <- c(rnorm(5, 20, .5), rnorm(5, 5, 1.5), rnorm(20, 10, 2))
desc.3 <- c(rnorm(10, 3, .1), rnorm(15, 6, .2), rnorm(5, 5, .3))
data <- cbind(desc.1, desc.2, desc.3)
# Create dendrogram
d <- dist(data)
hc <- as.dendrogram(hclust(d))
# Function to color branches
colbranches <- function(n, col)
{
a <- attributes(n) # Find the attributes of current node
# Color edges with requested color
attr(n, "edgePar") <- c(a$edgePar, list(col=col, lwd=2))
n # Don't forget to return the node!
}
# Color the first sub-branch of the first branch in red,
# the second sub-branch in orange and the second branch in blue
hc[[1]][[1]] = dendrapply(hc[[1]][[1]], colbranches, "red")
hc[[1]][[2]] = dendrapply(hc[[1]][[2]], colbranches, "orange")
hc[[2]] = dendrapply(hc[[2]], colbranches, "blue")
# Plot
plot(hc)
I think what you need essentially is the cophenetic correlation coefficient of the dendrogram. It will tell you the heights of all splitting points. From there you can easily walk through the tree. I made an attempt below and store all submatrices to a list "submatrices". It's a nested list. The first level is the submatrices from all splitting points. The second level is the submatrices from a splitting point. For example, if you want all submatrices from the 1st splitting point (grey and blue clusters), it should be submatrices[[1]]. If you want the first submatrix (red cluster) from submatrices[[1]], it should be submatrices[[1]][1].
splitme <- function(data, minH){
##Compute dist matrix and clustering dendrogram
d <- dist(data)
cl <- hclust(d)
hc <- as.dendrogram(cl)
##Get the cophenetic correlation coefficient matrix (cccm)
cccm <- round(cophenetic(hc), digits = 0)
#Get the heights of spliting points (sps)
sps <- sort(unique(cccm), decreasing = T)
#This list store all the submatrices
#The submatrices extract from the nth splitting points
#(top splitting point being the 1st whereas bottom splitting point being the last)
submatrices <- list()
#Iterate/Walk the dendrogram
i <- 2 #Starting from 2 as the 1st value will give you the entire dendrogram as a whole
while(sps[i] > minH){
membership <- cutree(cl, h=sps[i]) #Cut the tree at splitting points
lst <- list() #Create a list to store submatrices extract from a splitting point
for(j in 1:max(membership)){
member <- which(membership == j) #Get the corresponding data entry to create the submatrices
df <- data.frame()
for(p in member){
df <- rbind(df, data[p, ])
colnames(df) <- colnames(data)
dm <- dist(df)
}
lst <- append(lst, list(dm)) #Append all submatrices from a splitting point to lst
}
submatrices <- append(submatrices, list(lst)) #Append the lst to submatrices list
i <- i + 1
}
return(submatrices)
}
链接地址: http://www.djcxy.com/p/75738.html
上一篇: 居中一个没有宽度的div块
下一篇: 走分层树