Title: | Clustering Indices |
---|---|
Description: | Package providing functions for computing a collection of clustering validation or quality criteria and partition comparison indices. |
Authors: | Iago Giné-Vázquez [cre] , Bernard Desgraupes [aut] |
Maintainer: | Iago Giné-Vázquez <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3.0 |
Built: | 2024-10-31 20:29:40 UTC |
Source: | https://gitlab.com/iagogv/clustercrit |
bestCriterion
returns the best index value according to a specified criterion.
bestCriterion(x, crit)
bestCriterion(x, crit)
x |
|
crit |
|
Given a vector of several clustering quality index values computed
with a given criterion, the function bestCriterion
returns the
index of the "best" one in the sense of the specified criterion.
Typically, a set of data has been clusterized several times (using
different algorithms or specifying a different number of clusters) and
a clustering index has been calculated each time : the
bestCriterion
function tells which value is considered the best
according to the given clustering index. For instance, if one uses
the Calinski_Harabasz index, the best value is the largest one.
A list of all the supported criteria can be obtained with the
getCriteriaNames
function. The criterion name
(crit
argument) is case insensitive and can be abbreviated.
The index in vector x
of the best value according to the criterion
specified by the crit
argument.
Bernard Desgraupes
[email protected]
University of Paris Ouest - Nanterre
Lab Modal'X (EA 3454)
getCriteriaNames
, intCriteria
.
# Create some spheric data around three distinct centers x <- rbind(matrix(rnorm(100, mean = 0, sd = 0.5), ncol = 2), matrix(rnorm(100, mean = 2, sd = 0.5), ncol = 2), matrix(rnorm(100, mean = 4, sd = 0.5), ncol = 2)) vals <- vector() for (k in 2:6) { # Perform the kmeans algorithm cl <- kmeans(x, k) # Compute the Calinski_Harabasz index vals <- c(vals,as.numeric(intCriteria(x,cl$cluster,"Calinski_Harabasz"))) } idx <- bestCriterion(vals,"Calinski_Harabasz") cat("Best index value is",vals[idx],"\n")
# Create some spheric data around three distinct centers x <- rbind(matrix(rnorm(100, mean = 0, sd = 0.5), ncol = 2), matrix(rnorm(100, mean = 2, sd = 0.5), ncol = 2), matrix(rnorm(100, mean = 4, sd = 0.5), ncol = 2)) vals <- vector() for (k in 2:6) { # Perform the kmeans algorithm cl <- kmeans(x, k) # Compute the Calinski_Harabasz index vals <- c(vals,as.numeric(intCriteria(x,cl$cluster,"Calinski_Harabasz"))) } idx <- bestCriterion(vals,"Calinski_Harabasz") cat("Best index value is",vals[idx],"\n")
Package: | clusterCrit |
Type: | Package |
Version: | 1.3.0 |
Date: | 2024-10-31 |
License: | GPL (>= 2) |
clusterCrit
computes various clustering validation or quality
criteria and partition comparison indices. Type
library(help="clusterCrit")
for more info about the available functions.
Bernard Desgraupes
[email protected]
University of Paris Ouest - Nanterre
Lab Modal'X (EA 3454)
For more information about the algebraic background of clustering indices and their definition, see the vignette accompanying this package. To display the vignette, type the following instruction in the R console :
> vignette("clusterCrit")
extCriteria
,
getCriteriaNames
,
intCriteria
,
bestCriterion
,
concordance
.
concordance
calculates the concordance matrix between two partitions
of the same data.
concordance(part1, part2)
concordance(part1, part2)
part1 |
|
part2 |
|
Given two partitions, the function concordance
calculates the
number of pairs classified as belonging or not belonging to the same
cluster with respect to partitions part1
or part2
.
A 2x2 matrix of the form :
| P1 | P2 | ____________________ P1 | Nyy | Nyn | P2 | Nny | Nnn | ____________________
where
Nyy
is the number of points belonging to the same cluster both in part1
and part2
Nyn
is the number of points belonging to the same cluster in part1
but not in part2
Nny
is the number of points belonging to the same cluster in part2
but not in part1
Nnn
is the number of points not belonging to the same cluster both in part1
and part2
Bernard Desgraupes
[email protected]
University of Paris Ouest - Nanterre
Lab Modal'X (EA 3454)
# Generate two artificial partitions part1<-sample(1:3,150,replace=TRUE) part2<-sample(1:5,150,replace=TRUE) # Compute the table of concordances and discordances concordance(part1,part2)
# Generate two artificial partitions part1<-sample(1:3,150,replace=TRUE) part2<-sample(1:5,150,replace=TRUE) # Compute the table of concordances and discordances concordance(part1,part2)
extCriteria
calculates various external clustering comparison indices.
extCriteria(part1, part2, crit)
extCriteria(part1, part2, crit)
part1 |
|
part2 |
|
crit |
|
The function extCriteria
calculates external clustering indices in
order to compare two partitions.
The list of all the supported criteria can be obtained with the
getCriteriaNames
function.
The currently available indices are :
"Czekanowski_Dice"
"Folkes_Mallows"
"Hubert"
"Jaccard"
"Kulczynski"
"McNemar"
"Phi"
"Precision"
"Rand"
"Recall"
"Rogers_Tanimoto"
"Russel_Rao"
"Sokal_Sneath1"
"Sokal_Sneath2"
All the names are case insensitive and can be abbreviated. The keyword
"all"
can also be used as a shortcut to calculate all the
external indices.
The partition vectors should not have empty subsets. No attempt is made to verify this.
A list containing the computed criteria, in the same order as in the
crit
argument.
Bernard Desgraupes
[email protected]
University of Paris Ouest - Nanterre
Lab Modal'X (EA 3454)
See the bibliography at the end of the vignette.
getCriteriaNames
, intCriteria
,
bestCriterion
,
concordance
.
# Generate two artificial partitions part1<-sample(1:3,150,replace=TRUE) part2<-sample(1:5,150,replace=TRUE) # Compute all the external indices extCriteria(part1,part2,"all") # Compute some of them extCriteria(part1,part2,c("Rand","Folkes")) # The names are case insensitive and can be abbreviated extCriteria(part1,part2,c("ra","fo"))
# Generate two artificial partitions part1<-sample(1:3,150,replace=TRUE) part2<-sample(1:5,150,replace=TRUE) # Compute all the external indices extCriteria(part1,part2,"all") # Compute some of them extCriteria(part1,part2,c("Rand","Folkes")) # The names are case insensitive and can be abbreviated extCriteria(part1,part2,c("ra","fo"))
getCriteriaNames
returns the available clustering criteria names.
getCriteriaNames(isInternal)
getCriteriaNames(isInternal)
isInternal |
|
getCriteriaNames
returns a list of the available internal or
external clustering indices depending on the isInternal
logical argument.
The internal indices can be used in the crit
argument of the
intCriteria
function and the external indices similarly in
the extCriteria
function.
A character vector containing the supported criteria names.
Bernard Desgraupes
[email protected]
University of Paris Ouest - Nanterre
Lab Modal'X (EA 3454)
See the bibliography at the end of the vignette.
intCriteria
, extCriteria
,
bestCriterion
.
getCriteriaNames(TRUE) getCriteriaNames(FALSE)
getCriteriaNames(TRUE) getCriteriaNames(FALSE)
intCriteria
calculates various internal clustering validation or
quality criteria.
intCriteria(traj, part, crit)
intCriteria(traj, part, crit)
traj |
|
part |
|
crit |
|
The function intCriteria
calculates internal clustering indices.
The list of all the supported criteria can be obtained with the
getCriteriaNames
function.
The currently available indices are :
"Ball_Hall"
"Banfeld_Raftery"
"C_index"
"Calinski_Harabasz"
"Davies_Bouldin"
"Det_Ratio"
"Dunn"
"Gamma"
"G_plus"
"GDI11"
"GDI12"
"GDI13"
"GDI21"
"GDI22"
"GDI23"
"GDI31"
"GDI32"
"GDI33"
"GDI41"
"GDI42"
"GDI43"
"GDI51"
"GDI52"
"GDI53"
"Ksq_DetW"
"Log_Det_Ratio"
"Log_SS_Ratio"
"McClain_Rao"
"PBM"
"Point_Biserial"
"Ray_Turi"
"Ratkowsky_Lance"
"Scott_Symons"
"SD_Scat"
"SD_Dis"
"S_Dbw"
"Silhouette"
"Tau"
"Trace_W"
"Trace_WiB"
"Wemmert_Gancarski"
"Xie_Beni"
All the names are case insensitive and can be abbreviated. The keyword
"all"
can also be used as a shortcut to calculate all the
internal indices.
The GDI (Generalized Dunn Indices) are designated by
the following convention: GDImn, where the integers m
(1<=m<=5) and n (1<=n<=3) correspond to the
between-group and within-group distances respectively. See the vignette
for a comprehensive definition of the various distances. GDI
alone is synonym of GDI11
and is the genuine Dunn's index.
A list containing the computed criteria, in the same order as in the
crit
argument.
Bernard Desgraupes
[email protected]
University of Paris Ouest - Nanterre
Lab Modal'X (EA 3454)
See the bibliography at the end of the vignette.
getCriteriaNames
, extCriteria
,
bestCriterion
.
# Create some data x <- rbind(matrix(rnorm(100, mean = 0, sd = 0.5), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.5), ncol = 2), matrix(rnorm(100, mean = 2, sd = 0.5), ncol = 2)) # Perform the kmeans algorithm cl <- kmeans(x, 3) # Compute all the internal indices intCriteria(x,cl$cluster,"all") # Compute some of them intCriteria(x,cl$cluster,c("C_index","Calinski_Harabasz","Dunn")) # The names are case insensitive and can be abbreviated intCriteria(x,cl$cluster,c("det","cal","dav"))
# Create some data x <- rbind(matrix(rnorm(100, mean = 0, sd = 0.5), ncol = 2), matrix(rnorm(100, mean = 1, sd = 0.5), ncol = 2), matrix(rnorm(100, mean = 2, sd = 0.5), ncol = 2)) # Perform the kmeans algorithm cl <- kmeans(x, 3) # Compute all the internal indices intCriteria(x,cl$cluster,"all") # Compute some of them intCriteria(x,cl$cluster,c("C_index","Calinski_Harabasz","Dunn")) # The names are case insensitive and can be abbreviated intCriteria(x,cl$cluster,c("det","cal","dav"))