This module provides methods for clustering, histograms and other statistical computations.
This module provides code to compute clusterings. Adaptors are provided that allow easy clustering of points, and configurations of models in IMP::ConfigurationSet objects among other things.
Examples:
Author(s): Keren Lasker, Daniel Russel
Version: SVN.r12662
License: LGPL. This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
Publications:
IMP and how to apply them to biological problems.| PartitionalClusteringWithCenter* IMP::statistics::create_bin_based_clustering | ( | Embedding * | embed, |
| double | side | ||
| ) |
The space is grided with bins of side size and all points that fall in the same grid bin are made part of the same cluster.
| PartitionalClustering* IMP::statistics::create_centrality_clustering | ( | Metric * | d, |
| double | far, | ||
| int | k | ||
| ) |
Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.
Only items closer than far are connected.
| PartitionalClustering* IMP::statistics::create_centrality_clustering | ( | Embedding * | d, |
| double | far, | ||
| int | k | ||
| ) |
Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.
| PartitionalClustering* IMP::statistics::create_connectivity_clustering | ( | Metric * | metric, |
| double | dist | ||
| ) |
Two points,
,
are in the same cluster if there is a sequence of points
such that
.
| PartitionalClusteringWithCenter* IMP::statistics::create_connectivity_clustering | ( | Embedding * | embed, |
| double | dist | ||
| ) |
Two points,
,
are in the same cluster if there is a sequence of points
such that
.
| PartitionalClustering* IMP::statistics::create_diameter_clustering | ( | Metric * | d, |
| double | maximum_diameter | ||
| ) |
Cluster the elements into clusters with at most the specified diameter.
| IMP::statistics::create_lloyds_kmeans | ( | Embedding * | embedding, |
| unsigned int | k, | ||
| unsigned int | iterations | ||
| ) |
Return a k-means clustering of all points contained in the embedding (ie [0... embedding->get_number_of_embeddings())). These points are then clustered into k clusters. More iterations takes longer but produces a better clustering.
The algorithm uses algebra::EuclideanVectorKDMetric for computing distances between embeddings and cluster centers. This can be parameterized if desired.
Examples: basic optimization, kmeans, nup84 cg, nup84 rb, analyze 0
| algebra::VectorKDs IMP::statistics::get_centroids | ( | Embedding * | d, |
| PartitionalClustering * | pc | ||
| ) |
Given a clustering and an embedding, compute the centroid for each cluster
| Ints IMP::statistics::get_representatives | ( | Embedding * | d, |
| PartitionalClustering * | pc | ||
| ) |
Given a clustering and an embedding, compute a representatative element for each cluster.
| void show_histogram | ( | HistogramD | h, |
| std::string | xscale = "linear", |
||
| std::string | yscale = "linear", |
||
| Functions | curves = Functions() |
||
| ) |
In python, you can use matplot lib, if installed, to show the contents of a histogram. At the moment, only 1D and 2D histograms are supported.
| [in] | h | The histogram to show, the plot is sized to the histograms bounding box. |
| [in] | xscale | Whether the xscale is "linear" or "log" |
| [in] | yscale | Whether the yscale is "linear" or "log" |
| [in] | curves | A list of python functions to plot on the histogram as curves. The functions should take one float and return a float. |
| void IMP::statistics::validate_partitional_clustering | ( | PartitionalClustering * | pc, |
| unsigned int | n | ||
| ) |
Check that the clustering is a valid clustering of n elements. An exception is thrown if it is not, if the build is not a fast build.