IMP logo
Classes | Functions
IMP::statistics Namespace Reference

Detailed Description

This module provides methods for clustering, histograms and other statistical computations.

This module provides code to compute clusterings. Adaptors are provided that allow easy clustering of points, and configurations of models in IMP::ConfigurationSet objects among other things.

Examples:

Author(s): Keren Lasker, Daniel Russel

Version: SVN.r12662

License: LGPL. This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

Publications:

Classes

class  ConfigurationSetRMSDMetric
class  ConfigurationSetXYZEmbedding
 Embed a configuration using the XYZ coordinates of a set of particles. More...
class  Embedding
 Map clustering data to spatial positions. More...
class  HistogramD
class  Metric
 Compute a distance between two elements to be clustered. More...
class  ParticleEmbedding
class  PartitionalClustering
 The base class for clusterings of data sets. More...
class  PartitionalClusteringWithCenter
class  RecursivePartitionalClusteringEmbedding
class  RecursivePartitionalClusteringMetric
class  VectorDEmbedding
 Simply return the coordinates of a VectorD. More...

Python only

This functionality is only available in python.

void show_histogram (HistogramD h, std::string xscale="linear", std::string yscale="linear", Functions curves=Functions())

Functions

PartitionalClusteringWithCentercreate_bin_based_clustering (Embedding *embed, double side)
PartitionalClusteringcreate_centrality_clustering (Metric *d, double far, int k)
PartitionalClusteringcreate_centrality_clustering (Embedding *d, double far, int k)
PartitionalClusteringcreate_connectivity_clustering (Metric *metric, double dist)
PartitionalClusteringWithCentercreate_connectivity_clustering (Embedding *embed, double dist)
PartitionalClusteringcreate_diameter_clustering (Metric *d, double maximum_diameter)
PartitionalClusteringWithCentercreate_lloyds_kmeans (Embedding *embedding, unsigned int k, unsigned int iterations)
algebra::VectorKDs get_centroids (Embedding *d, PartitionalClustering *pc)
Ints get_representatives (Embedding *d, PartitionalClustering *pc)
void validate_partitional_clustering (PartitionalClustering *pc, unsigned int n)

Function Documentation

PartitionalClusteringWithCenter* IMP::statistics::create_bin_based_clustering ( Embedding *  embed,
double  side 
)

The space is grided with bins of side size and all points that fall in the same grid bin are made part of the same cluster.

PartitionalClustering* IMP::statistics::create_centrality_clustering ( Metric *  d,
double  far,
int  k 
)

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

Only items closer than far are connected.

PartitionalClustering* IMP::statistics::create_centrality_clustering ( Embedding *  d,
double  far,
int  k 
)

Cluster by repeatedly removing edges which have lots of shortest paths passing through them. The process is terminated when there are a set number of connected components. Other termination criteria can be added if someone proposes them.

PartitionalClustering* IMP::statistics::create_connectivity_clustering ( Metric *  metric,
double  dist 
)

Two points, $p_i$, $p_j$ are in the same cluster if there is a sequence of points $\left(p^{ij}_{0}\dots p^{ij}_k\right)$ such that $\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d$.

PartitionalClusteringWithCenter* IMP::statistics::create_connectivity_clustering ( Embedding *  embed,
double  dist 
)

Two points, $p_i$, $p_j$ are in the same cluster if there is a sequence of points $\left(p^{ij}_{0}\dots p^{ij}_k\right)$ such that $\forall l ||p^{ij}_l-p^{ij}_{l+1}|| < d$.

PartitionalClustering* IMP::statistics::create_diameter_clustering ( Metric *  d,
double  maximum_diameter 
)

Cluster the elements into clusters with at most the specified diameter.

IMP::statistics::create_lloyds_kmeans ( Embedding *  embedding,
unsigned int  k,
unsigned int  iterations 
)

Return a k-means clustering of all points contained in the embedding (ie [0... embedding->get_number_of_embeddings())). These points are then clustered into k clusters. More iterations takes longer but produces a better clustering.

The algorithm uses algebra::EuclideanVectorKDMetric for computing distances between embeddings and cluster centers. This can be parameterized if desired.

Examples: basic optimization, kmeans, nup84 cg, nup84 rb, analyze 0

algebra::VectorKDs IMP::statistics::get_centroids ( Embedding *  d,
PartitionalClustering *  pc 
)

Given a clustering and an embedding, compute the centroid for each cluster

Ints IMP::statistics::get_representatives ( Embedding *  d,
PartitionalClustering *  pc 
)

Given a clustering and an embedding, compute a representatative element for each cluster.

void show_histogram ( HistogramD  h,
std::string  xscale = "linear",
std::string  yscale = "linear",
Functions  curves = Functions() 
)

In python, you can use matplot lib, if installed, to show the contents of a histogram. At the moment, only 1D and 2D histograms are supported.

Parameters:
[in]hThe histogram to show, the plot is sized to the histograms bounding box.
[in]xscaleWhether the xscale is "linear" or "log"
[in]yscaleWhether the yscale is "linear" or "log"
[in]curvesA list of python functions to plot on the histogram as curves. The functions should take one float and return a float.
void IMP::statistics::validate_partitional_clustering ( PartitionalClustering *  pc,
unsigned int  n 
)

Check that the clustering is a valid clustering of n elements. An exception is thrown if it is not, if the build is not a fast build.


Generated on Fri Feb 10 2012 23:36:30 for IMP by doxygen 1.7.5.1