Package Scientific :: Package Clustering :: Module AffinityPropagation :: Class DataSet
[frames] | no frames]

Class DataSet

object --+
         |
        DataSet

A collection of data items with similarities

Instance Methods
 
__init__(self, items, similarities, symmetric=False, minimal_similarity=None)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
 
findClusters(self, preferences, max_iterations=500, convergence=50, damping=0.5)

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties

Inherited from object: __class__

Method Details

__init__(self, items, similarities, symmetric=False, minimal_similarity=None)
(Constructor)

 

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Parameters:
  • items (sequence) - a sequence of data items
  • similarities - similarity values for item pairs. This parameter can have one of three forms:
    • a list if triples (index1, index2, similarity), where the indices point into the item list and the similarity is a real number.
    • a callable object (typically a function or a bound method) that is called with two items and returns the similarity.
    • an array of shape (N, N), where N is the number of items, containing the similarities. The diagonal elements are not used.
  • symmetric (bool) - if True, the similarity measure is assumed to be symmetric. If False, no such assumption is made and the input data (if a list) must contain both directions for each pair. If the similarity is defined by a function, it will be called twice of symmtric=False and once if symmetric=True. If the similarity is defined by an array, this parameter is not used.
  • minimal_similarity (float) - a cutoff value for the similarities; values smaller than this cutoff are discarded. This is of use for large data sets because both the runtime and the memory consumption increase with the number of similarity values.
Overrides: object.__init__

findClusters(self, preferences, max_iterations=500, convergence=50, damping=0.5)

 
Parameters:
  • preferences (float or sequence of float) - the preference values for the cluster identification. This can be either a single number, or a sequence with one value per item.
  • max_iterations (int) - the number of iterations at which the algorithm is stopped even if there is no convergence.
  • convergence (int) - the number of iterations during which the cluster decomposition must remain stable before it is returned as converged.
  • damping (float) - a number between 0 and 1 that influences by fast affinity and responsibility values can change.