edu.georgetown.gucs.clustering
Class ClusterPrunerWeka

java.lang.Object
  extended by weka.clusterers.AbstractClusterer
      extended by weka.clusterers.RandomizableClusterer
          extended by edu.georgetown.gucs.clustering.ClusterPrunerWeka
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, weka.clusterers.Clusterer, weka.core.CapabilitiesHandler, weka.core.OptionHandler, weka.core.Randomizable, weka.core.RevisionHandler, weka.core.WeightedInstancesHandler

public class ClusterPrunerWeka
extends weka.clusterers.RandomizableClusterer
implements weka.core.WeightedInstancesHandler

Cluster pruner using WEKA's classes

Version:
$Revision$, $Date$
Author:
Mark Maloof (maloof@cs.georgetown.edu)
See Also:
Serialized Form

Field Summary
protected  weka.core.DistanceFunction m_DistanceFunction
          the distance function used.
 
Fields inherited from class weka.clusterers.RandomizableClusterer
m_Seed, m_SeedDefault
 
Constructor Summary
ClusterPrunerWeka()
          Default constructor.
 
Method Summary
 void buildClusterer(weka.core.Instances instances)
          Builds this leader clusterer using the specified instances.
 int clusterInstance(weka.core.Instance instance)
          Clusters the specified instance.
 int[] getAssignments()
           
 weka.core.Capabilities getCapabilities()
          Returns default capabilities of the clusterer.
 weka.core.DistanceFunction getDistanceFunction()
          returns the distance function currently in use.
 int getK()
           
 weka.core.Instances getLeaders()
          Gets the number of leaders
 int getNumberOfFollowers()
           
 int getNumClusters()
          Gets the number of clusters
 java.lang.String[] getOptions()
          Gets the current settings of ClusterPrunerWeka
 java.lang.String globalInfo()
          Returns a string describing this clusterer
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] args)
          Main method for testing this class.
 int numberOfClusters()
          Returns the number of clusters.
 weka.core.Instances retrieve(weka.core.Instance instance)
          Retrieves the top k instances based on the specified instance or query.
 void setDistanceFunction(weka.core.DistanceFunction df)
          sets the distance function to use for instance comparison.
 void setK(int k)
           
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setPreserveInstancesOrder(boolean r)
           
 java.lang.String toString()
          return a string describing this clusterer
 
Methods inherited from class weka.clusterers.RandomizableClusterer
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.clusterers.AbstractClusterer
distributionForInstance, forName, getRevision, makeCopies, makeCopy, runClusterer
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

m_DistanceFunction

protected weka.core.DistanceFunction m_DistanceFunction
the distance function used.

Constructor Detail

ClusterPrunerWeka

public ClusterPrunerWeka()
Default constructor.

Method Detail

buildClusterer

public void buildClusterer(weka.core.Instances instances)
                    throws java.lang.Exception
Builds this leader clusterer using the specified instances.

Specified by:
buildClusterer in interface weka.clusterers.Clusterer
Specified by:
buildClusterer in class weka.clusterers.AbstractClusterer
Parameters:
instances - the instances to cluster
Throws:
java.lang.Exception - if the clusterer has not been generated successfully

clusterInstance

public int clusterInstance(weka.core.Instance instance)
                    throws java.lang.Exception
Clusters the specified instance.

Specified by:
clusterInstance in interface weka.clusterers.Clusterer
Overrides:
clusterInstance in class weka.clusterers.AbstractClusterer
Parameters:
instance - the instance to be assigned to a cluster
Returns:
the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
Throws:
java.lang.Exception - if instance could not be clustered

retrieve

public weka.core.Instances retrieve(weka.core.Instance instance)
                             throws java.lang.Exception
Retrieves the top k instances based on the specified instance or query. The first retrieved instance is the leader and the next k-1 retrieved instances are that leader's closest followers. If k is greater than the number of followers, then the method returns the leader and all of its followers.

Parameters:
instance - the specified instance or query
Returns:
the closest leaders and its closest followers
Throws:
java.lang.Exception - if something goes wrong

getCapabilities

public weka.core.Capabilities getCapabilities()
Returns default capabilities of the clusterer.

Specified by:
getCapabilities in interface weka.clusterers.Clusterer
Specified by:
getCapabilities in interface weka.core.CapabilitiesHandler
Overrides:
getCapabilities in class weka.clusterers.AbstractClusterer
Returns:
the capabilities of this clusterer

globalInfo

public java.lang.String globalInfo()
Returns a string describing this clusterer

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

numberOfClusters

public int numberOfClusters()
                     throws java.lang.Exception
Returns the number of clusters.

Specified by:
numberOfClusters in interface weka.clusterers.Clusterer
Specified by:
numberOfClusters in class weka.clusterers.AbstractClusterer
Returns:
the number of clusters generated for a training dataset.
Throws:
java.lang.Exception - if number of clusters could not be returned successfully

getNumClusters

public int getNumClusters()
Gets the number of clusters

Returns:
the number of clusters

getDistanceFunction

public weka.core.DistanceFunction getDistanceFunction()
returns the distance function currently in use.

Returns:
the distance function

setDistanceFunction

public void setDistanceFunction(weka.core.DistanceFunction df)
                         throws java.lang.Exception
sets the distance function to use for instance comparison.

Parameters:
df - the new distance function to use
Throws:
java.lang.Exception - if instances cannot be processed

getNumberOfFollowers

public int getNumberOfFollowers()

setK

public void setK(int k)
          throws java.lang.Exception
Throws:
java.lang.Exception

getK

public int getK()

setPreserveInstancesOrder

public void setPreserveInstancesOrder(boolean r)

getAssignments

public int[] getAssignments()

getLeaders

public weka.core.Instances getLeaders()
Gets the number of leaders

Returns:
the number of leaders

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface weka.core.OptionHandler
Overrides:
listOptions in class weka.clusterers.RandomizableClusterer
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Specified by:
setOptions in interface weka.core.OptionHandler
Overrides:
setOptions in class weka.clusterers.RandomizableClusterer
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of ClusterPrunerWeka

Specified by:
getOptions in interface weka.core.OptionHandler
Overrides:
getOptions in class weka.clusterers.RandomizableClusterer
Returns:
an array of strings suitable for passing to setOptions()

toString

public java.lang.String toString()
return a string describing this clusterer

Overrides:
toString in class java.lang.Object
Returns:
a description of the clusterer as a string

main

public static void main(java.lang.String[] args)
Main method for testing this class.

Parameters:
args - should contain the following arguments:

-t training file [-N number of clusters]