WEKA Classification Algorithms
A WEKA Plug-in
This project provides implementation for a number of artificial neural network (ANN) and artificial
immune system (AIS) based classification algorithms for the WEKA (Waikato Environment for Knowledge Analysis)
machine learning workbench. The WEKA platform was selected for the implementation of the selected
algorithms because I think its an excellent piece of free software.
The WEKA project is required to run the
algorithms provided in this project, and is included in the download. This is an open source project
(released under the GPL) so the source code is available.
Download: Download the software here
The algorithm implementations are extensible and easily support modification and application
to varied problem domains. Please report any bugs, feature requests or include your
own algorithms by accessing the services on the project website.
Javadoc API documentation for the project is available here
Access this project's website on sourceforge to log bug reports, feature requests, and contact the developer.
Algorithms
The following provides a list of implemented Algorithms:
- Learning Vector Quantization (LVQ)
- As described in "LVQ_PAK: The Learning Vector Quantization Program Package" by Kohonen, Hynninen, Kangas, Laaksonnen and Torkkola (1996)
- and in: Teuvo Kohonen. Self-Organizing Maps. Third ed. Berlin Heidelberg: Springer-Verlag; 2001. (Thomas S Huang; Teuvo Kohonen, and Manfred R. Schroeder. Springer Series in Information Sciences; 30).
- Provides the following implementations: LVQ1, OLVQ1, LVQ2.1, LVQ3, OLVQ3, Multiple-pass LVQ, Hierarchical LVQ
- Self-Organizing Map (SOM)
- As described in "SOM_PAK: The Self-Organizing Map Program Package" by Kohonen, Hynninen, Kangas, and Laaksonnen (1996)
- and in: Teuvo Kohonen. Self-Organizing Maps. Third ed. Berlin Heidelberg: Springer-Verlag; 2001. (Thomas S Huang; Teuvo Kohonen, and Manfred R. Schroeder. Springer Series in Information Sciences; 30).
- Provides the following implementations: SOM, LVQ-SOM, Multiple-pass SOM
- Feed-Forward Artificial Neural Network (FF-ANN)
- Artificial Immune Recognition System (AIRS)
- Clonal Selection Algorithm (CLONALG)
- as described in Leandro N. de Castro and Fernando J. Von Zuben. Learning and optimization using the clonal selection principle. IEEE Transactions on Evolutionary Computation. 2002 Jun; 6(3):239-251. and Leandro N. de Castro and Fernando J. Von Zuben. The Clonal Selection Algorithm with Engineering Applications, L. Darrell Whitley; David E. Goldberg; Erick Cantú-Paz; Lee Spector; Ian C. Parmee, and Hans-Georg Beyer, Editors. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '00), Workshop on Artificial Immune Systems and Their Applications; Las Vegas, Nevada, USA. Morgan Kaufmann; 2000: 36-37.
- Provides the following implementations: CLONALG and a new CSCA algorithm
- A description of this implementation is provided in a technical report Jason Brownlee. [Technical Report]. Clonal Selection Theory & CLONALG - The Clonal Selection Classification Algorithm (CSCA) Victoria, Australia: Centre for Intelligent Systems and Complex Processes (CISCP), Faculty of Information and Communication Technologies (ICT), Swinburne University of Technology; 2005 Jan; Technical Report ID: 2-01.
- Immunos-81
- as described in Jerome H. Carter. The immune system as a model for classification and pattern recognition. Journal of the American Informatics Association. 2000; 7(1):28-41.
- Provides the following implementations: Immunos-81 and a new Immunos-1 and Immunos-2 algorithms
- A description of this implementation is provided in a technical report Jason Brownlee. [Technical Report]. Immunos-81 - The Misunderstood Artificial Immune System. Victoria, Australia: Centre for Intelligent Systems and Complex Processes (CISCP), Faculty of Information and Communication Technologies (ICT), Swinburne University of Technology; 2005 Feb; Technical Report ID: 3-01.
Learning Vector Quantization (LVQ)
I love the LVQ algorithm. It was the first algorithm I implemented for the WEKA platform. This section contains some
notes regarding the implementation of the LVQ algorithm in WEKA, taken from the initial release of the plug-in (back in 2002-2003).
LVQ Weka Formally here (defunct),
and here (defunct, see internet archive backup)
What is Learning Vector Quantization?
- A competitive learning algorithm said to be a supervised version of the Self-Organizing Map (SOM) algorithm by Kohonen
- Goal of the algorithm is to approximate the distribution of a class using a reduced number of codebook vectors where the algorithm seeks to minimise classification errors
- Codebook vectors become exemplars for a particular class - attempting to represent class boundaries
- The algorithm does not construct a topographical ordering of the dataset (there is no concept of explicit neighbourhood in LVQ as there is in the SOM algorithm)
- Algorithm was proposed by Kohonen in 1986 as an improvement over Labelled Vector Quantization
- The algorithm is associated with the neural network class of learning algorithms, though works significantly differently compared to conventional feed-forward networks like Back Propagation
What are some advantages of the Learning Vector Quantization algorithm?
- The model is trained significantly faster than other neural network techniques like Back Propagation
- It is able to summarise or reduce large datasets to a smaller number of codebook vectors suitable for classification or visualisation
- Able to generalise features in the dataset providing a level of robustness
- Can approximate just about any classification problem as long as the attributes can be compared using a meaningful distance measure
- Not limited in the number of dimensions in the codebook vectors like nearest neighbour techniques
- Normalisation of input data is not required (normalised may improve accuracy if attribute values vary greatly)
- Can handle data with missing values
- The generated model can be updated incrementally
What are some disadvantages of the Learning Vector Quantization algorithm?
- Need to be able to generate useful distance measures for all attributes (Euclidean is usually used for numeric attributes)
- Model accuracy is highly dependent on the initialisation of the model as well as the learning parameters used (learning rate, training iterations, etcetera)
- Accuracy is also dependent on the class distribution in the training dataset, a good distribution of samples is needed to construct useful models
- It is difficult to determine a good number of codebook vectors for a given problem
Algorithm Notes
Supports 7 implementations of the LVQ algorithm
- LVQ1 - A single BMU (best matching unit) is selected and moved closer or further away from each data vector, per iteration.
- OLVQ1 (Optimised LVQ1) - The same as LVQ1, except that each codebook vector has its own learning rate
- LVQ2.1 - Two BMU's are selected and only updated if one belongs to the desired class and one does not, and the distance ratio is within a defined window
- LVQ3 - The same as LVQ2.1 except if both BMU's are of the correct class, they are updated but adjusted using an epsilon value (adjusted learning rate instead of the global learning rate)
- OLVQ3 (Optimised LVQ3) - The same as LVQ3 except each codebook vector has its own learning rate in the same manner as OLVQ1
- Multi-Pass LVQ- This is the recommended usage of the algorithm (Kohonen) where a quick rough pass is made on the model using OLVQ1, then a long fine tuning pass is made on the model with any of LVQ1, LVQ2.1 or LVQ3
- Hierarchal LVQ- This is an idea I had (let me know if it's not original) where an LVQ model is constructed and each codebook vector is treated as a cluster centroid. All codebook vectors are evaluated and a number are selected as candidates for sub-models. Sub models are constructed for all candidate codebook vectors and those sub models that out perform (in terms of classification accuracy) their parent codebook vector are kept as part of the model. During testing a data instance is first mapped onto its BMU, if that BMU has a sub-model (that was not pruned during training), the sub model is used for classification, otherwise the class value in the BMU is used for classification. This algorithm has proven most useful for large datasets where a low-complexity model is used as the base model, and specialised LVQ models are used at each codebook vector to better model the data. Supports any Weka algorithm as the BMU's sub model, not just LVQ.
Supports 2 implementations of the Self-Organizing Map (SOM) algorithm
The Self-Organizing Map (SOM) algorithm is not a classification algorithm, though it can be used for classification tasks. An implementation of the unsupervised SOM algorithm is provided that can apply labels to the map so that it can be used for classification. The SOM implementation also supports supervised learning known as LVQ-SOM where codebook vectors in the neighbourhood of the best matching unit (BMU) that do not match the class of the data instance are pushed further away rather than closer to the data instance. No map visualisation techniques have been provided, but the vectors can be retrieved (debug mode in the GUI, or API call) and displayed using Kohonen's SOM_PAK, in MathLab or your visualisation application of choice.
- SOM - Self-Organizing Map algorithm that supports supervised and unsupervised learning and dynamical labelling or post-training map labelling.
- Multi-Pass SOM- The recommended usage of the SOM algorithm where two passes are performed on the same underlying model. The first pass is a rough ordering pass with large neighbourhood, learning rate and small training time. The second pass is the fine tuning pass that has a longer training time, small initial neighbourhood value and smaller initial learning rate.
Supports 4 Feed-Forward Neural Network algorithms
The default MultilayerPerceptron implementation provided with Weka is good, but did not meet my needs. I have added feed-forward neural network algorithms that provide a clean implementation, a simpler interface and more readable/maintainable source code. I believe that the back propagation implementation is also faster, though requires explicit specification of neurons at each layer.
- Perceptron - Perceptron learning rule, only supports binary data, provides a binary output and uses the Sign transfer function
- Widrow Hoff - Widrow Hoff learning rule, only supports binary data, provides a binary output and uses the Step transfer function
- Back Propagation - Standard implementation with support for up to 3 layers (user defined), momentum, weight decay, 5 transfer functions, etcetera
- Bold Driver Back Propagation- (Vogl's Method) Same as back propagation except the learning rate is dynamically determined
Supports 1 new Filter
- NormaliseMidpointZero - Located under UnsupervisedFilters this filter was implemented to scale data values to between -1.0 and +1.0 for use with the hyperbolic tangent (tanh) transfer function in feed forward neural networks
Algorithm Usage Notes and Heuristics
- Enable algorithm debugging to see how many of the codebook vectors are actually used by the end of training. This is most useful for knowing when to increase or decrease the number of codebook vectors thus increasing model accuracy.
- Less codebook vectors typically need less training, obviously more codebook vectors need more training iterations.
- Normalisation or standardisation of the dataset may improve the accuracy of an LVQ model, especially those datasets with many (> 15) numeric attributes
- It is usually a good idea to have enough training iterations to expose all training data to the model, usually a few times
- Performing feature selection (removal of unnecessary attributes) before running the algorithm will increase speed
- Interestingly, the SOM algorithm can pick up on features that the LVQ cannot pick up as easily, sometimes producing superior models
- LVQ only supports classification (not regression) type problems
- LVQ supports training and test data with missing values
- The LVQ algorithm can be used as a data reduction technique, where the resulting codebook vectors are taken and used in visualisation techniques such as Sammon's Mapping or a PCA Projection
- Multi-Pass LVQ is the recommended usage of the algorithm (Kohonen), if individual algorithms are used by themselves, longer training times and larger learning rates should be used
- Hierarchical LVQ can be used to over-learn the training data, providing superior accuracy for many datasets. It is possible for this algorithm to learning the training data exactly by setting both percentage parameters to 0 and selecting a base model and sub model that perform well
- I had an idea (let me know if its not original) for the concept of "voting" that was added to all LVQ models, where if enabled each codebook vector is able to dynamically select a classification based on the class distribution of data instances allocated to that codebook vector so far. This is used during training, but can also be enabled to use the model as an on-line learner (most powerful!).
- When using the feed forward neural network algorithms it is imperative that the dataset are scaled to the range of the selected transfer function. Example; for the Sigmoid transfer function data should be normalised (0 to 1), For Tanh use the new filter NormaliseMidpointZero (-1 to +1).
Algorithm Parameter Recommendations
Multi-Pass LVQ
- Pass 1
- OLVQ1 recommended for fast approximation
- intended to quickly and roughly approximate the class distribution
- larger learning rates and shorter training times should be used
- learning rate (0.1 to 0.4 - say 0.3)
- training iterations (30 to 50 times the number of codebook vectors)
- Pass 2
- LVQ1 or LVQ3 recommended for fine tuning
- intended to slowly and precisely finetune the codebook vectors to the class distribution in the data
- smaller learning rates and longer training times should be used
- learning rate (< 0.1 - say 0.05)
- training iterations (about 10 times that of the first pass)
General
- The number of codebook vectors should be sufficient to summarise the class distribution in the problem (20 to 50 is suitable for most "toy" classification problems, Kohonen uses thousands in some case studies)
- The more complex the class distribution, the more codebook vectors that will be required to approximate the problem
- Good values for the window width on LVQ2.1 and LVQ3 are between 0.2 and 0.3
- The epsilon value in LVQ3 and OLVQ3 should be small for small window values (0.1 to 0.3)
- For many classification problems the multi-pass approach can perform as well as or outperform any of the other single pass LVQ algorithms in half the training time, and with half as many neurons (see example results), thus is recommended for improved results
- OLVQ1 is faster at approximating a solution and should be used as a rough gage of the algorithms performance for a given problem