Monday, September 29, 2014

Random sample consensus (RANSAC)

Random sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data which contains outliers. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. The algorithm was first published by Fischler and Bolles at SRI International in 1981.

A basic assumption is that the data consists of "inliers", i.e., data whose distribution can be explained by some set of model parameters, though may be subject to noise, and "outliers" which are data that do not fit the model. The outliers can come, e.g., from extreme values of the noise or from erroneous measurements or incorrect hypotheses about the interpretation of data. RANSAC also assumes that, given a (usually small) set of inliers, there exists a procedure which can estimate the parameters of a model that optimally explains or fits this data.


Matlab code:

Python code:

Just for fun RANSAC song:

Corner Detector

Detecting corners is often a good first step in computer vision.  If you can match corners from two images you are well on your way to figuring out how they fit together for example.

Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D modelling and object recognition. Corner detection overlaps with the topic of interest point detection.

Lecture slide decks:



YouTube video:

Matlab code:

Deep Learning

Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence.

Website of organization dedicated to all things deep learning:

Multiple tutorials and a wealth of other info:

Peer review paper:

Theoretical results suggest that in order to learn the kind of complicated functions that can represent high- level abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas. This paper discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.

Deep learning implementations in many languages:

Restricted Boltzmann machine

Learning to use RBM's is on my todo list...I'll update when I get around to it.  RBM's are just one technique for deep learning.

The Restricted Boltzmann Machine (RBM) has become increasingly popular of late after its success in the Netflix prize competition and other competitions. Most of the inventive work behind RBMs was done by Geoffrey Hinton. In particular the training of RBMs using an algorithm called "Contrastive Divergence" (CD). CD is very similar to gradient descent. A good consequence of the CD is its ability to "dream". Of the various machine learning methods out there, the RBM is the only one which has this capacity baked in implicitly.

This is some Matlab code a guy made of a class he was taking.  It is probably not great but if you are working in Matlab it is probably better than starting from scratch:

RBM tutorial:

RBM in scikit-learn:

A Practical Guide to Training Restricted Boltzmann Machines:

Multi-scale Oriented Patches (MOPS)

MOPS is a useful tool for many computer vision and related applications. They key is that it gives you a way to generate features that are (mostly) invariant to rotation and scale to use as input for machine learning/artificial intelligence algorithms.

Technical report from Microsoft Research:

Nice slide deck on MOPS:

Example app from Microsoft Research:

VLFeat open source library computer vision algorithms

The VLFeat open source library implements popular computer vision algorithms specializing in image understanding and local featurexs extraction and matching. Algorithms incldue Fisher Vector, VLAD, SIFT, MSER, k-means, hierarchical k-means, agglomerative information bottleneck, SLIC superpixes, quick shift superpixels, large scale SVM training, and many others. It is written in C for efficiency and compatibility, with interfaces in MATLAB for ease of use, and detailed documentation throughout. It supports Windows, Mac OS X, and Linux.


Note: AdaBoost is extremely sensitive to mislabeled samples in your data. For example, if you are trying to classify transactions as either "fraud" or "not fraud" if you have even one mislabeled, then the classifier will over learn that one bad sample and be useless.  There are other versions of boosting algorithms that try to overcome this but if you have data for which you can not be sure of the labels then consider using some other method.

AdaBoost, short for "Adaptive Boosting", is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire who won the prestigious "Gödel Prize" in 2003 for their work. It can be used in conjunction with many other types of learning algorithms to improve their performance. The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. In some problems, however, it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (i.e., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner.

While every learning algorithm will tend to suit some problem types better than others, and will typically have many different parameters and configurations to be adjusted before achieving optimal performance on a dataset, AdaBoost (with decision trees as the weak learners) is often referred to as the best out-of-the-box classifier. When used with decision tree learning, information gathered at each stage of the AdaBoost algorithm about the relative 'hardness' of each training sample is fed into the tree growing algorithm such that later trees tend to focus on harder to classify examples.

Very nice AdaBoost slide deck:

Matlab and C++ implementations:

Viola Jones object detection framework

The Viola–Jones object detection framework is the first object detection framework to provide competitive object detection rates in real-time proposed in 2001 by Paul Viola and Michael Jones. Although it can be trained to detect a variety of object classes, it was motivated primarily by the problem of face detection. This algorithm is implemented in OpenCV as cvHaarDetectObjects().

YouTube video explaining Viola Jones face detection:

This is a slide deck explaining Viola Jones face detection:

Haar features are not an AI or ML algorithm themselves, but instead are often a useful tool for transforming data into a format that an AI or ML algorithm can use.

The Wikipedia article only talks about them with respect to object recognition in visible light images.  However, they can be used with images from any spectrum or even any type of data that can be represented as X,Y,Z such as a digital elevation map of terrain.

This pdf has a good explanation of how to use Haar features:

Friday, September 19, 2014

Genetic Algorithms: Cool Name & Damn Simple

Nice GA tutorial.

Genetic algorithms are a mysterious sounding technique in mysterious sounding field--artificial intelligence. This is the problem with naming things appropriately. When the field was labeled artificial intelligence, it meant using mathematics to artificially create the semblance of intelligence, but self-engrandizing researchers and Isaac Asimov redefined it as robots.

The name genetic algorithms does sound complex and has a faintly magical ring to it, but it turns out that they are one of the simplest and most-intuitive concepts you'll encounter in A.I.

Genetic Algorithms: Cool Name & Damn Simple - Irrational Exuberance

Curve fitting with Pyevolve

This is a very nice tutorial for genetic algorithms.  It uses pyevolve but the tutorial part is useful even if you are using a different language/implementation for GA.

A Coder's Musings: Curve fitting with Pyevolve

pygene - simple python genetic algorithms/programming library

I played around with this a bit before I decided on pyevolve instead.  However, pygene might suit your needs better.

pygene - simple python genetic algorithms/programming library

blaa/PyGene · GitHub

Genetic Algorithms tutorial

Great tutorial and introduction to genetic algorithms.  There are java applets that you can play with to see how GA's work.

These pages introduce some fundamentals of genetic algorithms. Pages are intended to be used for learning about genetic algorithms without any previous knowledge from this area. Only some knowledge of computer programming is assumed. You can find here several interactive Java applets demonstrating work of genetic algorithms.

As the area of genetic algorithms is very wide, it is not possible to cover everything in these pages. But you should get some idea, what the genetic algorithms are and what they could be useful for. Do not expect any sophisticated mathematics theories here.

Main page - Introduction to Genetic Algorithms - Tutorial with Interactive Java Applets

Pyevolve genetic algorithm python software

I have used this software to successfully create a genetic algorithm python script that I use to tune parameters on extra tree classifiers and RDF classifiers.  It is pretty easy to use and you can make almost any type of GA with it.  It is open source so you can go in a tinker with it.

Welcome to Pyevolve documentation ! — Pyevolve v0.5 documentation

There is a great pyevolve tutorial here:

A Coder's Musings: Curve fitting with Pyevolve

Random Forest Tutorial

This is slide deck from a lecture.  It is a good introduction to RDF's with advantages and disadvantages compared with other methods.

Neural Networks for Machine Learning

This is an online course from the University of Toronto.  If you can spend about 8 hours a week for 8 weeks you should be thoroughly familiar with ANN's.

Learn about artificial neural networks and how they're being used for machine learning, as applied to speech and object recognition, image segmentation, modeling language and human motion, etc. We'll emphasize both the basic algorithms and the practical tricks needed to get them to work well.

Neural Networks for Machine Learning | Coursera

Basic Neural Network Tutorial : C++ Implementation and Source Code

This tutorial is in two parts, one is the theory of ANN and the other part is a C++ implementation with hints on how to modify it efficiently.  If you are new to neural networks and you are a C++ programmer this is a great place to start.

Basic Neural Network Tutorial – Theory | Taking Initiative

Basic Neural Network Tutorial : C++ Implementation and Source Code | Taking Initiative

Wednesday, September 17, 2014

Theoretical Machine Learning

Introductory lecture notes from a class taught by Professor Rob Schapire who is a leader in the field.  This is a good place to start for a beginner in machine learning.

Decision Forests for Computer Vision and Medical Image Analysis book from Microsoft Research

This is the best resource I have found for understanding and using decision tree based machine learning algorithms.  It is very thorough on both theory and practical use with comparisons with other algorithms such as SVM's and AdaBoost.

There are also a bunch of supplemental materials available for free including a very nice PowerPoint with great explanations.  The supplemental materials include C++ and C# code.

Decision Forests - Microsoft Research

Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning

This technical report from Microsoft Research is an A-Z tutorial on how decision tree machine learning algorithms work.  It includes in depth explanations of random forests, extra tree classifiers, random ferns and other variations for both classification and regression.

It is in report format and compares decision forests to other types of machine learning algorithms such as SVM.  Some simple toy problems give the basics and some real life applications such as body position recognition and medical image are included.

There is also an accompanying PowerPoint with some nice animations. is not available

Alternatives to support vector machines in neuroimaging ensembles of decision trees for classification and information mapping with predictive models

This is a nice tutorial for using random decision forests for classifying medical images.  There is a comparison with some other methods, especially SVM's. is not available

Outside the Closed World: On Using Machine Learning for Network Intrusion Detection

This paper has a good discussion of the use of machine learning for intrusion detection and network security.

In network intrusion detection research, one popular strategy for finding attacks is monitoring a network's activity for anomalies: deviations from profiles of normality previously learned from benign traffic, typically identified using tools borrowed from the machine learning community. However, despite extensive academic research one finds a striking gap in terms of actual deployments of such systems: compared with other intrusion detection approaches, machine learning is rarely employed in operational "real world" settings. We examine the differences between the network intrusion detection problem and other areas where machine learning regularly finds much more success. Our main claim is that the task of finding attacks is fundamentally different from these other applications, making it significantly harder for the intrusion detection community to employ machine learning effectively. We support this claim by identifying challenges particular to network intrusion detection, and provide a set of guidelines meant to strengthen future research on anomaly detection.

IEEE Xplore Full-Text HTML : Outside the Closed World: On Using Machine Learning for Network Intrusion Detection

Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey

This journal article discusses the application of various machine learning methods to malware detection and information security.

This research synthesizes a taxonomy for classifying detection methods of new malicious code by Machine Learning (ML) methods based on static features extracted from executables. The taxonomy is then operationalized to classify research on this topic and pinpoint critical open research issues in light of emerging threats. The article addresses various facets of the detection challenge, including: file representation and feature selection methods, classification algorithms, weighting ensembles, as well as the imbalance problem, active learning, and chronological evaluation. From the survey we conclude that a framework for detecting new malicious code in executable files can be designed to achieve very high accuracy while maintaining low false positives (i.e. misclassifying benign files as malicious). The framework should include training of multiple classifiers on various types of features (mainly OpCode and byte n-grams and Portable Executable Features), applying weighting algorithm on the classification results of the individual classifiers, as well as an active learning mechanism to maintain high detection accuracy. The training of classifiers should also consider the imbalance problem by generating classifiers that will perform accurately in a real-life situation where the percentage of malicious files among all files is estimated to be approximately 10%.

Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey

One-Class Support Vector Machines: Methods and Applications

This tutorial on SVM's is a decent one for a beginner.  It is a slide deck from a presentation.

A Comparison of Methods for Multiclass Support Vector Machines

Support vector machines (SVMs) were originally designed for binary classification. How to effectively extend it for multiclass classification is still an ongoing research issue. Several methods have been proposed where typically we construct a multiclass classifier by combining several binary classifiers. Some authors also proposed methods that consider all classes at once. As it is computationally more expensive to solve multiclass problems, comparisons of these methods using large-scale problems have not been seriously conducted. Especially for methods solving multiclass SVM in one step, a much larger optimization problem is required so up to now experiments are limited to small data sets. In this paper we give decomposition implementations for two such “all-together” methods. We then compare their performance with three methods based on binary classifications: “one-against-all,” “one-against-one,” and directed acyclic graph SVM (DAGSVM). Our experiments indicate that the “one-against-one” and DAG methods are more suitable for practical use than the other methods. Results also show that for large problems methods by considering all data at once in general need fewer support vectors.

Nonlinear regression in environmental sciences by support vector machines combined with evolutionary strategy

A hybrid algorithm combining support vector regression with evolutionary strategy (SVR-ES) is proposed for predictive models in the environmental sciences. SVR-ES uses uncorrelated mutation with p step sizes to find the optimal SVR hyper-parameters. Three environmental forecast datasets used in the WCCI-2006 contest – surface air temperature, precipitation and sulphur dioxide concentration – were tested. We used multiple linear regression (MLR) as benchmark and a variety of machine learning techniques including bootstrap-aggregated ensemble artificial neural network (ANN), SVR-ES, SVR with hyper-parameters given by the Cherkassky–Ma estimate, the M5 regression tree, and random forest (RF). We also tested all techniques using stepwise linear regression (SLR) first to screen out irrelevant predictors. We concluded that SVR-ES is an attractive approach because it tends to outperform the other techniques and can also be implemented in an almost automatic way. The Cherkassky–Ma estimate is a useful approach for minimizing the mean absolute error and saving computational time related to the hyper-parameter search. The ANN and RF are also good options to outperform multiple linear regression (MLR). Finally, the use of SLR for predictor selection can dramatically reduce computational time and often help to enhance accuracy.

Nonlinear regression in environmental sciences by support vector machines combined with evolutionary strategy

An Idiot’s guide to Support vector machines (SVMs)

This tutorial is a slide deck from a presentation but it is a good introduction to SVM's for beginners.

Support Vector Machines (SVMs) organization

This site is a good compilation of everything related to SVM's.  There are links to many academic papers, tutorials, applications and much more.  There are also links to learn about all the mathematics necessary to really understand how SVM's work.  I am impressed by the fact that the site creators are not just cheerleaders for SVM's, they do a good job of stating the advantages and disadvantages of SVM's as well as comparing them to competing machine learning methods.

Support Vector Machines

Texturecam: Autonomous Image Analysis For Astrobiology Survey

This is a paper about a project to include software on robotic rover spacecraft that uses a random forest algorithm to allow the rover to autonomously classify rocks by texture.  This helps the rover to search for signs of life.

LIBSVM -- A Library for Support Vector Machines

This is the library for support vector machines (SVM's).  They have a version for many different programming languages including C++, Python, R, MATLAB, Perl, Ruby, Weka, Common LISP, CLISP, Haskell, OCaml, LabVIEW, and PHP interfaces. C# .NET code and CUDA extension is available.

If you are new to SVM's there is a cool java applet and a javascript toy that will show you how they work.

LIBSVM -- A Library for Support Vector Machines

DTREG SVM - Support Vector Machines

This is a commercial machine learning package that I have not used.  The page in the link contains a very good explanation of how support vector machines (SVM) work.

SVM - Support Vector Machines

inspyred 1.0 genetic algorithm

I have not used this package but it looks well documented.

inspyred is a free, open source framework for creating biologically-inspired computational intelligence algorithms in Python, including evolutionary computation, swarm intelligence, and immunocomputing. Additionally, inspyred provides easy-to-use canonical versions of many bio-inspired algorithms for users who don't need much customization.

inspyred 1.0 : Python Package Index

Genetic Algorithm Library at Code Project

Genetic Algorithm Library is a C++ library for building genetic algorithms.  The home page for the project has an excellent tutorial that not only explains how to use the library but is also a great introduction to genetic algorithms for someone new to the field.

Genetic Algorithm Library - CodeProject

Exploiting tree-based variable importances to selectively identify relevant variables

This paper proposes a novel statistical procedure based on permutation tests for extracting a subset of truly relevant variables from multivariate importance rankings derived from tree-based supervised learning methods. It shows also that the direct extension of the classical approach based on permutation tests for estimating false discovery rates of univariate variable scoring procedures does not extend very well to the case of multivariate tree-based importance measures.

A Data-Driven Mapping of Five ACT-R Modules on the Brain

In this paper we present a new, data-driven mapping of five ACT-R modules on the brain. In the last decade, many studies have been published that evaluated ACT-R models based on their ability to predict fMRI data in certain predefined brain regions. However, these predefined regions were based on a reading of the literature, and might not be optimal. Currently, we used the results of a model-based fMRI analysis of five datasets to define a new brain mapping for the problem state, declarative memory, manual, visual, and aural modules. Both the original and the new mapping were applied to data of an experiment that elicited differential activity in these five modules; the results were compared to model predictions. The new mapping performed slightly better for the problem state, declarative memory, aural, and manual modules, but not for the visual module. In addition, it provides a more principled way of validating ACT-R models. Although the mapping is ACT-R specific, the methodology can be use to map any cognitive architecture or model to the brain.

An Integrated Theory of the Mind

Adaptive control of thought–rational (ACT–R; J. R. Anderson & C. Lebiere, 1998) has evolved into a theory that consists of multiple modules but also explains how these modules are integrated to produce coherent cognition. The perceptual-motor modules, the goal module, and the declarative memory module are presented as examples of specialized systems in ACT–R. These modules are associated with distinct cortical regions. These modules place chunks in buffers where they can be detected by a production system that responds to patterns of information in the buffers. At any point in time, a single production rule is selected to respond to the current pattern. Subsymbolic processes serve to guide the selection of rules to fire as well as the internal operations of some modules. Much of learning involves tuning of these subsymbolic processes. A number of simple and complex empirical examples are described to illustrate how these modules function singly and in concert

Adaptive control of thought–rational ACT-R

Adaptive control of thought–rational ACT-R is a cognitive architecture: a theory for simulating and understanding human cognition. Researchers working on ACT-R strive to understand how people organize knowledge and produce intelligent behavior. As the research continues, ACT-R evolves ever closer into a system which can perform the full range of human cognitive tasks: capturing in great detail the way we perceive, think about, and act on the world.


Large-scale prediction of long disordered regions in proteins using random forests

Many proteins contain disordered regions that lack fixed three-dimensional (3D) structure under physiological conditions but have important biological functions. Prediction of disordered regions in protein sequences is important for understanding protein function and in high-throughput determination of protein structures. Machine learning techniques, including neural networks and support vector machines have been widely used in such predictions. Predictors designed for long disordered regions are usually less successful in predicting short disordered regions. Combining prediction of short and long disordered regions will dramatically increase the complexity of the prediction algorithm and make the predictor unsuitable for large-scale applications. Efficient batch prediction of long disordered regions alone is of greater interest in large-scale proteome studies.

BMC Bioinformatics | Full text | Large-scale prediction of long disordered regions in proteins using random forests

Extremely randomized trees by Pierre Geurts Damien Ernst Louis Wehenkel

This paper proposes a new tree-based ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter. We evaluate the robustness of the default choice of this parameter, and we also provide insight on how to adjust it in particular situations. Besides accuracy, the main strength of the resulting algorithm is computational efficiency. A bias/variance analysis of the Extra-Trees algorithm is also provided as well as a geometrical and a kernel characterization of the models induced.