I created a Python function to compute the K-category correlation coefficient. K-category correlation is a measure of classification performance and may be considered a multiclass generalization of the Matthews correlation coefficient.
Python function to compute K-category correlation coefficient at Github.
Academic paper:
Comparing two K-category assignments by a K-category correlation coefficient
Abstract
Predicted assignments of biological sequences are often evaluated by Matthews correlation coefficient. However, Matthews correlation coefficient applies only to cases where the assignments belong to two categories, and cases with more than two categories are often artificially forced into two categories by considering what belongs and what does not belong to one of the categories, leading to the loss of information. Here, an extended correlation coefficient that applies to K-categories is proposed, and this measure is shown to be highly applicable for evaluating prediction of RNA secondary structure in cases where some predicted pairs go into the category “unknown” due to lack of reliability in predicted pairs or unpaired residues. Hence, predicting base pairs of RNA secondary structure can be a three-category problem. The measure is further shown to be well in agreement with existing performance measures used for ranking protein secondary structure predictions.
Paper author's server and software is available at http://rk.kvl.dk/
Showing posts with label python. Show all posts
Showing posts with label python. Show all posts
Sunday, April 2, 2017
Friday, September 4, 2015
A Neural Network in 11 lines of Python
This is an excellent tutorial on neural networks that does a good job of explaining not only how they work but why they work as well. The python code is very easy to follow.
There is code for a very simple example and a more advanced one.
https://iamtrask.github.io/2015/07/12/basic-python-network/
see also:
http://denson-data-science.blogspot.com/2015/09/neural-network-step-by-step.html
There is code for a very simple example and a more advanced one.
https://iamtrask.github.io/2015/07/12/basic-python-network/
see also:
http://denson-data-science.blogspot.com/2015/09/neural-network-step-by-step.html
Neural Network: A Step by Step Backpropagation Example
This great neural network tutorial goes step-by-step through backpropagation in training a neural network. There is also companion python code.
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/comment-page-1/
see also:
http://denson-data-science.blogspot.com/2015/09/a-neural-network-in-11-lines-of-python.html
http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/comment-page-1/
see also:
http://denson-data-science.blogspot.com/2015/09/a-neural-network-in-11-lines-of-python.html
Tuesday, April 7, 2015
nolearn and lasagne tutorial
This short notebook is meant to help you getting started with nolearn and lasagne in order to train a neural net and make a submission to the Otto Group Product Classification Challenge.
http://nbviewer.ipython.org/github/ottogroup/kaggle/blob/master/Otto_Group_Competition.ipynb
Wednesday, February 18, 2015
Easily distributing a parallel IPython Notebook on a cluster
I haven't tried this yet, but it is high on my todo list:
http://twiecki.github.io/blog/2014/02/24/ipython-nb-cluster/
http://twiecki.github.io/blog/2014/02/24/ipython-nb-cluster/
Thursday, January 8, 2015
Natural Language Processing with Python
This is a book about Natural Language Processing. By "natural language" we mean a language that is used for everyday communication by humans; languages like English, Hindi or Portuguese. In contrast to artificial languages such as programming languages and mathematical notations, natural languages have evolved as they pass from generation to generation, and are hard to pin down with explicit rules. We will take Natural Language Processing — or NLP for short — in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles. At the other extreme, NLP involves "understanding" complete human utterances, at least to the extent of being able to give useful responses to them.
Online book:
http://www.nltk.org/book/ch00.html
NLTK 3.0 documentation:
http://www.nltk.org/
Python for Data Science
This short primer on Python is designed to provide a rapid "on-ramp" to enable computer programmers who are already familiar with concepts and constructs in other programming languages learn enough about Python to facilitate the effective use of open-source and proprietary Python-based machine learning and data science tools.
http://nbviewer.ipython.org/github/gumption/Python_for_Data_Science/blob/master/1_Introduction.ipynb
Useful Pandas Features
A tutorial on 10 useful Pandas features:
http://manishamde.github.io/blog/2013/03/07/pandas-and-python-top-10/
pandas Ecosystem
Increasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. This is encouraging because it means pandas is not only helping users to handle their data tasks but also that it provides a better starting point for developers to build powerful and more focused data tools. The creation of libraries that complement pandas’ functionality also allows pandas development to remain focused around it’s original requirements.
This is an in-exhaustive list of projects that build on pandas in order to provide tools in the PyData space.
http://pandas.pydata.org/pandas-docs/version/0.15.0/ecosystem.html
This is an in-exhaustive list of projects that build on pandas in order to provide tools in the PyData space.
http://pandas.pydata.org/pandas-docs/version/0.15.0/ecosystem.html
Seaborn
Seaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.
https://github.com/mwaskom/seaborn
Examples/tutorial:
http://nbviewer.ipython.org/github/mwaskom/seaborn/blob/master/examples/plotting_distributions.ipynb
https://github.com/mwaskom/seaborn
Examples/tutorial:
http://nbviewer.ipython.org/github/mwaskom/seaborn/blob/master/examples/plotting_distributions.ipynb
Vincent: A Python to Vega translator
The folks at Trifacta are making it easy to build visualizations on top of D3 with Vega. Vincent makes it easy to build Vega with Python.
https://github.com/wrobstory/vincent
https://github.com/wrobstory/vincent
Bokeh
Bokeh is a Python interactive visualization library for large datasets that natively uses the latest web technologies. Its goal is to provide elegant, concise construction of novel graphics in the style of Protovis/D3, while delivering high-performance interactivity over large data to thin clients.
http://bokeh.pydata.org/en/latest/tutorial/index.html
http://bokeh.pydata.org/en/latest/tutorial/index.html
ggplot
ggplot is an extremely un-pythonic package for doing exactly what ggplot2 does. The goal of the package is to mimic the ggplot2 API. This makes it super easy for people coming over from R to use, and prevents you from having to re-learn how to plot stuff.
https://github.com/yhat/ggplot
https://github.com/yhat/ggplot
Qgrid
Qgrid is an IPython extension which uses SlickGrid to render pandas DataFrames within an IPython notebook. It's being developed for use in Quantopian's hosted research environment, and this repository holds the latest source code:
https://github.com/quantopian/qgrid
Demo:
http://nbviewer.ipython.org/github/quantopian/qgrid/blob/master/qgrid_demo.ipynb
https://github.com/quantopian/qgrid
Demo:
http://nbviewer.ipython.org/github/quantopian/qgrid/blob/master/qgrid_demo.ipynb
Monday, September 29, 2014
Random sample consensus (RANSAC)
Random sample consensus (RANSAC) is an iterative method to estimate parameters of a mathematical model from a set of observed data which contains outliers. It is a non-deterministic algorithm in the sense that it produces a reasonable result only with a certain probability, with this probability increasing as more iterations are allowed. The algorithm was first published by Fischler and Bolles at SRI International in 1981.
A basic assumption is that the data consists of "inliers", i.e., data whose distribution can be explained by some set of model parameters, though may be subject to noise, and "outliers" which are data that do not fit the model. The outliers can come, e.g., from extreme values of the noise or from erroneous measurements or incorrect hypotheses about the interpretation of data. RANSAC also assumes that, given a (usually small) set of inliers, there exists a procedure which can estimate the parameters of a model that optimally explains or fits this data.
http://en.wikipedia.org/wiki/RANSAC
Tutorial:
http://vision.ece.ucsb.edu/~zuliani/Research/RANSAC/docs/RANSAC4Dummies.pdf
Matlab code:
http://www.mathworks.com/discovery/ransac.html
Python code:
http://wiki.scipy.org/Cookbook/RANSAC
Just for fun RANSAC song:
https://www.youtube.com/watch?v=1YNjMxxXO-E
A basic assumption is that the data consists of "inliers", i.e., data whose distribution can be explained by some set of model parameters, though may be subject to noise, and "outliers" which are data that do not fit the model. The outliers can come, e.g., from extreme values of the noise or from erroneous measurements or incorrect hypotheses about the interpretation of data. RANSAC also assumes that, given a (usually small) set of inliers, there exists a procedure which can estimate the parameters of a model that optimally explains or fits this data.
http://en.wikipedia.org/wiki/RANSAC
Tutorial:
http://vision.ece.ucsb.edu/~zuliani/Research/RANSAC/docs/RANSAC4Dummies.pdf
Matlab code:
http://www.mathworks.com/discovery/ransac.html
Python code:
http://wiki.scipy.org/Cookbook/RANSAC
Just for fun RANSAC song:
https://www.youtube.com/watch?v=1YNjMxxXO-E
Corner Detector
Detecting corners is often a good first step in computer vision. If you can match corners from two images you are well on your way to figuring out how they fit together for example.
Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D modelling and object recognition. Corner detection overlaps with the topic of interest point detection.
http://en.wikipedia.org/wiki/Corner_detection
Lecture slide decks:
http://www.cse.psu.edu/~rcollins/CSE486/lecture06.pdf
http://courses.cs.washington.edu/courses/cse577/05sp/notes/harris.pdf
Tutorials:
Python/OpenCV
http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_feature2d/py_features_harris/py_features_harris.html
YouTube video:
https://www.youtube.com/watch?v=vkWdzWeRfC4
Matlab code:
http://www.mathworks.com/matlabcentral/fileexchange/9272-harris-corner-detector
Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image. Corner detection is frequently used in motion detection, image registration, video tracking, image mosaicing, panorama stitching, 3D modelling and object recognition. Corner detection overlaps with the topic of interest point detection.
http://en.wikipedia.org/wiki/Corner_detection
Lecture slide decks:
http://www.cse.psu.edu/~rcollins/CSE486/lecture06.pdf
http://courses.cs.washington.edu/courses/cse577/05sp/notes/harris.pdf
Tutorials:
Python/OpenCV
http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_feature2d/py_features_harris/py_features_harris.html
YouTube video:
https://www.youtube.com/watch?v=vkWdzWeRfC4
Matlab code:
http://www.mathworks.com/matlabcentral/fileexchange/9272-harris-corner-detector
Restricted Boltzmann machine
Learning to use RBM's is on my todo list...I'll update when I get around to it. RBM's are just one technique for deep learning.
The Restricted Boltzmann Machine (RBM) has become increasingly popular of late after its success in the Netflix prize competition and other competitions. Most of the inventive work behind RBMs was done by Geoffrey Hinton. In particular the training of RBMs using an algorithm called "Contrastive Divergence" (CD). CD is very similar to gradient descent. A good consequence of the CD is its ability to "dream". Of the various machine learning methods out there, the RBM is the only one which has this capacity baked in implicitly.
http://bayesianthink.blogspot.com/2013/05/the-restricted-boltzmann-machine-rbm.html#.VCnWzikijjI
This is some Matlab code a guy made of a class he was taking. It is probably not great but if you are working in Matlab it is probably better than starting from scratch:
https://code.google.com/p/matrbm/
RBM tutorial:
http://deeplearning.net/tutorial/rbm.html#rbm
RBM in scikit-learn:
http://scikit-learn.org/stable/modules/neural_networks.html
A Practical Guide to Training Restricted Boltzmann Machines:
http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf
The Restricted Boltzmann Machine (RBM) has become increasingly popular of late after its success in the Netflix prize competition and other competitions. Most of the inventive work behind RBMs was done by Geoffrey Hinton. In particular the training of RBMs using an algorithm called "Contrastive Divergence" (CD). CD is very similar to gradient descent. A good consequence of the CD is its ability to "dream". Of the various machine learning methods out there, the RBM is the only one which has this capacity baked in implicitly.
http://bayesianthink.blogspot.com/2013/05/the-restricted-boltzmann-machine-rbm.html#.VCnWzikijjI
This is some Matlab code a guy made of a class he was taking. It is probably not great but if you are working in Matlab it is probably better than starting from scratch:
https://code.google.com/p/matrbm/
RBM tutorial:
http://deeplearning.net/tutorial/rbm.html#rbm
RBM in scikit-learn:
http://scikit-learn.org/stable/modules/neural_networks.html
A Practical Guide to Training Restricted Boltzmann Machines:
http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf
Friday, September 19, 2014
Genetic Algorithms: Cool Name & Damn Simple
Nice GA tutorial.
Genetic algorithms are a mysterious sounding technique in mysterious sounding field--artificial intelligence. This is the problem with naming things appropriately. When the field was labeled artificial intelligence, it meant using mathematics to artificially create the semblance of intelligence, but self-engrandizing researchers and Isaac Asimov redefined it as robots.
The name genetic algorithms does sound complex and has a faintly magical ring to it, but it turns out that they are one of the simplest and most-intuitive concepts you'll encounter in A.I.
Genetic Algorithms: Cool Name & Damn Simple - Irrational Exuberance
Genetic algorithms are a mysterious sounding technique in mysterious sounding field--artificial intelligence. This is the problem with naming things appropriately. When the field was labeled artificial intelligence, it meant using mathematics to artificially create the semblance of intelligence, but self-engrandizing researchers and Isaac Asimov redefined it as robots.
The name genetic algorithms does sound complex and has a faintly magical ring to it, but it turns out that they are one of the simplest and most-intuitive concepts you'll encounter in A.I.
Genetic Algorithms: Cool Name & Damn Simple - Irrational Exuberance
Curve fitting with Pyevolve
This is a very nice tutorial for genetic algorithms. It uses pyevolve but the tutorial part is useful even if you are using a different language/implementation for GA.
A Coder's Musings: Curve fitting with Pyevolve
A Coder's Musings: Curve fitting with Pyevolve
pygene - simple python genetic algorithms/programming library
I played around with this a bit before I decided on pyevolve instead. However, pygene might suit your needs better.
pygene - simple python genetic algorithms/programming library
blaa/PyGene · GitHub
pygene - simple python genetic algorithms/programming library
blaa/PyGene · GitHub
Subscribe to:
Posts (Atom)