tag:blogger.com,1999:blog-67236795785089085042017-09-07T17:26:41.869-07:00Data Science, Machine Learning & Artificial IntelligenceDenson Smithnoreply@blogger.comBlogger69125tag:blogger.com,1999:blog-6723679578508908504.post-22711910150846607232017-04-02T15:36:00.000-07:002017-04-02T15:44:07.814-07:00Python function to compute K-category correlation coefficientI created a Python function to compute the K-category correlation coefficient. K-category correlation is a measure of classification performance and may be considered a multiclass generalization of the <a href="https://en.wikipedia.org/wiki/Matthews_correlation_coefficient">Matthews correlation coefficient.</a><br /><br /><a href="https://github.com/denson/compute_RkCC">Python function to compute K-category correlation coefficient at Github</a>.<br /><br /><br />Academic paper:<br /><br /><a href="http://www.sciencedirect.com/science/article/pii/S1476927104000799">Comparing two K-category assignments by a K-category correlation coefficient</a><br /><br />Abstract<br /><br />Predicted assignments of biological sequences are often evaluated by Matthews correlation coefficient. However, Matthews correlation coefficient applies only to cases where the assignments belong to two categories, and cases with more than two categories are often artificially forced into two categories by considering what belongs and what does not belong to one of the categories, leading to the loss of information. Here, an extended correlation coefficient that applies to K-categories is proposed, and this measure is shown to be highly applicable for evaluating prediction of RNA secondary structure in cases where some predicted pairs go into the category “unknown” due to lack of reliability in predicted pairs or unpaired residues. Hence, predicting base pairs of RNA secondary structure can be a three-category problem. The measure is further shown to be well in agreement with existing performance measures used for ranking protein secondary structure predictions.<br /><br /><br />Paper author's server and software is available at <a href="http://rk.kvl.dk/">http://rk.kvl.dk/</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-41697334935903797032017-04-01T07:23:00.000-07:002017-04-01T07:23:50.570-07:00Demo of Blended Model Machine Learning TechniqueI started with <a href="https://github.com/emanuele/kaggle_pbr">Emanuele's code</a> and switched to data generated with scikit's "make classification" algorithm. <a href="https://github.com/denson/kaggle_pbr">I also added a Jupyter notebook blending demo : https://github.com/denson/kaggle_pbr</a><br /><br /> The general concept is that if we build multiple different models trained on different samples of our training data we get multiple predictions that are substantially better than chance and that are uncorrelated with each other.<br /><br /> In step 1 we take stratified fold samples of our training data and build multiple models (in this case RDF entropy,RDF-gini ET-entropy,ET-gini and GBT) on each fold. We then use the trained models to predict the training sample not in the training part of this fold. It is super important that you do not use a given model to predict training data that was used to train that model on that fold. We also predict all the test data with each model. These predictions are a way of transforming the training data and the test data into a different space with the predicted probabilities as the transformed information. We take a simple average of the predictions of each type of model (eg RDF-gini) and that becomes the transformed data for the next step. If we have 5 different models as in this case our input data for step 2 will have 5 columns and the same number of rows as the training set and test set respectively.<br /><br /> In step 2 we use a train a logistic regresson on the transformed training data and use it to predict the transformed test data. We take the predicted probabilities from the logistic regression as our final answer.<br /><br /> This method usually results in an improvement over a single highly tuned model for "hard" problems and not "simple" problems. By hard I mean that the decision boundary between classes is highly non-linear. Overlapping classes and non-linear relationships between features contribute to making problems hard.<br /><br /> This academic paper describes the concept:<br /><br /> <a href="http://statistics.berkeley.edu/sites/default/files/tech-reports/367.pdf">Stacked Regressions </a><br /><br /> I found this at Kaggle:<br /><br /> <a href="https://www.kaggle.com/c/bioresponse/discussion/1765">Kaggle competion question</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-85349530500335808212015-09-04T10:46:00.000-07:002015-09-04T10:58:51.317-07:00A Neural Network in 11 lines of PythonThis is an excellent tutorial on neural networks that does a good job of explaining not only <i>how</i> they work but <i>why</i> they work as well. The python code is very easy to follow.<br /><br />There is code for a very simple example and a more advanced one.<br /><br /><a href="https://iamtrask.github.io/2015/07/12/basic-python-network/">https://iamtrask.github.io/2015/07/12/basic-python-network/</a><br /><br />see also:<br /><br /><a href="http://denson-data-science.blogspot.com/2015/09/neural-network-step-by-step.html">http://denson-data-science.blogspot.com/2015/09/neural-network-step-by-step.html</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-24877682068357068072015-09-04T10:39:00.000-07:002015-09-04T10:46:41.263-07:00Neural Network: A Step by Step Backpropagation ExampleThis great neural network tutorial goes step-by-step through backpropagation in training a neural network. There is also companion python code.<br /><br /><a href="http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/comment-page-1/">http://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/comment-page-1/</a><br /><br />see also:<br /><br /><a href="http://denson-data-science.blogspot.com/2015/09/a-neural-network-in-11-lines-of-python.html">http://denson-data-science.blogspot.com/2015/09/a-neural-network-in-11-lines-of-python.html</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-68515806404218083092015-08-25T13:50:00.001-07:002015-08-25T13:50:22.592-07:003 Wrong Ways to Store a Password (And 5 code samples doing it right)Mainly for web development but useful in other contexts as well.<br /><br />http://adambard.com/blog/3-wrong-ways-to-store-a-password/Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-6597949941481426692015-08-25T13:46:00.002-07:002015-08-25T13:46:35.957-07:00Choosing ColormapsFrom the matplotlib documentation:<br /><br /><br /><blockquote class="tr_bq">The idea behind choosing a good colormap is to find a good representation in 3D colorspace for your data set. The best colormap for any given data set depends on many things including:<br /><ul><li>Whether representing form or metric data </li></ul><ul><li>Your knowledge of the data set (e.g., is there a critical value from which the other values deviate?)</li></ul><ul><li>If there is an intuitive color scheme for the parameter you are plotting</li></ul><ul><li>If there is a standard in the field the audience may be expecting</li></ul><br />For many applications, a perceptual colormap is the best choice — one in which equal steps in data are perceived as equal steps in the color space. Researchers have found that the human brain perceives changes in the lightness parameter as changes in the data much better than, for example, changes in hue. Therefore, colormaps which have monotonically increasing lightness through the colormap will be better interpreted by the viewer.</blockquote><br /><a href="http://matplotlib.org/users/colormaps.html"> http://matplotlib.org/users/colormaps.html</a><br /><br /><br />More info from IBM:<br /><br /><a href="http://www.research.ibm.com/people/l/lloydt/color/color.HTM">http://www.research.ibm.com/people/l/lloydt/color/color.HTM</a><br /><br />Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-26380361758060636102015-08-25T12:58:00.000-07:002015-08-25T12:58:44.290-07:00FastML: a great resource for machine learningThere are many useful articles at this site. It is useful for everyone from novice to advanced:<br /><br /><a href="http://fastml.com/">http://fastml.com/</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-35710397216836862342015-08-25T12:56:00.000-07:002015-08-25T12:56:14.918-07:00How to Select the Correct Encryption ApproachThis article is a pretty good start at selecting an encryption method.<br /><br /><a href="http://www.itbusinessedge.com/articles/how-to-select-the-correct-encryption-approach.html?google_editors_picks=true">http://www.itbusinessedge.com/articles/how-to-select-the-correct-encryption-approach.html?google_editors_picks=true</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-58418446315177724342015-08-25T12:54:00.000-07:002015-08-25T12:54:29.221-07:00Shannon EntropyIn information theory, entropy (more specifically, Shannon entropy) is the expected value (average) of the information contained in each message received. 'Messages' don't have to be text; in this context a 'message' is simply any flow of information. The entropy of the message is its amount of uncertainty; it increases when the message is closer to random, and decreases when it is less random. The idea here is that the less likely (i.e. more random) an event is, the more information it provides when it occurs. This seems backwards at first: it seems like messages which have more structure would contain more information, but this is not true. For example, the message 'aaaaaaaaaa' (which appears to be very structured and not random at all [although in fact it could result from a random process]) contains much less information than the message 'alphabet' (which is somewhat structured, but more random) or even the message 'axraefy6h' (which is very random). In information theory, 'information' doesn't necessarily mean useful information; it simply describes the amount of randomness of the message, so in the example above the first message has the least information and the last message has the most information, even though in everyday terms we would say that the middle message, 'alphabet', contains more information than a stream of random letters. Therefore, we would say in information theory that the first message has low entropy, the second has higher entropy, and the third has the highest entropy.<br /><br /><a href="https://en.wikipedia.org/wiki/Entropy_(information_theory)">https://en.wikipedia.org/wiki/Entropy_(information_theory)</a><br /><br />Non-technical article:<br /><br /><a href="http://gizmodo.com/if-it-werent-for-this-equation-you-wouldnt-be-here-1719514472?google_editors_picks=true">http://gizmodo.com/if-it-werent-for-this-equation-you-wouldnt-be-here-1719514472?google_editors_picks=true</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-13519860220168908192015-08-25T12:45:00.002-07:002015-08-25T12:50:11.603-07:00Contrast Limited Adaptive Histogram Equalization (CLAHE)CLAHE is a useful tool for preprocessing images (or video) for computer vision/pattern recognition tasks. It more or less helps you "see" areas of the image that are in shadows.<br /><br />There are many available implementations of this but I like the one in open CV:<br /><br /><a href="http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_histograms/py_histogram_equalization/py_histogram_equalization.html">http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_histograms/py_histogram_equalization/py_histogram_equalization.html</a><br /><br />Note: it is usually better to convert images to HSV colorspace first.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="http://4.bp.blogspot.com/-mNubm2DBWsI/VdzGp7UiccI/AAAAAAAAAcY/EpzCM_dq8Xg/s1600/before.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="178" src="http://4.bp.blogspot.com/-mNubm2DBWsI/VdzGp7UiccI/AAAAAAAAAcY/EpzCM_dq8Xg/s400/before.jpg" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Before</td></tr></tbody></table><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="http://4.bp.blogspot.com/-xKNAJaJdjzk/VdzGp5gyI6I/AAAAAAAAAcc/ew2qLjIskOI/s1600/after.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="178" src="http://4.bp.blogspot.com/-xKNAJaJdjzk/VdzGp5gyI6I/AAAAAAAAAcc/ew2qLjIskOI/s400/after.jpg" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">After</td></tr></tbody></table><br />Additional info:<br /><br /><a href="http://fiji.sc/wiki/index.php/Enhance_Local_Contrast_(CLAHE)">http://fiji.sc/wiki/index.php/Enhance_Local_Contrast_(CLAHE)</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-28428082303785548932015-07-30T14:07:00.001-07:002015-07-30T14:07:50.325-07:00Get Much Smarter About Machine Learning in 2 MinutesThis is a great presentation by <a href="https://twitter.com/stephaniejyee">Stephanie Yee</a> and <a href="https://twitter.com/tonyhschu/">Tony Chu</a>. It is targeted at people new to the concept/field of machine learning. There are excellent animations that make things very clear.<br /><br /><a href="http://www.r2d3.us/visual-intro-to-machine-learning-part-1/">http://www.r2d3.us/visual-intro-to-machine-learning-part-1/</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-14939118163259663092015-04-07T13:12:00.002-07:002015-04-07T13:12:34.937-07:00nolearn and lasagne tutorial<blockquote class="tr_bq">This short notebook is meant to help you getting started with nolearn and lasagne in order to train a neural net and make a submission to the Otto Group Product Classification Challenge.</blockquote><br /><blockquote class="tr_bq"><a href="http://nbviewer.ipython.org/github/ottogroup/kaggle/blob/master/Otto_Group_Competition.ipynb">http://nbviewer.ipython.org/github/ottogroup/kaggle/blob/master/Otto_Group_Competition.ipynb</a></blockquote>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-51855875474820529842015-04-01T13:26:00.000-07:002015-04-01T13:26:14.410-07:00An Intuitive Explanation of Bayes' TheoremThis is a great introduction to Bayes' Theorem and strong evidence that a large majority of medical doctors are not scientists.<br /><br />About 85% of doctors get this problem wrong!<br /><br /><blockquote class="tr_bq">1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?</blockquote><br /><a href="http://www.yudkowsky.net/rational/bayes">http://www.yudkowsky.net/rational/bayes</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-47222358305807370142015-03-13T23:41:00.000-07:002015-03-13T23:41:15.735-07:00quicksort visualizationThis is a great visualization of the quicksort algorithm:<br /><br /><a href="https://www.youtube.com/watch?v=aXXWXz5rF64">https://www.youtube.com/watch?v=aXXWXz5rF64</a><br /><br /><br /><blockquote class="tr_bq">Quicksort (sometimes called partition-exchange sort) is an efficient sorting algorithm, serving as a systematic method for placing the elements of an array in order. Developed by Tony Hoare in 1960, it is still a very commonly used algorithm for sorting. When implemented well, it can be about two or three times faster than its main competitors, merge sort and heapsort.[1]<br />Quicksort is a comparison sort, meaning that it can sort items of any type for which a "less-than" relation (formally, a total order) is defined. In efficient implementations it is not a stable sort, meaning that the relative order of equal sort items is not preserved. Quicksort can operate in-place on an array, requiring small additional amounts of memory to perform the sorting.<br />Mathematical analysis of quicksort shows that, on average, the algorithm takes O(n log n) comparisons to sort n items. In the worst case, it makes O(n2) comparisons, though this behavior is rare.</blockquote><br /><blockquote class="tr_bq"><a href="http://en.wikipedia.org/wiki/Quicksort">http://en.wikipedia.org/wiki/Quicksort</a></blockquote><br /><br /><br />Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-63340192731037375072015-03-13T23:34:00.000-07:002015-03-13T23:41:46.024-07:00The Halting ProblemThis is a great video that describes the <i>halting problem</i>.<br /><br /><br /><a href="https://www.youtube.com/watch?v=92WHN-pAFCs">https://www.youtube.com/watch?v=92WHN-pAFCs</a><br /><br /><br />What is the halting problem you ask?<br /><blockquote class="tr_bq">In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running or continue to run forever.<br />Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist. A key part of the proof was a mathematical definition of a computer and program, which became known as a Turing machine; the halting problem is undecidable over Turing machines. It is one of the first examples of a decision problem.</blockquote><br /><a href="http://en.wikipedia.org/wiki/Halting_problem">http://en.wikipedia.org/wiki/Halting_problem</a><br /><br /><br />Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-91737097315297515602015-02-18T12:34:00.003-08:002015-02-18T12:34:56.549-08:00Easily distributing a parallel IPython Notebook on a clusterI haven't tried this yet, but it is high on my todo list:<br /><br /><a href="http://twiecki.github.io/blog/2014/02/24/ipython-nb-cluster/">http://twiecki.github.io/blog/2014/02/24/ipython-nb-cluster/</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-53235598154956983912015-01-08T13:11:00.002-08:002015-01-08T13:11:48.248-08:00Natural Language Processing with Python<blockquote class="tr_bq">This is a book about Natural Language Processing. By "natural language" we mean a language that is used for everyday communication by humans; languages like English, Hindi or Portuguese. In contrast to artificial languages such as programming languages and mathematical notations, natural languages have evolved as they pass from generation to generation, and are hard to pin down with explicit rules. We will take Natural Language Processing — or NLP for short — in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles. At the other extreme, NLP involves "understanding" complete human utterances, at least to the extent of being able to give useful responses to them.<br />Online book:<br /><a href="http://www.nltk.org/book/ch00.html">http://www.nltk.org/book/ch00.html</a></blockquote><br /><br /><br />NLTK 3.0 documentation:<br /><br /><a href="http://www.nltk.org/">http://www.nltk.org/</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-20480092225319001272015-01-08T13:08:00.003-08:002015-01-08T13:08:56.698-08:00Python for Data Science<blockquote class="tr_bq">This short primer on Python is designed to provide a rapid "on-ramp" to enable computer programmers who are already familiar with concepts and constructs in other programming languages learn enough about Python to facilitate the effective use of open-source and proprietary Python-based machine learning and data science tools.<br /><a href="http://nbviewer.ipython.org/github/gumption/Python_for_Data_Science/blob/master/1_Introduction.ipynb">http://nbviewer.ipython.org/github/gumption/Python_for_Data_Science/blob/master/1_Introduction.ipynb</a></blockquote>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-20278254533781513692015-01-08T13:01:00.004-08:002015-01-08T13:01:25.129-08:00The Great, Big List of LATEX SymbolsLATEX symbols reference:<br /><br /><a href="http://www.rpi.edu/dept/arc/training/latex/LaTeX_symbols.pdf">http://www.rpi.edu/dept/arc/training/latex/LaTeX_symbols.pdf</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-3074069999277592152015-01-08T13:00:00.003-08:002015-01-08T13:00:32.532-08:00Enlightening Symbols: A Short History of Mathematical Notation and Its Hidden Powers<blockquote class="tr_bq">While all of us regularly use basic math symbols such as those for plus, minus, and equals, few of us know that many of these symbols weren't available before the sixteenth century. What did mathematicians rely on for their work before then? And how did mathematical notations evolve into what we know today? In Enlightening Symbols, popular math writer Joseph Mazur explains the fascinating history behind the development of our mathematical notation system. He shows how symbols were used initially, how one symbol replaced another over time, and how written math was conveyed before and after symbols became widely adopted.<br />Traversing mathematical history and the foundations of numerals in different cultures, Mazur looks at how historians have disagreed over the origins of the numerical system for the past two centuries. He follows the transfigurations of algebra from a rhetorical style to a symbolic one, demonstrating that most algebra before the sixteenth century was written in prose or in verse employing the written names of numerals. Mazur also investigates the subconscious and psychological effects that mathematical symbols have had on mathematical thought, moods, meaning, communication, and comprehension. He considers how these symbols influence us (through similarity, association, identity, resemblance, and repeated imagery), how they lead to new ideas by subconscious associations, how they make connections between experience and the unknown, and how they contribute to the communication of basic mathematics.<br />From words to abbreviations to symbols, this book shows how math evolved to the familiar forms we use today.</blockquote><br /><blockquote class="tr_bq"><a href="http://www.amazon.com/Enlightening-Symbols-History-Mathematical-Notation/dp/0691154635/">http://www.amazon.com/Enlightening-Symbols-History-Mathematical-Notation/dp/0691154635/</a></blockquote><br /><br />Article about the book:<br /><br /><a href="http://www.theguardian.com/science/alexs-adventures-in-numberland/2014/may/21/notation-history-mathematical-symbols-joseph-mazur">http://www.theguardian.com/science/alexs-adventures-in-numberland/2014/may/21/notation-history-mathematical-symbols-joseph-mazur</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-60937721201357012582015-01-08T12:35:00.003-08:002015-01-08T12:57:44.878-08:00Quantifying Uncertainty: Modern Computational Representation of Probability and Applications<div class="tr_bq">This is a link to a pdf file containing a tutorial on modeling uncertainty:</div><br /><br /><a href="http://www.wire.tu-bs.de/forschung/talks/06_Opatija.pdf">http://www.wire.tu-bs.de/forschung/talks/06_Opatija.pdf</a><br /><br /><br /><br /><blockquote>Many descriptions (especially of future events) contain<br />elements, which are uncertain and not precisely known.<br /><ul><li>For example future rainfall, or discharge from a river.</li></ul><ul><li>More generally, action from surrounding environment.</li></ul><ul><li>The system itself may contain only incompletely known</li></ul><ul><li>parameters, processes or fields (not possible or too</li></ul><ul><li>costly to measure)</li></ul><ul><li>There may be small, unresolved scales in the model,</li></ul><ul><li>they act as a kind of background noise.</li></ul><br />All these introduce some uncertainty in the model.<br /><ul><li>Uncertainty may be aleatoric, which means random and not reducible, or</li></ul><ul><li>epistemic, which means due to incomplete knowledge.</li></ul></blockquote>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-84266104696050461682015-01-08T12:29:00.001-08:002015-01-08T12:29:27.888-08:00Useful Pandas Features<br />A tutorial on 10 useful Pandas features:<br /><br /><a href="http://manishamde.github.io/blog/2013/03/07/pandas-and-python-top-10/">http://manishamde.github.io/blog/2013/03/07/pandas-and-python-top-10/</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-30549427599872406402015-01-08T12:25:00.004-08:002015-01-08T12:25:55.803-08:00pandas EcosystemIncreasingly, packages are being built on top of pandas to address specific needs in data preparation, analysis and visualization. This is encouraging because it means pandas is not only helping users to handle their data tasks but also that it provides a better starting point for developers to build powerful and more focused data tools. The creation of libraries that complement pandas’ functionality also allows pandas development to remain focused around it’s original requirements.<br /><br />This is an in-exhaustive list of projects that build on pandas in order to provide tools in the PyData space.<br /><br /><a href="http://pandas.pydata.org/pandas-docs/version/0.15.0/ecosystem.html">http://pandas.pydata.org/pandas-docs/version/0.15.0/ecosystem.html</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-51648145235857304462015-01-08T12:22:00.000-08:002015-01-08T12:26:17.739-08:00SeabornSeaborn is a Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.<br /><br /><a href="https://github.com/mwaskom/seaborn">https://github.com/mwaskom/seaborn</a><br /><br />Examples/tutorial:<br /><br /><a href="http://nbviewer.ipython.org/github/mwaskom/seaborn/blob/master/examples/plotting_distributions.ipynb">http://nbviewer.ipython.org/github/mwaskom/seaborn/blob/master/examples/plotting_distributions.ipynb</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0tag:blogger.com,1999:blog-6723679578508908504.post-48221196531318431392015-01-08T12:19:00.003-08:002015-01-08T12:26:28.556-08:00Vincent: A Python to Vega translatorThe folks at Trifacta are making it easy to build visualizations on top of D3 with Vega. Vincent makes it easy to build Vega with Python.<br /><br /><br /><a href="https://github.com/wrobstory/vincent">https://github.com/wrobstory/vincent</a>Denson Smithhttps://plus.google.com/112205246822364946982noreply@blogger.com0