Data Science, Machine Learning & Artificial Intelligence: August 2015

3 Wrong Ways to Store a Password (And 5 code samples doing it right)

Mainly for web development but useful in other contexts as well.

http://adambard.com/blog/3-wrong-ways-to-store-a-password/

Choosing Colormaps

From the matplotlib documentation:

The idea behind choosing a good colormap is to find a good representation in 3D colorspace for your data set. The best colormap for any given data set depends on many things including:

Whether representing form or metric data

Your knowledge of the data set (e.g., is there a critical value from which the other values deviate?)

If there is an intuitive color scheme for the parameter you are plotting

If there is a standard in the field the audience may be expecting

For many applications, a perceptual colormap is the best choice — one in which equal steps in data are perceived as equal steps in the color space. Researchers have found that the human brain perceives changes in the lightness parameter as changes in the data much better than, for example, changes in hue. Therefore, colormaps which have monotonically increasing lightness through the colormap will be better interpreted by the viewer.

http://matplotlib.org/users/colormaps.html

More info from IBM:

http://www.research.ibm.com/people/l/lloydt/color/color.HTM

FastML: a great resource for machine learning

There are many useful articles at this site. It is useful for everyone from novice to advanced:

http://fastml.com/

How to Select the Correct Encryption Approach

This article is a pretty good start at selecting an encryption method.

http://www.itbusinessedge.com/articles/how-to-select-the-correct-encryption-approach.html?google_editors_picks=true

In information theory, entropy (more specifically, Shannon entropy) is the expected value (average) of the information contained in each message received. 'Messages' don't have to be text; in this context a 'message' is simply any flow of information. The entropy of the message is its amount of uncertainty; it increases when the message is closer to random, and decreases when it is less random. The idea here is that the less likely (i.e. more random) an event is, the more information it provides when it occurs. This seems backwards at first: it seems like messages which have more structure would contain more information, but this is not true. For example, the message 'aaaaaaaaaa' (which appears to be very structured and not random at all [although in fact it could result from a random process]) contains much less information than the message 'alphabet' (which is somewhat structured, but more random) or even the message 'axraefy6h' (which is very random). In information theory, 'information' doesn't necessarily mean useful information; it simply describes the amount of randomness of the message, so in the example above the first message has the least information and the last message has the most information, even though in everyday terms we would say that the middle message, 'alphabet', contains more information than a stream of random letters. Therefore, we would say in information theory that the first message has low entropy, the second has higher entropy, and the third has the highest entropy.

https://en.wikipedia.org/wiki/Entropy_(information_theory)

Non-technical article:

http://gizmodo.com/if-it-werent-for-this-equation-you-wouldnt-be-here-1719514472?google_editors_picks=true

Contrast Limited Adaptive Histogram Equalization (CLAHE)

CLAHE is a useful tool for preprocessing images (or video) for computer vision/pattern recognition tasks. It more or less helps you "see" areas of the image that are in shadows.

There are many available implementations of this but I like the one in open CV:

http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_imgproc/py_histograms/py_histogram_equalization/py_histogram_equalization.html

Note: it is usually better to convert images to HSV colorspace first.

Before

After

Additional info:

http://fiji.sc/wiki/index.php/Enhance_Local_Contrast_(CLAHE)

Data Science, Machine Learning & Artificial Intelligence

Tuesday, August 25, 2015

3 Wrong Ways to Store a Password (And 5 code samples doing it right)

Choosing Colormaps

FastML: a great resource for machine learning

How to Select the Correct Encryption Approach

Shannon Entropy

Contrast Limited Adaptive Histogram Equalization (CLAHE)

Pages