Gr33n Data: September 2013

30 September 2013

A Quick Intro. to NumPy

For me, NumPy is a Python list on steroids. You can use it to create multidimensional arrays and matrices.

The convention is to import it as follows:

import numpy as np

To create an array of numbers between 0 and 9, you could use the following command:

x = range(9)

To convert that list into a NumPy array, you can write:

x = np.array(range(9))

And to make you life easier, there is a shorthand for the above command:

x = np.arange(9)

So far, we have been creating one dimensional array. However, there are ways to reshape the arrays. The reshape() method when applied on an array, it returns a reshaped version of it without changing the original object. To reshape the original object itself, then use resize() instead.

y = x.reshape(2,5)

The above command create a 2-dimensional array of 2 rows and 5 columns. You can create a much dimensional arrays as you want. See the command below for a 3*4*5 array.

y = np.arange(3*4*5).reshape(3,4,5)

The mathematical operations '+', '-', '/' and '*' are applied elementwise.

x = np.arange(10)

# To multiply each element of x by 10
y = x + 10

# To multiply each element of x by itself
y = x + x

To do a Matrix Multiplication though:

# Create a 3 * 5 Matrix
A = np.arange(15).reshape(3,5)

# Create a 5 * 2 Matrix
B = np.arange(10).reshape(5,2)

# Dot product gives you a 3 * 2 Matrix
y = y = np.dot(A, B)

Just like lists, you can get parts of arrays

For original lists:

A = range(10)
A[2:5] # [2, 3, 4]

For NumPy Arrays

B = arange(10)
B[2:5] # array([2, 3, 4])

However, you can set some elements of the array as follows

B[2:5] = 1337

But, you cannot do the same to lists.

A[2:5] = 1337 # TypeError: can only assign an iterable

For statisticians, there are also the following functions

x = np.arange(5) + 1
x.mean() # 3.0
x.max() # 5
x.min() # 1
x.std() # 1.414

You can also access elements of the array using start, stop and a step:

x = np.arange(10)
x[2:7:2] # array([2, 4, 6])

Or access specific elements, let's say elements 1, 5 and 6

x[[1,5,6]] # array([1, 5, 6])

Similar to reshape() and resize(), ravel() converts a multidimensional array into a one-dimensional array, while transpose() turns rows into columns and vice versa.

If you program in R, you will not miss their way of accessing elements of array that meet a certain condition.

x = np.arange(10)
x[x>4] # array([5, 6, 7, 8, 9])
x[x%2 == 1] # array([1, 3, 5, 7, 9])

If you are having an array of elements that are either True or False.

x = np.array([True, False, True, True])

x.all() # Only True if all elements are True
x.any() # Only True if any elements are True

Finally, there is a repeat() that repeats each element of the array n times

x = np.array([1, 2])
x.repeat(3) # array([1, 1, 1, 2, 2, 2])

That's all folks for today.
Check the following tutorial for more information.

26 September 2013

Middle East Relationships Infographics

The Radio Free Europe - Radio Libre (RFE/RL) published an infographic summarizing the political relationships between the Middle Eastern countries. The graph comes after a similar one that was made by the Egyptian blogger, The Big Pharaoh (@TheBigPharaoh), and was published in the Washington Post.

I'd like to discuss the two infographics from a design point of view here. So, let me start with the one made by RFE/RL.

The main point of the graph is to show the relationships between those countries, i.e. friends and foes. However, as you can see, it is not possible to tell this from the first look. All dots are the same, black dots of the same size. Well, may be they are inviting us to interact with those dots by clicking on them.

So, there are messages hidden behind the dots, but this is just text. Hmmm, couldn't those same messages be written in an article then, or in a table? What is the use of the graph then?

Why is it that the United Stated is there on one axis but not the other. Same for Iraq. Also relationships are supposed to be symmetrical, yet the chart isn't. You can track the lines between Iran and Israel, but there are no lines between Israel and Iran. I know, this is a sort of redundancy, however, it is either the graph is to be redesigned, otherwise, this way is confusing.

Here comes The Big Pharaoh's graph then.

This times relations are supposed to be clearer from the first look. Different line colours reflect different relationships. In this graph countries are represented by points while the relationships between them are represented by lines. While on the RFE/RF's graph, it was the other way round. Entities are placed in the form of a matrix where dots represents the relationships between them and the countries are represented by the horizontal and vertical lines.

In Jacques Bertin's paper, The Matrix Theory of Graphics, he explained that the network representation (e.g. The Big Pharaoh's graph) is more useful in representing the topographical structure of the elements and how each pair of them are connected on a micro level. While the matrix representation on the other hand is more flexible in reordering the element in order to show how the relationships between elements on a macro level. You can cluster your elements first to show groups of allies and foes. Changing the dots colours or sizes gives you a third dimension to move in. As you can see in the network graph above, lines are cluttered and a bit hard to follow.

Can you sketch a better representation of those relationships then?

Are you looking for my non-technical blog?

30 September 2013

A Quick Intro. to NumPy

26 September 2013

Middle East Relationships Infographics