Are you looking for my non-technical blog?

This is now my technical-only blog, my non-technical blog is here.
Showing posts with label Data Science. Show all posts
Showing posts with label Data Science. Show all posts

09 February 2014

Introduction to Data Science

The AskDeveloper group organized this hangout about Data Mining, and Machine Learning in particular. The video is in Arabic, yet the slides are in English.

،شوية دردشة عن تنقيب وتحليل البيانات والتعلم الآلي، الفيديو بالعربي

إضغط على الصورة لمشاهدة تسجيل الحلقة

30 September 2013

A Quick Intro. to NumPy

For me, NumPy is a Python list on steroids. You can use it to create multidimensional arrays and matrices.

The convention is to import it as follows:

import numpy as np

To create an array of numbers between 0 and 9, you could use the following command:

x = range(9)

To convert that list into a NumPy array, you can write:

x = np.array(range(9))

And to make you life easier, there is a shorthand for the above command:

x = np.arange(9)

So far, we have been creating one dimensional array. However, there are ways to reshape the arrays. The reshape() method when applied on an array, it returns a reshaped version of it without changing the original object. To reshape the original object itself, then use resize() instead.

y = x.reshape(2,5)

The above command create a 2-dimensional array of 2 rows and 5 columns. You can create a much dimensional arrays as you want. See the command below for a 3*4*5 array.

y = np.arange(3*4*5).reshape(3,4,5)

The mathematical operations '+', '-', '/' and '*' are applied elementwise.

x = np.arange(10)

# To multiply each element of x by 10
y = x + 10

# To multiply each element of x by itself
y = x + x

To do a Matrix Multiplication though:

# Create a 3 * 5 Matrix
A = np.arange(15).reshape(3,5)

# Create a 5 * 2 Matrix
B = np.arange(10).reshape(5,2)

# Dot product gives you a 3 * 2 Matrix
y = y =, B)

Just like lists, you can get parts of arrays

For original lists:

A = range(10)
A[2:5] # [2, 3, 4]

For NumPy Arrays

B =  arange(10)
B[2:5] # array([2, 3, 4])

However, you can set some elements of the array as follows

B[2:5] = 1337

But, you cannot do the same to lists.

A[2:5] = 1337 # TypeError: can only assign an iterable

For statisticians, there are also the following functions

x = np.arange(5) + 1
x.mean() # 3.0
x.max() # 5
x.min() # 1
x.std() # 1.414

You can also access elements of the array using start, stop and a step:

x = np.arange(10)
x[2:7:2] # array([2, 4, 6])

Or access specific elements, let's say elements 1, 5 and 6

x[[1,5,6]] # array([1, 5, 6])

Similar to reshape() and resize(), ravel() converts a multidimensional array into a one-dimensional array, while transpose() turns rows into columns and vice versa.

If you program in R, you will not miss their way of accessing elements of array that meet a certain condition.

x = np.arange(10)
x[x>4] # array([5, 6, 7, 8, 9])
x[x%2 == 1] # array([1, 3, 5, 7, 9])

If you are having an array of elements that are either True or False.

x = np.array([True, False, True, True])

x.all() # Only True if all elements are True
x.any() # Only True if any elements are True

Finally, there is a repeat() that repeats each element of the array n times

x = np.array([1, 2])
x.repeat(3) # array([1, 1, 1, 2, 2, 2])

That's all folks for today.
Check the following tutorial for more information.

08 January 2013

Time-Series Data Classification

Time-series data are every where. They are important in stock market analysis, eco-
nomics, sales forecasting, and the study of natural phenomena such as temperature and
wind speed. The growing size of such data, as well as its variable
statistical nature, make it a challenging problem for data mining algorithms to predict, classify.

I've written this report as part of my postgraduate degree in data mining program in The University of East Anglia. In it, I focus on time-series data classification by shedding the light on the researches done in this area.

Time-Series Data Classification

26 December 2012

Linkedin's Skills Endorsement is flawed

Linkedin's Skills Endorsement is flawed. I love the idea, but the problem is that the way it suggests for others to endorse their connections is flawed. They only show you the top endorsed skill for one of your friend and invite you to endorse it, and what happens is that (99% of the time), you will just follow their suggestion, rather than selecting the skills your connection really has. In other words, the rich (skills) get richer, and the poor (skills) get poorer.

Another problem, is that I am doing a career shift at the moment, but because most of my connections so far either know me Network Security Engineer or Blogger, my top endorsements are Juniper, Social Media and Networks Security, followed by ones like Firewalls and Twitter. For sure it reflects my expertise, at least the past one, but I would love to see some balance there, and see more endorsements for skills like Python, Machine Learning, Information Retrieval, Data Science, Statistics and the other skills at the tail of my list. One way is form more connection in those new fields so they can reflect my true skills now, but back to the first problem, the Linkedin suggestion system will keep them buried and it will be harder for those skills to get promoted as I want.

Hope they change such suggestion system one day, so my profile reflects what I really am at the moment.