Learning Python

Another learning adventure is Python! I thought this language was slow and not suitable for “real projects”, but I decided to give a look at it this summer when I read the article Meeting the EMEA MongoDB Masters – Nicola Iarocci.

eveIt started with Nicola saying: “I’m a passionate Python, JavaScript and C# developer” and this lead me to think… I love MongoDB, I know C# and Javascript already, let’s give Python a try! At the same time I was working on a REST API Portal in 4ward, based on ServiceStack and C#, while Nicola is creating a great REST Framework in Python: Eve

Steps I followed:

pluralsight

Video on Pluralsight: As always with new “things” I started looking for a Pluralsight course and here it is: Python Fundamentals. Great training course, starting from easy topics, up to a good level of knowledge.


learning-python-5thBooks: I love reading books so I took a lot of them to gain more knowledge. Here the list:

  • Learning Python 5th Edition: this is the bible, 1542 pages. I use it when I want to go in deep on a specific topic. Maybe it’s a bit too much.
  • Python Cookbook 3rd Edition: a cookbook with any kind of recipes. Very useful to find solutions for specific problems.
  • Python 3 Object Oriented Programming: this is quite advanced and offers common OOP techniques and design patterns applied to Python

Then there’s a list of books that shows usage of Python in specific areas like:

  • Mining the Social Web Second Edition: great book, that will show you how to use iPython to get, analyze and visualize information from social media
  • Python for Data Analysis: this shows that Python can be used for Data Analysis with specific libraries like numpy, pandas, matplotlib and scipy
  • Learning iPython for Interactive Computing and Data Visualization: this will guide you in the realm of iPython, an highly improved Python console and web notebook, that let you write and execute Python code form your browser

pythonide

  • IDE: I tried 2 ways here:
    • Visual Studio: I love it! It’s great, extensible and full of features. You can write Python code with it and run it inside .NET VM together with C# or VB.NET code! All this is possible using IronPython
    • PyCharm: from Jetbrains, the creators of Resharper, WebStorm, IntelliJ. Perfect Python IDE, that can help programmers with refactoring, autocompletion, hints and a lot of productivity tips.
    • VIM: I know there is VIM… I also tried it in Visual Studio with this grate VsVIM extension, but it takes a lot of time to learn it, and now I want to spend this time to learn other things

After this I can say that I learned Python but… I decided to do another new crazy thing and I enrolled to a Python course of the Rice University (Houston) through Coursera. Don’t you know what is Coursera? This will be a topic for another post. The course is An Introduction to Interactive Programming in Python. After what I did so far with Python, this seems quite easy, but the idea of doing some simple games to learn is funny. I’ll go on with it!

Learning Docker using Windows Azure

I decided to start my learning adventure with something useful to create a lot of things on top of it: Docker.

Docker (https://www.docker.io/) “Docker is an open-source project to easily create lightweight, portable, self-sufficient containers from any application. The same container that a developer builds and tests on a laptop can run at scale, in production, on VMs, bare metal, OpenStack clusters, public clouds (Yes Windows Azure is one of them) and more.”

I did it! I installed a virtual machine running Ubuntu 12.04 and I followed tutorial and documentation to create some containers with:

  • MongoDB
  • Redis

I played a bit with Dockerfile in order to easily automate docker container creation.

Docker is a really promising Open Source project that can lead to a new way of deploying and maintaining production software as much as development machines.

Would you like to give it a try using Azure so that you can use a lightning fast internet connection? Let’s do it.

First of all let’s create an Ubuntu VM that we’ll use for testing Docker. Connect to Azure and create a standard Ubuntu 12.04 VM, but same applies to 13.04.Image(1)

As soon as the VM starts connect using ssh, if you’re using Windows OS you can use Putty

Now you can easily follow instructions from Docker site to install it on an Ubuntu distribution: http://docs.docker.io/en/latest/installation/ubuntulinux/

Here a short command list:

# install the backported kernel
sudo apt-get update
sudo apt-get install linux-image-generic-lts-raring linux-headers-generic-lts-raring
# reboot
sudo reboot

# Add the Docker repository key to your local keychain# using apt-key finger you can check the fingerprint matches 36A1 D786 9245 C895 0F96 6E92 D857 6A8B A88D 21E9
sudo sh -c “curl https://get.docker.io/gpg | apt-key add -”
# Add the Docker repository to your apt sources list.
sudo sh -c “echo deb http://get.docker.io/ubuntu docker main > /etc/apt/sources.list.d/docker.list”
# Update your sources
sudo apt-get update
# Install, you will see another warning that the package cannot be authenticated. Confirm install.
sudo apt-get install lxc-docker

Docker installation complete!

Now let’s check if everything is working properly

# download the base ‘ubuntu’ container and run bash inside it while setting up an interactive shell
sudo docker run -i -t ubuntu /bin/bash

You’re now running commands insider a docker container!
# type ‘exit’ to exit

exit

From here I learnt many Docker aspects following official documentation:

  • Using Dockerfile to build images
  • Using Public Repository to get or publish a custom image
  • Configuring images with common services like MongoDBRedis or ElasticSearch

MongoDB $exists and indexes

In these days I’m playing a lot with MongoDB on Azure to store and analyze large set of data.

Problem:

a collection with more than 3M records and I needed to compute all documents and mark them with a status. The preexisting schema hasn’t a field available for tracking.

First try:

let’s try a quick solution to get documents starting from older documents without “tracked” field. With query below:

db.requestlogentries.find({ "tracked" : { $exists : false } }).limit(50).sort({ "_id" : 1 }).explain();

at the beginning it was lighting fast because documents retrieved where all without “tracked” fields, but after few minutes of marking them as “tracked” : “completed”, a big performance hit raised.

Explain result below:

{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 50,
        "nscannedObjects" : 801877,
        "nscanned" : 801877,
        "nscannedObjectsAllPlans" : 1603755,
        "nscannedAllPlans" : 1603755,
        "scanAndOrder" : true,
        "indexOnly" : false,
        "nYields" : 113,
        "nChunkSkips" : 0,
        "millis" : 11155,
        "indexBounds" : {
        },
        "server" : "mongotest:27017"
}

The problem is related to the fact that we’re are first ordering by _id, but all first elements now contains the “tracked” field so a BasicCursor will be used to find right elements, so this query is taking around 11 seconds to return results!

Second try:

let’s slightly change the query, getting rid of sort by _id.
Query below:

db.requestlogentries.find({ "tracked" : { $exists : false } }).limit(50).explain();

And Explain result:

{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 50,
        "nscannedObjects" : 640853,
        "nscanned" : 640853,
        "nscannedObjectsAllPlans" : 640853,
        "nscannedAllPlans" : 640853,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 29,
        "nChunkSkips" : 0,
        "millis" : 2822,        
        "indexBounds" : {

        },
        "server" : "mongotest:27017"
}

As you can see now a BasicCursor is used directly without ordering and query is taking around 3 seconds. Still slow, but much better than first solution.
If document average size is not too big you can increase the total number of fields returned without impact on server side. Here same query with 150 documents returned.

{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 250,
        "nscannedObjects" : 682241,
        "nscanned" : 682241,
        "nscannedObjectsAllPlans" : 682241,
        "nscannedAllPlans" : 682241,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 18,
        "nChunkSkips" : 0,
        "millis" : 2103,
        "indexBounds" : {

        },
        "server" : "mongotest:27017"
}

Third try and real solution:

let’s create an index (in background here to avoid locking  and see the difference. Just keep in mind that index will have a small performance hit during write phase on this collection.

db.requestlogentris.ensureIndex( { tracked: 1}, {background: true} )

Query:

db.requestlogentries.find({ "tracked" : { $exists : false } }).limit(100).explain()

Explain:

{
        "cursor" : "BtreeCursor tracked",
        "isMultiKey" : false,
        "n" : 100,
        "nscannedObjects" : 100,
        "nscanned" : 100,
        "nscannedObjectsAllPlans" : 100,
        "nscannedAllPlans" : 100,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "tracked" : [
                        [
                                null,
                                null
                        ]
                ]
        },
        "server" : "mongotest:27017"
}

Clearly this is the way to go. Without index you can’t survive with many data. This may seems obvious but many many programmers don’t consider this aspects until they meet a performance hit.

P.S.: I use the data retrieved by this query to compute some pre-aggregation collections using C# and some auto-scale Azure Worker Roles, but this is a topic for a future post. Stay tuned!

Learning never ends

learningneverends600If you think that learning ends with the last day of school, you’re crazy. That’s only the beginning of a new journey. You finish with all tools needed to start the real learning path and you’ll never reach the end of it.

I’ve always been an avid learner in my life and I’ll never stop being one. I love studying, reading and creating new things.

I decided that it’s time to set some goals and start to learn something new each week. I love computers so many paths will be in this realm, but learning should be about anything. I like startups, entrepreneurship, running, math, statistics, writing, design, big data and machine learning.

So let’s start this adventure that will lead me outside of my comfort zone. I’ll track my discoveries in this blog.

Italia: la nazione che faceva sognare le persone

Cos’è successo nella nazione che faceva sognare le persone?The Colosseum in Rome, Italy

Si’, sto parlando dell’Italia! I suoi colori, i suoi paesaggi, la sua atmosfera, le sue città piene di storia e la passione dei suoi abitanti. Un mix magico che aiuta felicità e creatività e che ha contribuito a creare il Made in Italy in passato.

Cos’è successo a quest’Italia? E’ persa per sempre?

L’anno scorso ho partecipato a DEF2012, uno splendido evento dedicato all’ecosistema delle startup.  Organizzato dall’Ambasciata americana.Nell’aria c’era elettricità, passione, creatività e il desiderio di creare qualcosa di nuovo! Voglio veramente ringraziare l’ambasciata americana ed in particolare l’Ambasciatore Thorne, per questo viaggio nella terra dei sogni che ora sono realtà. C’erano Startupper venuti direttamente dall’America per raccontarci le loro storie, condividere le loro esperienze e renderci partecipi delle loro avventure. Era un evento per noi! Le persone che vivono “nella nazione che faceva sognare le persone”, per mostrarci che la costanza, la passione e la cooperazione sono necessari per plasmare i nostri sogni e farli diventare realtà.

Che lezione ho imparato? Noi siamo gli artefici del nostro destino!

Niente nella vita ci viene dato su un piatto d’argento, nulla è dovuto. Dobbiamo lavorare e lavorare sodo per raggiungere i nostri risultati. Non ci sono scorciatoie.

“Gli italiani sono considerati i lavoratori più creativi del mondo”

Questo è stato detto durante il DEF2012. Noi dovremmo creare, innovare, condividere le idee e smettere di perdere tempo ed energie lamentandoci dei politici, della crisi economica, della mancanza di denaro, di un futuro incerto o di altre cose negative.

Questa è una preghiera per tutti, ma in particolare per i giovani: smettete di lamentarvi della realtà e ricominciate a sognare! Costruite i vostri sogni, rendeteli reali nella “terra della felicità” (definizione dell’Italia di @maxciociola), lavorate duro per creare il vostro futuro, fate piccoli passi per creare un nuovo ecosistema italiano e vedrete che…

L’Italia tornerà ancora ad essere la nazione che fa sognare le persone!

Italy: the country that made people dream

The Colosseum in Rome, Italy”What is left of the country that made people dream?”

Yes, Italy was that country! For its colors, its landscapes, its atmosphere, its ancient cities, its history and the passion of its inhabitants. A magical mix for happiness and business creativity that shaped the Made in Italy in the past. What happened to this Italy? Is it lost forever?

Last year I attended the DEF2012 event on startup ecosystem organized by US Embassy. Superb event! There was electricity in the air, passion, creativity, desire to make things happen! Many thanks to US Embassy and in particular to Ambassador Thorne, for this journey in the land of dreams that are now reality.
Startuppers came from USA to tell us their stories, share their experiences and make us part of their adventures. It was for us, the people living in ”the country that made people dream”, to show that perseverance, passion and cooperation are needed to shape dreams and make them reality. Lesson learned? We are master of our future. Nothing in life is served on a silver plate, you have to work and work hard to reach your goals. There are no shortcuts.

“Italians are considered among the most ”creative business people” in the world” this has been said in DEF2012. We should create, innovate, share ideas and stop losing time and energy complaining about politicians, economic crisis, lack of money, lack of future or any other negative thing.

This is a prayer for anyone, but especially for young people: stop complaining about reality and start dreaming again! Build your dreams and make them real in the ”land of happiness” (Italy definition by @maxciociola), work hard to shape your future, make small steps to create a new italian ecosystem and

Italy will become again: the country that makes people dream!

ServiceStack: a basic paging and sorting implementation

I’ve created a QueryBase class in order to support Paging and Sorting when needed.


public class QueryBase
 {
    public string Sort { get; set; }
    public int PageNumber { get; set; }
    public int PageSize { get; set; }
 }

If a class supports these features, it’ll simply extend it like this:

public class Cars: QueryBase, IReturn<CarsResponse>
{
}

public class CarsResponse : IHasResponseStatus
{
    public List<Car> Cars { get; set; }
    public ResponseStatus ResponseStatus { get; set; }
}

In order to fill QueryBase from querystring I’ve created a RequestFilterAttribute that can be used when needed:


public class QueryRequestFilterAttribute : Attribute, IHasRequestFilter
{
    #region IHasRequestFilter Members

    public IHasRequestFilter Copy()
    {
        return this;
    }

    public int Priority
    {
        get { return -100; }
    }

    public void RequestFilter(IHttpRequest req, IHttpResponse res, object requestDto)
    {
        var request = requestDto as QueryBase;
        if (request == null) { return; }
        request.PageNumber = req.QueryString["pageNumber"].IsEmpty() ? 1 : int.Parse(req.QueryString["pageNumber"]);
        request.PageSize = req.QueryString["pageSize"].IsEmpty() ? 15 : int.Parse(req.QueryString["pageSize"]);
        request.Sort = req.QueryString["sort"].IsNullOrEmpty() ? "id" : req.QueryString["sort"];
    }

    #endregion
}

Everything is working properly my only concern at the moment is finding a way to build a generic validation logic on this QueryBase class. ServiceStack Validation features works with subclass type without considering baseClass.
I posted a Q&A on Stackoverflow here to get the best possible solution for this problem.

In the meantime QueryBaseValidator modified as follows:


public class QueryBaseValidator<T> : AbstractValidator<T> where<  T : QueryBase
{
     public QueryBaseValidator()
    {
        RuleFor(query => query.PageSize).LessThanOrEqualTo(100).GreaterThan(0);
    }
}

Additional validator created for subclass Cars

public class CarsValidator : QueryBaseValidator<Cars>
{
}

In this way everything works and I’ve now a basic implementation of generic paging, sorting and very soon basic query with ServiceStack.

Upgrade a standalone MongoDB 2.2 to 2.4 on Linux CentOS in Azure

Here follows the procedure I followed to update my Mongo 2.2 instance in Azure on Linux CentOS

    1. Connect to you Linux machine through ssh
    2. First of all a backup of everything is needed. Go in a folder where you’ve write permissions and run:
      mongodump
    3. Verify presence of mongo update running:
      yum list updates
    4. You should get an output similar to the following:
      Loaded plugins: security 
      Updated Packages 
      mongo-10gen.x86_64                      2.4.0-mongodb_1                10gen 
      mongo-10gen-server.x86_64               2.4.0-mongodb_1                10gen
    5. Let’s start upgrading with:
      sudo yum update
    6. Now let’s restart the mongod service
      service mongod start
    7. That’s all! Really!

Running Redis 2.6 on a CentOS Linux in Azure – Part 2: Redis Installation

If you followed part 1, now you have a CentOS Linux up and running in Azure and you’re ready to start Redis installation.

Connect to https://manage.windowsazure.com from Virtual Machines section verify that you redis test server is running.

skitch [6]

Now let’s connect to your server. I will do everything for a MacOS terminal, but you can use Putty on Windows as described here: http://www.windowsazure.com/en-us/manage/linux/how-to-guides/ssh-into-linux/

SSH port has been mapped automatically for you during the provisioning process. Open detail of the machine and select Endpoints from the upper menu.

Take note of the public port to be used for connection

skitch [5]

From a terminal you can connect using following instruction, replacing user, server port and key filename as needed:
ssh redistest@redistest1.cloudapp.net -p 60659 -i ivanPrivateKey.key

You’re now connected to your Linux in Azure!

skitch [4]

Now let’s install Development tools in order to be able to compile everything and here we have a problem.

The normal way to do this is to use terminal with following command:
sudo yum group install “Development tools”

If you try you’ll receive errors related to missing kernel header…

Let’s fix them:

  1. from terminal run: sudo vi /etc/yum.conf
  2. then comment kernel part: #exclude=kernel*
  3. Now try to run again sudo yum group install “Development tools” but you can run in another issue: Error Downloading Packages kernel-headers-2.6.32-279.22.1.el6.x86_64: …
  4. Here the problem is that kernel used is: Linux version 2.6.32-279.14.1.el6.openlogic.x86_64 (use command: dmesg | grep “Linux version” to verify) while packages searched from yum are more recent (22).
  5. Let’s fix it. From terminal run:
    1. sudo yum install kernel-headers-2.6.32-279.14.1.el6.openlogic.x86_64
    2. sudo yum install kernel-devel-2.6.32-279.14.1.el6.openlogic.x86_64

Now try again from terminal: sudo yum group install “Development tools”

You did it! We are now ready for Redis!

From a browser connect to: http://redis.io/download and copy link to latest version available at time of this post it is: http://redis.googlecode.com/files/redis-2.6.10.tar.gz

Connect to you remote server using SSH as described in part 2: from a terminal you can connect using following instruction, replacing user, server port and key filename as needed:
ssh redistest@redistest1.cloudapp.net -p 60659 -i ivanPrivateKey.key

From terminal run:

wget http://redis.googlecode.com/files/redis-2.6.10.tar.gz (link taken in step 1)
tar xvf redis-2.6.10.tar.gz
cd redis-2.6.10
make

Redis is now built and ready to run from src folder. In production you should run make install and then prepare a redis.conf file starting from the sample located in the redis-2.6.10 folder

From terminal start redis:
cd src
./redis-server

In order to use it remotely (for testing purpose) we have to open ports from Azure portal

Connect to https://manage.windowsazure.com

Select Virtual Machines on the left and your redis server on the right

Select Endpoints from upper bar and click Add Endpoint in the bottom bar

skitch [3]

Press next to Add Endpoint

Configure name and port 6379 and Confirm

skitch

Wait for the endpoint configuration to be completed

We are now ready to test everything using a local redis installation

  • On Linux/MAC follow instructions in step 3 to have a local Redis 2.6 installation
  • On Windows: you can download the unofficial Windows Redis binary package on Github

You can now use following command to test your installation:
redis-benchmark -h redistest1.cloudapp.net

Congrats your Redis 2.6 is up and running on CentOS Linux in Azure!

Running Redis 2.6 on a CentOS Linux in Azure Iaas – Part 1: Server Provisioning

After several attempts I found a good way for installing Redis 2.6 on Azure. I started following post from Thomas Conté but some steps did not work as expected.

In this first part of the blog post we’ll see how to configure a CentOS 6.3 Linux in Azure in order to be ready for Redis installation (Part 2)

Note: I did it from a MacOS so don’t worry. Azure is really multi platform on server and client side.

Before starting you need: Azure subscription with Virtual Machines & Virtual Networks preview activated (see below step-by-step)

This section describes how to activate the preview feature needed to use IaaS on Azure. Skip it if not needed.

skitch [15]

  • Sign up on a subscription (note: you have to activate preview features in each subscriptions where you need them)

skitch [14]

  • Confirm and wait for process completion
  • You’re now ready to start! Click on Portal

In order to start with Linux installation process. Connect to https://manage.windowsazure.com and follow instructions below:

  • Click on Virtual Machines on the left side

skitch [12]

  • Click on Create a Virtual Machine
  • Click on From Gallery

skitch [11]

  • Select CentOS 6.3 and Next

skitch [10]

  • In order to create SSH keys follow this link
  • Select DNS name, storage account and Region (or affinity group) and click next

skitch [9]

  • In the last screen leave None for Availability Sets (not needed in this sample) and confirm

skitch [8]

Ok, time for a break! Go to take a coffee and in the meantime your virtual machine will be provisioned and you’ll be ready for part 2.

skitch [7]

You are ready for Redis installation – Part 2!

Azure, ServiceStack, API, MongoDB, Big Data and Machine Learning