Taking a look at Parallelism

The other day I needed to improve the performance of an application. I spent some time reviewing the code and found a place where the application needed to loop through several hundred to several thousand records. Within the loop the program reached out to another sever to get data, to map the data to a custom object and to apply some business rules. On my test machine, it was taking 3 to 4 seconds for one loop to complete. Just to process 700 records the application needed to run for over 35 minutes. This is not good. The solution that I settled on was take advantage of Parallelism features within .NET Framework (). The code now executes within about 10 minutes on the test machine.

What is Parallelism?

The basic idea behind parallelism is to run multiple tasks at the same time. A slightly larger definition is to partition a task into a small chunk, execute that chunk on its own thread, and to collate the results together.

What Namespace is Parallelism in?

The namespace System.Threading.Tasks provides the types for working with concurrent and asynchronous code.

The class System.Threading.Tasks.Parallel provides the types for working with parallel loops and regions.

When should I consider using Parallelism?

The times that I have found parallelism useful is when I have perform the same operation multiples times and that operations takes more than minute to complete. Another case has been where I have a parent/child task. There have also been cases where long running tasks have benefited from parallelism.

Setting up and running parallel tasks comes at a cost. The code is more complex to write, maintain, and debug. Parallelism also takes up resources on the computer. I need to make sure that there is a large enough benefit (gain in speed) to offset costs (memory and CPU time).

I need to be aware that each task will be on its own thread so I should not expect the results to return to me in the same order as tasks were started.

Show me an example

The key piece of code for starting tasks.

Parallel.ForEach(numList, num =>
    {
        DoSomeWork(num);
    });

The full code example

using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace ParallelismExample
{
    class Program
    {
        static void Main(string[] args)
        {

            List<int> numList = Enumerable.Range(0, 100).ToList(); ;
            Parallel.ForEach(numList, num =>
                {
                    DoSomeWork(num);
                });

            Console.WriteLine("Done");
            Console.ReadLine();
        }

        static void DoSomeWork(int taskId)
        {
            Thread t = Thread.CurrentThread;

            Console.WriteLine("Task {0} is running on Thread ID {1}", taskId, t.ManagedThreadId.ToString());

            Thread.Sleep(1000);
        }
    }
}

Where can I find more information? (Resources)

Threading in C# by Joseph Albahari

BlackWasp’s series on Parallelism

System.Threading.Tasks Namespace – MSDN Website

Parallel Class- MSDN Website

Parallel Programming in the .NET Framework – MSDN Website

Parallel.ForEach Method – MSDN Website

Learning about CouchDB

To continue my learning about NoSQL databases, I decided to try CouchDB on a Ubuntu 12.10 system using MonoDevelop for an IDE.

Using the Synaptic Package Manger, I installed MonoDevelop. This installed version 3.03.2 of MonoDevelop and supporting libraries. I also installed nunit (2.6.0.1205) at the same time.

My next step was to install CouchDB. I did this from a terminal by issuing the following command.

apt-get install couchdb

To check that CouchDB is operational, I performed two simple quick tests to see if CouchDB was setup. The first was to open a web browser and go to the url http://localhost:5984. This returned a result of

{"couchdb":"Welcome","version":"1.2.0"}

. My next test was to open a terminal and try the command curl http://localhost:5984/ This returned

{"couchdb":"Welcome","version":"1.2.0"}.

It looks like CouchDB is ready to go.

There needs to a be a way to administer and configure this database. The administrative interface is accessed by a web browser. The URL is http://localhost:5984/_utils/. CouchDB calls this interface “Futon”.

More information about Futon can be found at the CouchDB Wiki.

From within Funton, I created my first database named ‘wickfirstcouchdb’. I had to read read the screen carefully, there is a restriction for the database name. It must lowercase alphabetical characters only.

In a web browser, I navigated to http://localhost:5984/wickfirstcouchdb. This returned a value of

{"db_name":"wickfirstcouchdb","doc_count":0,"doc_del_count":0,"update_seq":0,"purge_seq":0,"compact_running":false,"disk_size":79,"data_size":0,"instance_start_time":"1362266551217552","disk_format_version":6,"committed_update_seq":0}

So far everything is looking good and I am ready for the next step. I spent sometime reading the CouchDB Wiki. I found the HTTP Document API to be helpful in gettng my mind around how to work with CouchDB.

In MonoDevelop, I started a new C# console application and named it ‘LearningCouchDB’. In order to work with the data from CouchDB, a third party library to handle the serialization and deserialization of JSON data. For this project, I am using JSON.NET.

The whole point of this project is to learn about CouchDB. So I will not be using good programming practices.

Here is my code

using System;
using System.Net;
using System.IO;
using System.Text;
using Newtonsoft.Json;

namespace LearningCouchDB
{
	class MainClass
	{
		const string COUCHDBURL = "http://127.0.0.1:5984/";

		public static void Main (string[] args)
		{
			GetMyDatabase("_all_dbs");
			GetMyDatabase("wickfirstcouchdb");
			WriteDocument();
			GetDocument("2752d78cc3de4b87136af3850c031547");
		}

		static void GetMyDatabase(string dbName)
		{
			HttpWebRequest request = (HttpWebRequest)WebRequest.Create(COUCHDBURL + dbName);
			request.Method = "GET";

			using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
			{

				Console.WriteLine("Results Get {0}", dbName);
				using (StreamReader reader = new StreamReader( response.GetResponseStream()))
				{
					Console.WriteLine("\t{0}",reader.ReadToEnd());
				}
			}
		}

		static void WriteDocument()
		{
			SimpleDoc docData = new SimpleDoc();
			docData.SiteName = "CNN";
			docData.URL = "http://www.cnn.com";
			docData.Notes = string.Empty;

			string jsonData = JsonConvert.SerializeObject(docData);

			HttpWebRequest request = (HttpWebRequest)WebRequest.Create(COUCHDBURL + "wickfirstcouchdb");
			request.Method = "POST";

			byte[] bytes = UTF8Encoding.UTF8.GetBytes( jsonData );
			request.ContentLength = bytes.Length;
			request.ContentType = "application/json";
			using (Stream dataStream = request.GetRequestStream())
			{
				dataStream.Write(bytes, 0, bytes.Length);
			}

			using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
			{

				Console.WriteLine("Results From Write Document ");
				using (StreamReader reader = new StreamReader( response.GetResponseStream()))
				{
					Console.WriteLine("\t{0}",reader.ReadToEnd());
				}
			}
		}

		static void GetDocument(string id)
		{
			HttpWebRequest request = (HttpWebRequest)WebRequest.Create(COUCHDBURL + "wickfirstcouchdb/" + id);
			request.Method = "GET";

			using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
			{

				Console.WriteLine("Results Get Document {0}", id);
				using (StreamReader reader = new StreamReader( response.GetResponseStream()))
				{
					Console.WriteLine("\t{0}",reader.ReadToEnd());
				}
			}
		}
	}

	public class SimpleDoc
	{

		public string SiteName {
			get;
			set;
		}

		public string URL {
			get;
			set;
		}

		public string Notes {
			get;
			set;
		}

	}
}

 

 

Resources

 

Learning about NoSQL

For most of my programming life, when I needed to store data I turned to an old friend the relational database. Over the years, I have used many different databases like MS Access, MS SQL Server, MySQL, SQLIite, Oracle SQL, and ProgreSQL. I have used some type of relational database for so long it is fairly easy to divide the data into groups, turn the groups into table, establish keys, create relationships, and throw in a few indexes for good measure. Using a relational database for a brand new project is usually fairly nice, I can make the database tables and models fairly close. As the project matures and requirements change, the database structure and the models drift apart in design. The traditional relational database has been good to me over the years. There have been time when a relational database has not been the right choice for the project. Relational databases can be very expensive to setup, maintain and to secure. The question now has become where do I turn when old reliable is not a good fit.

One option is to use a NoSQL database. My knowledge on this topic is fairly limited. Today is the day that I am going to expand that knowledge. Common characters of NoSQL include not using a relational model, runs well on clusters, open-source and schema-less.

When I look under the covers a relational database, I notice that the data is stored as three parts -key, value and a relationship. There is a limitation with this model. It does not allow for the storage of structures like a list or a nested records. As demand on a relational database increases to handle more traffic or more data, there are two common approaches for dealing with this. One is to increase the size of the machine by adding more hard drives, additional memory, and more CPU. This is commonly known as scaling up. The other option is to add servers together into a cluster. All the machines share the work of getting and storing data. This is known as scaling out. A group of machines like this known as a cluster. For a relational database to operate in a cluster it usually shares a common shared disk subsystem. The schema for a database is the metadata about all the parts of database. The metadata describes things like table structures, indexes, keys, views, stored procedures and the like. This allows the relational database engine to enforce structure. One can make a strong argument that the structure of the database and their relationships is a form of business rules.

NoSQL database a general rule don’t use T-SQL syntax to query there data store. Many of the databases allow for a RESTful interface and/or query API. NoSQL runs well in a cluster. There are few options on how to the database can be setup within a cluster. The simplest is to run the database on single server. Sharding is another option. Sharding is splitting the data into different parts and putting each part on its own server. The third option is Master-Slave Replication. Data is written to the Master database and then replicated to the Slaves. Data can be read either the Master or the Slave. This is great for read intensive applications. Another option is to use Peer-to-Peer. In this option a write to any of the node would be replicated to all the others databases in the cluster. This is a good configuration for applications that have a lot of writes or when the application needs to guard against a failure of any single node. The last option is to combine Sharding with Replication. This gives a lot of options for how to setup a database for you application. NoSQL are known for being schema-less. This means the each record does not have to be the same as any other record in the database.

NoSQL databases are categorized by the way they store data. Common categories are key-value, document, and graph. Key-value databases store just two pieces of data for a record. The key or Id and the value either a primitive data type or an object. Document databases are built around the concept of a document. The document is are self-describing hierarchical data structures. Graph Databases allow entities and the relationships between entities to be stored.

Examples of Key-Value databases

When to use Key-Value databases

  • When needing to store Session
  • When needing to storing User Profiles
  • When needing to storing Preferences
  • When needing to storing Shopping Cart Data

When not to use Key-Value databases

  •  When there is a relationship between data sets
  • When multioperation transactions are required
  • When there is a need to search by the value or data
  • When there is a need to operate upon multiple keys at the same time

Examples of Document databases

When to use Document databases

  • For event logging
  • For Content Management Systems
  • For Analytics
  • For Blogging Platforms
  • For E-Commerce Applications

When not to use Document databases

  • When there is a need for complex transactions spanning different operations
  • When needing to search against the content of the document

Examples of Graph databases

When to use Document databases

  • When there is connected data
  • When building routing services
  • When building recommendation engines

When not to use Document databases

  • When you want to update a subset of entities.
Tags:

A quick look at Cloud OS and related terms

I have seen the words Cloud Operating System (Cloud OS), Web OS, and web desktop in blog posts and news articles recently. With the way the terms are kicked around one could easily assume that they mean the same thing. This led me on a quest of find out a bit more. After reviewing Wikipedia for the terms, there are a relationship between the terms along with differences. ‘Cloud (operating system)’ was defined as a “browser-based operation system created by Good OS LLC.” The idea was to provide a way to perform simple tasks without a full operating system. The next term I checked was Web operating system (webOS). The first thing that jumps out at me is the warning at the top of the page cautioning the reader not to confuse Web operating system with Online OS and Web desktop. The article point to webOS as a “network services for Internet scale distributed computing.” The article also points out that term webOS is being used broadly and with many different meanings. The term Online OS is defined as a “web desktop written in JavaScript using Ajax.” “It provides basic services such as a GUI, a virtual file systemaccess control management and possibilities to develop and deploy applications online.” The last term I looked was Web desktop (webtop) which Wikipedia defines as “a desktop environment embedded in a web browser or similar client application.”

Advantages of a Cloud OS

  • Mobility – Access your desktop anywhere you have connectivity
  • System Management – Software, drives and patches can be applied to all users at the same time.
  • Collaborative – easy to share files with other people, devices and services
  • Scale – easy to add additional resources like storage

Disadvantages of a Cloud OS

  • Speed – All the resources are shared among the desktop
  • Connectivity – You have to be connected to use the desktop
  • Bandwidth – Your connection to the Internet is not always 100% reliable or available

 Cloud OS / Webtop offerings

Windows Azure Media Services go GA

On the 22 of January 2013, Microsoft announced that general availability (GA) for Windows Azure Media Services.

One work flow is to upload a raw video file, encode into a web consumable format, make the file available for wide distribution and finally consume the file.

Upload of the file is straight forward. This is can been done over HTTP(S) using the REST API or one of the SDKs. The file is placed within Windows Azure blob storage. There is functionality to allow for bulk uploading of files and transfer files between storage accounts.

Windows Azure Media Services supports number of encoding formats. The specifics can be found within the document named ‘Supported File Types for Media Services.’

For delivery, Windows Azure Media Services has added a feature called ‘dynamic packaging.’ This feature allows for one file to be stored and stream the content to many adaptive protocol formats. When compared to the traditional model of encoding a file into multiple formats before distribution. It is easy to see a cost savings in storage and managing the files.

The last piece is to consume or play the file. Window Azure Media Services provides a set of client player SDKs. This makes it possible to build rich media application for: Windows 8, iOS, Xbox, Sliverlight, Window Phone, and Android.

Pricing for seems reasonable. Encoding services start at $1.99 per GB for the first 5 TB / Month. Microsoft is also offering Encoding Reserved Units. This allows you to purchase parallel processing for media tasks.

Resources

 

Videos from Windows Azure Conf 2012

Channel 9 has the videos from the Window Azure Conference occurred on the 14th of November. I am so looking forward to watching several of these videos.

 

A different way to view setting options in Windows 8

The other day, I came across a blog post “How To Activate Windows 8 GodMode” on the C# Corner website. The blog shows a way to create a special folder that allows the user to see various setting options in one large list. The trick is to create a new folder on the desktop then rename it. On my machine, I set the folder up with then name “Advanced Settings.{ED7BA470-8E54-465E-825C-99712043E01C}”. I did not use the name GodMode because the name implies elevated privileges. I think the name “Advanced Settings” gives a better idea of what the folder contains.

Videos from Build 2012

Channel 9 has posted videos from Build 2012. The following is a list of talks involving Windows Azure.

 

 

Microsoft Announces Enhancements to Windows Azure SQL Database

There was a recent announcement (19 Sept. 2012) from Microsoft for enhancements to Windows Azure SQL Database.

Summary of Enhancements

  • Windows Azure SQL Database can be used in Linked Server and Distributed Queries
  • Recursive Triggers are now supported within Windows Azure SQL Database
  • DBCC SHOW_STATISTICS comes to Windows Azure SQL Database
  • Changes to the Windows Azure SQL Database firewall can be made at the database level and at the server level

My Notes and Thoughts

I am looking forward to using these enhancements in future projects. The support for Linked Server against Windows Azure SQL Database should be useful in gathering data for reporting and analysis. It also seems that linked servers could be very useful in creating hybrid solutions.

As a general rule, I don’t use triggers very often because it is too easy to introduce unintended side effects. I am not sure how often I will need recursive trigger but it is nice to know the feature is there when it is needed.

DBCC SHOW_STATISTICS is one of the enchantments that I am surprised it has taken so long to add. This have been a very useful feature in a normal SQL Server to assist with performance tuning.

Changing of firewall rules at the database level is another enhancement that I am not sure how often I will use but I can see where it would be very useful.

Resources

Announcing Updates to Windows Azure SQL Database

Windows Azure SQL Database Firewall

Healthcare As A Service (HaaS)

At the end of July, Microsoft announced that they are offering customers and partners a HIPAA Business Associated Agreement for Windows Azure. This makes is it possible for companies that build software for the healthcare industry to take advantage of Windows Azure. One can reasonable expect that over the next few months a lot more buzz and marketing material will be generated over HaaS.

Porticor.com has a great article “Cloud Encryption and Healthcare ‘as a Service’ solutions.” The article is focused on cloud security within healthcare application as a service. The section about data ownership really caught my attention. Healthcare data is a tricky subject. As an individual I want my healthcare information kept private between my doctor and myself. As a programmer who has spent a number of years working for health insurance companies, being able to mine the data allows for medical professionals to get a different look at how an individual’s health compares to other with similar conditions and to general population. The data could allow medical professionals to identify trends much earlier.