<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Statistically incorrect &#187; design</title>
	<atom:link href="http://anomalizer.net/statistically-incorrect/tag/design/feed/" rel="self" type="application/rss+xml" />
	<link>http://anomalizer.net/statistically-incorrect</link>
	<description>Statistically incorrect</description>
	<lastBuildDate>Sun, 18 Dec 2011 17:26:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>In search of a job queue</title>
		<link>http://anomalizer.net/statistically-incorrect/2010/12/in-search-of-a-job-queue/</link>
		<comments>http://anomalizer.net/statistically-incorrect/2010/12/in-search-of-a-job-queue/#comments</comments>
		<pubDate>Mon, 20 Dec 2010 16:40:13 +0000</pubDate>
		<dc:creator>Arvind Jayaprakash</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[job queue]]></category>

		<guid isPermaLink="false">http://anomalizer.net/statistically-incorrect/?p=59</guid>
		<description><![CDATA[Job queues are an essential element in internet application design to execute long running tasks. This stems from the fact that web servers and consequently web applications are best suited for interactive applications. If a particular operation that needs to be carried out fits the description below, then it makes a good case for the [...]]]></description>
			<content:encoded><![CDATA[<p>Job queues are an essential element in internet application design to execute long running tasks. This stems from the fact that web servers and consequently web applications are best suited for interactive applications. If a particular operation that needs to be carried out fits the description below, then it makes a good case for the usage of job queues</p>
<ul>
<li>For the most part, these tasks need to be carried out near instantaneous fashion. Specifically, the commencement of execution of the task should happen at the earliest possible time. The expectation of completion however depends on the nature of task.</li>
<li>If the task cannot be executed immediately for some reason, it should be queued up and processed later.</li>
<li>Executing the task is resource intensive in some form or the other.</li>
<li>Tasks once submitted must not be dropped to the extent possible.</li>
</ul>
<p>The desirable features of a job queue are as follows:</p>
<ol>
<li>The job queue should have durability.</li>
<li>The queue should support multiple producers &amp; consumers. These queue operations would be performed across hosts.</li>
<li>The draining of the queue should happen as quickly as possible. If a new task gets added to the system and there are idle consumers, the consumption should commence at the earliest.</li>
</ol>
<p>The first two points are well addressed by an RDBMS solution. However, it struggles to achieve the third point since there is no inherent notification mechanism and aggressive polling is the closest solution but it does not scale very well. A message queue is good at supporting the last two points but trying to maintain a credible state is exceptionally hard. Interestingly, the popular job queue solutions out there choose to use either an RDBMS (example: <a href="http://gearman.org/">gearman</a>) or an MQ (example: <a href="http://celeryproject.org/">celery</a>). However, a mix of both seems to be the right answer. I shall briefly describe what looks like.</p>
<h3>Adding a new task</h3>
<ul>
<li>Add a new element to your data store. This element should represent every aspect of the task such as the task type, the task details and also task management data such as execution status. A unique id must also be generated by the producer before adding the task to the store. Failure to make this entry is considered as failure to accept the job.</li>
<li>A notification event is sent out a message queue. The notification contains the task type and task id.</li>
</ul>
<h3>Processing a task, the normal case</h3>
<ul>
<li>A pool of consumers is actively waiting for notification of a new task and starts working the moment it gets a notification. The delivery mechanism of the notification can be configured to either exactly one or at least one consumer based on what looks like the right trade-off.</li>
<li>The consumer checks with the data store and manipulates it accordingly to indicate that it has voulenteered to perform the task.</li>
<li>When it is done processing the task (either successfully or unsuccessfully), it updates the store with the outcome.</li>
</ul>
<h3>Processing a task, the abnormal cases</h3>
<ul>
<li> The notification message could have gotten lost and not reached any consumer for a variety of reasons. It is necessary to <em>sweep</em> the job queue periodically for any unprocessed tasks and trigger its execution. The latencies associated with this is comparable to a pure RDBMS based queue. Specifically, the need to scan by the value of a field (task status) ni addition to the normal access pattern based on id is what makes RDBMS a convenient choice.</li>
<li>Semi-completed and also failed tasks may have to be retried depending upon the semantics of the task at hand. This might require a back-off mechanism which will effectively need a scheduler. In such situations, the scheduler needs to be held outside of the job queue to achieve clear separation of responsibilities.</li>
</ul>
<p>So far, I have not been able to find any open source solution that seems to follow the above approach. If you know of any, do let me know. Else I get down to implementing one.</p>
]]></content:encoded>
			<wfw:commentRss>http://anomalizer.net/statistically-incorrect/2010/12/in-search-of-a-job-queue/feed/</wfw:commentRss>
		<slash:comments>137</slash:comments>
		</item>
		<item>
		<title>Why you can&#8217;t always just throw more hardware at it</title>
		<link>http://anomalizer.net/statistically-incorrect/2009/02/throwing-in-more-hardware-is-not-panacea/</link>
		<comments>http://anomalizer.net/statistically-incorrect/2009/02/throwing-in-more-hardware-is-not-panacea/#comments</comments>
		<pubDate>Fri, 06 Feb 2009 14:30:20 +0000</pubDate>
		<dc:creator>Arvind Jayaprakash</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[cs fundamentals]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://anomalizer.net/statistically-incorrect/?p=4</guid>
		<description><![CDATA[A long time ago, people used to worry about the efficiencies of software they used to write. Then came a time when processors just kept getting faster every month the pace wouldn&#8217;t slow down even after crossing the 500MHz mark. Somewhere around this time, people started writing exceptionally bloated software and the bloat started to [...]]]></description>
			<content:encoded><![CDATA[<p>A long time ago, people used to worry about the efficiencies of software they used to write. Then came a time when processors just kept getting faster every month the pace wouldn&#8217;t slow down even after crossing the 500MHz mark. Somewhere around this time, people started writing exceptionally bloated software and the bloat started to grow at a phenomenal pace. Then came the new catch phase <q>hardware is cheap, we can throw more hardware at it</q>. And in one magic swoop, all bloatware became perfectly acceptable since the bloat now seemed to be affordable.  And this was precisely the point wherein most people forgot their CS fundamentals. If you have done a course on CPU scheduling, you would know these metrics:</p>
<ol>
<li>CPU utilisation</li>
<li>Throughput</li>
<li>Turnaround time</li>
<li>Waiting time</li>
<li>Response time</li>
</ol>
<p>I will take up web application space as an example in the remainder of the discussions since it has a fairly large development community and also because it is littered with bloatware + hardware is cheap mentality.  In web applications, the consumer is usually worried about response times and turnaround times. Let us say there is solution <em>A</em> wherein it takes a full second for the server to process a single web request and solution <em>B</em> that takes 50 milliseconds to process a single web request. A very misplaced number that people chase is <em>requests/second</em> and this is solved using the now infamous <strong>throw more hardware</strong> approach. Focus on throughput works in businesses when your consumers have nowhere else to go and your notion of increasing business is by increasing volumes. You don&#8217;t hear people switching banks because of how fast (or slow) their websites load and the reason is that main product offering is banking service and not a website i.e. you would worry more about interest rates rather than website response times. Businesses whose primary offering is the website itself cannot take such liberties.</p>
<h3>Turnaround time</h3>
<p>Turnaround time is the total time taken to service a request. So, if you have a slow running web page, you can keep adding more hardware to take on more volume (assuming the solution can be scaled out infinitely) but the experience of each individual user is not going to improve. Also, real world experience suggests that left to itself, things start to slow down as you scale out. A knee jerk fix is to do things in parallel and use <em>threads</em>. That also usually doesn&#8217;t get you too far thanks to what a certain <a href="http://en.wikipedia.org/wiki/Amdahl's_law">Amdahl had to say</a>. This is where all those classes on algorithms, architecture and the abstinence from bloatwares begin to make some difference.</p>
<h3>Response time</h3>
<p>Response time is what is usually called as <a href="http://blog.browsermob.com/2009/04/understanding-time-to-first-byte/">time to first byte</a> in the internet world. In trying to solve the turnaround time problem, one of the speedup areas that people work on is minimizing the context switches from user space to kernel space. <a href="http://www.ibm.com/developerworks/library/j-zerocopy/index.html">Zero copy</a> is an example of one such problem. The most common example however happens to be buffered files (or streams if you are from the Java world). Some people (and their software creations) take this to the extreme and try and send out the entire HTTP response in one shot hoping to minimize the number of system calls needed to get the job done. It turns out that this makes for a worse user experience. Put it another way, it is better off to start sending something to the user after 200 milliseconds (ms) and finish it in the next 4 seconds rather than start sending something 2 seconds after the request was issued and get done in the next 500 ms.  In fact this is a harder problem to solve for two reasons:</p>
<ul>
<li> Left to itself, most web servers aren&#8217;t eager to push back smaller chunks of data (easier problem to solve)</li>
<li>Dynamic pages, especially the ones generated MVC frameworks do not make the response available to the web server until they have fully constructed the response body. Some of these solutions offer no straight forward way to push out data in parts while others have explicit mechanisms of achieving this effect.</li>
</ul>
<p>For those of you who are still wondering why something that puts on extra load on the server <strong>and</strong> takes longer to finish is considered better by the user, there are two reasons:</p>
<ul>
<li> <strong>Psychological</strong>: Giving the user an early indication of some progress creatives some incentive for the user to wait rather than sending no information. Even getting the status bar to say <q><em>recieving from &#8230;</em></q> as opposed to <q><em>sending request to &#8230;</em></q> makes a difference.</li>
<li><strong>Pipeline effect</strong>: An average web page has references to various resources (images, external css files, etc. etc.) that are needed to completely render a page. It turns out that most browsers can initiate the retrieval of those resources before the page loads up completely. Pushing out a partial response early on gives the browsers a chance to get started with other things early on. So while the additional flushes done on the server side might have slowed down the turnaround time for basic page transmission, the overall turnaround time as seen by the user can still drop with this technique.</li>
</ul>
<h3>Throughput</h3>
<p>Since throughput signifies the total amount of work that gets done in a unit of time, it turns out that throwing more hardware can sort of solve this problem. As I had mentioned earlier, if you solution scales infinitely, then the hardware addition technique works. The reason why things do not scale infinitely are:</p>
<ul>
<li>There ends up being some components that are hard to scale infinitely such as the top level load balancer and the pipes that it is connected to</li>
<li>Amdahl&#8217;s law</li>
</ul>
<p>One of the most common fixes that is a borderline superstition is to run more threads. In a CPU bound world, having any more threads of execution than the number of compute unit slows things down. In a <a href="NUMA">NUMA</a> based world, certain workloads can be detrimental even when the number of threads matches the number of compute unit available. However, for workloads that are I/O bound, threads do help as long as the different threads are not contending for the same underlying I/O resource. The one exception is <a href="http://www.cs.jhu.edu/~yairamir/cs418/os8/sld022.htm">rotating storage media</a> where the amortized performance increases as concurrent requests increase but only up to a certain point.</p>
<p>In effect, the reasons for throughput not increasing just by increasing either the concurrency levels of task execution or by throwing in more hardware beyond a certain point is very real.</p>
<h3>Closing remarks</h3>
<p>We are now in an age where people not only believe that hardware is cheap but also in cloud computing that promises provisioning of infinite hardware (i.e. more than you can afford). The thing to remember is that you might extend the life of a given solution for quite sometime by throwing in hardware (at diminishing rate of returns) but if you are chasing response times, you will have to constantly improvise on your design as opposed relying on hardware.</p>
]]></content:encoded>
			<wfw:commentRss>http://anomalizer.net/statistically-incorrect/2009/02/throwing-in-more-hardware-is-not-panacea/feed/</wfw:commentRss>
		<slash:comments>179</slash:comments>
		</item>
	</channel>
</rss>

