High Scalability -

Entries in job scheduling (3)

Friday

Nov062009

Product: Resque - GitHub's Distrubuted Job Queue

Friday, November 6, 2009 at 7:43AM

Queuing work for processing in the background is a time tested scalability strategy. Queuing also happens to be one of those much needed tools where it easy enough to forge for your own that we see a lot of different versions made. Resque is GitHub's take on a job queue and they've used it to process million and millions of jobs so far.

What is Resque?

Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later. Background jobs can be any Ruby class or module that responds to perform. Your existing classes can easily be converted to background jobs or you can create new classes specifically to do work. Or, you can do both.

GitHub tried and considered many other systems: SQS, Starling, ActiveMessaging, BackgroundJob, DelayedJob, beanstalkd, AMQP, and Kestrel, but found them all wanting in one way are another. The latency for SQS was too high. Others didn't make full use of Ruby. Others still had a lot of overhead. Some didn't have enough features. And still others weren't reliable enough.

Click to read more ...

HighScalability Team |

2 Comments |

Permalink |

Print Article

Email Article

Product,

job scheduling,

ruby

Tuesday

Jan132009

Product: Gearman - Open Source Message Queuing System

Tuesday, January 13, 2009 at 1:16AM

Update: New Gearman Server & Library in C, MySQL UDFs. Gearman is an open source message queuing system that makes it easy to do distributed job processing using multiple languages. With Gearman you: farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, to call functions between languages, spread CPU usage around your network. Gearman is used by companies like LiveJournal, Yahoo!, and Digg. Digg, for example, runs 300,000 jobs a day through Gearman without any issues. Most large sites use something similar. Why would anyone ever even need a message queuing system? Message queuing is a handy way to move work off your web servers (like image manipulation), to generate thousands of documents in the background, to run the multiple requests in parallel needed to build a web page, or to perform tasks that can comfortably be run in the background and not part of the main request loop for servicing a web request. There's a gearmand server and clients written in Perl, Ruby, Python or C. Use at least two gearmand server daemons for higher availability. The tasks each client can perform are registered with gearman distributes requests for those functions to the client that can implement them. Gearman uses a very robust, if somewhat higher latency, signal-and-pull architecture.

According to dormando the flow goes like: * worker connects to all gearmand servers. * worker registers what functions it supports. * worker asks for jobs. * if no jobs, sends command 'pre_sleep' to all gearmand's and sleeps.

Client does: * Connect to gearmand. * submit's a job for a particular func.

Gearmand does: * Acks the job, finds all *sleeping workers* related to the function. * Sends them all a 'noop' command to wake them up.

Worker does: * Urk, I'm awake now. * Worker asks for jobs. * If jobs, do work. * If no jobs, sends command 'pre_sleep' to all gearmand's, etc. Gearman uses an efficient binary protocol and no XML. There's an a line-based text protocol for admin so you can use telnet and hook into Nagios plugins. The system makes no guarantees. If there's a failure the client is told about the failure and the client is responsible for retries. And the queue isn’t persistent. If gearman is restarted the queue is gone.

Gearman Wiki

German Google Groups

Queue everything and delight everyone by Leslie Michael Orchard.

USENIX 2007. Starts at slide 83.

PEAR and Gearman by Daniel O'Connor.

Amazon Architecture

Click to read more ...

Todd Hoff |

8 Comments |

Permalink |

Print Article

Email Article

Product,

gearman,

job scheduling

Sunday

May252008

Product: Condor - Compute Intensive Workload Management

Sunday, May 25, 2008 at 3:01AM

From their website: Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion. While providing functionality similar to that of a more traditional batch queueing system, Condor's novel architecture allows it to succeed in areas where traditional scheduling systems fail. Condor can be used to manage a cluster of dedicated compute nodes (such as a "Beowulf" cluster). In addition, unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. For instance, Condor can be configured to only use desktop machines where the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (such as a key press detected), in many circumstances Condor is able to transparently produce a checkpoint and migrate a job to a different machine which would otherwise be idle. Condor does not require a shared file system across machines - if no shared file system is available, Condor can transfer the job's data files on behalf of the user, or Condor may be able to transparently redirect all the job's I/O requests back to the submit machine. As a result, Condor can be used to seamlessly combine all of an organization's computational power into one resource.

High Throughput Computing by Miron Livny

Condor Presentations

(my) Principles of Distributed Computing by Miron Livny

Click to read more ...

Todd Hoff |

1 Comment |

Permalink |

Print Article

Email Article

Grid,

Product,

job scheduling