Entries in job scheduling (3)

Friday
Nov062009

Product: Resque - GitHub's Distrubuted Job Queue

Queuing work for processing in the background is a time tested scalability strategy. Queuing also happens to be one of those much needed tools where it easy enough to forge for your own that we see a lot of different versions made. Resque is GitHub's take on a job queue and they've used it to process million and millions of jobs so far.

What is Resque?

Redis-backed library for creating background jobs, placing those jobs on multiple queues, and processing them later. Background jobs can be any Ruby class or module that responds to perform. Your existing classes can easily be converted to background jobs or you can create new classes specifically to do work. Or, you can do both.

GitHub tried and considered many other systems: SQS, Starling, ActiveMessaging, BackgroundJob, DelayedJob, beanstalkd, AMQP,  and Kestrel, but found them all wanting in one way are another. The latency for SQS was too high. Others didn't make full use of Ruby. Others still had a lot of overhead. Some didn't have enough features. And still others weren't reliable enough.

Click to read more ...

Tuesday
Jan132009

Product: Gearman - Open Source Message Queuing System

Update: New Gearman Server & Library in C, MySQL UDFs. Gearman is an open source message queuing system that makes it easy to do distributed job processing using multiple languages. With Gearman you: farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, to call functions between languages, spread CPU usage around your network. Gearman is used by companies like LiveJournal, Yahoo!, and Digg. Digg, for example, runs 300,000 jobs a day through Gearman without any issues. Most large sites use something similar. Why would anyone ever even need a message queuing system? Message queuing is a handy way to move work off your web servers (like image manipulation), to generate thousands of documents in the background, to run the multiple requests in parallel needed to build a web page, or to perform tasks that can comfortably be run in the background and not part of the main request loop for servicing a web request. There's a gearmand server and clients written in Perl, Ruby, Python or C. Use at least two gearmand server daemons for higher availability. The tasks each client can perform are registered with gearman distributes requests for those functions to the client that can implement them. Gearman uses a very robust, if somewhat higher latency, signal-and-pull architecture.

  • According to dormando the flow goes like: * worker connects to all gearmand servers. * worker registers what functions it supports. * worker asks for jobs. * if no jobs, sends command 'pre_sleep' to all gearmand's and sleeps.
  • Client does: * Connect to gearmand. * submit's a job for a particular func.
  • Gearmand does: * Acks the job, finds all *sleeping workers* related to the function. * Sends them all a 'noop' command to wake them up.
  • Worker does: * Urk, I'm awake now. * Worker asks for jobs. * If jobs, do work. * If no jobs, sends command 'pre_sleep' to all gearmand's, etc. Gearman uses an efficient binary protocol and no XML. There's an a line-based text protocol for admin so you can use telnet and hook into Nagios plugins. The system makes no guarantees. If there's a failure the client is told about the failure and the client is responsible for retries. And the queue isn’t persistent. If gearman is restarted the queue is gone.

    Related Articles

  • Gearman Wiki
  • German Google Groups
  • Queue everything and delight everyone by Leslie Michael Orchard.
  • USENIX 2007. Starts at slide 83.
  • PEAR and Gearman by Daniel O'Connor.
  • Amazon Architecture

    Click to read more ...

  • Sunday
    May252008

    Product: Condor - Compute Intensive Workload Management

    From their website: Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion. While providing functionality similar to that of a more traditional batch queueing system, Condor's novel architecture allows it to succeed in areas where traditional scheduling systems fail. Condor can be used to manage a cluster of dedicated compute nodes (such as a "Beowulf" cluster). In addition, unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. For instance, Condor can be configured to only use desktop machines where the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (such as a key press detected), in many circumstances Condor is able to transparently produce a checkpoint and migrate a job to a different machine which would otherwise be idle. Condor does not require a shared file system across machines - if no shared file system is available, Condor can transfer the job's data files on behalf of the user, or Condor may be able to transparently redirect all the job's I/O requests back to the submit machine. As a result, Condor can be used to seamlessly combine all of an organization's computational power into one resource.

    Related Articles

  • High Throughput Computing by Miron Livny
  • Condor Presentations
  • (my) Principles of Distributed Computing by Miron Livny

    Click to read more ...