Entries in queue (4)

Tuesday
Sep032019

Top Redis Use Cases by Core Data Structure Types

Top Redis Use Cases by Core Data Structure Types - ScaleGrid Blog

Redis, short for Remote Dictionary Server, is a BSD-licensed, open-source in-memory key-value data structure store written in C language by Salvatore Sanfillipo and was first released on May 10, 2009. Depending on how it is configured, Redis can act like a database, a cache or a message broker. It’s important to note that Redis is a NoSQL database system. This implies that unlike SQL (Structured Query Language) driven database systems like MySQL, PostgreSQL, and Oracle, Redis does not store data in well-defined database schemas which constitute tables, rows, and columns. Instead, Redis stores data in data structures which makes it very flexible to use. In this blog, we outline the top Redis use cases by the different core data structure types.

Data Structures in Redis

Click to read more ...

Tuesday
Jan172017

Wouldn't it be nice if everyone knew a little queuing theory?

After many days of rain one lane of this two lane road collapsed into the canyon. It's been out for a month and it will be many more months before it will be fixed. Thanks to Google maps way too many drivers take this once sleepy local road. 

How do you think drivers go through this chokepoint? 

 

 

One hundred experience points to you if you answered one at a time.

One at a time! Through a half-duplex pipe following a first in first out discipline takes forever!

Yes, there is a stop sign. And people default to this mode because it appeals to our innate sense of fairness. What could be fairer than alternating one at a time?

The problem is it's stupid.

While waiting, stewing, growing angrier, I often think if people just knew a little queueing theory we could all be on our way a lot faster.

We can't make the pipe full duplex, so that's out. Let's assume there's no priority involved, vehicles are roughly the same size and take roughly the same time to transit the network. Then what do you do?

Why can't people figure out its faster to drive through in batches? If we went in groups of say, three, the throughput would be much higher. And when one side's queue depth grows larger because people are driving to or from work that side's batch size should increase. 

Since this condition will last a long time we have a possibility to learn because the same people take this road all the time. So what happens if you try to change the culture by showing people what a batch is by driving right behind someone as they take their turn?

You got it. Honking. There's a simple heuristic, a deeply held ethic against line cutting, so people honk, flip you off, and generally make heir displeasure known.

It's your classic battle of reason versus norms. The smart thing is the thing we can't do by our very natures. So we all just keep doing the dumb thing.

 

Monday
May172010

7 Lessons Learned While Building Reddit to 270 Million Page Views a Month

Steve Huffman, co-founder of social news site Reddit, gave an excellent presentation (slides, transcript) on the lessons he learned while building and growing Reddit to 7.5 million users per month, 270 million page views per month, and 20+ database servers.

Steve says a lot of the lessons were really obvious, so you may not find a lot of completely new ideas in the presentation. But Steve has an earnestness and genuineness about him that is so obviously grounded in experience that you can't help but think deeply about what you could be doing different. And if Steve didn't know about these lessons, I'm betting others don't either.

There are seven lessons, each has their own summary section: Lesson one: Crash Often; Lesson 2: Separation of Services; Lesson 3: Open Schema; Lesson 4: Keep it Stateless; Lesson 5: Memcache; Lesson 6: Store Redundant Data; Lesson 7: Work Offline.

By far the most surprising feature of their architecture is in Lesson Six, whose essential idea is:

Click to read more ...

Wednesday
Oct082008

Strategy: Flickr - Do the Essential Work Up-front and Queue the Rest 

This strategy is stated perfectly by Flickr's Myles Grant: The Flickr engineering team is obsessed with making pages load as quickly as possible. To that end, we’re refactoring large amounts of our code to do only the essential work up front, and rely on our queuing system to do the rest. Flickr uses a queuing system to process 11 million tasks a day. Leslie Michael Orchard also does a great job explaining the queuing meme in his excellent post Queue everything and delight everyone. Asynchronous work queues are how you scalably solve problems that are too big to handle in real-time. The process:

  • Identify the minimum feedback the client (UI, API) needs to know an operation succeeded. It's enough, for example, to update a client's view when a posting a message to a microblogging service. The client probably isn't aware of all the other steps that happen when a message is added and doesn't really care when they happen as long as the obvious cases happen in an appropariate period of time.
  • Queue all work not on the critical path to a job queueing system so the critical path remains unblocked. Work is then load balanced across a cluster and completed as resources permit. The more sharded your architecture is the more work can be done in parallel which minimizes total throughput time. This approach makes it much easier to bound response latencies as features scale.

    Queues Give You Lots of New Knobs to Play With

    As features are added data consumers multiply, so throwing a new task into a sequential process has a good chance of blowing latencies. Queueing gives much more control and flexibility over the performance of a system. With queues some advanced strategies you have at your disposal are:
  • Horizontal scaling. Add more processing resources to do more work in parallel.
  • Priority order processing. Paying customers, can be processed first, for example. Take measures to avoid starvation.
  • Aggregation. Work sitting on the same queue for the same user can be aggregated together so it can be processed as a batch.
  • Work canceling. A request later in the queue can cancel work earlier in the queue. These can just be dropped.
  • CPU limitting. When jobs have unbounded CPU time it destroys the latency for other jobs sitting in the queue. Bounding CPU limits on jobs evens out latency for everyone.
  • Low priority work dropping. Under load low priority jobs can be dropped. Just make you have background sweep processes that catch work that should have been done and redoes it.
  • Admission control. Under load clients can be told about when to retry. This is the best form of flow control, end-to-end flow with the client. We want to push back on work as high up the stack as we can. Stop the client from pushing work to you and you've accomplished something. Just having blind retries and timeouts puts immense pressure on the whole system. These ideas have been employed in embedded real-time systems forever and now it seems they'll move into web services as well.

    What Can You do with Your Queue?

    The options are endless, but here are some uses I found out in the wild:
  • Backfill jobs. Backfill is what Flickr calls asynchronous job that: alter database tables in preparation for a new feature; fix existing features; or other operation that touch a lot of accounts, photos, or groups. For example, a sharding approach means related data is spread through many different shards. To delete a user account would require visiting each shard to delete that users data. Each of those deletes would be queued to they could be done in parallel. Now lets say a bug prevented some of the user data from deleting. After the bug was fixed the user data for all the impacted user accounts would have to be scheduled to be deleted again.
  • Low latency funciton call router.
  • Scatter/gather calls in paralellel.
  • Defer expensive library calls.
  • Parellize database queries.
  • Job queue system for a cluster. Efficiently use all your pool of CPU power.
  • Sending scheduled mail merged emails.
  • Creating guest hosts
  • Put heavy code on backend instead of the web server.
  • Call a cron script to update topic hits and popular article hits.
  • Clean useless data from database because it's outdated.
  • Resize photos.
  • Run daily reports.
  • Update search indexes.
  • Speed up batch jobs by running them in parallel.
  • SpamAssassin spamtraps.

    Queuing Implies an Event Driven State Machine Based Client Architecture

    Moving to queuing has architecture implications. The client and server are nolonger connected in a direct request-response sort of way. Instead, the server continually sends events to clients. The client is event driven instead of request-response driven. Internally clients often simulates the reqest-response model even though Ajax is asynchronous. It might be better to drop the request-response illusion and just make the client an event driven state machine. An event can come from a request, or from asynchronous jobs, or events can be generated by others performing activities that a client should see. Each client has an event channel that the system puts events on for a client to consume. The client is responspible for making sense of the event in its current context and is capable of handling any event regardless of its original source.

    Queuing Systems

    If you are in the market for a queuing system take a look at:
  • Gearman - Open Source Message Queuing System
  • Amazon's SQS. The latencies for this service tend to be high and variable so it may not be appropriate for all tasks.
  • beanstalkd.
  • Apache ActiveMQ.
  • Spread Queue
  • Rabbit MQ
  • Open AMQ
  • The Schwartz
  • Starling
  • Simple MQ
  • Roll your own.

    Related Articles

  • Flick Engineers Do it Offline by Myles Grant
  • Queue everything and delight everyone by Leslie Michael Orchard.
  • Gearman - Open Source Message Queuing System
  • GridGain: One Compute Grid, Many Data Grids

    Click to read more ...