Entries in games (13)

Wednesday
Aug022017

The Next Scalability Hurdle: Massively Multiplayer Mobile AR

 

Many moons ago, in Building Super Scalable Systems: Blade Runner Meets Autonomic Computing In The Ambient Cloud, I said we still had scaling challenges ahead, that we've not yet begun to scale, that we still don't know how to scale at a planetary level.

That was 7 years ago. Now Facebook has 2 billion monthly users. There's no reason to think they can't scale an unimpressive 3.5x to handle the rest of the planet. WhatsApp is at one billion daily users. YouTube is at 1.5 billion monthly users.

So it appears we do know how to service a whole planet full of people (and bots). At least a select few companies with vast resources know how. We are still no closer to your average developer being able to field a planet scale service. The winner take all nature of the Internet seems to fend off decentralization like it's a plague. Maybe efforts like Filecoin will change the tide. 

There's another area we have scaling challenges: Massively Multiplayer Mobile AR (Augmented Reality). While AR has threatened to be the future for quite some time, it now looks like the future may be just around the virtual corner.

Apple Introducing ARKit, a hit with developers, means that future will be sooner rather than later. One billion iPhone users make it so. Remember when the iPhone was introduced, how the increased data usage melted AT&Ts' network? This will be worse. 

Pokémon Go had a little event recently that shows what incredible stress such systems will put on our infrastructure. No need to repeat the story, iMore has it all: Pokémon Go Fest: What happened and whyPokémon Go Fest's big flop shows Niantic needs to think biggerPokémon Go Fest Chicago: The fun, the failure, and the legendaryUPDATED: Are AT&T's iPhone Problems Due to Network Configuration Errors?

It's true, Pokémon Go has been well known for its scaling problems, but this was a planned event, shouldn't it have been handled better? No doubt. Still, a concentration of 20,000 players in a single shard, in such a small "kill zone" like a park, is a challenge. Should they have brought in Cell on Wheels, use high density WiFi, maybe put up microwave links to increase the backhaul? Yep, that seems reasonable. EM spectrum is a terrible thing to waste.

But what happens when Pokémon Go Fest is just what we call Tuesday? When everyone is using mobile AR? Every product in every store, every building, every sign, everything will have some sort of data driven overlay. There will be no chance to build special infrastructure. Infrastructure must be improved to handle the new loads. Hopefully 5G will come to the rescue.

Spectrum isn't the only problem. Compute resources are also a problem. Pokémon Go isn't a particularly data intensive game. It doesn't require a lot interaction between users or constant communication with backend servers. What happens we we have multiple games like that all operating at once?

Pokémon Go seems like a poster child for edge computing. The entire shard could have been handled by a portable onsite datacenter with its own local communication infrastructure. An onsite datacenter combines low latency compute with enough scale to handle the load. My guess is the thundering herd problem that blocked players connecting to the game would have disappeared. Players would have connected quickly to the local game servers and started playing the game with little muss or fuss. Same with game state.

Perhaps in the future we'll have datacenter handoff protocols just like we have cell network handoff protocols today. And if we really do it right, the big scheduler in the sky that will coordinate all these moving parts, might consider distributed compute resources like smartphones as part of the compute fabric.

We have not yet begun to scale Massively Multiplayer Mobile AR. 

 

Monday
Jan042016

Server-Side Architecture. Front-End Servers and Client-Side Random Load Balancing

Chapter by chapter Sergey Ignatchenko is putting together a wonderful book on the Development and Deployment of Massively Multiplayer Games, though it has much broader applicability than games. Here's a recent chapter from his book.

Enter Front-End Servers

[Enter Juliet]
Hamlet:
Thou art as sweet as the sum of the sum of Romeo and his horse and his black cat! Speak thy mind!
[Exit Juliet]

— a sample program in Shakespeare Programming Language

 

 

Front-End Servers as an Offensive Line

 

Our Classical Deployment Architecture (especially if you do use FSMs) is not bad, and it will work, but there is still quite a bit of room for improvement for most of the games out there. More specifically, we can add another row of servers in front of the Game Servers, as shown on Fig VI.8:

Click to read more ...

Wednesday
Feb132013

7 Sensible and 1 Really Surprising Way EVE Online Scales to Play Huge Games

"Everything in war is simple, but the simplest thing is difficult." -- Carl von Clausewitz

Games are proving grounds for software architecture. They combine scale, high performance, challenging problems, a rabid user base, cost sensitivity, and the need for profit. And when games have in-game currency, like EVE Online has, there's money at play, so you can't just get away with a c'est la vie attitude. Engineering must be applied. 

In Planning for war: how the EVE Online servers deal with a 3,000 person battle, we learn some techniques EVE Online uses to handle large games:

7 Sensible...

Click to read more ...

Wednesday
Oct172012

World of Warcraft's Lead designer Rob Pardo on the Role of the Cloud in Games

In a really far ranging and insightful interview by Steve Peterson: Game Industry Legends: Rob Pardo, where the future of gaming is discussed, there was a section on how the cloud might be used in games. I know there are a lot of game developers in my audience, so I thought it might be useful:

Q. If the game is free-to-play but I have to download 10 gigabytes to try it out, that can keep me from trying it. That's part of what cloud gaming is trying to overcome; do you think cloud gaming is going to make some inroads because of those technical issues?

Click to read more ...

Monday
Oct152012

Simpler, Cheaper, Faster: Playtomic's Move from .NET to Node and Heroku

This is a guest post by Ben Lowry, CEO of Playtomic. Playtomic is a game analytics service implemented in about 8000 mobile, web and downloadable games played by approximately 20 million people daily.

Here's a good summary quote by Ben Lowry on Hacker News:

Just over 20,000,000 people hit my API yesterday 700,749,252 times, playing the ~8,000 games my analytics platform is integrated in for a bit under 600 years in total play time. That's just yesterday. There are lots of different bottlenecks waiting for people operating at scale. Heroku and NodeJS, for my use case, eventually alleviated a whole bunch of them very cheaply.

Playtomic began with an almost exclusively Microsoft.NET and Windows architecture which held up for 3 years before being replaced with a complete rewrite using NodeJS.  During its lifetime the entire platform grew from shared space on a single server to a full dedicated, then spread to second dedicated, then the API server was offloaded to a VPS provider and 4 – 6 fairly large VPSs.   Eventually the API server settled on 8 dedicated servers at Hivelocity, each a quad core with hyperthreading + 8gb of ram + dual 500gb disks running 3 or 4 instances of the API stack.
 
These servers routinely serviced 30,000 to 60,000 concurrent game players and received up to 1500 requests per second, with load balancing done via DNS round robin.

In July the entire fleet of servers was replaced with a NodeJS rewrite hosted at Heroku for a significant saving.

Scaling Playtomic with NodeJS

Click to read more ...

Thursday
May192011

Zynga's Z Cloud - Scale Fast or Fail Fast by Merging Private and Public Clouds

Release early and often. A/B testing. Creating a landing page and buying ads on AdSense. All are ways of providing quick feedback in order to validate an idea. If you are like Zynga, with 250 million active users a month, how do you cost effectively prove out a game that could flop or get 90 million users (like CityVille) in an instant?

Zynga handles this problem inlle an innovative way, by inverting the typical cloud burst scenario that has excess traffic flowing from a datacenter to a cloud, to having a game start in the cloud and then moving to the datacenter once the game has proved popular enough to keep.

This process is nicely described by Charles Babcock in Lessons From FarmVille: How Zynga Uses The Cloud, in an interview with Allan Leinwand, CTO of infrastructure engineering at Zynga.

When paired down to its essence, Zynga's strategy goes something like this:

Click to read more ...

Tuesday
Sep212010

Playfish's Social Gaming Architecture - 50 Million Monthly Users and Growing

Ten million players a day and over fifty million players a month interact socially with friends using Playfish games on social platforms like The Facebook, MySpace, and the iPhone. Playfish was an early innovator in the fastest growing segment of the game industry: social gaming, which is the love child between casual gaming and social networking. Playfish was also an early adopter of the Amazon cloud, running their system entirely on 100s of cloud servers. Playfish finds itself at the nexus of some hot trends (which may by why EA bought them for $300 million and they think a $1 billion game is possible): building games on social networks, build applications in the cloud, mobile gaming, leveraging data driven design to continuously evolve and improve systems, agile development and deployment, and selling virtual good as a business model.

How can a small company make all this happen? To explain the magic I interviewed Playfish's Jodi Moran, Senior Director of Engineering, and Martin Frost, Chief Architext, first Engineer and Operations guy at Playfish. Lots of good stuff, so let's move on to the nitty gritty.

Click to read more ...

Wednesday
Mar102010

How FarmVille Scales - The Follow-up

Several readers had follow-up questions in response to How FarmVille Scales to Harvest 75 Million Players a Month. Here are Luke's response to those questions (and a few of mine).

How does social networking makes things easier or harder?

The primary interesting aspect of social networking games is how you wind up with a graph of connected users who need to be access each other's data on a frequent basis. This makes the overall dataset difficult if not impossible to partition.

What are examples of the Facebook calls you try to avoid and how they impact game play?

We can make a call for facebook friend data to retrieve information about your friends playing the game. Normally, we show a friend ladder at the bottom of the game that shows friend information, including name and facebook photo. 

Can you say where your cache is, what form it takes, and how much cached there is? Do you have a peering relationship with Facebook, as one might expect at that bandwidth?

Click to read more ...

Monday
Feb082010

How FarmVille Scales to Harvest 75 Million Players a Month

Several readers had follow-up questions in response to this article. Luke's responses can be found in How FarmVille Scales - The Follow-up.

If real farming was as comforting as it is in Zynga's mega-hit Farmville then my family would have probably never left those harsh North Dakota winters. None of the scary bedtime stories my Grandma used to tell about farming are true in FarmVille. Farmers make money, plants grow, and animals never visit the red barn. I guess it's just that keep-your-shoes-clean back-to-the-land charm that has helped make FarmVille the "largest game in the world" in such an astonishingly short time.

How did FarmVille scale a web application to handle 75 million players a month? Fortunately FarmVille's Luke Rajlich has agreed to let us in on a few their challenges and secrets. Here's what Luke has to say...

Click to read more ...

Saturday
Jan172009

Scaling in Games & Virtual Worlds  

"Online games and virtual worlds have familiar scaling requirements, but don’t be fooled: everything you know is wrong." Jim Waldo, Sun Microsystems Laboratories * The computational environment for online games or virtual worlds is close to the exact inverse of that found in most markets serviced by the high-tech industry. * The need for a heavyweight client is, in part, an outcome of the evolution of these games. * Latency is the enemy of fun—and therefore the enemy of online games and virtual worlds. * The game server is used both to discourage cheating (by making it much more difficult) and to detect cheating (by seeing patterns of divergence between the game state reported by the client and the game state held by the server). Peer-to-peer technologies might seem a natural fit for the first role of the game server, but this second role means that few if any games or worlds trust their peers enough to avoid the server component. * Using multiple servers is a basic mechanism for scaling the server component of a game to the levels that are being seen in the online world today. * Having multiple servers means that part of building the game is deciding how to partition the load over these servers. The first technique is to exploit the geography of the game or world. The second technique is known as sharding. * While shards allow scale, they do so at the price of player interaction. * The problem is that the culture that has grown up around games and virtual worlds is not one that understands or is overly familiar with the programming techniques that are required to exploit the parallelism inherent in these systems. * It is for these reasons that we started Project Darkstar (http://www.projectdarkstar.com), a research effort attempting to build a server-side infrastructure that will exploit the multithreaded, multicore chips being produced and scaled over a large group of machines, while presenting the programmer with the illusion that he or she is developing in a single-threaded, single-machine environment. *The model is a simple event-based one in which input from the client is received by the server, which then sets off a task in response to that event. * This mechanism for concurrency control does require that all tasks access all of their data through the Darkstar data service. Our current implementation uses the Berkeley Database. we believe that we can keep the penalty for accessing through a data service small by caching data in intelligent ways. We also believe that by using the inherent parallelism in these games, we can increase the overall performance of the game as the number of players increases, even if there is a small penalty for individual data access. * We found that additional machines lowered the capacity of the overall system. We are working on removing the choke points so that adding equipment actually adds capacity.

Click to read more ...