Entries in google (39)

Thursday
Jun272019

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

Ready to transition from a commercial database to open source, and want to know which databases are most popular in 2019? Wondering whether an on-premise vs. public cloud vs. hybrid cloud infrastructure is best for your database strategy? Or, considering adding a new database to your application and want to see which combinations are most popular? We found all the answers you need at the Percona Live event last month, and broke down the insights into the following free trends reports:

Click to read more ...

Wednesday
Apr032019

2019 PostgreSQL Trends Report: Private vs. Public Cloud, Migrations, Database Combinations & Top Reasons Used

2019 PostgreSQL Trends Report: Private vs. Public Cloud, Migrations, Database Combinations & Top Reasons Used

PostgreSQL is an open source object-relational database system that has soared in popularity over the past 30 years from its active, loyal, and growing community. For the 2nd year in a row, PostgreSQL has kept the title of #1 fastest growing database in the world according to the DBMS of the Year report by the experts at DB-Engines. So what makes PostgreSQL so special, and how is it being used today? We found the answers at the Postgres Conference in March where we surveyed PostgreSQL users, contributors, and SQL and NoSQL database administrators alike. In this free PostgreSQL Trends Report, we break down PostgreSQL hosting use across public cloud vs. private cloud vs. hybrid cloud, most popular cloud providers, migration trends, database combinations with Postgres, and why PostgreSQL is preferred over popular RDBMS alternatives.

Private Cloud vs. Public Cloud vs. Hybrid Cloud

Click to read more ...

Wednesday
Aug082018

Case Study: Pokémon GO on Google Cloud Load Balancing

 

There are a lot of cool nuggets in Google's New Book: The Site Reliability Workbook. If you haven't put it on your reading list, here's a tantalizing excerpt from CHAPTER 11 Managing Load by Cooper Bethea, Gráinne Sheerin, Jennifer Mace, and Ruth King with Gary Luo and Gary O’Connor.

 

Niantic launched Pokémon GO in the summer of 2016. It was the first new Pokémon game in years, the first official Pokémon smartphone game, and Niantic’s first project in concert with a major entertainment company. The game was a runaway hit and more popular than anyone expected—that summer you’d regularly see players gathering to duel around landmarks that were Pokémon Gyms in the virtual world.

Pokémon GO’s success greatly exceeded the expectations of the Niantic engineering team. Prior to launch, they load-tested their software stack to process up to 5x their most optimistic traffic estimates. The actual launch requests per second (RPS) rate was nearly 50x that estimate—enough to present a scaling challenge for nearly any software stack. To further complicate the matter, the world of Pokémon GO is highly interactive and globally shared among its users. All players in a given area see the same view of the game world and interact with each other inside that world. This requires that the game produce and distribute near-real-time updates to a state shared by all participants.

Scaling the game to 50x more users required a truly impressive effort from the Niantic engineering team. In addition, many engineers across Google provided their assis‐ tance in scaling the service for a successful launch. Within two days of migrating to GCLB, the Pokemon GO app became the single largest GCLB service, easily on par with the other top 10 GCLB services.

As shown in Figure 11-5, when it launched, Pokémon GO used Google’s regional Network Load Balancer (NLB) to load-balance ingress traffic across a Kubernetes cluster. Each cluster contained pods of Nginx instances, which served as Layer 7 reverse proxies that terminated SSL, buffered HTTP requests, and performed routing and load balancing across pods of application server backends.


Figure 11-5. Pokémon GO (pre-GCLB)

Click to read more ...

Monday
Jul182016

How Does Google do Planet-Scale Engineering for a Planet-Scale Infrastructure?

 

How does Google keep all its services up and running? They almost never seem to fail. If you've ever wondered we get a wonderful peek behind the curtain in a talk given at GCP NEXT 2016 by Melissa Binde, Director, Storage SRE at Google: How Google Does Planet-Scale Engineering for Planet-Scale Infrastructure.

Melissa's talk is short, but it's packed with wisdom and delivered in a no nonsense style that makes you think if your service is down Melissa is definitely the kind of person you want on the case. 

Oh, just what is SRE? It stands for Site Reliability Engineering, but a definition is more elusive. It's like the kind of answers you get when you ask for a definition of the Tao. It's more a process than a thing, as is made clear by Ben Sloss 24x7 VP, Google, who defines SRE as:

what happens when a software engineer is tasked with what used to be called operations.

Let that bounce around your head for awhile.

Above and beyond all else one thing is clear: SREs are the custodian of production. SREs are the custodian of customer experience, for both google.com and GCP.

Some of the highlights of the talk for me:

  • The Destructive Incentives of Pitting Uptime vs Features. SRE is an attempt to solve the natural tension between developers who want to push features and sysadmins that want maintain uptime by not pushing features. 
  • The Error Budget. This is the idea that failure is expected. It's not a bad thing. Users can't tell if a service is up 100% of the time or 99.99%, so you can have errors. This reduces the tension between dev and ops. As long as the error budget is maintained you can push out new features and the ops side won't be blamed.
  • Goal is to restore service immediately. Troubleshooting comes later. This means you need a  lot of logging and tooling to debug after a service has been restored. For some reason this made flash on a line from an earlier article, also based on a talk from a Google SRE: Backups are useless. It’s the restore you care about
  • No Boredom Philosophy of Paging. When a page comes in it should be for an interesting and new problem. You don't want SREs being bored handling repetitive problems. That's what bots are for.

Other interesting topics in the talk are: How is SRE structured organizationally? How are devs hired into a role focussed on production and keep them happy? How do we keep the team valued inside of Google? How do we help our teams communicate better and resolve disagreements with data rather than with assertions or power grabs? 

Let's get on with it with it. Here's how Google does Planet-Scale Engineering for a Planet-Scale Infrastructure...

Click to read more ...

Wednesday
Mar162016

Jeff Dean on Large-Scale Deep Learning at Google


If you can’t understand what’s in information then it’s going to be very difficult to organize it.

 

This quote is from Jeff Dean, currently a Wizard, er, Fellow in Google’s Systems Infrastructure Group. It’s taken from his recent talk: Large-Scale Deep Learning for Intelligent Computer Systems.

Since AlphaGo vs Lee Se-dol, the modern version of John Henry’s fatal race against a steam hammer, has captivated the world, as has the generalized fear of an AI apocalypse, it seems like an excellent time to gloss Jeff’s talk. And if you think AlphaGo is good now, just wait until it reaches beta.

Jeff is referring, of course, to Google’s infamous motto: organize the world’s information and make it universally accessible and useful.

Historically we might associate ‘organizing’ with gathering, cleaning, storing, indexing, reporting, and searching data. All the stuff early Google mastered. With that mission accomplished Google has moved on to the next challenge.

Now organizing means understanding.

Some highlights from the talk for me:

  • Real neural networks are composed of hundreds of millions of parameters. The skill that Google has is in how to build and rapidly train these huge models on large interesting datasets, apply them to real problems, and then quickly deploy the models into production across a wide variery of different platforms (phones, sensors, clouds, etc.).

  • The reason neural networks didn’t take off in the 90s was a lack of computational power and a lack of large interesting data sets. You can see how Google’s natural love of algorithms combined with their vast infrastructure and ever enlarging datasets created a perfect storm for AI at Google.

  • A critical difference between Google and other companies is that when they started the Google Brain project in 2011, they didn’t keep their research in the ivory tower of a separate research arm of the company. The project team worked closely with other teams like Android, Gmail, and photos to actually improve those properties and solve hard problems. That’s rare and a good lesson for every company. Apply research by working with your people.

  • This idea is powerful: They’ve learned they can take a whole bunch of subsystems, some of which may be machine learned, and replace it with a much more general end-to-end machine learning piece. Often when you have lots of complicated subsystems there’s usually a lot of complicated code to stitch them all together. It’s nice if you can replace all that with data and very simple algorithms.

  • Machine learning will only get better, faster. A paraphrased quote from Jeff: The machine learning community moves really really fast. People publish a paper and within a week lots of research groups throughout the world have downloaded the paper, have read it, dissected it, understood it, implemented some extensions to it, and published their own extensions to it on arXiv.org. It’s different than a lot other parts of computer science where people would submit a paper, and six months later a conference would decide to accept it or not, and then it would come out in the conference proceeding three months later. By then it’s a year. Getting that time down from a year to a week is amazing.

  • Techniques can be combined in magical ways. The Translate Team wrote an app using computer vision that recognizes text in a viewfinder. It translates the text and then superimposes the translated text on the image itself. Another example is writing image captions. It combines image recognition with the Sequence-to-Sequence neural network. You can only imagine how all these modular components will be strung together in the future.

  • Models with impressive functionality are small enough run on Smartphones. For technology to disappear intelligence must move to the edge. It can’t be dependent on network umbilical cord connected to a remote cloud brain. Since TensorFlow models can run on a phone, that might just be possible.

  • If you’re not considering how to use deep neural nets to solve your data understanding problems, you almost certainly should be. This line is taken directly from the talk, but it’s truth is abundantly clear after you watch hard problem after hard problem made tractable using deep neural nets.

Jeff always gives great talks and this one is no exception. It’s straightforward, interesting, in-depth, and relatively easy to understand. If you are trying to get a handle on Deep Learning or just want to see what Google is up to, then it's a must see.

There’s not a lot of fluff in the talk. It’s packed. So I’m not sure how much value add this article will give you. So if you want to just watch the video I’ll understand.

As often happens with Google talks there’s this feeling you get that we’ve only been invited into the lobby of Willy Wonka’s Chocolate factory. In front of us is a locked door and we're not invited in. What’s beyond that door must be full of wonders. But even Willy Wonka’s lobby is interesting.

So let’s learn what Jeff has to say about the future…it’s fascinating...

What is Meant by Understanding?

Click to read more ...

Tuesday
Feb232016

Google's Transition from Single Datacenter, to Failover, to a Native Multihomed Architecture

 

Making a system work in one datacenter is hard. Now imagine you move to two datacenters. Now imagine you need to support multiple geographically distributed datacenters. That’s the journey described in another excellent and thought provoking paper from Google: High-Availability at Massive Scale: Building Google’s Data Infrastructure for Ads.

The main idea of the paper is that the typical failover architecture used when moving from a single datacenter to multiple datacenters doesn’t work well in practice. What does work, where work means using fewer resources while providing high availability and consistency, is a natively multihomed architecture:

Our current approach is to build natively multihomed systems. Such systems run hot in multiple datacenters all the time, and adaptively move load between datacenters, with the ability to handle outages of any scale completely transparently. Additionally, planned datacenter outages and maintenance events are completely transparent, causing minimal disruption to the operational systems. In the past, such events required labor-intensive efforts to move operational systems from one datacenter to another

The use of “multihoming” in this context may be confusing because multihoming usually refers to a computer connected to more than one network. At Google scale perhaps it’s just as natural to talk about connecting to multiple datacenters.

Google has built several multi-homed systems to guarantee high availability (4 to 5 nines) and consistency in the presence of datacenter level outages: F1 / Spanner: Relational Database; Photon: Joining Continuous Data StreamsMesa: Data Warehousing. The approach taken by each of these systems is discussed in the paper, as are the many challenges is building a multi-homed system: Synchronous Global State; What to Checkpoint; Repeatable Input; Exactly Once Output.

The huge constraint here is having availability and consistency. This highlights the refreshing and continued emphasis Google puts on making even these complex systems easy for programmers to use:

The simplicity of a multi-homed system is particularly valuable for users. Without multi-homing, failover, recovery, and dealing with inconsistency are all application problems. With multi-homing, these hard problems are solved by the infrastructure, so the application developer gets high availability and consistency for free and can focus instead on building their application.

The biggest surprise in the paper was the idea that a multihomed system can actually take far fewer resources than a failover system:

In a multi-homed system deployed in three datacenters with 20% total catchup capacity, the total resource footprint is 170% of steady state. This is dramatically less than the 300% required in the failover design above

What’s Wrong With Failover?

Click to read more ...

Monday
Dec142015

Does AMP Counter an Existential Threat to Google?

When AMP (Accelerated Mobile Pages) was first announced it was right inline with Google’s long standing project to make the web faster. Nothing seemingly out of the ordinary.

Then I listened to a great interview on This Week in Google with Richard Gingras, Head of News at Google, that made it clear AMP is more than just another forward looking initiative from Google. Much more.

What is AMP? AMP is two things. AMP is a restricted subset of HTML designed to make the web fast on mobile devices. AMP is also a strategy to counter an existential threat to Google: the mobile web is in trouble and if the mobile web is in trouble then Google is in trouble.

In the interview Richard says (approximately):

The alternative [to a strong vibrant community around AMP] is devastating. We don’t want to see a decline in the viability of the mobile web. We don’t want to see poor experiences on the mobile web propel users into proprietary platforms.

This point, or something very like it, is repeated many times during the interview. With ad blocker usage on the rise there’s a palpable sense of urgency to do something. So Google stepped up and took leadership in creating AMP when no one else was doing anything that aligned with the principles of the free and open web.

The irony for Google is that advertising helped break the web. We have fouled our own nest.

Why now? Web pages are routinely between 2MB and 10 MB in size for only 80K worth of content. The blimpification of web pages comes from two general sources: beautification and advertising. Lots of code and media are used to make the experience of content more compelling. Lots of code and media are used in advertising.

The result: web pages have become very very slow. And a slow web is a dead web, especially in the parts of the world without fast or cheap mobile networks, which is much of the world. For many of these people the Internet consists of their social network, not the World Wide Web, and that’s not a good outcome for lots of people, including Google. So AMP wants to make people fall in love with the web again by speeding it up using a simple, cachable, and open format.

Does AMP work? Pinterest found AMP pages load four times faster and use eight times less data than traditional mobile-optimized pages. So, yes.

Is AMP being adopted? Seems like it.  Some of those on board are: WordPress, Nuzzle, LinkedIn, Twitter. Fox News, The WSJ, The NYT, Huffington Post, BuzzFeed, The Washington Post, BBC, The Economist, FT, Vox Media, LINE, Viber, and Tango, comScore, Chartbeat, Google Analytics, Parse.ly, Network18, and many more. Content publishers clearly see value in the survival of the web. Developers like AMP too. There are over 4500 developers on the AMP GitHub project.

When will AMP start? Google will reportedly send traffic to AMP pages in Google Search starting in late February, 2016.

Will Google advantage AMP in search results? Not directly says Google, but since faster sites rank better, AMP will implicitly rank higher compared to heavier weight content. We may have a two tiered web: the fast AMP based web and the slow bloated traditional web. Non AMP pages can still be made fast of course, but all of human history argues against it.

The AMP talk featured a well balanced panel representing a wide variety of interests. Leo Laporte, famous host and founder of TWiT, represents the small content publisher. He views AMP with a generally positive yet skeptical eye. AMP is open source, but it is still controlled by Google, so is the web still the open web? Jeff Jarvis is a journalism professor and a long time innovative thinker on how journalism can stay alive in the modern era. Jeff helped inspire the idea of AMP and sees AMP as a way publishers can distribute content to users on whatever form of media users are consuming. Kevin Marks is as good a representative for the free and open web as you could ask for. Matt Cutts as a very early employee at Google is of course pro Google, but he’s also represents an engineering perspective. Richard Gingras is the driving force behind AMP at Google. He’s also a compelling evangelist for AMP and the need for a true new Web 2.0.

Here’s a gloss of the discussion. I’m not attributing who said what, just the outstanding points that help reveal AMP’s vision for the future of the open web:

Origin Story

Click to read more ...

Tuesday
Oct082013

F1 and Spanner Holistically Compared

This aricle, F1: A Distributed SQL Database That Scales by Srihari Srinivasan, is republished with permission from a blog you really should follow: Systems We Make - Curating Complex Distributed Systems.

With both the F1 and Spanner papers out its now possible to understand their interplay a bit holistically. So lets start by revisiting the key goals of both systems.

Key Goals of F1′s design
  • System must be able to scale up by adding resources
  • Ability to re-shard and rebalance data without application changes
  • ACID consistency for transactions
  • Full SQL support, support for indexes
Spanner’s objectives

Click to read more ...

Monday
Oct222012

Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale

A lot of people seem to passionately dislike the term NewSQL, or pretty much any newly coined term for that matter, but after watching Alex Lloyd, Senior Staff Software Engineer Google, give a great talk on Building Spanner, that’s the term that fits Spanner best.

Spanner wraps the SQL + transaction model of OldSQL around the reworked bones of a globally distributed NoSQL system. That seems NewSQL to me.

As Spanner is a not so distant cousin of BigTable, the NoSQL component should be no surprise. Spanner is charged with spanning millions of machines inside any number of geographically distributed datacenters. What is surprising is how OldSQL has been embraced. In an earlier 2011 talk given by Alex at the HotStorage conference, the reason for embracing OldSQL was the desire to make it easier and faster for programmers to build applications. The main ideas will seem quite familiar:

  • There’s a false dichotomy between little complicated databases and huge, scalable, simple ones. We can have features and scale them too.
  • Complexity is conserved, it goes somewhere, so if it’s not in the database it's pushed to developers.
  • Push complexity down the stack so developers can concentrate on building features, not databases, not infrastructure.
  • Keys for creating a fast-moving app team: ACID transactions; global Serializability; code a 1-step transaction, not 10-step workflows; write queries instead of code loops; joins; no user defined conflict resolution functions; standardized sync; pay as you go, get what you pay for predictable performance.

Spanner did not start out with the goal of becoming a NewSQL star. Spanner started as a BigTable clone, with a distributed file system metaphor. Then Spanner evolved into a global ProtocolBuf container. Eventually Spanner was pushed by internal Google customers to become more relational and application programmer friendly.

Apparently the use of Dremel inside Google had shown developers it was possible to have OLAP with SQL at scale and they wanted that same ease of use and time to market for their OLTP apps. It seems Google has a lot of applications to get out the door and programmers didn’t like dealing with real-world complexities of producing reliable products on top of an eventually consistent system. 

The trick was in figuring out how to make SQL work at truly huge scales. As an indicator of how deep we are still in the empirical phase of programming, that process has taken even Google over five years of development effort. Alex said the real work has actually been in building a complex reliable distributed systems. That’s the hard part to get correct.  

With all the talk about atomic clocks, etc., you might get the impression that there’s magic in the system. That you can make huge cross table, cross datacenter transactions on millions of records with no penalty. That is not true. Spanner is an OLTP system. It uses a two phase commit, so long and large updates will still lock and block, programmers are still on the hook to get in and get out. The idea is these restrictions are worth the programmer productivity and any bottlenecks that do arise can be dealt with on case by case basis. From the talk I get the feeling over time, specialized application domains like pub-sub, will be brought within Spanner's domain. While the transaction side may be conventional, except for all the global repartitioning magic happening transparently under the covers, their timestamp approach to transactions does have a lot of cool capabilities on the read path.

As an illustration of the difficulties of scaling to a large number of replicas per Paxos group, Alex turned to a hydrology metaphor:

You could use a Spanner partition as a strongly ordered pub-sub scheme where you have read-only replicas all over the place of some partition and you are trying to use it to distribute some data in an ordered way to a lot of different datacenters. This creates different challenges. What if you are out of bandwidth to some subset of those datacenters? You don’t want data buffered in the leader too long. If you spill it to disk you don’t want to incur the seek penalty when bandwidth becomes available. It becomes like hydrology. You have all this data going to different places at different times and you want to keep all the flows moving smoothly under changing conditions. Smoothly means fewer server restarts, means better latency tail, means better programming model.

This was perhaps my favorite part of the talk. I just love the image of data flowing like water drops through millions of machines and networks, temporarily pooling in caverns of memory and disk, always splitting, always recombining, always in flux, always making progress, part of a great always flowing data cycle that never loses a drop. Just wonderful.

If you have an opportunity to watch the video I highly recommend that you do, it is really good, there’s very little fluff at all. The section on the use of clocks in distributed transactions is particularly well done. But, in case you are short on time, here’s a gloss of the talk:

Click to read more ...

Monday
Sep242012

Google Spanner's Most Surprising Revelation: NoSQL is Out and NewSQL is In

Google recently released a paper on Spanner, their planet enveloping tool for organizing the world’s monetizable information. Reading the Spanner paper I felt it had that chiseled in stone feel that all of Google’s best papers have. An instant classic. Jeff Dean foreshadowed Spanner’s humungousness as early as 2009.  Now Spanner seems fully online, just waiting to handle “millions of machines across hundreds of datacenters and trillions of database rows.” Wow.

The Wise have yet to weigh in on Spanner en masse. I look forward to more insightful commentary. There’s a lot to make sense of. What struck me most in the paper was a deeply buried section essentially describing Google’s motivation for shifting away from NoSQL and to NewSQL. The money quote:

We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.

This reads as ironic given Bigtable helped kickstart the NoSQL/eventual consistency/key-value revolution.

We see most of the criticisms leveled against NoSQL turned out to be problems for Google too. Only Google solved the problems in a typically Googlish way, through the fruitful melding of advanced theory and technology. The result: programmers get the real transactions, schemas, and query languages many crave along with the scalability and high availability they require.

The full quote:

Click to read more ...