Friday
Oct062017

Stuff The Internet Says On Scalability For October 6th, 2017

Hey, it's HighScalability time: 

 

LiDAR sees an enchanted world. (Luminar)

 

If you like this sort of Stuff then please support me on Patreon.

 

  • 14TB: Western Digital Hard Drive; 3B: Yahoo's perfidy; ~80%: companies traded on U.S. stock market 1950-2009 were gone by 2009; 21%: conversion increase with AI-enabled site personalisation; $1 billion: US Air Force jets off the cloud; 1 billion: iOS devices in use; 1000x: new DeepMind WaveNet model produces 20 seconds of higher quality audio in 1 second; 96: vCPUs on new GCE machine type, with 624GB of memory; 

  • Quotable Quotes:
    • fusiongyro: The amount of incipient complexity in programming has been growing, not going down. What's more complex, "hello, world" to the console in Python, or "hello world" in a browser with the best and newest web stack? Mobility and microservices create lots of new edge cases and complexity—do non-programmers seem particularly well-equipped to handle edge cases to you? The problem has never really been the syntax—if it were, non-programmers would have made great strides with Applescript and SQL, and we'd all be building PowerBuilder libraries for a living. The problem is that programming requires a mode of thinking which is difficult. Lots of people, even people who do it daily, who are trained to do it and exercise great care and use great tool tools, are not great at it. This is not a syntax problem or a lack of decent libraries problem. We have simple programming languages with huge bodies of libraries. What's hard is the actual programming.
    • @troyhunt: 1 person didn’t patch Struts, got Equifax breached, sold shares & created dodgy search site with bad results. Right?
    • @rob_pike: Once in a while I need to build some large system written in C or C++ and am reminded why we made Go. #golang
    • @adam_chal: Me before #strangeloop: I'm not a real programmer unless I know Haskell Me after #strangeloop: I'm not a real programmer unless I knit
    • Julian Squires: I make a petty point about premature optimization; don't go out and rewrite your switch statements as binary searches by hand; maybe do rewrite your jump tables as switch statements, though.
    • @GossiTheDog: Re this - vuln scanners only find the vuln if you point them at a Struts URL. If you just point them at hostname or IP, it won’t find vuln.
    • @stevesi: Yes very much. Not unlike Wells Fargo trying to find a mid-level manager who signed people up for credit cards independent of metrics/execs.
    • @patio11: We would laugh out of the room a CEO who said "The reason that we didn't file our taxes last year was an employee forgot to buy a stamp."
    • @swardley: In general, the reasons for hybrid cloud have nothing to do with economics & everything to do with executives justifying past purchases 
    • @asymco: Changes in Android propagate to users over six years. iOS propagates in about three months.
    • bb611: It isn't luck [re: Incident: France A388 over Greenland on Sep 30th 2017, fan and engine inlet separated]. It is the result of millions of engineering hours spent on the development of highly reliable and resilient passenger aircraft, an emphasis on public identification and dissemination of design weaknesses, errors, and failures, and an unwavering focus by industry regulators on safety.
    • @mipsytipsy: "I would rather have a system that's 75% 'down' but users are fine, than a system 99.99% 'up' but user experience is impacted." #strangeloop
    • psyc: A huge proportion of the ICOs I investigate turn out to be pure facade. It's amazing to me just how quickly this con was honed and formalized, but I guess people have always been good at aping when it comes to get-rich-quick bandwagons. The standard ICO consists solely of: 1) A slick website. 2) A well-produced video. 3) A whitepaper that discusses trivially standard blockchain features and goals. No differentiation necessary. 4) The appearance that prominent or well-credentialed people are working on the "technology". That's all. The "product" is vapor. The real product is another pump & dump vehicle to satisfy the insatiable demand for pump & dump vehicles. This product is sold to the "investors" during the ICO. Said "investors" are even explicitly awarded more coins for shilling the pump everywhere by creating amateurish articles and YouTube videos.
    • nameless912~ As a developer at a company that's trying to shove Lambda down our throats for EVERYTHING...AWS needs to get better at a few key things before Lambda/serverless become viable enough that I'll actually consider integrating them into my services: 1. Permissions are a nightmare. 2. Networking is equally nightmarish. 3. If the future of compute is serverless, then Lambda, Google Cloud Functions, and whatever half-baked monstrosity Azure has cooked up are going to have to get together and define a common runtime for these environments.
    • @erikstmartin: “OS’s are dinosaurs. Let them rest” - @nicksrockwell #velocityconf
    • @bridgetkromhout: Thought experiment: what if all your systems restart at once? How long does it take you to recover? *Can* you? @whereistanya #velocityconf
    • Eric Hammond: Some services, like API Gateway, are far more complicated, difficult to use, and expensive than I expected before trying. Other services, like Amazon Kinesis Streams, are simpler, cheaper, and far more useful than I expected.
    • nameless912: please, please chop off my hands and pull out my eyeballs if cloud computing becomes yet another workflow engine. I though we killed those off in the 90s.
    • MIT: The proof-of-principle experiment that Neill and Roushan and co have pulled off is to make a chip with nine neighboring loops and show that the superconducting qubits they support can represent 512 numbers simultaneously.
    • @swardley: Equifax: We're a security nightmare! Adobe: Hold my beer Deloitte: Hold my beer Yahoo: Amateurs. Learn from a pro -
    • @somic: seeing more & more indicators these days that devops as a unifying idea is now dead. devops appears now to be ops who can write simple code
    • @slightlylate: So true. I'll trade 10 devs who are high on abstractions and metaprogramming for one who gives a damn about the user.
    • @postwait: I am just a single data point, but I use about 10% of my CS education (CS/MSe/~PhD) daily; about 50% of it monthly. I value it immensely.
    • @mweagle: The Go compiler will likely slow down your first sprint. It will radically improve your marathon performance.
    • There are many more quotes. Click through to the full article to read them. Or not. Up to you.

  • The Coming Software Apocalypse. After all these years it's still strange to see people fall into the "if we only had complete requirements we could finally make reliable systems, what's wrong with these idiots?" tarpit. Requirements are a trap. We went through all of this with waterfall and big design up front. It doesn't work. Requirements are no less complex and undiscoverable than code. Tools are another trap. Tools are code. Tools encode one perspective on a solution space and if there's anything the real world is good at, it's destroying perspective. IMHO, our mostly likely future is to treat programming as an act of computational creativity. Human programmers will work with AIs to co-create software systems. We'll work together to produce better software than a human can on their own or an AI can produce on it's own. We're better together, which is why I'm not afraid AI will replace programmers. Here's an example in music, A.I. Experiments: A.I. Duet, where a computer accompanies a piano player. Here's a better example—Ripples - A piano duet for improvising musician and generative software—where the AI piano player riffs off a human in real-time. You can imagine this is how sofware will be built in the future. Here's a hint at the productivity gain, thought it isn't a complete example, because what I'm talking about doesn't exist yet: @DynamicWebPaige: Blue lines: @Google's old Translate program, 500k lines of stats-focused code. Green: now, 500 lines of @tensorflow. See also, Jeff Dean On Large-Scale Deep Learning At Google and Peter Norvig on Machine Learning Driven Programming: A New Programming For A New World

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Monday
Oct022017

Ripple: The Most (Demonstrably) Scalable Blockchain

 

This is a guest post by Mark Travis, Performance Engineer at Ripple.

Ripple’s XRP Ledger is a blockchain-based payment network that transfers funds between any type of currency within a few seconds with average transaction costs of a fraction of a penny. The core of this peer-to-peer network is an open source C++ application called rippled. Ripple’s goal is to supplant the world’s existing legacy payment networks. As such, scalability is a continuous goal. This document describes how the rippled team has integrated performance engineering into its development processes, and how this has contributed to throughput gains of over 1000%.

Performance engineering practices deliver benefits in addition to measurable performance gains. These include the ability to report on the capabilities of the software so that users can feel confident that their needs will be met by the system. Performance engineering informs capacity planning and optimal configuration of environments to support the application. Many performance problems are caught and addressed before customers notice them. As process automation improves, each change to the software can be quickly assessed for improvement or regression. This methodology also makes better use of developer time by helping choose the most effective tasks for improving performance. Any software project serious about supporting global scale should integrate performance engineering into its development cycle.

Performance Engineering Method

Click to read more ...

Friday
Sep292017

Stuff The Internet Says On Scalability For September 29th, 2017

Hey, it's HighScalability time: 

 

Latency Numbers Every Programmer Should Know plotted over time. Click on and move the slider to see changes. There were a lot more blocks in 1990.

 

If you like this sort of Stuff then please support me on Patreon.

 

  • 1040: undergrads enrolled in Stanford's machine learning class; 39: minutes to travel from New York to Shanghai on Elon's rocket ride; $625,000: in stolen electronic-grade polysilicon; 160: terabits of data per second for Microsoft's new Trans-Atlantic Subsea Cable; 8K: people in Microsoft's AI group; 110%: increase in ICS/SCADA attacks from 2016 to 2017; 2 million: advertisers on Instagram; ~70%: savings using new Spot instance checkpointing; 10,000: nuts a year stored by a fox squirrel; $22.1 billion: IaaS market in 2016; 

  • Quotable Quotes:
    • @patio11: Wife: "Hold hands when crossing the street." *2 year old grabs own hands* "OK Mommy." Me: "Oh you're going to be so good at programming."
    • Charlie Demerjian: Intel’s “new” 8th Gen CPUs are a stopgap OEM placation to cover for a failed process, but they do bring some advances. As SemiAccurate sees it, Intel took .023 steps forward with the hardware and their messaging took three steps back.
    • Richard Dawkins: If AI Ran the World, Maybe it Would Be a Better Place
    • @swardley: No-one should be in any doubt that AWS is gunning for entire software stack (all of it) over next decade. Lambda, one code to rule them all.
    • @mstine: "you are as reliable as the weakest component in your stack...and people are a really weak component." @adrianco #CloudNativeLondon
    • @swardley: The vertical depth play will be found wanting as Amazon uses ecosystems to chew up horizonal components and move up the value chain.
    • @swardley: I assume Goldmans is bricking itself that Amazon might come into its industry and with good reason. The fattened slug wouldn't last long.
    • reacweb: I have a baremetal server and 99% of my admin task is apt-get update, apt-get upgrade. I have a diary where I write all the other admin tasks (the most complex one was configuring apache). When I buy a new server, I reread my notes to do some copy/paste. The freedom of a bare server is priceless ;-)
    • tedu: if you’re going to retry automatically, be damn sure the operation either failed or is idempotent. Or next week you can be the lucky author of the blog post about what happens when your billing database reverts to readonly mode, preventing any transactions from being marked paid, sending the payment service into a loop where it charges customers their monthly bill every 10 minutes for ten hours.
    • Jeff Barr: You can now resume workloads on spot instances and fleets. As long as they checkpoint to disk, workloads that aren’t time sensitive just got a whole heck of a lot cheaper for you.
    • Jamie Condiffe: The experiment uses drones to shuttle parcels of up to 4 pounds from a distribution center to vans—at least, when they’re parked at one of four rendezvous points around the city, anyway. The vans have a special landing zone on their roof, which allows the drone to set down and and drop off its payload. The driver of the vehicle is then tasked with actually delivering the package to a customer.
    • @codinghorror: Absolutely monstrous http://browserbench.org/Speedometer/  numbers for iPhone 8. That is over 1.4x the iPhone 7
    • @EdSwArchitect: "Logstash is not going to go to 500,000 log entries / second". Kafka does. #StrataData - Streams & Containers talk
    • endymi0n: Managing financial matters on AWS is such a royal PITA, I'm so glad we switched 90% of our stack to Google.
    • shub: Good luck finding anything public about graph processing on a dataset too large to fit on a single machine. I can launch an AWS instance with 128 cores and 4 TB RAM--how many triples is too many for that monster? Tens of billions? Hundreds of billions?
    • @pzfreo: @adrianco #CloudNativeLondon By the time you've decided on your container orch'n system, you could have the whole thing done in #Serverless
    • @danielbryantuk: "The easy win to get started with chaos engineering is to run a game day with what you currently have" @adrianco #CloudNativeLondon
    • @Tulio_de_Souza: CloudNative principle: "pay for what you used last month and not what what you guess you will need next year" #cloudnativelondon @adrianco
    • Yasmin Anwar: fox squirrels apparently organize their stashes of nuts by variety, quality and possibly even preference
    • @kopertop: Unfortunately @randybias, the most important thing @awscloud did cant be replicated by *any* Software. Marketing, innovation, and support.
    • Sarty: I guess the argument [for using Filecoin] is that I should trust a single behemoth like Amazon less than I should trust an arbitrary number of nameless, faceless on-the-cheap suppliers on the premise that a nebulous algorithm that I (the average user) don't totally understand will stochastically cause those suppliers to lose their contract if they lose too much data, but that's okay because a different nebulous algorithm I don't totally understand can reconstruct the data as long as most of those nameless, faceless suppliers are on the up-and-up, all on the fly and completely decentralized? Yeah, sure, sign me up. What could possibly go wrong?
    • danudey: The biggest change for us was SSDs coming down in price. Whereas before I might need four read slaves to ensure that at peak load I'm handling all my transactions within X ms, now I can guarantee it on one server. More importantly, in our industry where we're vastly more write-constrained than read-constrained and we're faced with e.g. MySQL not being able to easily spread writes over multiple servers simply, the appeal of something like MongoDB or Cassandra with built-in sharding and rebalancing to spread out both reads and writes sounds very appealing. And again, I can move from a giant complicated, expensive, heat-producing multi-disk raid10 to a pair of SSDs in RAID1 (or better) and easily meet my iops requirements. Without being able to upgrade to SSDs I think we would have been looking into other systems like Cassandra a lot sooner, but right now we can pretty easily throw some money at the problem and it goes away.
    • LibertarianLlama: Perhaps millennia from now we will be building Dyson Spheres around stars to use the energy to mine bitcoin.
    • happymellon: I work with satellite imagery processing which is quite large in it raw data format, and after a decade we are not dealing with petabytes of active data, hundreds of gigs for a full earth coverage. Before that I have held positions in finance, dealing with realtime transaction processing. We did not work in petabytes. If you are working in petabytes you are storing crap in your production database, and 99% of that data is wasted.
    • @karpathy: Kaggle competitions need some kind of complexity/compute penalty. I imagine I must be at least the millionth person who has said this.
    • unclebucknasty: I think I'm one of those graybeards. I see it in so many things tech. It's a pattern, and once you've seen it repeat a half-dozen times and also gain a depth of experience over that time, you can actually recognize when something represents genuine progress vs yet another passing fad. Spoiler alert: those that are most rabidly promoted are often the latter. But, if you try to raise the point in the midst of the latest fad, you generally get shouted down. So, you wait until the less-jaded figure it out...again. It was plainly obvious for NoSQl, just as it now is for SPAs (or at least our current approach). Don't believe me? Wait 5 years.
    • CBobRobison: Outsourcing Attacks are prevented by implementing Proof-of-Spacetime (PoST). With PoST a node is required to put a deposit down based on the amount of storage it's providing. It then has to continuously hash the stored data against public nonces and occasionally upload it's solution to the network to prove the data was there the whole time. If it doesn't actually have the data, it doesn't hash correctly, and it fails to provide PoST. As a negative consequences, the node forfeits its deposit.
    • Antonio Garcia-Martinez: Zuckerberg’s proposes, shockingly, a solution that involves total transparency. Per his video, Facebook pages will now show each and every post, including dark ones (!), that they’ve published in whatever form, either organic or paid. It’s not entirely clear if Zuckerberg intends this for any type of ad or just those from political campaigns, but it’s mindboggling either way. Given how Facebook currently works, it would mean that a visitor to a candidate’s page—the Trump campaign, for instance, once ran 175,000 variations on its ads in a single day—would see an almost endless series of similar content.

    Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

    Click to read more ...

Wednesday
Sep272017

Aligning Your Team around Microservices When There's No Precise Definition

This is a guest post by Roger Jin, Software Architect at ButterCMS and co-author of Microservices for Startups.

For a profession that stresses the importance of naming things well, we've done ourselves a disservice with microservices. The problem is that that there is nothing inherently "micro" about microservices. Some can be small, but size is relative and there's no standard of unit of measure across organizations. A "small" service at one company might be one million lines of code while far less at another.

Some argue that microservices aren’t a new thing at all and rather a rebranding of Service Oriented Architectures, while others advocate for viewing microservices as an implementation of SOA similar to how Scrum is an implementation of Agile.

How do you align your team when no precise definitions of microservices exist? The most important thing when talking about microservices on a team is to ensure that you are grounded in a common starting point.

But ambiguous definitions don’t help with this. It would be like trying to put Agile into practice without context for what you are trying to achieve, or an understanding of precise methodologies like Scrum.

Finding common ground 

Click to read more ...

Tuesday
Sep262017

Sponsored Post: Loupe, Etleap, Aerospike, Stream, Scalyr, VividCortex, Domino Data Lab, MemSQL, InMemory.Net, Zohocorp

Who's Hiring? 

  • Advertise your job here! 

Fun and Informative Events

  • October 10 Live Webinar. Fast & Frictionless - The Decision Engine for Seamless Digital Business. Join us for a live webinar on Tuesday, October 10 at 11:00 am Pacific Time featuring guest speakers Michele Goetz, Principal Analyst at Forrester Research, and Matthias Baumhof, VP Worldwide Engineering at ThreatMetrix®. A positive customer experience is required for successful enterprise digital transformation. Digital businesses depend on speed and efficiency to drive operational decisions. Making faster, accurate, and real-time customer trust decisions removes friction and delivers superior business outcomes. In this session, you’ll learn: How risk-based authentication leveraging digital identities is key to empowering customer transactions; How real-time customer trust decisions can reduce fraud and improve customer satisfaction; How a high performance Hybrid Memory Architecture (HMA) database helps continuously evaluate across a multitude of factors to drive decisioning at the lowest operational cost. Register now.

  • Advertise your event here!

Cool Products and Services

  • .NET developers dealing with Errors in Production: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Managers want to know what’s wrong right away, users don’t want to provide log data, and you spend more time gathering information than you do fixing the problem. To fix all that, Loupe was built specifically as a .NET logging and monitoring solution. Loupe notifies you about any errors and tells you all the information you need to fix them. It tracks performance metrics, identifies which errors cause the greatest impact, and pinpoints the root causes. Learn more and try it free today.

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • What engineering and IT leaders need to know about data science. As data science becomes more mature within an organization, you may be pulled into leading, enabling, and collaborating with data science teams. While there are similarities between data science and software engineering, well intentioned engineering leaders may make assumptions about data science that lead to avoidable conflict and unproductive workflows. Read the full guide to data science for Engineering and IT leaders.

  • Etleap is a Redshift ETL tool that lets you bring all the data everyone wants into Redshift. It's easy enough for analysts to add and manage data connections on their own, without inundating IT/Engineering with requests for help. It takes just minutes to add new connections such as MySQL, Salesforce, S3, and many others, then you can "set it and forget it." Learn more about Redshift ETL with Etleap.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL envisions a world of adaptable databases and flexible data workloads - your data anywhere in real time. Today, global enterprises use MemSQL as a real-time data warehouse to cost-effectively ingest data and produce industry-leading time to insight. MemSQL works in any cloud, on-premises, or as a managed service. Start a free 30 day trial here: memsql.com/download/.

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Click to read more ...

Friday
Sep222017

Stuff The Internet Says On Scalability For September 22nd, 2017

Hey, it's HighScalability time: 

 

Ever feel like howling at the universe? (Greg Rakozy)

 

If you like this sort of Stuff then please support me on Patreon.

 

  • 10 billion: API calls made every second in Google datacenters; $767,758,000,000: collected by Apple on iPhones sold to the end of June; 20: watts of power consumed by human brain, autonomous vehicles peak at 3000 watts; 59%: drop in leads using AMP; 27%: success rate of AIs guessing passwords; 2.8 kilometers: distance devices running on almost zero power can xmit using backscatter; 96: age at which Lotfi Zadeh, inventor of Fuzzy Logic, passed away; 35%: store time series data in a RDBMS; $1.1 billion: Google's spend on self-driving tech;  $5.1 billion: Slack valuation; 15%: bugs reduced by strong typing; ~1 ft: new smartphone GPS accuracy; 

  • Quotable Quotes:
    • Napoleon: [Sir Hudson Lowe] was a man wanting in education and judgment. He was a stupid man, he knew nothing at all of the world, and like all men who knew nothing of the world, he was suspicious and jealous.
    • Rich Werner: Data center operations, to me, is 362 days of boredom. And then you get these hurricanes coming through, and it’s three days of pulling your hair out.
    • @pacoid: @kenneth0stanley #TheAIConf "We're not interested in complexity for its own sake" -- ref. operational closure in second-order cybernetics
    • Animats: Much as I like Rust, I have to agree. When you have to get it done, use Go. When you want to explore advanced experimental programming constructs, use Rust. The Go guys knew when to stop. Arguably they stopped too early, before generics or parameterized types, and "interface[]" is used too much. But that doesn't seem to be a big overhead item. Rust has the complexity of C++ plus additional complexity from the functional community. Plus lots of fancy template cruft.
    • @mims: People who say data is the new oil are wrong. Non-volatile flash memory is the new oil.
    • @cmeik: As former member of a NoSQL startup, "Safety, reliability [as well as pay up front, save later] doesn't sell" sure sounds familiar.
    • @PaulDJohnston: ... simply because we've had 20 years of "servers" and 10 years of "instances" and now "containers"... they are all the same...
    • Venkatraman Ramakrishnan~ Inventions in one discipline can build on—and spur—basic research in many others, often unwittingly. It’s a virtuous cycle, and scientists take joy in exploiting all of it. Scientists are very promiscuous and the good ones are the most promiscuous.
    • @rightfold: Most of programmers learn early to avoid premature optimization. Next step: teach people about premature distributed computing.
    • @indievc: “Not heroine…Not cocaine….But Venture Capital is the drug flowing through the veins of most Silicon Valley startups”
    • @skamille: Editing is a different profession than writing, but code review and programming are both performed by the same people
    • XNormal: Mainframes had the reputation of being very expensive. But this is misleading. In terms of cost per processing task they were much more efficient than mini and microcomputers.
    • @swardley: "Culture eats strategy for breakfast" is code for "I don't know what the heck I'm talking about but this meme sounds smart"
    • James Glanz: Yet another data center, west of Houston, was so well prepared for the storm — with backup generators, bunks and showers — that employees’ displaced family members took up residence and United States marshals used it as a headquarters until the weather passed.
    • @pacoid: Neuroevolution talk @kenneth0stanley #theaiconf -- "exact gradient is not always the best move"; evolution uses fitness fn, not objective fn
    • @GossiTheDog: Holy crap. CCleaner trojan 1st stage payload is on 700k PCs, with these orgs targeted for 2nd stage (successfully) 
    • Scott Aaronson: In the meantime, the survival of the human race might hinge on people’s ability to understand much smaller numbers than 10^122: for example, a billion, a trillion, and other numbers that characterize the exponential growth of our civilization and the limits that we’re now running up against.
    • Timothy Morgan: Compute and networking could hit the Moore’s Law wall at about the same time, and that is precisely what we expect.
    • ralmidani: As I've said before, universal web components are a pipe dream. Developers disagree on even the most trivial things, like the best way to parse a query string. What makes anyone think those disagreements will magically disappear once web components become a standard?
    • Mallory Locklear: [Virus] Jumps between species have driven most major evolutionary innovations in the viruses. Meanwhile, co-divergence has been less common than was assumed and has mostly caused incremental changes.
    • @SwiftOnSecurity: Linux is like if the creator of git wrote an operating system.
    • A troll ate the rest of the quotes. Luckily, if luck it be, a copy was made and you can read all of them by clicking through to the full post.

    Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

    Click to read more ...

Monday
Sep182017

Evolution of data structures in Yandex.Metrica

Yandex.Metrica is the world's second largest web analytics system. Metrica takes in a stream of data representing events that took place on sites or on apps. Our task is to process this data and present it in an analyzable form.


Processing the data in itself is not a problem. The real difficulty lies in trying to determine what form the processed results should be saved in so that they are easy to work with. During the development process, we had to completely change our approach to data storage organization several times. We started with MyISAM tables, then used LSM-trees and eventually came up with column-oriented database, ClickHouse. In this article I'll explain what led us to settle on this last option.

Yandex.Metrica was launched in 2008 and has now been running for more than nine years. Every time we changed our approach to data storage in the past it was because a particular solution proved inefficient: either there was insufficient performance reserve, or the solution was unreliable, or it used too many computational resources, or it just did not allow us to implement what we needed to.

The old Yandex.Metrica for websites has more than 40 "fixed" report types (for example, the visitor geography report), several in-page analytics tools (like click maps), Webvisor (which lets you study individual user actions in great detail), as well as the separate report constructor.

With the new Metrica and Appmetrica, you can customize every report instead of dealing with "fixed" types. You can add new dimensions (for example, in a search term report you can break data down further by landing page), segment and compare (between, let's say, traffic sources for all visitors vs. visitors from San Francisco), change your set of metrics, etc. The new system, therefore, demands a completely different approach to data storage than what we used earlier.

MyISAM

Click to read more ...

Friday
Sep152017

Stuff The Internet Says On Scalability For September 15th, 2017

Hey, it's HighScalability time: 

 

Earth received Cassini’s final signal at 7:55am ET. Let's bid a fond farewell. After a 13-year tour of duty, job well done!

 

If you like this sort of Stuff then please support me on Patreon.

 

  • 12.9 million: DynamoDB requests per second on Prime Day; 4 billion: transistors on Apple's A11 Bionic chip; 4x: extreme weather events since 1970; 51: qubit device; 50%: Messenger.com converted to Reason56.6 million: US cord cutters; 5000: bikes abandoned at Burning Man; 500 million: yearly visitors to Apple stores; 30 min: time to send one HD color image from Mars to Earth; 

  • Quoteable Quotes:
    • @randyshoup: Interesting idea of a *Negative* MTTR by @adrianco: notice something is going to fail and proactively fix it before it breaks!
    • @rob_pike: "The Equifax executives who let my data be stolen will probably suffer fewer consequences than I will for an overdue library book." @nytimes
    • @avantgame: on weaponized social media: "We’re in an information war with Russia. It’s time we started acting like it."
    • Jamie Dimon: It's [Bitcoin] worse than tulip bulbs. It won't end well. Someone is going to get killed
    • @manisha72617183: First they tell you that Scrum is not a magic bullet.Then they spend the rest of the time saying how it’s the best thing since sliced bread🙄
    • yogthos: My team has been using Clojure for 7 years now, and we're very happy with it. It's still a pleasure to work with, and the stability of the language has been really welcome.
    • @GossiTheDog: Another way of looking at Equifax is they did an incredible job of keeping infrastructure that size with that much legacy secure for so long
    • API Evangelist: when it comes to the shear volume, and regular drumbeat of serverless stories Microsoft is keeping pace. After watching several months of sustained storytelling, it looks like they could even pass up Amazon in the near future.
    • amelius: Well, I hear a lot of people complaining that the results on DuckDuckGo are still worse than on Google, even though both search-engines produce results within a second. And these are people that really want to quit using Google for privacy reasons. I never hear people complaining that a search is slow. So I do think that search-quality is where the competition is happening.
    • @SwiftOnSecurity: How you think multinational hypercorps get hacked: NSA 0days on the black market How multinational hypercorps get hacked: admin/admin
    • m-masa: Snapchat to me is sharing your shaky drunken escapades at 3AM with your friends to let them know you made it home and survived the night. Instagram seems more like an endless observation of copy-and-paste, superficial things and people and places. It's evolved more into a (usually inaccurate) portrayal of status than anything else.
    • Dmitri Zimine: When Serverless replaces micro-services, it is not going to be free lunch either. We are paying by introducing more complexity, now for the benefit of massive cost savings.
    • Kris De Decker: In London, a solar panel produces 65 times less energy on a heavy overcast day in December at 10 am than on a sunny day in June at noon
    • nostrademons: The real interesting work in search is in ranking functions, and this is where nobody comes close to Google. Some of this, as other commenters note, is because Google has more data than anyone else. Some of it is just because there've been more man-hours poured into it. IMHO, it's pretty doubtful that an open-source project could attract that sort of focused knowledge-work (trust me; it's pretty laborious) when Google will pay half a mil per year for skilled information-retrieval Ph.Ds.
    • rkangel: Up to now this is all classic Eve - betrayal by people you trust. The postscript is less nice though: gigx in a moment of anger asked in in game chat for real life contact details for TheJudge so that he could 'cut off his hands'. This is obviously not OK and CCP banned gigx permanently. This has the side-effect of putting the final nail in the CO2 coffin.
    • Rick Altherr: At one point I did that calculation and I was seeing one hard drive die every five minutes. 
    • EliE: the fundamental reason why ransomware is so successful, and here to stay, is that people simply don’t backup their data.
    • EliE: no matter how many times the bitcoins are moved, ultimately they must be cashed out at exchange points. So we just need to keep tracing movements until we reach a cash-out wallet.
    • @radjanirad: Just a few hours ago, Cassini received the command to turn off the RADAR instrument - for the last time. :( #cassini
    • @postwait: Most monitoring "innovations" have been mostly aesthetic, but their marketing is deafening and drowns out real innovation. #UphillBattle
    • pab: I have two years experience pair programming, and to quote asthasr, I found it an absolute slog.
    • @matthew_d_green: I have an idea. Let's combine all the hard parts of cryptography with all the asshole parts of the finance industry.
    • Pete Saia: It’s important to understand that it isn’t all or nothing. Serverless is in our future, but it isn’t our exclusive future.
    • Errata Security: The 9,000 devices were split almost evenly between Apple and Android. Almost all of the Apple devices randomized their addresses. About a third of the Android devices randomized. (This assumes Android only randomizes the final 3 bytes of the address, and that Apple randomizes all 6 bytes -- my assumption may be wrong).
    • David Rosenthal: Today's eclipse records would be on the Web, not paper or bone. Will astronomers 3200 or even only 580 years from now be able to use them?
    • Peter Zaitsev: To be competitive with non-open-source cloud deployment options, open source databases need to invest in “ease-of-use.” There is no tolerance for complexity in many development teams as we move to “ops-less” deployment models.
    • Jeremy Hsu: the advantage of the flip-flop qubit comes from inducing an electric dipole—separation of positive and negative charges—by pulling the electron a little bit away from the nucleus of the phosphorus atoms (which are themselves embedded in silicon). That electric dipole enables the spin-based silicon qubits to remain entangled together over longer distances and able to influence one another through quantum physics.
    • Cory Doctorow: All these forms of cheating treat the owner of the device as an enemy of the company that made or sold it, to be thwarted, tricked, or forced into con­ducting their affairs in the best interest of the com­pany’s shareholders. To do this, they run programs and processes that attempt to hide themselves and their nature from their owners, and proxies for their owners (like reviewers and researchers).
    • Jonathan Golden: How do you know, though, when to pull resources away from other growth initiatives to address these edge cases? My rule of thumb was when a problem was occurring at least 50 times a day, it was time to solve it more holistically. At a time when we were growing anywhere from 300%–600% per year — and edge cases were growing at least as fast — that’s when the potential explosion of problems proliferated.
    • A Mind at Play: Well, the good of this command is that if you’re in a loop you can have this command in that loop and every time it goes around the loop it will put a pulse in and you will hear a frequency equal to how long it takes to go around that loop. And then you can put another one in some bigger loop and so on. And so you’ll hear all of this coming on and you’ll hear this “boo boo boo boo boo boo,” and his concept was that you would soon learn to listen to that and know whether when it got hung up in a loop or something else or what it was doing all this time, which he’d never been able to tell before.

    Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

    Click to read more ...

Wednesday
Sep132017

Have you noticed there's a lot more collaboration going on these days? Why?

 

Thanks to zero marginal cost digital production methods, we're seeing content markets—for the first time—develop in conditions free from supply and price constraints.

In the process we've learned something: consumers have an unquenchable thirst for new content; content creators are willing to oblige with an equally prodigious stream of new content; platforms that best control access to the customer are the biggest winners; the reward for content creators varies drastically by medium and platform.

For consumers, life is now a streaming fixed priced buffet of unending variety and diversion.

For producers, the changes have been terrifying. Old modes have crumbled, leaving everyone scrambling to figure out what, if anything, comes next.

To adapt, content creators are learning to exploit capture loops, bundling, and collaboration to extract money from a digital economy that has collectively decided it rarely wants to pay artists directly for their content anymore.

The most highly evolved form of digital content platform strategies can be found in the book market. Why? Because Amazon.

Kindle Unlimited is the Clear Platform Winner

Click to read more ...

Tuesday
Sep122017

Sponsored Post: Close.io, Loupe, Etleap, Aerospike, Stream, Scalyr, VividCortex, Domino Data Lab, MemSQL, InMemory.Net, Zohocorp

Who's Hiring? 

  • Close.io is a ~25 person fully remote team that is profitable and building a product our customers love! We’re hiring Senior Backend Developers to join our team. Our backend tech stack currently includes Python (Flask, Gunicorn, TaskTiger), Elasticsearch, MongoDB, Postgres, and Redis running in Docker/Kubernetes on AWS. Learn more and apply here!

  • Advertise your job here! 

Fun and Informative Events

  • October 10 Live Webinar. Fast & Frictionless - The Decision Engine for Seamless Digital Business. Join us for a live webinar on Tuesday, October 10 at 11:00 am Pacific Time featuring guest speakers Michele Goetz, Principal Analyst at Forrester Research, and Matthias Baumhof, VP Worldwide Engineering at ThreatMetrix®. A positive customer experience is required for successful enterprise digital transformation. Digital businesses depend on speed and efficiency to drive operational decisions. Making faster, accurate, and real-time customer trust decisions removes friction and delivers superior business outcomes. In this session, you’ll learn: How risk-based authentication leveraging digital identities is key to empowering customer transactions; How real-time customer trust decisions can reduce fraud and improve customer satisfaction; How a high performance Hybrid Memory Architecture (HMA) database helps continuously evaluate across a multitude of factors to drive decisioning at the lowest operational cost. Register now.

  • Advertise your event here!

Cool Products and Services

  • .NET developers dealing with Errors in Production: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Managers want to know what’s wrong right away, users don’t want to provide log data, and you spend more time gathering information than you do fixing the problem. To fix all that, Loupe was built specifically as a .NET logging and monitoring solution. Loupe notifies you about any errors and tells you all the information you need to fix them. It tracks performance metrics, identifies which errors cause the greatest impact, and pinpoints the root causes. Learn more and try it free today.

  • Enterprise-Grade Database Architecture. The speed and enormous scale of today’s real-time, mission critical applications has exposed gaps in legacy database technologies. Read Building Enterprise-Grade Database Architecture for Mission-Critical, Real-Time Applications to learn: Challenges of supporting digital business applications or Systems of Engagement; Shortcomings of conventional databases; The emergence of enterprise-grade NoSQL databases; Use cases in financial services, AdTech, e-Commerce, online gaming & betting, payments & fraud, and telco; How Aerospike’s NoSQL database solution provides predictable performance, high availability and low total cost of ownership (TCO)

  • What engineering and IT leaders need to know about data science. As data science becomes more mature within an organization, you may be pulled into leading, enabling, and collaborating with data science teams. While there are similarities between data science and software engineering, well intentioned engineering leaders may make assumptions about data science that lead to avoidable conflict and unproductive workflows. Read the full guide to data science for Engineering and IT leaders.

  • Etleap is a Redshift ETL tool that lets you bring all the data everyone wants into Redshift. It's easy enough for analysts to add and manage data connections on their own, without inundating IT/Engineering with requests for help. It takes just minutes to add new connections such as MySQL, Salesforce, S3, and many others, then you can "set it and forget it." Learn more about Redshift ETL with Etleap.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL envisions a world of adaptable databases and flexible data workloads - your data anywhere in real time. Today, global enterprises use MemSQL as a real-time data warehouse to cost-effectively ingest data and produce industry-leading time to insight. MemSQL works in any cloud, on-premises, or as a managed service. Start a free 30 day trial here: memsql.com/download/.

  • Advertise your product or service here!

If you are interested in a sponsored post for an event, job, or product, please contact us for more information.

Click to read more ...