advertise
Friday
Jan202017

Stuff The Internet Says On Scalability For January 20th, 2017

Hey, it's HighScalability time:

 

Absolutely. Do we agree that the cerebellum is amazingly beautiful? (@PeppeGanga)

If you like this sort of Stuff then please support me on Patreon.

  • 900 GB: data stolen in Cellebrite hack; 99.24%: users identified by cross-browser fingerprinting; 72%: intend to migrate to a hybrid cloud; 90%: Google & Facebook ad traffic is useless; 5.2 terabytes per second: data from Australian Square Kilometre Array Pathfinder; 10 billion: searches on DuckDuckGo in 2016; $330m: Amazon's loss on Alexa; 

  • Quotable Quotes:
    • @brucel: Breaking: Programmer accused of writing unreadable code refuses to comment.
    • @asymco: Remember Android first? App Annie believes the Apple’s App Store produced about twice as much revenue as Google Play
    • @bridgetkromhout: Describing your old-timer ranting as "greybeard" just makes me want to fight you with sed & awk at twenty paces. Be there tomorrow at dawn.
    • @StevenShorrock: Root Cause Analysis is: * Acceptable for simple systems * Inappropriate for complicated systems * Ludicrous for complex systems
    • @swardley: Five years ago Amazon was worth about half of Walmart, today Walmart is worth about half of Amazon.
    • Eric Raymond: In practice, I found Rust painful to the point of unusability. The learning curve was far worse than I expected; it took me those four days of struggling with inadequate documentation to write 67 lines of wrapper code for the server.
    • @swardley: past history shows many major players won't announce they're getting into the battle until some time after war has ended
    • @benthompson: Apple wasn't billed as phone maker / Amazon wasn't billed as infrastructure provider / FB wasn't billed as portal / Snapchat wasn't billed as TV
    • Jessitron: the biggest consideration in choosing whether to use libraries or services for distribution of effort / modularization is that choice of who decides when it deploys. Who controls which code is in production at a given time.
    • Hi Ben: The disruption of TV will follow a similar path: a different category will provide better live sports, better story-telling, or better escapism. Said category will steal attention, and when TV no longer commands enough attention of enough people, the entire edifice will collapse. Suddenly.
    • @leonidasfromxiv: I also don't understand why people compare Go with Rust. If you need a GC-less programming language: Rust; if you need a board game: Go.
    • Carlo Rovelli: The world isn’t just a mass of colliding atoms; it is also a web of correlations between sets of atoms, a network of reciprocal physical information between physical systems.
    • Chris Dixon: In the beginning, hardware-focused companies make gadgets with ever increasing laundry lists of features. Then a company with strong software expertise (often a new market entrant) comes along that replaces these feature-packed gadgets with full-fledged computers. 
    • Animats: The real question is "what do we do with a lot of CPUs without shared memory?" Such hardware has been built many times - Thinking Machines, Ncube, the PS2's Cell - and has not been too useful for general purpose computing.
    • @taavet: Very unfortunate that incumbents see tech only as a way to cut costs. Versus seeing tech to offer much better products.
    • NelsonMinar: This is what security looks like when your threat model is well funded government agencies.
    • Don Norman: The solution requires a different approach to the design of automation: collaboration. Instead of automating what can be automated, leaving the rest to the driver, we must develop collaborative systems so that the driver is continually involved in giving high-level guidance, thereby always staying active, always being in the loop. 
    • Thomas Frey: It took 50 years for the world to install the first million industrial robots. The next million will take only eight. Will this cause more jobs or few jobs in the future? I'm not convinced we know the answer.
    • @jtauber: "Every shot in Piper is composed of millions of grains of sand, each one of them around 5000 polygons."
    • rackforms: my point is the current situation, basically 2 companies controlling so much traffic, seems, well, bad for small business in this country. I value what they bring to the table and fully understand why they're so popular. But is things keep on this way where does that lead the guys like me? Is this just the way it has to be? Is the dream of the open Internet already dead?
    • @sheeshee: I think I know why it's called "DevOps" - "DevOops" was too obvious... ;)
    • greenspot: The open solution to a faster mobile web would have been so easy: Just penalize large and slow web pages without defining a dedicated mobile specification. That's it. This wasn't done in the past, slow pages outperformed fast ones on the SERPs because of some weird Google voodoo ranking, heck sometimes even desktop sites outperformed responsive ones on smartphones. If they had just tweaked these odd ranking rules in way that speed and size got more impact on the overall ranking there wouldn't have been any reason for AMP—the market would have regulated itself.
    • Juergen Schmidhuber: General purpose quantum computation won’t work (my prediction of 15 years ago is still standing). Related: The universe is deterministic, and the most efficient program that computes its entire history is short and fast, which means there is little room for true randomness, which is very expensive to compute. What looks random must be pseudorandom, like the decimal expansion of Pi, which is computable by a short program. Many physicists disagree, but Einstein was right: no dice. There is no physical evidence to the contrary

  • RethinkDB is shutting down and here's the post-portem. Lessons: the database market is like Mad Max fighting in the Thunderdome; it's better to optimize for useless microbenchmarks than it is to be good; optimism isn't a strategy.

  • Apple isn't alone in using custom hardware to thwart nation state level attackers. Google Infrastructure Security Design Overview. Good overview at Google reveals its servers all contain custom security silicon. Google designs "custom chips, including a hardware security chip that is currently being deployed on both servers and peripherals. These chips allow us to securely identify and authenticate legitimate Google devices at the hardware level."  Google encrypts data before it is written to disk, to make it harder for malicious disk firmware to access data. Google uses automated and manual code review techniques. Google uses automated software and code reviews to detect bugs in software its developers write. Google scans user-installed apps, downloads, browser extensions, and content browsed from the web for suitability on corp clients. Google uses a custom version of the KVMhypervisor. Good discussion on HackerNews, where a lot of the comments are on how Google needs this level sophistication to evade the prying eyes of governments.

  • What happens when you embed machine learning into a DBMS in order to continuously optimise its runtime performance? You get Self-driving database management systems. Humans suck at tuning databases so this is just one more job AIs will eventually toss into the dust bin of history. TensorFlow was integrated inside Peleton training two RNNs on 52 million queries from one month of traffic for a popular site. Does it help?: early results are promising: (1) RNNs accurately predict the expected arrival rate of queries. (2) hardware-accelerated training has a minor impact on the DBMS’s CPU and memory resources, and (3) the system deploys actions without slowing down the application. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Tuesday
Jan172017

Sponsored Post: Contentful, Stream, Loupe, New York Times, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp

Who's Hiring?

  • Contentful is looking for a JavaScript BackEnd Engineer to join our team in their mission of getting new users - professional developers - started on our platform within the shortest time possible. We are a fun and diverse family of over 100 people from 35 nations with offices in Berlin and San Francisco, backed by top VCs (Benchmark, Trinity, Balderton, Point Nine), growing at an amazing pace. We are working on a content management developer platform that enables web and mobile developers to manage, integrate, and deliver digital content to any kind of device or service that can connect to an API. See job description.

  • The New York Times is looking for a Software Engineer for its Delivery/Site Reliability Engineering team. You will also be a part of a team responsible for building the tools that ensure that the various systems at The New York Times continue to operate in a reliable and efficient manner. Some of the tech we use: Go, Ruby, Bash, AWS, GCP, Terraform, Packer, Docker, Kubernetes, Vault, Consul, Jenkins, Drone. Please send resumes to: technicaljobs@nytimes.com

Fun and Informative Events

  • Your event here!

Cool Products and Services

  • Build, scale and personalize your news feeds and activity streams with getstream.io. Try the API now in this 5 minute interactive tutorial. Stream is free up to 3 million feed updates so it's easy to get started. Client libraries are available for Node, Ruby, Python, PHP, Go, Java and .NET. Stream is currently also hiring Devops and Python/Go developers in Amsterdam. More than 400 companies rely on Stream for their production feed infrastructure, this includes apps with 30 million users. With your help we'd like to ad a few zeros to that number. Check out the job opening on AngelList.

  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Click to read more ...

Tuesday
Jan172017

Wouldn't it be nice if everyone knew a little queuing theory?

After many days of rain one lane of this two lane road collapsed into the canyon. It's been out for a month and it will be many more months before it will be fixed. Thanks to Google maps way too many drivers take this once sleepy local road. 

How do you think drivers go through this chokepoint? 

 

 

One hundred experience points to you if you answered one at a time.

One at a time! Through a half-duplex pipe following a first in first out discipline takes forever!

Yes, there is a stop sign. And people default to this mode because it appeals to our innate sense of fairness. What could be fairer than alternating one at a time?

The problem is it's stupid.

While waiting, stewing, growing angrier, I often think if people just knew a little queueing theory we could all be on our way a lot faster.

We can't make the pipe full duplex, so that's out. Let's assume there's no priority involved, vehicles are roughly the same size and take roughly the same time to transit the network. Then what do you do?

Why can't people figure out its faster to drive through in batches? If we went in groups of say, three, the throughput would be much higher. And when one side's queue depth grows larger because people are driving to or from work that side's batch size should increase. 

Since this condition will last a long time we have a possibility to learn because the same people take this road all the time. So what happens if you try to change the culture by showing people what a batch is by driving right behind someone as they take their turn?

You got it. Honking. There's a simple heuristic, a deeply held ethic against line cutting, so people honk, flip you off, and generally make heir displeasure known.

It's your classic battle of reason versus norms. The smart thing is the thing we can't do by our very natures. So we all just keep doing the dumb thing.

 

Friday
Jan132017

Stuff The Internet Says On Scalability For January 13th, 2017

Hey, it's HighScalability time:

 

So you think you're early to market! The Man Who Invented VR Goggles 50 Years Too Soon

If you like this sort of Stuff then please support me on Patreon.

  • 99.9: Percent PCs cheaper than in 1980; 300x20 miles: California megaflood; 7.5 million: articles published on Medium; 1 million: Amazon paid eBook downloads per day; 121: pages on P vs. NP; 79%: Americans use Facebook; 1,600: SpaceX satellites to fund a city on Mars; 

  • Quotable Quotes:
    • @GossiTheDog: How corporate security works: A) buy a firewall B) add a rule allowing all traffic C) the end How corporate security works:A) buy a firewall B) add a rule allowing all traffic C) the end
    • @caitie: Distributed Systems PSA: your regular reminder that the operational cost of a system should be included & considered when designing a system
    • @jimpjorps: 1998: the internet means you can "telecommute" to a tech job from anywhere on Earth 2017: everyone works in the same one square mile of SF
    • Jessi Hempel: [re: BitTorrent] Perhaps the lesson here is that sometimes technologies are not products. And they’re not companies. They’re just damn good technologies.
    • giltene: My new pet peeve: "how to make X faster: do less of X" recommendations.
    • peterwwillis: It used to be you had to actually break into a system to exfiltrate all its data. Now you just make an HTTP query.
    • Laralyn McWillams: Identify problems but focus on solutions. If you become more about problems than solutions, that negativity infects your work, your team, and how you think about your career.
    • Chris Fox: Apple is 100% a boutique retailer, meaning that a human chooses which books to promote. Without that, there was no organic discovery tool where readers could find your book.
    • vytah: In fact, the 1986 [Chernobyl] disaster happened because the engineers decided to get rid of safeguards and run tests.
    • Eric Elliott: Breaking into a user’s top 5 apps is like getting struck by lightning or winning the lottery. Don’t bank on it.
    • Peter: I say the super-intelligent aliens will be powered by hyper-computation, a technology that makes our concept of computation look like counting on your fingers; and they’ll have not only qualia, but hyper-qualia, experiential phenomenologica whose awesomeness we cannot even speak of.
    • SEJeff: LVS is pretty much the undisputed king for serious business load balancing. I've heard (anecdotally) that Uber uses gorb[1] and google has released seesaw, which are both fancy wrappers ontop of LVS for load balancing.
    • k__: I have the feeling this is haunting my life. Jobs, relationships, everything. When I got something, it didn't feel that hard to get it. When I try to get something it feels impossible.
    • Nelson Elhage: One of my favorite concepts when thinking about instrumenting a system to understand its overall performance and capacity is what I call “time utilization”. By this I mean: If you look at the behavior of a thread over some window of time, what fraction of its time is spent in each “kind” of work that it does?
    • Bart Sano (Google): I can say that we are committed to the choice of these different architectures, including X86 – and that includes AMD – as well as Power and ARM. The principle that we are investing in heavily is that competition breeds innovation, 
    • aaron-lebo: This is a larger issue with developer burnout I suspect. You master one thing and there's someone standing on the corner saying..."well, actually, I've got something better" and there's a very real anxiety in that evaluation process. Does object-oriented programming suck? Are functional languages the future? Do you really want an SPA? Should you replace your C codebase with Rust... or Go? Is Bitcoin worth getting in on? etc etc
    • StorageMojo: [re: Violin’s bankruptcy] The race is not always to the swift, nor riches to the wise. By starting with software, other companies built an early lead, and now have the money and time to optimize hardware for flash.
    • nocarrier: [Why no datacenters in India?] Cost was a smaller factor than politics; the Indian government wanted the private keys for our certs in order to let FB put a POP there. That was an absolute dealbreaker, so we served India from Singapore and other POPs in nearby countries.
    • RDX: So that original post, although long and full of real examples, was not about Javascript fatigue really. Its change fatigue. Let’s be clear, if you’re picking something new, you’re making a conscious choice to grow up with it.
    • @jamesurquhart: Amazing that emergent tech that’ll revolutionize software dev is already almost a commodity utility service. #streaming #serverless #events

  • The Ethics of Autonomous Cars. The obvious revenue model is highest bidder lives. During the first few milliseconds of a crash response a real-time bidding session is created and the lowest bidder assumes the risk. That at least captures the zeitgeist of the times.

  • First Go. Now poker. DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker. Thank the force humans are still unbeatable at Sabacc. 

  • Medium may be the first YA (Young Adult, think Hunger Games) style publishing outlet. YA is often written in first-person present. It's a good way to fake authenticity. Traditional publications use third-person past tense, but that's not what works best on Medium. What I learned from analyzing the top 252 Medium stories of 2016: The words “you” and “I” were by far the most common, which suggests that addressing the reader directly as an individual person is a better writing strategy than writing in third person.

  • Ben Kehoe says AWS Step Functions is not the cheap, high-scale state machines using an event-driven paradigm he has been looking for. FaaS is stateless, and AWS Step Functions provides state as-a-Service: at $0.025 per 1,000 executions, it’s 125 times more expensive per invocation than Lambda; it’s not going to be cost-effective to replace existing roll-your-own Lambda solutions; the default throttling limit for a state machine is two executions per second...it’s not built to handle massively scaled but transient event scheduling.

  • Ransomware has shifted to being a reproducible strategy. @SteveD3Since I fist covered the MongoDB hacking on Jan 3, the number of compromised DBs has surpassed 32,000. Now possibly Elasticsearch. Anything you can find basically with Shodan. Which is why we now have @GossiTheDog: Found out today firms have started doing legal contracts which specifically rule out liability if they get hit by ransomware, naming it.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Friday
Jan062017

Stuff The Internet Says On Scalability For January 6th, 2017

Hey, it's HighScalability time:

 

Hot rods in space. The Smith Cloud plummets towards our galaxy at nearly 700,000 mph. Vroom!

If you like this sort of Stuff then please support me on Patreon.

  • 3 of top 5: Stackoverflow questions are about Git; 3,000: four-passenger cars could serve 98 percent of NYC taxi demand; 44%: US population lives within 20 miles of Amazon fulfillment center; 72%: Amazon customers shopped using mobile device; 110%: increase in industrial control system attacks; 455: Number of scripted television series aired this year; $28.5 billion/yr: App downloads on iOS;

  • Quotable Quotes:
    • @ValaAfshar: Number of robots working in Amazon warehouses: 2016: 45,000 / 2015: 30,000 2014: 15,000 / 2013: 1,000 — @JonErlichman
    • @jason_kint: updated duopoly #s. new IAB data came out yesterday. easy to run vs earnings for goog and fb, it's evident everyone else is zero sum game. 
    • rb2k_: I also haven't seen one [company in Germany] that isn't riddled with MBA grads that mainly push Jira tickets around.
    • Joe McCann: The best software developers I know are always hacking over the holidays. True story.
    • @kaffeecoder: Sigh. Async vs blocking protocol is irrelevant. What matters is communicating with other services outside your own req/response cycle.
    • Eric Jang: It's not a coincidence that Nvidia, the literal arms-dealer of deep learning, has had a good year in the stock market.
    • @markimbriaco: Just read a comment that said "Any good codebase has every part perfectly isolated". Oh, to be young and optimistic about software again.
    • @swardley: Asked "What do I think is the biggest impact AI would have?" ... hmmm, the largest erosion of social mobility in human history?
    • The Attention Merchants: It is therefore more effective for the State to intervene before options are seen to exist. This creates less friction with the State but requires a larger effort: total attention control.
    • StorageMojo: The cloud’s collateral damage to the legacy IT vendors continues to spread. A few billion here and a few billion there, and pretty soon you’re talking real money.
    • Janakiram MSV: The key takeaway is that Amazon wants enterprises to consume EC2 while it is pushing startups and developers towards Lambda. This move from Amazon will fuel the growth of serverless computing in the industry. 
    • Maxime Chevalier-Boisvert: Edsger Dijkstra famously said, “The question of whether machines can think is about as relevant as the question of whether submarines can swim.”
    • @karlseguin: Microservices without asynchronous messaging (queues) is actually a monolith with really slow and error prone method invocation.
    • AshleysBrain: We've been using WebRTC Datachannels for multiplayer gaming in the browser in our game editor Construct 2 (www.scirra.com) for a couple of years now. Generally they work great! However the main problem we have is switching tab suspends the game, which if you're acting as the host, freezes the game for everybody. This is really inconvenient. 
    • @lstoll: 2017: Year of the return of three tier architecture.
    • @tealtan: “I will never make a racial profiling database!” *continues working on social networks, analytics, ad tech*
    • @abt_programming: Inverse bus factor: "how many developers have to be hit by a bus before a project starts to proceed smoothly?” - @gasproni
    • M.G. Siegler: The numbers speak for themselves. 2 billion words written on Medium in the last year. 7.5 million posts during that time. 60 million monthly readers now. Pageviews galore. So step 2 is simply to slap some banner ads on the site, while step 3 is to profit, right?
    • snarf21: Writing software is hard but to me the hardest part is always taking a random abstract concept from someone's mind (or worse, several people) and converting that into something "real" in a fixed timeline and budget. There will have to be lots of tradeoffs and miscues by definition. We are always making something that doesn't already exist, it is creation and creation is hard.
    • @Pinboard: Who could have foreseen the always-on home microphone might be of interest to the cops?
    • @ThePracticalDev: I heard a rumor that Santa moved over to AWS this year. Big if true.
    • Drew Purves~ “intelligence” extends beyond brains; something as simple as self-replicating RNA exhibit intelligent behavior at the evolutionary scale. The natural world is fractal, cyclic, and fuzzy...in a biosphere, every organism is a resource to another organism. That is, learning and adaptation of each organism is not independent of other organisms
    • @meatcomputer: System clocks are always accurate and increase monotonically. Timestamps from remote machines are reliable
    • pjmlp: Everything on web development feels like an hack.
    • @kelseyhightower: In my opinion Serverless does not mean FaaS. I consider any platform that hides the management of servers from the user to be Serverless.
    • Amit: one of the lessons I learned from this journey was that the tutorials work best when I've needed that technology for a real project
    • @Carnage4Life: Snapchat' copied all the worst parts of Apple's culture & seen success. More copycats to come
    • ch: So all that's missing with the decentralized web is a centralized service to aggregate the decentralized streams?
    • @mathiasverraes: "Separation of intent and implementation" is probably a much more useful programming principle than all of SOLID combined.
    • doh: We moved back and forth between AWS and GCE (based on who gave us free credits). Once we ran out, we chose GCE and never regretted it. GCE has many quirks, for instance the inconsistency between API and the UI, it misses the richness of the services offered by AWS but everything GCE does offer is just faster, more stable and much more consistent.
    • Exponential Laws: We have argued that exponential growth would not have succeeded without sustained exponential growth at three levels of the computing ecosystem—chip, system, and adopting community. Growth (progress) feeds on itself up to the inflection point.

  • Measuring a gnat's eyebrow at a billion miles. Ivan Linscott tells the thrilling story behind the development of the New Horizons probe to Pluto. a16z Podcast: New Year, New Horizons — Pluto!  Completing the probe was a close thing. Finding enough plutonium to power system almost didn't happen. Enough wasn't found so the probe had a much lower power budget than originally spec'ed, which caused the communication system to use one FPGA instead of two. You have to use radiation hardened parts. The chips sit right next to a pile of plutonium pumping out gamma rays and neutrons. The FPGA's had a capacity of a million gates, were hardened by design, and had triple redundancy. Each gate in the array is implemented in threes. They are voted in pairs. If three agree then fine. If two agree that's the value used. They fit all the code with 5 gates margin. They also had a hero's journey sourcing a high precision oscillator. And then the frightening story of when the watchdog timer timedout and put the probe in safe mode. It turned out the JPEG compression algorithm took too long to compress an image of Pluto and that caused the timeout to fire. The reason is one of those crazy testing stories. When this feature was tested the picture of the sky was darker so it took less time to compress!

  • The impulse for folks at Twitter to delay Trump's tweets and insider trade on that information must be overwhelming.

  • 33C3 (Chaos Computer Congress) videos are now available. Great overview by Chris Hager. Lots of interesting talks. You might like: Dissecting modern (3G/4G) cellular modems; Edible Soft Robotics - An exploration of candy as an engineered material; Software Defined Emissions - A hacker’s review of Dieselgate; Rebel Cities - Towards A Global Network Of Neighbourhoods And Cities Rejecting Surveillance.

  • A compelling break down of the DNC phishing attack. Making everything viewable through a generic UI and everything programmable through a scriptable API has interesting consequences @pwnallthethings: Could have hacked? Sure. Did hack? No. Let me go through why not..The hackers weren't hacking one-by-one; so URL contraction wasn't done manually. It was done via the Bitly API...Why did the hackers include this info? Same reason they contracted links via API. Because they're not hacking 1-by-1. Are hacking at scale...When hackers hack at scale, they reuse infrastructure. They make mistakes. This isn't unusual. You can piece the bits together.

  • In the game of data you want to be at the top of the data gravity well. When your are down well nothing escapes without great cost. AWS Snowball

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Jan042017

How is Writing Lord of the Rings Like Writing Software?

 

Have you ever read a book and wondered how any human could have written something so brilliant? For me it was Lord of the Rings. I despaired that in a hundred lifetimes I could never write a book so rich, so deep, so beautiful. Since then I've learned a few things about about how LoTr was created that has made me reconsider. The kick-in-the-head is that it's the same lesson I learned long ago about writing software.

I've always been amazed how a program can start as a single source file and after years of continued effort turn into a working system that is so large no human can come close to understanding it. If you had tried from the start to build the system you ended up with you would have never ever got there. That's just not how it works. Software is path dependent.

I've experienced this growth from a single cell to a Cambrian explosion many times so I know it's a thing. What I hadn't considered is how it's also a thing for writing books too. 

Creating good software is a process of evolution through the mechanism of constant iteration for the purpose of survival. This is also how good stories are made. What both have in common is creation through thought.

Thought needs an object to contemplate. Each intermediate state of a project is that object. By linking together a series of state inspired creative jumps something wonderful can be created that may contain only the faintest trace of its beginnings.

Here's how Lord of the Rings is a good example of this process...

LoTr started out as a sequel to the Hobbit. Tolkien's publisher wanted to cash in on the success of the Hobbit with a sequel. And The Silmarillion wasn't it. So Tolkien began with the intention of writing a sequel to the Hobbit. It was horrible. 

The first book title was The Return of the Shadow, not Lord of the Rings. The prose was still written for children. Frodo was called Bingo. Strider was a hobbit called Trotter. Bilbo planned to get married. And the ring was still just a ring. The story had no clear motive or direction. "What more can hobbits do?" asked Tolkein. The ideas of the Hobbit were played out. The LoTr we know and love was far far away. 

In draft after draft Tolkien probed and searched for a direction to take the story. It all turned when Tolkien wrote the scene with the Black Rider. At first the Black Rider was really a White Rider. It was Gandalf coming to talk to Bingo. But then some insight happened. A dizzying array of neurons conspired and the color of the horse changed from white to black and Gandalf transformed into a man wrapped in a great black cloak and hood. A new framework was creating itself.

How do we know? Fortunately, from Christopher Tolkien, we have the history of changes his father made to LoTr. Dr. Corey Olsen in a great series—The Return of the Shadow, Session 1 - In Search of a Sequel—walks us through what is essentially the git log for LoTr. Imagine a kind of Papers We Love treatment from a true Tolkien expert and gifted analyst. It's magical.

We see idea after idea worked through in the text. It was a continuous process of refactoring and new development. Some ideas were kept from beginning to end. Many were cut. Many morphed. Much dialogue was kept, but was given to different characters to say in different circumstances. 

The whole feeling was very much like seeing software being developed, only the result wasn't a working app, but one of the most influential stories of all time.

The lesson for me was a deep and dramatic reconfirmation of an old idea: All successful large systems started as successful small systems.

This applies to us as writers and programmers. It's easy to get down on yourself during the creation process. Neither your story or program has to start out great; greatness is something that evolves.

In this new year, that's the lesson of Lord of the Rings for me.

Tuesday
Jan032017

Sponsored Post: Loupe, New York Times, ScaleArc, Aerospike, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp

Who's Hiring?

  • The New York Times is looking for a Software Engineer for its Delivery/Site Reliability Engineering team. You will also be a part of a team responsible for building the tools that ensure that the various systems at The New York Times continue to operate in a reliable and efficient manner. Some of the tech we use: Go, Ruby, Bash, AWS, GCP, Terraform, Packer, Docker, Kubernetes, Vault, Consul, Jenkins, Drone. Please send resumes to: technicaljobs@nytimes.com

Fun and Informative Events

  • Your event here!

Cool Products and Services

  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex is a SaaS database monitoring product that provides the best way for organizations to improve their database performance, efficiency, and uptime. Currently supporting MySQL, PostgreSQL, Redis, MongoDB, and Amazon Aurora database types, it's a secure, cloud-hosted platform that eliminates businesses' most critical visibility gap. VividCortex uses patented algorithms to analyze and surface relevant insights, so users can proactively fix future performance problems before they impact customers.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Click to read more ...

Monday
Jan022017

Efficient storage: how we went down from 50 PB to 32 PB

As the Russian rouble exchange rate slumped two years ago, it drove us to think of cutting hardware and hosting costs for the Mail.Ru email service. To find ways of saving money, let’s first take a look at what emails consist of.

Indexes and bodies account for only 15% of the storage size, whereas 85% is taken up by files. So, files (that is attachments) are worth exploring in more detail in terms of optimization. At that time, we didn’t have file deduplication in place, but we estimated that it could shrink the total storage size by 36% since many users receive the same messages, such as price lists from online stores or newsletters from social networks containing images and so on. In this article, I’m going to describe how we implemented a deduplication system under the supervision of Albert Galimov.

Click to read more ...

Friday
Dec232016

Stuff The Internet Says On Scalability For December 23rd, 2016

Hey, it's HighScalability time:

 

A wondrous ethereal mix of technology and art. Experience of "VOID"

If you like this sort of Stuff then please support me on Patreon.

  • 2+ billion: Google lines of code distributed over 9+ million source files; $3.6 bn: lower Google taxes using Dutch Sandwich; $14.6 billion: aggregate value of all cryptocurrencies; 2x: graphene-fed silkworms produce silk that conducts electricity; < 100: scientists looking for extraterrestrial life; 48: core Qualcomm server SoC; 455: original TV series in 2016;

  • Quotable Quotes:
    • Ben Thompson~ It's so easy to think of tech with an 80s mindset with all the upstarts. We still glorify people in garages. The garage is gone...Our position in the world is not the scrappy upstart. It is the establishment.
    • The Attention Merchants: True brand advertising is therefore an effort not so much to persuade as to convert. At its most successful, it creates a product cult, whose loyalists cannot be influenced by mere information
    • @seldo: Speed of development always wins. Performance problems will (eventually) get engineered away. This is nearly always how technology changes.
    • @evgenymorozov: How Silicon Valley can support basic income: give everyone a bot farm so that we can make advertising $ from fake traffic to their platforms
    • @avdi: Apple has 33 Github repos and 56 contributors. Microsoft now has ~1,200 repos and 2,893 contributors.
    • Peter Norvig: Understanding the brain is a fascinating problem but I think it’s important to keep it separate from the goal of AI which is solving problems ... If you conflate the two it’s like aiming at two mountain peaks at the same time—you usually end up in the valley between them .... We don’t need to duplicate humans ... We want humans and machines to partner and do something that they cannot do on their own.
    • Brave New Greek: Unbounded anything—whether its queues, message sizes, queries, or traffic—is a resilience engineering anti-pattern. Without explicit limits, things fail in unexpected and unpredictable ways. Remember, the limits exist, they’re just hidden. By making them explicit, we restrict the failure domain giving us more predictability, longer mean time between failures, and shorter mean time to recovery at the cost of more upfront work or slightly more complexity.
    • Naren Shankar (Expanse): Everybody feels like they can look at the show and find parts of themselves in it. When you can give people collective ownership of the creative product you get the best from people. At the end of the day it shows. People work their asses off and accomplish the impossible.
    • Richard Jones: a corollary of Moore’s law (sometimes called Rock’s Law). This states that the capital cost of new generations of semiconductor fabs is also growing exponentially
    • Waterloo: His [Napoleon] strategy was simple. It was to divide his enemies, then pin one down while the other was attacked hard and, like a boxing match, the harder he punched the quicker the result. Then, once one enemy was destroyed, he would turn on the next. The best defense for Napoleon in 1815 was attack, and the obvious enemy to attack was the closest.
    • Daniel Lemire: beyond a certain point, reducing the possibility of a fault becomes tremendously complicated and expensive… and it becomes far more economical to minimize the harm due to expected faults
    • @greglinden: “For some products at Baidu, the main purpose is to acquire data from users, not revenue.” — @stuhlmueller
    • strebler:  Deep Learning has made some very fundamental advances, but that doesn't mean it's going to make money just as magically!
    • sulam: Twitter clearly doesn't have growth magic (or they'd be growing faster) -- but is that an engineer's fault? At the end of the day, any user facing engineering is beholden to the product team. Engineers at Twitter can run experiments, but they can't get those experiments shipped unless a PM is behind it.
    • Gil Tene: The right way to read "99%'ile latency of a" is "1 or a 100 of occurrences of 'a' took longer than this. And we have no idea how long". That is the only information captured by that metric. It can be used to roughly deduce "what is the likelihood that 'a' will take longer than that?". But deducing other stuff from it usually simply doesn't work.
    • @esh: Unheralded tiny features like AWS Lambda inside Kinesis Firehose streams replace infrastructure monstrosities with a few lines of code
    • @postwait: Listening to this twitter caching talk... *so* glad my OS doesn't even contemplate OOMs. How is that shit still in Linux? A literal WTF.
    • SomeStupidPoint: Mostly, it was just a choice to save $1-2k on a laptop (every 1-2 years) and spend the money on cellphone data and lattes.
    • @timbray: Oracle trying to monetize Java... Golang/Rust/Elixir all looking better. Assume all JVM langs are potential targets.
    • Kathryn S. McKinley: In programming languages research, the most revolutionary change on the horizon is probabilistic programming, in which developers produce models that estimate the real world and explicitly reason about uncertainty in data and computations. 
    • cindy sridharan: Four Golden Signals 1) Latency 2) Traffic 3) Errors 4) Saturation
    • @FioraAeterna: as a tech company grows in size, the probability of it developing its own in-house bug tracking system approaches 1
    • The Attention Merchants: In 1928, Paley made a bold offer to the nation’s many independent radio stations. The CBS network would provide any of them all of its sustaining content for free—on the sole condition that they agree to carry the sponsored content as well

  • philips: Essentially I see the world broken down into four potential application types: 1) Stateless applications: trivial to scale at a click of a button with no coordination. These can take advantage of Kubernetes deployments directly and work great behind Kubernetes Services or Ingress Services. 2) Stateful applications: postgres, mysql, etc which generally exist as single processes and persist to disks. These systems generally should be pinned to a single machine and use a single Kubernetes persistent disk. These systems can be served by static configuration of pods, persistent disks, etc or utilize StatefulSets. 3) Static distributed applications: zookeeper, cassandra, etc which are hard to reconfigure at runtime but do replicate data around for data safety. These systems have configuration files that are hard to update consistently and are well-served by StatefulSets. 4) Clustered applications: etcd, redis, prometheus, vitess, rethinkdb, etc are built for dynamic reconfiguration and modern infrastructure where things are often changing. They have APIs to reconfigure members in the cluster and just need glue to be operated natively seemlessly on Kubernetes, and thus the Kubernetes Operator concept

  • Top 5 uses for Redis: content caching; user session store; job & queue management; high speed transactions; notifications.

  • Is machine learning being used in the wild? The answer appears to be yes. Ask HN: Where is AI/ML actually adding value at your company? Many uses you might expect and some unexpected: predicting if a part scanned with an acoustic microscope has internal defects; find duplicate entries in a large, unclean data set; product recommendations; course recommendations; topic detection; pattern clustering; understand the 3D spaces scanned by customers; dynamic selection of throttle threshold; EEG interpretation; predict which end users are likely to churn for our customers; automatic data extraction from web pages; model complex interactions in electrical grids in order to make decisions that improve grid efficiency;sentiment classification; detecting fraud; credit risk modeling; Spend prediction; Loss prediction; Fraud and AML detection; Intrusion detection; Email routing; Bandit testing; Optimizing planning/ task scheduling; Customer segmentation; Face- and document detection; Search/analytics; Chat bots; Topic analysis; Churn detection; phenotype adjudication in electronic health records; asset replacement modeling; lead scoring;  semantic segmentation to identify objects in the users environment to build better recommendation systems and to identify planes (floor, wall, ceiling) to give us better localization of the camera pose for height estimates; classify bittorrent filenames into media classify bittorrent filenames into media categories; predict how effective a given CRISPR target site will be; check volume, average ticket $, credit score and things of that nature to determine the quality and lifetime of a new merchant account; anomaly detection; identify available space in kit from images; optimize email marketing campaigns; investigate & correlate events, initially for security logs; moderate comments; building models of human behavior to provide interactive intelligent agents with a conversational interface; automatically grading kids' essays; Predict probability of car accidents based on the sensors of your smartphone; predict how long JIRA tickets are going to take to resolve; voice keyword recognition; produce digital documents in legal proceedings; PCB autorouting.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge (which means this post has many more items to read so please keep on reading)...

Click to read more ...

Wednesday
Dec212016

Sponsored Post: Loupe, New York Times, ScaleArc, Aerospike, Scalyr, VividCortex, MemSQL, InMemory.Net, Zohocorp

Who's Hiring?

  • The New York Times is looking for a Software Engineer for its Delivery/Site Reliability Engineering team. You will also be a part of a team responsible for building the tools that ensure that the various systems at The New York Times continue to operate in a reliable and efficient manner. Some of the tech we use: Go, Ruby, Bash, AWS, GCP, Terraform, Packer, Docker, Kubernetes, Vault, Consul, Jenkins, Drone. Please send resumes to: technicaljobs@nytimes.com

Fun and Informative Events

  • Your event here!

Cool Products and Services

  • A note for .NET developers: You know the pain of troubleshooting errors with limited time, limited information, and limited tools. Log management, exception tracking, and monitoring solutions can help, but many of them treat the .NET platform as an afterthought. You should learn about Loupe...Loupe is a .NET logging and monitoring solution made for the .NET platform from day one. It helps you find and fix problems fast by tracking performance metrics, capturing errors in your .NET software, identifying which errors are causing the greatest impact, and pinpointing root causes. Learn more and try it free today.

  • ScaleArc's database load balancing software empowers you to “upgrade your apps” to consumer grade – the never down, always fast experience you get on Google or Amazon. Plus you need the ability to scale easily and anywhere. Find out how ScaleArc has helped companies like yours save thousands, even millions of dollars and valuable resources by eliminating downtime and avoiding app changes to scale. 

  • Scalyr is a lightning-fast log management and operational data platform.  It's a tool (actually, multiple tools) that your entire team will love.  Get visibility into your production issues without juggling multiple tabs and different services -- all of your logs, server metrics and alerts are in your browser and at your fingertips. .  Loved and used by teams at Codecademy, ReturnPath, Grab, and InsideSales. Learn more today or see why Scalyr is a great alternative to Splunk.

  • InMemory.Net provides a Dot Net native in memory database for analysing large amounts of data. It runs natively on .Net, and provides a native .Net, COM & ODBC apis for integration. It also has an easy to use language for importing data, and supports standard SQL for querying data. http://InMemory.Net

  • VividCortex measures your database servers’ work (queries), not just global counters. If you’re not monitoring query performance at a deep level, you’re missing opportunities to boost availability, turbocharge performance, ship better code faster, and ultimately delight more customers. VividCortex is a next-generation SaaS platform that helps you find and eliminate database performance problems at scale.

  • MemSQL provides a distributed in-memory database for high value data. It's designed to handle extreme data ingest and store the data for real-time, streaming and historical analysis using SQL. MemSQL also cost effectively supports both application and ad-hoc queries concurrently across all data. Start a free 30 day trial here: http://www.memsql.com/

  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.

  • www.site24x7.com : Monitor End User Experience from a global monitoring network. 

If any of these items interest you there's a full description of each sponsor below...

Click to read more ...