The Faving spam counter-measures. Ironically, deviantART relates a gripping story of how they detected and stopped a deviant user from attacking their servers with an automated faving script which faved every 10 seconds for 24 hours a day. The same spam filter they use on the rest of the site was used. Problem solved. Would like detail on their spam filter though.
Translation Memory. Etsy has a problem. They deploy software continuously, which can be tricky, but is doable. What is far trickier is supporting multiple languages because this implies translations must also be continuous. Usually translations are an arduous process that take a schedule hit. Etsy's clever solution is to integrate translations into the deployment process using Lucene/Solr's MoreLikeThis feature, which suggests possible translations in real-time. Very nice.
Facebook explains how they've made HipHop Virtual Machine dynamically translate PHP code into native machine code. Lots of good details and is well written.
Why wireless mesh networks won’t save us from censorship. Shaddi Hasan harshes the buzz on the utopian vision of a darknet freeing us from a SOPA/RIAA/everything tyranny. The reasons: Management is hard and expensive; Omni-directional antennas suck; Single-radio equipment doesn’t work; multi-radio equipment is very expensive; Your RF tricks won’t help you here; Unplanned mesh networks break routing. My take: what can't be routed around must be crushed.
Graphity: An efficient Graph Model for Retrieving the Top-k News Feeds for users in social networks. René Pickhardt shows us how to make retrieval of social news feeds in social networks very efficient. It is able to dynamically retrieve more than 10’000 temporal ordered news feeds per second in social networks with millions of users like Facebook and Twitter by using graph data bases (like neo4j). His index is O(1) in reading and O(d) in writing. Great discussion of something a lot of people are interested in how to do.
James Hamiton with a nice gloss on Hyder: Transactional Indexed Record Manager for Shared Flash Storage. In the Hyder system, all cores operate on a single shared transaction log. Each core (or thread) processes Optimistic Concurrency Control (OCC) database transactions one at a time.
Is Your Kernel Reading /proc Too Slowly? Mark Seger, author of Collectl, with a detailed analysis on a serious problem for anyone running an HPC cluster, particularly if you worry about system noise and its impact on fine-grained MPI codes: reading from /proc has been measured to be over a factor of 50 reading /proc/stat on a system with 8 sockets and 48 cores. If you love to run top continually in the background you may be in for a shock. And you should be using collectl anyway.
eBay on Rapid Development Setup in Large Environments. Mahesh Somani with a great description of the release process at eBay, which tries to balance agility, code sharing, with a huge code base. While many web properties have gone branchless, eBay uses a more traditional feature branch approach. To handle large projects they: split large projects into several small projects, decouple applications from common code areas, create meta-information (DSL) instead of compiled source code, using source element changes in combination with binary bundles.
For a cloud to really work, everything has to be in software appliances rather than hardware appliances, which requires the ability to effectively scale virtual appliances. Greg Ferro and Iven Pepelnajk both have good articles on how a new startup called Embrane hopes to do this using IP flows. An IP Flow is the the stateful conversation of IP packets from a specific source/destination – not just one IP packet but the whole two way, full duplex, stateful session of TCP or UDP packets that form the Layer 5 session flow. Embrane scales out by managing IP flows and then directing to other appliances, in effect creating what I would call a two tier load balancing
Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.