Prismatic Architecture - Using Machine Learning on Social Networks to Figure Out What You Should Read on the Web 
Monday, July 30, 2012 at 8:15AM
HighScalability Team in Example, Machine Learning

This post on Prismatic’s Architecture is adapted from an email conversation with Prismatic programmer Jason Wolfe.

What should you read on the web today? Any thoroughly modern person must solve this dilemma every day, usually using some occult process to divine what’s important in their many feeds: Twitter, RSS, Facebook, Pinterest, G+, email, Techmeme, and an uncountable numbers of other information sources.

Jason Wolfe from Prismatic has generously agreed to describe their thoroughly modern solution for answering the “what to read question” using lots of sexy words like Machine Learning, Social Graphs, BigData, functional programming, and in-memory real-time feed processing. The result is possibly even more occult, but this or something very much like it will be how we meet the challenge of finding interesting topics and stories hidden inside infinitely deep pools of information.

A couple of things stand out about Prismatic. They want you to know that Prismatic is being built by a small team of four computer scientists, three of them very strong young PHDs from Stanford and Berkeley. Half-hazard methods won’t do. They are bringing brain power to solving the information overload problem. But these PHDs are also programmers, working on everything from websites, iOS programming, as well as the sexy BigData/ML backend programming.

One of the things that excited me about Prismatic as an architecture is that a problem that will need to be solved over and over again in the future is applying Machine Learning to great streams of socially mediated information in real-time. Secrecy prevents them saying very much about their Machine Learning tech, but we do get a peak behind the curtain.

As you might expect, they are doing things a little differently. They’ve chosen Clojure, a modern Lisp that compiles to Java bytecode, as their programming language of choice. The idea is to use functional programming to build fine-grained, flexible abstractions that are composed to express problem-specific logic.

One example of functional power is their graph library, which they use all over the place.  For example, a graph of computations can be described to create the equivalent of a low-latency, pipelined set of map-reduce jobs for each user. Another example is the use of subgraphs to compactly describe service configuration in a modular way.

Given this focus on functional elaboration, they avoid large frameworks like Hadoop, going for a smaller, more reliable, easier to debug, easier to extend, and easier to understand codebase.

A criticism of Prismatic’s approach is the long training periods needed to get results. First, they say it doesn’t take that long at all to start getting good content. Second, I would add, start thinking about these types of systems using the Long Sight. ML based recommenders will start being trained from childhood and will stay with you your entire life. A scalable digital analog of your mind will act as both information gatekeeper and wingman.

In a Tech Crunch article Prismatic founder Bradford Cross pithily describes Prismatic as being “built around a complex system that provides large scale, real-time, dynamic personalized re-ranking of information, as well classifying and grouping topics into an ontology.” Now let’s see what that system looks like...

Stats

Platform

Data Storage and IO

Services

Data Ingest - Backend

Onboarding - Backend

API - Client-facing

Other services - Client facing

There are a few other separate client-facing services:

Batch and other services

Graph Library

Machine Learning on Documents and Users

Documents and users are two areas where Prismatic applies ML (machine learning):

ML on Documents

ML on Users

Lessons Learned

Related Articles

Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.