« 4 New Podcasts for Scalable Summertime Reading | Main | Strategy: Consider When a Service Starts Billing in Your Algorithm Cost »
Thursday
Jul222010

How can we spark the movement of research out of the Ivory Tower and into production?

Over the years I've read a lot of research papers looking for better ways of doing things. Sometimes I find ideas I can use, but more often than not I come up empty. The problem is there are very few good papers. And by good I mean: can a reasonably intelligent person read a paper and turn it into something useful? 

Now, clearly I'm not an academic and clearly I'm no genius, I'm just an everyday programmer searching for leverage, and as a common specimen of the species I've often thought how much better our industry would be if we could simply move research from academia into production with some sort of self-conscious professionalism. Currently the process is horribly hit or miss. And this problem extends equally to companies with research divisions that often do very little to help front-line developers succeed. 

How many ideas break out of academia into industry in computer science? We have many brilliant examples: encryption, microprocessors, compression, transactions, distributed file systems, vector clocks, gossip protocols, MapReduce, search, algorithms, networking, communication, and on ad infinitum. For every Google that breaks out there must be thousands of other potential ideas that go nowhere, even in this hyper-VC aware age. 

We need to do a better job of using research. There's a lot out there in the literature that we could be making use of right now, but it's closed off from the people, i.e., developers, who can turn this research into gold. And it's largely closed off because researchers don't consider developers as an audience and they don't write their papers with the intention of being applied. Change the publication process and we can save the cheerleader and save the world.

I'm bringing this up now because:

  • It's been an issue I've thought about for a long while, but I never thought there was much that could be done about it.
  • I attended a talk by Daria Mochly-Rosen, PhD at Stanford titled: How Are Drugs Developed and Why Is It So Expensive? Her talk made me think maybe there's something we in computer science could do about this problem because they are already doing something about a similar problem in the drug discovery space.

The Gap Between Research and Development is More Like an Ocean

Dr. Mochly-Rosen is Senior Associate Dean for Research and George D. Smith Professor in Translational Medicine, School of Medicine. Her talk was fascinating. She walked us through the incredible gauntlet every drug must run before being approved for human use. She also talked about a project called SPARK:

SPARK was created to advance promising discoveries from the research laboratory into medical practice. SPARK’s mission is to facilitate collaboration between academia and industry by providing Stanford researchers with the support, knowledge, and partnerships that can bridge the gap between discovery and its application into important new medical therapies. SPARK was created to advance promising discoveries from the research laboratory into medical practice. SPARK’s mission is to facilitate collaboration between academia and industry   Stanford researchers with the support, knowledge,  that can bridge the gap between discovery and its application into important new medical therapies.

I thought the idea of SPARK might also work for the computer industry.

My thoughts on the state of research versus industry were reinforced after listening to Barbara Liskov during her Turing Award Speech. In her speech she recounted what to her was an amusing little anecdote of how she discovered that one of the key rules of OO modeling, the Liskov Substitution Principle (LSP), came to be named after her. If you haven't read a design patterns book in a while, LSP says: objects of subtypes should behave like those of supertypes if used via supertype methods. This is a very famous design rule in OO.

I was absolutely floored while listening to learn that she had no idea that people in industry had taken some of her ideas and elevated them to a principle. She says "what happened is that 5 or 10 years later she discovered that this idea got picked up by a community on the internet and they were all discussing what they referred to as the Liskov Substitution Principle. They call it LSP. That's just sort of amazing."

While amusing to her, it signaled a very serious problem to me. It made me realize how distant academia is from industry. How could she possibly not know? How could she possibly think so little of this practical little gem of an idea? It's hard for me to imagine, but it illustrates that vast gulf between industry and academia, the same sort of divide seen in the drug industry.

During her talk Dr. Mochly-Rosen shared how difficult it was to get research acted on by industry. As an example she used one of her own discoveries, a drug to reduce the damage from a heart attack by up to 70%. Her discovery, like so many others, was published in a prestigious journal, was impeccably well researched, and showed a lot of potential in a sadly growing market, yet drug companies were not beating down her door. She eventually ended up starting her own company to get the ball rolling. But this experience taught her something was broken in the process. And it wasn't what you might think, it wasn't simply that drug companies are evil.

Only about 20 new drugs are approved for human use every year. Only 5 of those drugs are new compounds, the other 15 are called "me too" drugs, they are slightly different versions of existing drugs. Take Viagra, for example (insert joke here). The discovery of Viagra would be one of the five new drugs. The other five versions of Viagra-like drugs we see on the market would be me-too drugs. They don't exactly advance the state-of-the-art, but they are safer to develop.

It can take 10+ years and a billion+ dollars to develop a new drug. It takes numerous trials to determine if a drug is stable, safe for humans, and to then determine the effective dosage. You know when on a label it says take 2 pills every 24 hours? How do they know the dosage? Someone has to figure that out. And the only way to figure that out is with a very wide range of trials on a very wide variety of humans. This is extremely expensive and risky. Very few drugs make it through the entire process.

The Gap isn't Caused by Evilness, it's More of an Awareness Problem on Both Sides

When the drug companies weren't beating down their door it wasn't necessarily because they were evil, it's because developing a drug is a risky and expensive business. Why should they take a risk on novel compounds they are unsure about when a miss is so costly? When drug companies market the hell out of an approved drug it's because they finally got an approved drug, it makes sense to make the most money they can from it given the investment costs. I know, I think this is a screwy situation too, but given market forces it's rational.

Dr. Mochly-Rosen's insight was to realize there was a disconnect between researchers like herself and industry, a situation very similar to that of computer science. To illustrate this point she told a story about a talk she gave at a  Cardiology Convention about a discovery she made on how to control the rate at which a heart beats. An amazing and novel discovery. But the cardiologists didn't care and she was puzzled as to why. A friend told her that cardiologists simply don't have a heart rate problem. What they care about are heart attacks. So she figured out a way to make her research be something cardiologists would be interested in. And once you have a market you can make a case for developing a drug.

That's what the SPARK program is about, figuring out how researchers can reframe and present their research in a way that is more easily consumed by potential productizers. Typically the goals and motivations of researchers and productizers are diametrically apposed. Researchers only get rewarded for publishing something new and moving on to the next new thing. Productizers have to stick with something for long periods of time and ride out every vicissitude. These are not the same people. Dr. Mochly-Rosen thinks researchers can be a great at generating new drug ideas, companies can develop them, the problem is how to get them to speak the same language and agree on some basic goals and protocols?

In the drug development world there's a clear process that can be mastered: Trial design, Intellectual property law, Consent forms Regulatory submissions, Regulatory documents and amendments, Case Report Forms, Coordinating with other departments and specialties, Confidentiality, Protocol development and deviations, Preparing adverse event documentation, Data Safety Monitoring Boards.

As a researcher once you understand the process it will be more straightforward for you to position your research in a way that's easier for industry to understand, see the value of, and to take the risk of financing. SPARK tries to fill the void between laboratory work and the delivery of products, increasing the value and readiness of commercial interventions.  

Do we Bridge the Gap with Bureaucracy or Something Else?

What would a similar process look like for computer science? I don't know, I was hoping others might have some ideas. I don't think it needs to be an official agency. That would just end up a big bloated bureaucracy with little to show for it.

This is the internet age. What we need is to make all the pieces of the puzzle easier to find, easier to understand, and easier to put together. If we can make the pieces more linkable the system will self-catalyze.

If I am, for example, looking for a sharded, replicated, secure storage system and I find a research project to that effect, it would help me immensely if those researchers thought it a priority to make it easy for developers like me to use it. Research is so often just to publish a paper, not to use, which means the systems are dead once the paper is published. This isn't a sustainable model. How do we make a more sustainable research model?

A Few Simple Ideas

Here are a few simple ideas from my own experience. Hopefully you have some better ideas.

  • Tear down that paywall! How many times have you researched a topic only to find the object of your desire hidden behind a ridiculously priced paywall? It happens to me all the time. How ironic that an industry built on openness and sharing is the most secretive of them all. The first step to making research more useful is to make it so people can actually read the research. This is the 21st century. Journals. Really?
  • Solve problems people in the industry are interested in, will be interested, or should be interested in. I know, research is about the next big problems and isn't about being practical today. Understood. But is there a middle ground where you can figure out a way to help people today while you work on tomorrow? Working on distributed systems in the abstract is cool, working on a MapReduce that works in the field is also pretty cool.
  • Write papers with future developers and practical implementations in mind. Try to explain ideas thoroughly. Use examples. Examples are a gift. Explore implications. Think about potential gotchas. Try to go beyond just formulas. Use plain language. A dense writing style may seem to make a paper more significant, but they are far less useful in practice. Amazon's Dynamo paper is a good example, but even that could be clearer.
  • Make buildable software that's available on a service like GitHub. I know it's just research, so there's no requirement to be formal, but please take a little extra time so other people can use the code. Add some documentation, build instructions, tests, examples. Make it modular. Turn it into something like a real open source project. If you start out with the idea of making a project from the start then the chances are far less that you will be too embarrassed to show the code in the future.
  • Record talks on your research. Take a look at the talks Google produces on their research or the talks Apple makes on their frameworks. This can go a long way to make something useful. Everyone could do a lot better at this. Of the thousand conferences being run, how many are on-line? It's almost criminal that this information is being lost.
  • Teach Better basics. Schools need to teach better. There are a quite a few classes on-line now, but often they are way too high level. Often classes start out with the equivalent of teaching calculus by writing a big integral sign on the chalkboard. It wouldn't hurt to go slower and actually explain things better. A Teching Company for computer science might be a good idea.
  • Be available via discussion lists.  It would help immensely if developers could ask questions and learn more about the ideas behind the research. Often the hardest task for a developer is to build up the expertise needed to grok a new line of research. Some help in building up that background would be priceless.
  • Change the game mechanics. The game mechanics for research are all weighted towards publishing. Is there some way to rework the reward mechanism so the ease of use is also rewarded?
  • Develop business savvy. There should be something about how to take research and turn it into a startup. That might make the practical applications aspect of research more attractive as it makes the link between how research is written to potential rewards. The easier it is for VCs and potential founders to clearly see how an idea can be applied, the easier it is for research to be turned into a product.

Anything else that might help?

Related Articles

Reader Comments (21)

The other issue is that a lot of excellent new computer science is intentionally unpublished and obfuscated, being kept as a trade secret. Many organizations that do fairly amazing theoretical computer science research engage in this practice because it is a market advantage; I've worked with R&D groups that were many years ahead of the published state-of-the-art for their software applications.

It is a reaction to there being little in the way of effective IP for theoretical computer science even though some theoretical computer science R&D groups will spend considerable amounts of time and money developing new algorithms for whatever it is they are working on. Instead of publishing an algorithm for, say, massively distributable join operations, they find a way to commercialize that capability without putting a hint of the solution in public literature. This greatly limits the opportunity for cross-fertilization but in my experience much of the really interesting research is increasingly done this way.

July 22, 2010 | Unregistered CommenterJ. Andrew Rogers

Interesting point J. Any ideas on what can be done about it?

July 22, 2010 | Registered CommenterHighScalability Team

Excellent post, very poignant

One thing not addressed, that is a stopper for converting theoretical research into real computer code: That hand-to-mouth attitude programmers have.

I have presented research papers to professional programmers in the past but found the "how does it make *ME* money?" attitude instantly kills any momentum building. If it's not a software spec, with bias toward whatever programming language they already know and have religious zealotry for, they aren't open to innovation.

July 22, 2010 | Unregistered CommenterTodd

1) Many researchers are bloggers. (See http://cacm.acm.org/blogs/about-the-blogs/)

2) Many researchers post their papers on arxiv.org. There are handy RSS feeds. Some of the best papers make it there.

3) Many researchers use Google code. (e.g. http://code.google.com/u/lemire/)

4) Countless researchers are on Twitter, stackoverflow, mathoverflow, quora, and so on.

5) Many of my colleagues are wikipedians.

July 22, 2010 | Unregistered CommenterDaniel Lemire

You may enjoy the idea of Open Scholarship. See our facebook group:

http://www.facebook.com/group.php?gid=140408581513

July 22, 2010 | Unregistered CommenterDaniel Lemire

Nicely written article. Working in a research lab, I see similar problems. You have given some very useful tips to academics to make their research accessible, but it would be nice to see what developers can do to ease this process. It is not always easy to see all the developer's problems from the academic side. Developers have to make the effort to read the paper, run the "research" code and understand some of the limitations of research. Best results are achieved when both sides work together.

July 22, 2010 | Unregistered CommenterPradeep Padala

As a computer science researcher myself, I agree with your view that research is often a bit disconnected from the rest of the world. A lot of the papers I read (and write) could have more practical implications than those that are implied by, or can readily be derived from the paper. But even in the cases where your research doesn't target a problem that doesn't pose itself yet, making it suitable for immediate consumption is a lot of hard work which usually isn't really valued in an academic sense. Some research ideas do make it into open-source, downloadable tools, but that takes a lot of extra effort. To implement your idea into something that runs with most of the benchmarks and allows you to get the graphs you need for a paper is only 10% of the work, after that you need to add usable API/commandline/GUI interfaces, handle all corner cases, write documentation, do all the support etc. And at the end of the day, as a researcher you're judged on your number of publications, not your number of GitHub downloads. It's not surprising then that most academics will choose not to spend the extra effort, but just do what they're best at and move on to their next idea (and paper).

July 22, 2010 | Unregistered CommenterWim Heirman

But, this may lead to focusing on the wrong thing. Drugs for postponing heart-attack could be what cardiologists like, but drugs for regulating heart rate might be the way solve the heart attack problem from the root. Why give them what they "want" (a temporary solution), when you should target giving them what they "need" (a real solution)?

food for thought.

July 22, 2010 | Unregistered Commentermurat

Thank you for this great post. This is a very significant issue. Most academics care only about acceptance ratios and complexity; it is a pissing game played with brains. This needs to change, and you suggested somee nice remedies for the problem.

Another remedy I can suggest is industry to fund more research in academia. I don't know how this can be initiated though.

July 22, 2010 | Unregistered Commentermurat

I also believe that, at least in Sweden, the other side of the spectrum in other words the developers need to be convinced that research has something to offer. Not all companies or developers are of this view. Then there are companies such as the company I work with over the summer who wants to engage with research type work but they are simply too small to do so. Or, may it be that the barrier of entry is too high?

Earlier this summer I defended my thesis and one of the questions from the audience was "So, what is the next step for company X to make it useful?" Though we had thought about it before I don't honestly think we thought about it from the start. Researcher's own agendas shouldn't always be driving directions. Perhaps research methods such as applied research a solution? It is growing more popular and respected within the research community, still there's a long way to go before it'll be "easy to use" but with the right motivations it may prove a better alternative to "bureaucratic integration centers."

Great post btw, got me thinking about lots of stuff. :)

July 22, 2010 | Unregistered CommenterMarcus

The biggest challenge may be the "change the game mechanics" part. Researchers are rewarded for publishing papers, and that's it. They are not rewarded for producing software or communicating better with developers. Some researchers already follow some of your suggestions, which is great, but there is no incentive to do it.

Hard-coded incentives already create problems for CS professors in some places because in our area a lot of high-quality research is published in conferences, but journals tend to be more highly regarded.

Changing this, unfortunately, requires far more than the good will of some individual researchers.

July 22, 2010 | Unregistered CommenterAndrei Formiga

The post is excellent.

The issues are real.

Here's a suggestion though on how we (a community of like-minded individuals) might progress this issue.

Get some developers into a room.

Get some researchers into a room.

Hire/get hold of a good facilitator. Tell the facilitator what the target end state is. It might be something like "material that is worked on and produced by researchers needs to be easily found and make usable to developers." Then the facilitated session can explore things like the different needs of each group, the drivers for each group, and then using the power of that first seed group a paper/blog/slideshow can be produced to further other discussions.

Magic happens when you get a whole lot of minds working on something. Most of us know this intrinsically. But sometimes we need a catalyst, a seed, a starting point, to get it started.

Get the discussion happening in realtime, in the room, with real people. Maybe some of you folks commenting on this post are in the same geographic area?

Todd, It's great that you've named and described the issue. Now that it's named, it can be worked on.

Cheers and good luck (I'm not a developer or a researcher, I'm more a facilitative type of system admin in Australia)

July 22, 2010 | Unregistered CommenterFrancis Liu

I work on a commercial project directly inspired from academic research and have benefitted considerably from contact with computer science researchers.

However, I also see the role of looking farther down the road. I feel like so much of my day-to-day work as a practitioner is soaked in productizing details - I'm relieved to know there are others unconcerned with practical implementation spending their time thinking the bigger, more abstract thoughts, unhindered by the nagging details of the "real" world.

Thought provoking post - thanks for sharing your observations.

July 22, 2010 | Unregistered CommenterRyan

I've been involved in sever quasi-academic projects and spin-offs, and it's tough. Academia moves painfully slowly by commercial standards. Commercial outfits quickly become bogged down in IP issues. Academics care about correctness, commercial concerns care about usefulness.

One problem is that purely theoretical research is too hard for normal developers to incorporate into software. Most companies don't have the luxury of employing people at a level where they can read papers on algorithmic advances and convert that into working code. And if they did employ those sorts of people, they'd be spending much of their time doing their own work on the algorithms.

Equally, when academic concerns decide to go one step further and implement working software, they then expect to get some kind of return on that, and rarely simply give it away. But neither do they go the whole hog and turn it into commercial grade software. This is what I often deal with. Stuff that works, but needs a lot of investment in user interface, robustness, performance and so on to really be of value to the world at large. But the universities (and academics) are unwilling to simply hand over their 'baby' to some nasty commercial enterprise to mess with. And the enterprise is wary of relying on flaky academics who don't care about timetables to make important updates and fixes in the code.

July 23, 2010 | Unregistered CommenterJ

Excellent post. I can totally relate this when I was trying to implement a paper and the math was really high level for my education although the gist of the idea was simple. The proof of correctness was something I had struggled with.

July 24, 2010 | Unregistered Commenternascentmind

Looking at some comments, I am compelled to comment again.

" And the enterprise is wary of relying on flaky academics who don't care about timetables to make important updates and fixes in the code."

This just doesn't make any sense. Why should academics care about timetables or updates? That's not their job. No academic is rewarded for maintaining a piece of code on sourceforge and for making bug fixes. I am not saying it's boring or not worth doing. It's just not a researcher's job.

Also, in many comments people seem to think that academics = theoretical research, which is not true. There is a lot of "systems" research, which produces many practical research ideas/solutions, some of which go into open source. Look at the papers from SOSP, OSDI, Eurosys, NSDI, SIGCOMM etc. Papers from these conferences much more easily readable and digestible from a developer's point of view.

July 25, 2010 | Unregistered CommenterPradeep Padala

As someone who works in Corporate R&D -with some time off building production stuff- I think open source is a great way of getting stuff from us, the researchers, into the field. But, the OSS projects have to work for it -in Hadoop I'm a great advocate of plugin points- because there is no way we'd trust some random postgrads to put the big hadoop filesystems at risk. Having plugin points for scheduling (in 0.21) and hopefully in future: block placement on namenodes, block placement within namenodes, estimating system availability for new jobs, etc: all classic CS-hard problems that we can get people busy with, if we can give them a safe way to do so.

July 26, 2010 | Unregistered CommenterSteve Loughran

Great article. Are you familiar with the NSF I/UCRC program?

July 26, 2010 | Unregistered CommenterMark Fontenot

I was not Mark, thanks. I'm more of a bottom-up person though. To me a bottom-up approach would give the maximum chance of reuse.

July 26, 2010 | Registered CommenterHighScalability Team

A lot of researchers also privide their papers on their homepages. So if yuo're interested, search for the author's homepage and get it from their or just send 'em an email. Being a researcher myself I've often received papers by email when I requested them.

August 5, 2010 | Unregistered Commenterfg

> Change the game mechanics. The game mechanics for research are all weighted towards publishing. Is there some way to rework the reward mechanism so the ease of use is also rewarded?

The problem is that researcher measure themselfs by the number of published papers and the amount of citations they get in other papers (resulting in stuff like the H-Index). Sad but true ...

August 7, 2010 | Unregistered Commenteranonymous

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>