Recommend The Anatomy of Search Technology: Crawling using Combinators (Email)

This action will generate an email recommending this article to the recipient of your choice. Note that your email address and your recipient's email address are not logged by this system.

EmailEmail Article Link

The email sent will contain a link to this article, the article title, and an article excerpt (if available). For security reasons, your IP address will also be included in the sent email.

Article Excerpt:

This is the second guest post (part 1, part 3) of a series by Greg Lindahl, CTO of blekko, the spam free search engine. Previously, Greg was Founder and Distinguished Engineer at PathScale, at which he was the architect of the InfiniPath low-latency InfiniBand HCA, used to build tightly-coupled supercomputing clusters.

What's so hard about crawling the web?

Web crawlers have been around as long as the Web has -- and before the web, there were crawlers for gopher and ftp. You would think that 25 years of experience would render crawling a solved problem, but the vast growth of the web and new inventions in the technology of webspam and other unsavory content results in a constant supply of new challenges. The general difficulty of tightly-coupled parallel programming also rears its head, as the web has scaled from millions to 100s of billions of pages.

Existing Open-Source Crawlers and Crawls


Article Link:
Your Name:
Your Email:
Recipient Email:
Message: