In earlier versions, when the Peecho platform first launched, all processing was executed by EC2 instances. For a single order it was done sequentially; page by page as illustrated below.
Since we can deal with any kind of files and usually really tough ones, the described transformation process could take hours to be executed. In average, it would take 15 seconds per page. Since it needed to be done sequentially, the processing time increased linearly according to the number of pages. For example, a 400-page document would take around 1 hour and 40 minutes to be processed, which is a considerable amount of time for a single file.
Recently, our development team has integrated the new AWS Lambda functions into the processing architecture and that has changed the story enormously.
AWS Lambda
Imagine if you could simply define a piece of code that runs in a dedicated machine in the cloud, without worrying about provisioning, managing and scaling the servers that you use to run the code? That’s exactly what AWS Lambda is: a compute service where you can define functions that respond to events, such as changes to data in Amazon S3.
In the new processing architecture, we took the existing processing code and converted it into a AWS Lambda function that performs all file transformations on a single page in a document. The new function is written in Node.js and is triggered after S3 file uploads.
After the processing starts, the original document is split into separate pages and uploaded to S3; when the upload completes for every page, a new Lambda instance is launched and starts cracking the page data.
By doing that, we are now able to run a separate processing instance for each page in parallel. It means that for a 400-page document we now launch 400 Lambda instances simultaneously and process the entire document at the same period of time it would take to process a single page. Therefore, the processing time does not increase with the number of pages. And as a result, we can process almost any document in the same time we used to process a single page!
Although AWS Lambda is a great and powerful function, it has some limitations regarding execution time, disk space and memory. For instance, we are not able to use Lambda to process files larger than 500MB. Since we still have to process these big guys, the Peecho platform falls back to the previous mechanism whenever we need to handle corner cases like that.
More on Lambda
Other than document processing, Peecho also uses AWS Lambda functions in some other cool features like the generation of thumbnails for publication covers as well as content previews. For that, Lambda functions are triggered right after a publication is uploaded, so image thumbnails are instantly available in our dashboard, website and checkout pages.
Our development team is obsessed in making things simpler and faster. We are continuously seeking new possibilities for improving performance across Peecho applications. When it comes to that, AWS Lambda function makes a great fit and it’s definitely going to be more and more explored in future releases.