« Letting Clients Know What's Changed: Push Me or Pull Me? | Main | How Ruby on Rails Survived a 550k Pageview Digging »
Tuesday
Jan082008

Virus Scanning for Uploaded content

All,
What is the best way to scan the content being uploaded by the users? Is there any open source solution available to do that? How does YouTube, flickr and other user uploadable content sites handle this?
Any insight would be greatly appreciated!
Regards,
Janakan Rajendran.

Reader Comments (6)

ClamAV? :D

http://www.clamav.net/

I use it on all our servers.

December 31, 1999 | Unregistered CommenterAlberto

Alberto,

Thanks for your reply. Yes, I heard of Clamav. But I haven't heard that the bigger names like YouTube, Flickr using it.
Is ClamAV efficient in scanning media files too in terms of accuracy/performance (fast?)?

Regards,
Janakan Rajendran.

December 31, 1999 | Unregistered CommenterJanakan Rajendran

Janakan,

You did ask for the opensource version.
ClamAV is quiet good.

Is this also for the CDN project that you are planning?
We use this at our ISP for scanning files uploaded via FTP.
We use Pure-ftpd and it has a upload script hook that can do what you want to the file after it has been uploaded and can decided wether to keep or remove it.

For the http based upload, you will have to follow a similar scheme.
Allow people to upload a file in a non-downloadable-area, queue that file for an inspection and inform the uploader after the inspection what the result it. This is less expensive aproach that doing everything in real-time.

Wether you use scanning or not, it is anyway a reasonable good idea to separate your upload server from your other content-serving servers.

Hope this helps.

December 31, 1999 | Unregistered Commenteratif.ghaffar

Atif,

Thanks for your response. I like the idea of seperating it from content delivery servers before successful scanning. Is there any paid commerical solutions available rather than ClamAV? If I get simultaneous uploads, I'm concerned about multiple threads support from ClamAV.

Regards,
Janakan Rajendran

December 31, 1999 | Unregistered Commenterrjanakan

Janakan,

I dont know about any paid service. Havent had the need to look into it yet.
For the multiple files... what is bothering you?

while (incoming files) {
scan_and_report.sh $file &; # fork it!
}

fork as much as you can handle.

Or distribute it on different machines.

perhaps use a central database where you put a reference to all uploaded files.
Then from a dispatcher dispatch the files to different machines in batches. (you can put your threshhold, for example each machine recieves no more than 50 requests at one time).

When the scanning process finish, the scanner reports back to the database with OK or KO.

Once you have an OK, move the file to where it should be.

December 31, 1999 | Unregistered Commenteratif.ghaffar

Not open-source but there are commercial products for this type scanning. Symantec has one product:

http://www.symantec.com/business/products/overview.jsp?pcid=2251&pvid=836_1

ICAP is a protocol specification meant for handling this type of processing: http://en.wikipedia.org/wiki/Internet_Content_Adaptation_Protocol

December 31, 1999 | Unregistered CommenterAnonymous

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>