Tuesday
Jan082008
Virus Scanning for Uploaded content
Tuesday, January 8, 2008 at 12:33AM
All,
What is the best way to scan the content being uploaded by the users? Is there any open source solution available to do that? How does YouTube, flickr and other user uploadable content sites handle this?
Any insight would be greatly appreciated!
Regards,
Janakan Rajendran.
What is the best way to scan the content being uploaded by the users? Is there any open source solution available to do that? How does YouTube, flickr and other user uploadable content sites handle this?
Any insight would be greatly appreciated!
Regards,
Janakan Rajendran.
Reader Comments (6)
ClamAV? :D
http://www.clamav.net/
I use it on all our servers.
Alberto,
Thanks for your reply. Yes, I heard of Clamav. But I haven't heard that the bigger names like YouTube, Flickr using it.
Is ClamAV efficient in scanning media files too in terms of accuracy/performance (fast?)?
Regards,
Janakan Rajendran.
Janakan,
You did ask for the opensource version.
ClamAV is quiet good.
Is this also for the CDN project that you are planning?
We use this at our ISP for scanning files uploaded via FTP.
We use Pure-ftpd and it has a upload script hook that can do what you want to the file after it has been uploaded and can decided wether to keep or remove it.
For the http based upload, you will have to follow a similar scheme.
Allow people to upload a file in a non-downloadable-area, queue that file for an inspection and inform the uploader after the inspection what the result it. This is less expensive aproach that doing everything in real-time.
Wether you use scanning or not, it is anyway a reasonable good idea to separate your upload server from your other content-serving servers.
Hope this helps.
Atif,
Thanks for your response. I like the idea of seperating it from content delivery servers before successful scanning. Is there any paid commerical solutions available rather than ClamAV? If I get simultaneous uploads, I'm concerned about multiple threads support from ClamAV.
Regards,
Janakan Rajendran
Janakan,
I dont know about any paid service. Havent had the need to look into it yet.
For the multiple files... what is bothering you?
while (incoming files) {
scan_and_report.sh $file &; # fork it!
}
fork as much as you can handle.
Or distribute it on different machines.
perhaps use a central database where you put a reference to all uploaded files.
Then from a dispatcher dispatch the files to different machines in batches. (you can put your threshhold, for example each machine recieves no more than 50 requests at one time).
When the scanning process finish, the scanner reports back to the database with OK or KO.
Once you have an OK, move the file to where it should be.
Not open-source but there are commercial products for this type scanning. Symantec has one product:
http://www.symantec.com/business/products/overview.jsp?pcid=2251&pvid=836_1
ICAP is a protocol specification meant for handling this type of processing: http://en.wikipedia.org/wiki/Internet_Content_Adaptation_Protocol