PDA

View Full Version : A webcrawler/data extractor



jamms
15 Sep 2009, 02:40 AM
This may not be the correct term for what I am describing, but it was the closest I could find. Before you read below, understand my purpose for wanting this. I am creating a viral entertainment website (pictures, videos, etc). In order to find good content efficiently, and because it's a well-kept secret how others do it, I have thought up this idea.

I want to write a program that will scour the internet (or a list of specified sites) for certain file extensions. The program will also accept certain parameters for the file extensions (such as number of hits, or size). When the program finds a match, I want it to download the file to my local hard drive and catalog it for review.

Is something like this outside the realm of web scripting languages? Would Java do the trick? I am familiar with PHP/CSS/HTML/MySQL, but I don't believe these would work, or would it? Thanks for your answers.

DigitalExtreme
02 Oct 2009, 12:35 PM
Its called a scrubber and will get you banned in Google.