At the beginning of this term at the university I have subscribed to a project about the Semantic Web and Service Sciences. A really interesting topic, indeed. But this is not the main topic of this article, even though this course led me to think about how to crawl the web with a Java application (because the project leader, in other words the professor, has already had some code written in Java).
So then, I started with my basic skills of Java coding, to look around. Fortunately I already knew a little bit about crawling and that regular expressions are key in this area from my experience with PHP and libcurl.
Looking around in the web I (or should I say Google has) have found a good point to start with from Osborne (a unit of McGraw Hill) where the Book "The Art Of Java" from Herbert Schildt and James Holmes has been published. So I started to read the article Crawling The Web With Java.
