Java – How to Programmatically Download a Webpage in Java


I would like to be able to fetch a web page's html and save it to a String, so I can do some processing on it. Also, how could I handle various types of compression.

How would I go about doing that using Java?

Best Solution

I'd use a decent HTML parser like Jsoup. It's then as easy as:

String html = Jsoup.connect("").get().html();

It handles GZIP and chunked responses and character encoding fully transparently. It offers more advantages as well, like HTML traversing and manipulation by CSS selectors like as jQuery can do. You only have to grab it as Document, not as a String.

Document document = Jsoup.connect("").get();

You really don't want to run basic String methods or even regex on HTML to process it.

See also: