First a disclaimer beforehand: the posted code snippets are all basic examples. You'll need to handle trivial IOException
s and RuntimeException
s like NullPointerException
, ArrayIndexOutOfBoundsException
and consorts yourself.
In case you're developing for Android instead of Java, note also that since introduction of API level 28, cleartext HTTP requests are disabled by default. You are encouraged to use HttpsURLConnection
, but if it is really necessary, cleartext can be enabled in the Application Manifest.
Preparing
We first need to know at least the URL and the charset. The parameters are optional and depend on the functional requirements.
String url = "http://example.com";
String charset = "UTF-8"; // Or in Java 7 and later, use the constant: java.nio.charset.StandardCharsets.UTF_8.name()
String param1 = "value1";
String param2 = "value2";
// ...
String query = String.format("param1=%s¶m2=%s",
URLEncoder.encode(param1, charset),
URLEncoder.encode(param2, charset));
The query parameters must be in name=value
format and be concatenated by &
. You would normally also URL-encode the query parameters with the specified charset using URLEncoder#encode()
.
The String#format()
is just for convenience. I prefer it when I would need the String concatenation operator +
more than twice.
Firing an HTTP GET request with (optionally) query parameters
It's a trivial task. It's the default request method.
URLConnection connection = new URL(url + "?" + query).openConnection();
connection.setRequestProperty("Accept-Charset", charset);
InputStream response = connection.getInputStream();
// ...
Any query string should be concatenated to the URL using ?
. The Accept-Charset
header may hint the server what encoding the parameters are in. If you don't send any query string, then you can leave the Accept-Charset
header away. If you don't need to set any headers, then you can even use the URL#openStream()
shortcut method.
InputStream response = new URL(url).openStream();
// ...
Either way, if the other side is an HttpServlet
, then its doGet()
method will be called and the parameters will be available by HttpServletRequest#getParameter()
.
For testing purposes, you can print the response body to standard output as below:
try (Scanner scanner = new Scanner(response)) {
String responseBody = scanner.useDelimiter("\\A").next();
System.out.println(responseBody);
}
Firing an HTTP POST request with query parameters
Setting the URLConnection#setDoOutput()
to true
implicitly sets the request method to POST. The standard HTTP POST as web forms do is of type application/x-www-form-urlencoded
wherein the query string is written to the request body.
URLConnection connection = new URL(url).openConnection();
connection.setDoOutput(true); // Triggers POST.
connection.setRequestProperty("Accept-Charset", charset);
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded;charset=" + charset);
try (OutputStream output = connection.getOutputStream()) {
output.write(query.getBytes(charset));
}
InputStream response = connection.getInputStream();
// ...
Note: whenever you'd like to submit a HTML form programmatically, don't forget to take the name=value
pairs of any <input type="hidden">
elements into the query string and of course also the name=value
pair of the <input type="submit">
element which you'd like to "press" programmatically (because that's usually been used in the server side to distinguish if a button was pressed and if so, which one).
You can also cast the obtained URLConnection
to HttpURLConnection
and use its HttpURLConnection#setRequestMethod()
instead. But if you're trying to use the connection for output you still need to set URLConnection#setDoOutput()
to true
.
HttpURLConnection httpConnection = (HttpURLConnection) new URL(url).openConnection();
httpConnection.setRequestMethod("POST");
// ...
Either way, if the other side is an HttpServlet
, then its doPost()
method will be called and the parameters will be available by HttpServletRequest#getParameter()
.
Actually firing the HTTP request
You can fire the HTTP request explicitly with URLConnection#connect()
, but the request will automatically be fired on demand when you want to get any information about the HTTP response, such as the response body using URLConnection#getInputStream()
and so on. The above examples does exactly that, so the connect()
call is in fact superfluous.
Gathering HTTP response information
- HTTP response status:
You need an HttpURLConnection
here. Cast it first if necessary.
int status = httpConnection.getResponseCode();
HTTP response headers:
for (Entry<String, List<String>> header : connection.getHeaderFields().entrySet()) {
System.out.println(header.getKey() + "=" + header.getValue());
}
HTTP response encoding:
When the Content-Type
contains a charset
parameter, then the response body is likely text based and we'd like to process the response body with the server-side specified character encoding then.
String contentType = connection.getHeaderField("Content-Type");
String charset = null;
for (String param : contentType.replace(" ", "").split(";")) {
if (param.startsWith("charset=")) {
charset = param.split("=", 2)[1];
break;
}
}
if (charset != null) {
try (BufferedReader reader = new BufferedReader(new InputStreamReader(response, charset))) {
for (String line; (line = reader.readLine()) != null;) {
// ... System.out.println(line)?
}
}
} else {
// It's likely binary content, use InputStream/OutputStream.
}
Maintaining the session
The server side session is usually backed by a cookie. Some web forms require that you're logged in and/or are tracked by a session. You can use the CookieHandler
API to maintain cookies. You need to prepare a CookieManager
with a CookiePolicy
of ACCEPT_ALL
before sending all HTTP requests.
// First set the default cookie manager.
CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL));
// All the following subsequent URLConnections will use the same cookie manager.
URLConnection connection = new URL(url).openConnection();
// ...
connection = new URL(url).openConnection();
// ...
connection = new URL(url).openConnection();
// ...
Note that this is known to not always work properly in all circumstances. If it fails for you, then best is to manually gather and set the cookie headers. You basically need to grab all Set-Cookie
headers from the response of the login or the first GET
request and then pass this through the subsequent requests.
// Gather all cookies on the first request.
URLConnection connection = new URL(url).openConnection();
List<String> cookies = connection.getHeaderFields().get("Set-Cookie");
// ...
// Then use the same cookies on all subsequent requests.
connection = new URL(url).openConnection();
for (String cookie : cookies) {
connection.addRequestProperty("Cookie", cookie.split(";", 2)[0]);
}
// ...
The split(";", 2)[0]
is there to get rid of cookie attributes which are irrelevant for the server side like expires
, path
, etc. Alternatively, you could also use cookie.substring(0, cookie.indexOf(';'))
instead of split()
.
Streaming mode
The HttpURLConnection
will by default buffer the entire request body before actually sending it, regardless of whether you've set a fixed content length yourself using connection.setRequestProperty("Content-Length", contentLength);
. This may cause OutOfMemoryException
s whenever you concurrently send large POST requests (e.g. uploading files). To avoid this, you would like to set the HttpURLConnection#setFixedLengthStreamingMode()
.
httpConnection.setFixedLengthStreamingMode(contentLength);
But if the content length is really not known beforehand, then you can make use of chunked streaming mode by setting the HttpURLConnection#setChunkedStreamingMode()
accordingly. This will set the HTTP Transfer-Encoding
header to chunked
which will force the request body being sent in chunks. The below example will send the body in chunks of 1 KB.
httpConnection.setChunkedStreamingMode(1024);
User-Agent
It can happen that a request returns an unexpected response, while it works fine with a real web browser. The server side is probably blocking requests based on the User-Agent
request header. The URLConnection
will by default set it to Java/1.6.0_19
where the last part is obviously the JRE version. You can override this as follows:
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"); // Do as if you're using Chrome 41 on Windows 7.
Use the User-Agent string from a recent browser.
Error handling
If the HTTP response code is 4nn
(Client Error) or 5nn
(Server Error), then you may want to read the HttpURLConnection#getErrorStream()
to see if the server has sent any useful error information.
InputStream error = ((HttpURLConnection) connection).getErrorStream();
If the HTTP response code is -1, then something went wrong with connection and response handling. The HttpURLConnection
implementation is in older JREs somewhat buggy with keeping connections alive. You may want to turn it off by setting the http.keepAlive
system property to false
. You can do this programmatically in the beginning of your application by:
System.setProperty("http.keepAlive", "false");
Uploading files
You'd normally use multipart/form-data
encoding for mixed POST content (binary and character data). The encoding is in more detail described in RFC2388.
String param = "value";
File textFile = new File("/path/to/file.txt");
File binaryFile = new File("/path/to/file.bin");
String boundary = Long.toHexString(System.currentTimeMillis()); // Just generate some unique random value.
String CRLF = "\r\n"; // Line separator required by multipart/form-data.
URLConnection connection = new URL(url).openConnection();
connection.setDoOutput(true);
connection.setRequestProperty("Content-Type", "multipart/form-data; boundary=" + boundary);
try (
OutputStream output = connection.getOutputStream();
PrintWriter writer = new PrintWriter(new OutputStreamWriter(output, charset), true);
) {
// Send normal param.
writer.append("--" + boundary).append(CRLF);
writer.append("Content-Disposition: form-data; name=\"param\"").append(CRLF);
writer.append("Content-Type: text/plain; charset=" + charset).append(CRLF);
writer.append(CRLF).append(param).append(CRLF).flush();
// Send text file.
writer.append("--" + boundary).append(CRLF);
writer.append("Content-Disposition: form-data; name=\"textFile\"; filename=\"" + textFile.getName() + "\"").append(CRLF);
writer.append("Content-Type: text/plain; charset=" + charset).append(CRLF); // Text file itself must be saved in this charset!
writer.append(CRLF).flush();
Files.copy(textFile.toPath(), output);
output.flush(); // Important before continuing with writer!
writer.append(CRLF).flush(); // CRLF is important! It indicates end of boundary.
// Send binary file.
writer.append("--" + boundary).append(CRLF);
writer.append("Content-Disposition: form-data; name=\"binaryFile\"; filename=\"" + binaryFile.getName() + "\"").append(CRLF);
writer.append("Content-Type: " + URLConnection.guessContentTypeFromName(binaryFile.getName())).append(CRLF);
writer.append("Content-Transfer-Encoding: binary").append(CRLF);
writer.append(CRLF).flush();
Files.copy(binaryFile.toPath(), output);
output.flush(); // Important before continuing with writer!
writer.append(CRLF).flush(); // CRLF is important! It indicates end of boundary.
// End of multipart/form-data.
writer.append("--" + boundary + "--").append(CRLF).flush();
}
If the other side is an HttpServlet
, then its doPost()
method will be called and the parts will be available by HttpServletRequest#getPart()
(note, thus not getParameter()
and so on!). The getPart()
method is however relatively new, it's introduced in Servlet 3.0 (Glassfish 3, Tomcat 7, etc.). Prior to Servlet 3.0, your best choice is using Apache Commons FileUpload to parse a multipart/form-data
request. Also see this answer for examples of both the FileUpload and the Servelt 3.0 approaches.
Dealing with untrusted or misconfigured HTTPS sites
In case you're developing for Android instead of Java, be careful: the workaround below may save your day if you don't have correct certificates deployed during development. But you should not use it for production. These days (April 2021) Google will not allow your app be distributed on Play Store if they detect insecure hostname verifier, see https://support.google.com/faqs/answer/7188426.
Sometimes you need to connect an HTTPS URL, perhaps because you're writing a web scraper. In that case, you may likely face a javax.net.ssl.SSLException: Not trusted server certificate
on some HTTPS sites who doesn't keep their SSL certificates up to date, or a java.security.cert.CertificateException: No subject alternative DNS name matching [hostname] found
or javax.net.ssl.SSLProtocolException: handshake alert: unrecognized_name
on some misconfigured HTTPS sites.
The following one-time-run static
initializer in your web scraper class should make HttpsURLConnection
more lenient as to those HTTPS sites and thus not throw those exceptions anymore.
static {
TrustManager[] trustAllCertificates = new TrustManager[] {
new X509TrustManager() {
@Override
public X509Certificate[] getAcceptedIssuers() {
return null; // Not relevant.
}
@Override
public void checkClientTrusted(X509Certificate[] certs, String authType) {
// Do nothing. Just allow them all.
}
@Override
public void checkServerTrusted(X509Certificate[] certs, String authType) {
// Do nothing. Just allow them all.
}
}
};
HostnameVerifier trustAllHostnames = new HostnameVerifier() {
@Override
public boolean verify(String hostname, SSLSession session) {
return true; // Just allow them all.
}
};
try {
System.setProperty("jsse.enableSNIExtension", "false");
SSLContext sc = SSLContext.getInstance("SSL");
sc.init(null, trustAllCertificates, new SecureRandom());
HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
HttpsURLConnection.setDefaultHostnameVerifier(trustAllHostnames);
}
catch (GeneralSecurityException e) {
throw new ExceptionInInitializerError(e);
}
}
Last words
The Apache HttpComponents HttpClient is much more convenient in this all :)
Parsing and extracting HTML
If all you want is parsing and extracting data from HTML, then better use a HTML parser like Jsoup.
Best Solution
What you're really asking is: Is there any way to validate the "WWW-Authenticate: NTLM" tokens submitted by IE and other HTTP clients when doing Single Sign-On (SSO). SSO is when the user enters their password a "single" time when they do Ctrl-Alt-Del and the workstation remembers and uses it as necessary to transparently access other resources without prompting the user for a password again.
Note that Kerberos, like NTLM, can also be used to implement SSO authentication. When presented with a "WWW-Authenticate: Negotiate" header, IE and other browsers will send SPNEGO wrapped Kerberos and / or NTLM tokens. More on this later but first I will answer the question as asked.
The only way to validate an NTLMSSP password "response" (like the ones encoded in "WWW-Authenticate: NTLM" headers submitted by IE and other browsers) is with a NetrLogonSamLogon(Ex) DCERPC call with the NETLOGON service of an Active Directory domain controller that is an authority for, or has a "trust" with an authority for, the target account. Additionally, to properly secure the NETLOGON communication, Secure Channel encryption should be used and is required as of Windows Server 2008.
Needless to say, there are very few packages that implement the necessary NETLOGON service calls. The only ones I'm aware of are:
Windows (of course)
Samba - Samba is a set of software programs for UNIX that implements a number of Windows protocols including the necessary NETLOGON service calls. In fact, Samba 3 has a special daemon for this called "winbind" that other programs like PAM and Apache modules can (and do) interface with. On a Red Hat system you can do a
yum install samba-winbind
andyum install mod_auth_ntlm_winbind
. But that's the easy part - setting these things up is another story.Jespa - Jespa (http://www.ioplex.com/jespa.html) is a 100% Java library that implements all of the necessary NETLOGON service calls. It also provides implementations of standard Java interfaces for authenticating clients in various ways such as with an HTTP Servlet Filter, SASL server, JAAS LoginModule, etc.
Beware that there are a number of NTLM authentication acceptors that do not implement the necessary NETLOGON service calls but instead do something else that ultimately leads to failure in one scenario or another. For example, for years, the way to do this in Java was with the NTLM HTTP authentication Servlet Filter from a project called JCIFS. But that Filter uses a man-in-the-middle technique that has been responsible for a long-standing "hiccup bug" and, more important, it does not support NTLMv2. For these reasons and others it is scheduled to be removed from JCIFS. There are several projects that have been unintentionally inspired by that package that are now also equally doomed. There are also a lot of code fragments posted in Java forums that decode the header token and pluck out the domain and username but do absolutely nothing to actually validate the password responses. Suffice it to say, if you use one of those code fragments, you might as well walk around with your pants down.
As I eluded to earlier, NTLM is only one of several Windows Security Support Providers (SSP). There's also a Digest SSP, Kerberos SSP, etc. But the Negotiate SSP, which is also known as SPNEGO, is usually the provider that MS uses in their own protocol clients. The Negotiate SSP actually just negotiates either the NTLM SSP or Kerberos SSP. Note that Kerberos can only be used if both the server and client have accounts in the target domain and the client can communicate with the domain controller sufficiently to acquire a Kerberos ticket. If these conditions are not satisfied, the NTLM SSP is used directly. So NTLM is by no means obsolete.
Finally, some people have mentioned using an LDAP "simple bind" as a make-shift password validation service. LDAP is not really designed as an authentication service and for this reason it is not efficient. It is also not possible to implement SSO using LDAP. SSO requires NTLM or SPNEGO. If you can find a NETLOGON or SPNEGO acceptor, you should use that instead.
Mike