Using UTF-8 encoding with application/x-www-form-urlencoded

I ran into a problem trying to use application/x-www-form-urlencoded to submit a POST request to facebook site. Every was working fine when I use normal characters but when I tried using Thai character, the post that was published to facebook would be converted into some weird text.

URLConnection connection = new URL("https://graph.facebook.com/" + userModel.getFbId() +"/feed").openConnection();
connection.setDoOutput(true);
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded; charset=utf-8");
connection.setRequestProperty("Content-Length", Integer.toString(content.length()));

DataOutputStream out = new DataOutputStream(connection.getOutputStream());
out.writeBytes(content);
out.flush();
out.close();

BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) { 
// System.out.println(inputLine);
}

To solve the problem, I googled and found this

application/x-www-form-urlencoded  

This is the default content type. Forms submitted with this content type must be encoded as follows:

  1. Control names and values are escaped. Space characters are replaced by `+’, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by `%HH’, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as “CR LF” pairs (i.e., `%0D%0A’).
  2. The control names/values are listed in the order they appear in the document. The name is separated from the value by `=’ and name/value pairs are separated from each other by `&’.

And that’s it. Turns out I have to encode my String myself.

So I used  URLEncoder.encode(<my string> , “UTF-8”) to convert characters like “ภาษาไทย” to “%E0%B8%A0%E0%B8%B2%E0%B8%A9%E0%B8%B2%E0%B9%84%E0%B8%97%E0%B8%A2″  and everything is working fine now.

Facebook read my weird characters and interpret them correctly 🙂