Monday, September 18, 2023

3 ways to convert String to byte array in Java - Example Tutorial

Today, I am going to discuss one of the common tasks for programmers, converting a String to a byte array. You need to do that for multiple reasons like saving content to a file, sending over a network, or maybe some other reason. Suppose you have a string "abcd" and you want to convert it into a byte array, how will you do that in a Java program? Remember, String is made of the char array, so it involves a character to byte conversion, which is subject to character encoding intricacies

Thankfully, Java provides a convenient getBytes() method to convert String to byte array in Java, but unfortunately, many developers don't use it correctly. Almost 70% of the code I have reviewed uses getBytes() without character encoding, leaving it on the chance that the platform's default character encoding will be the same as the source String.

The right way to use getBytes() should always be with explicit character encoding, as shown in this article. Java even comes with some standard set of character encoding which is supported out-of-box by StandardCharset class, we will review them as well.

It's also good practice is to use the pre-defined contestants for specifying character encoding in your code instead of using a free text or String to avoid typos and other silly mistakes.

In past, I have shown you how to convert a byte array to a String in Java, and in this article, I will show you three common ways to convert a string to a byte array in Java, let's start with the most popular one.




String to a byte array using getBytes()

This is the most common way to convert a String into a byte array, it works most of the time but it's error-prone and can produce an erroneous result if the platform's character encoding doesn't match with the expected encoding.

Here is an example of converting String to byte[] in Java :

// converts String to bytes using platform's default character encoding, 
// in Eclipse it's Cp1252
// in Linux it could be something else
byte[] ascii = "abcdefgh".getBytes(); 

System.out.println("platform's default character encoding : "
                     + System.getProperty("file.encoding"));
System.out.println("length of byte array in default encoding : "
                     + ascii.length);
System.out.println("contents of byte array in default encoding: "
                     + Arrays.toString(ascii));

Output :
platform's default character encoding : Cp1252
length of byte array in default encoding : 8
contents of byte array in default encoding: [97, 98, 99, 100, 
                                               101, 102, 103, 104]

Remark : 
1. Platform's default encoding is used for converting a character to bytes if you don't specify any character encoding.

2. You can see the platform's default character encoding by using System.getProperty("file.encoding");, this returns the default character encoding of the machine your JVM is running. You can also see these free Java programming courses for more details on this topic.

3. Beware, your code may work in one environment e.g. QA but not work in production because of different default character encoding. That's why you should not rely on default character encoding.

4. length of the byte array may not be the same as the length of the String, it depends upon character encoding. Some character encoding is multi-byte but usually, takes 1 byte to encode ASCII characters.




String to byte array using getBytes("encoding)

Here is another way to convert a string to a byte array but this time by specifying the proper encoding to leave any guess or platform default aside.

// convert String to bytes of specified character encoding but
// also throw checked UnsupportedEncodingException, which pollutes the code
try {
byte[] utf16 = "abcdefgh".getBytes("UTF-16");
System.out.println("length of byte array in UTF-16 charater encoding : "
 + utf16.length);
System.out.println("contents of byte array in UTF-16 encoding: "
 + Arrays.toString(utf16));

} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}

Output :
length of byte array in UTF-16 charater encoding : 18
contents of byte array in UTF-16 encoding: [-2, -1, 0, 97, 
0, 98, 0, 99, 0, 100, 0, 101, 0, 102, 0, 103, 0, 104]

Remark :
1. It's better than the previous approach but throws a checked exception java.io.UnsupportedEncodingException, if character encoding String has a typo or specifies and character encoding not supported by Java.

2. The returned byte array is on specified character encoding

3. You can see that length of the byte array is not the same as the number of characters in String as was the case in the previous example because UTF-16 encoding takes at least 2 bytes to encode a character.




String to a byte array using getBytes(Charset)

This is third but probably the best way to convert to String to byte[] in Java. In this example, I have used java.nio.StandardCharsets to specify character encoding. This class contains some of the widely used character encoding constants like UTF-8, UTF-16, etc.

A good thing about this approach is that it doesn't throw checked java.io.UnsupportedEncodingException, but unfortunately this class is only available from JDK 7 onward so it might not be an option for several Java applications running on Java 6 and lower version.

// return bytes in UTF-8 character encoding
// pros - no need to handle UnsupportedEncodingException
// pros - bytes in specified encoding scheme
byte[] utf8 = "abcdefgh".getBytes(StandardCharsets.UTF_8); 
System.out.println("length of byte array in UTF-8 : " + utf8.length);
System.out.println("contents of byte array in UTF-8: " 
                      + Arrays.toString(utf8));

Output : 
length of byte array in UTF-8 : 8
contents of byte array in UTF-8: [97, 98, 99, 100, 101, 102, 103, 104]

Remarks :
1. This is the best way to convert String to a byte array in Java.

2. This doesn't throw java.io.UnsupportedEncodingException exception, which means no boilerplate code for handling this checked exception.

3.Though, you must keep in mind that StandarhardCasets class is only available from Java 7 onward. You can also see these Java Beginner courses for more such details, which also cover Java SE 8.

3 ways to convert String to byte array in Java - Example Tutorial


That's all about how to convert a string to a byte array in Java. Remember the size of the byte array can be more than the length of the String because it's not necessary that one byte is used to encode one character, it all depends on character encoding. 

For example, UTF-8 is a multi-byte character encoding scheme and uses between 1 to 4 bytes per character. In general, characters of the old ASCII range takes 1 byte but characters from the old ISO-8859 range beyond ASCII take 2 bytes.


Other articles you may like
How to replace characters and substring in a given String?
How String in switch case works in Java?
What is the difference in the String pool between Java 6 and 7?
10 Difference between StringBuffer and StringBuilder in Java?

Thanks for reading this article, if you like this tutorial then please share it with your friends. If you have any questions or feedback then please drop us a note.

Now, is the quiz time, what would be the size of byte array if you convert string "Java" into a byte array? 

No comments:

Post a Comment

Feel free to comment, ask questions if you have any doubt.