Friday, August 18, 2023

How to Convert Byte array to String in Java with Example

There are multiple ways to convert a byte array to String in Java but the most straightforward way is to use the String constructor which accepts a byte array i.e. new String(byte []) , but the key thing to remember is character encoding. Since bytes are binary data but String is character data, it's very important to know the original character encoding of the text from which byte array has created. If you use a different character encoding, you will not get the original String back. For example, if you have read that byte array from a file which was encoded in "ISO-8859-1" and you have not provided any character encoding while converting byte array to String using new String() constructor then it's not guaranteed that you will get the same text back? Why? because new String() by default uses platform's default encoding (e.g. Linux machine where your JVM is running), which could be different than "ISO-8859-1".

If it's different you may see some garbage characters or even different characters changing the meaning of text completely and I am not saying this by reading few books, but I have faced this issue in one of my project where we are reading data from the database which contains some french characters.

In the absence of any specified coding, our platform defaulted on something which is not able to convert all those special character properly, I don't remember exact encoding. That issue was solved by providing "UTF-8" as character encoding while converting byte array to String. Yes, there is another overloaded constructor in String class which accepts character encoding i.e. new String(byte[], "character encoding").

BTW, if you are new in the world of character encoding and don't understand what is UTF-8 or UTF-16, I recommend you to read my article difference between UTF-8, UTF-16, and UTF-32 encoding. That will not only explain the difference but also give you some basic idea about character encoding.

Another article, I recommend you to read is about how Java deals with default character encoding. Since many classes which perform conversion between bytes and character cache character encoding, it's important to learn how to provide proper encoding at JVM level. If this interests you then here is the link to full article.




How to convert byte array to String in Java? Example

Everything is 0 and 1 in the computers world, yet we are able to see different things e.g. text, images, music files, etc. The key to converting the byte array to String is character encoding. In simple word, byte values are numeric values and character encoding is map which provides a character for a particular byte.

For example in most of the character encoding schemes e.g. UTF-8, if the value of the byte is 65, the character is A, for 66 it's B. Since ASCII character which includes, numbers, alphabets and some special characters are very popular they have the same value in most of the encoding scheme. But that's not true for every byte value for example -10 can be different in UTF-8 and Windows-1252 encoding schemes.

You can also read Core Java Volume 1 - Fundamentals by Cay S. Horstmann to learn more about converting bytes to characters in Java. The second book also covers Java SE 8, one of the most up-to-date books on the market at the moment.

Now someone can question that, since a byte has 8 bits, it can only represent a maximum of 255 characters, which is quite less given so many languages in the world. That's why we have multi-byte character encoding schemes, which can represent a lot many characters. 

Why do we need to convert bytes to String? one real-world example is to display base 64 encoded data as text. In order to do that, you need to convert the byte array to hex String as shown in that tutorial.

Byte array to String in Java with Example




Java Byte Array to String Example

 Now we know a little bit of theory about how to convert byte array to String, let's see a working example. In order to make the example simple, I have created a byte array on the program itself and then converted that byte array into String using a different character encoding like cp1252, which is the default character encoding in Eclipse, windows1252 another popular encoding in Windows, and UTF-8, which is a default standard character encoding in the world.

If you run this program and look at the output you will notice that most of the characters are the same in all three encodings, they are mostly ASCII characters containing alphabets in both upper and lower case and numbers, but special characters are rendered differently. This is where using incorrect character encoding can create trouble. 

The rest of the example is pretty straightforward as we already have a byte array and we are just using an overloaded String constructor which also accepts encoding. For a more complex example, where we read content from an XML file, see this tutorial. There are also printable and non-printable characters in ASCII, which is handled differently by different character encoding.

import java.io.UnsupportedEncodingException;

public class ByteArrayToStringDemo {

    public static void main(String args[]) throws UnsupportedEncodingException {
       
        byte[] random = new byte[] { 67, 65, 70, 69, 66, 65, 66, 69, -20};
       
        String utf = new String(random, "UTF-8");
        String cp1252 = new String(random, "Cp1252");
        String windows1252 = new String(random, "Windows-1252");
     
        System.out.println("String created from byte array in UTF-8 encoding :
                " + utf);
        System.out.println("byte array to String in Cp1252 encoding : " 
                 + cp1252);
        System.out.println("byte array to String in Windows-1252 encoding : "
              + windows1252);

    }

}

Output :
String created from byte array in UTF-8 encoding : CAFEBABE?
byte array to String in Cp1252 encoding : CAFEBABEì
byte array to String in Windows-1252 encoding : CAFEBABEì


That's all about how to convert the byte array to String in Java. Always provide character encoding while converting bytes to the character and that should be the same encoding that is used in the original text. If you don't know then UTF-8 is good default but don't rely on the platform's default character encoding because that is subject to change and might not be UTF-8. A better option is to set the character encoding for your application at the JVM level to have complete control over how the byte array gets converted to String.


2 comments:

  1. Replies
    1. Hello Anonymous, can you provide more details? what are you trying to do and what error are you getting? I may be able to help you

      Delete

Feel free to comment, ask questions if you have any doubt.