Compression and Decompression of String Data in Java

compression and decompression of string data in java

This is because of

String outStr = obj.toString("UTF-8");

Send the byte[] which you can get from your ByteArrayOutputStream and use it as such in your ByteArrayInputStream to construct your GZIPInputStream. Following are the changes which need to be done in your code.

byte[] compressed = compress(string); //In the main method

public static byte[] compress(String str) throws Exception {
...
...
return obj.toByteArray();
}

public static String decompress(byte[] bytes) throws Exception {
...
GZIPInputStream gis = new GZIPInputStream(new ByteArrayInputStream(bytes));
...
}

How to GZip decompress a compressed String data with Java code

Your string is base64 encoded gzip data, so you need to base64 decode it, instead of trying to encode it as UTF-8 bytes.

String input = "H4sIAAAAAAAAAHNJLQtJLS4BALwLiloHAAAA";
byte[] byteCompressed = Base64.getDecoder().decode(input);
// ... rest of your code

String compression and decompression for data set that can be nested

We actually want to find the strings in [] and repeat it n times which is specified before the []. But the problem is these strings are nested.
So when we call the function that reads the string in[] it should call itself again when it hits a new []. Hence this leads to a recursive solution and we loop through the input string only once.

public class StringDecomposer {

public String process(String input) {

StringBuilder result = new StringBuilder();

decompres(input, 0, result);

return result.toString();
}

private int decompres(String input, int ofset, StringBuilder result) {

StringBuilder rpt = new StringBuilder();
StringBuilder current = new StringBuilder();

while(ofset < input.length()) {

if(input.charAt(ofset) == '[') {
ofset = decompres(input, ofset+1, current);
repeat(rpt, current);
result.append(current);
rpt.delete(0, rpt.length());
current.delete(0, current.length());
}
else if(input.charAt(ofset) == ']') {
break;
}
else if(input.charAt(ofset) > 47 &&
input.charAt(ofset) < 58) {
rpt.append(input.charAt(ofset));
}
else {
current.append(input.charAt(ofset));
}
ofset++;
}
result.append(current);
return ofset;
}

private void repeat(StringBuilder rpt, StringBuilder input) {

if(rpt.length() > 0) {
StringBuilder current = new StringBuilder(input);
int times = Integer.parseInt(rpt.toString());
for(int i = 1; i < times; i++) {
input.append(current);
}
}
}
}

Obtain a string from the compressed data and vice versa in java

I don't see any good reasons for you to compress your data: Cassandra can do it for you transparently (it will LZ4 your data by default). So, if your goal is to reduce your data footprint then you have a non-existent problem, and I'd feed the XML document directly to C*.

By the way, all the compression algorithms take array of bytes and produce array of bytes. As a solution, you could apply something like a base64 encoding to your compressed byte array. On decompression, reverse the logic: decode base64 your string and then apply your decompression algorithm.

Decompress GZip string in Java

There's no such thing as a GZip string. GZip is binary, strings are text.

If you want to compress a string, you need to convert it into binary first - e.g. with OutputStreamWriter chained to a compressing OutputStream (e.g. a GZIPOutputStream)

Likewise to read the data, you can use an InputStreamReader chained to a decompressing InputStream (e.g. a GZIPInputStream).

One way of easily reading from a Reader is to use CharStreams.toString(Readable) from Guava, or a similar library.



Related Topics



Leave a reply



Submit