How to Zip and Unzip a String Using Gzipoutputstream That Is Compatible with .Net

How can I Zip and Unzip a string using GZIPOutputStream that is compatible with .Net?

The GZIP methods:

public static byte[] compress(String string) throws IOException {
ByteArrayOutputStream os = new ByteArrayOutputStream(string.length());
GZIPOutputStream gos = new GZIPOutputStream(os);
gos.write(string.getBytes());
gos.close();
byte[] compressed = os.toByteArray();
os.close();
return compressed;
}

public static String decompress(byte[] compressed) throws IOException {
final int BUFFER_SIZE = 32;
ByteArrayInputStream is = new ByteArrayInputStream(compressed);
GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
StringBuilder string = new StringBuilder();
byte[] data = new byte[BUFFER_SIZE];
int bytesRead;
while ((bytesRead = gis.read(data)) != -1) {
string.append(new String(data, 0, bytesRead));
}
gis.close();
is.close();
return string.toString();
}

And a test:

final String text = "hello";
try {
byte[] compressed = compress(text);
for (byte character : compressed) {
Log.d("test", String.valueOf(character));
}
String decompressed = decompress(compressed);
Log.d("test", decompressed);
} catch (IOException e) {
e.printStackTrace();
}

=== Update ===

If you need .Net compability my code has to be changed a little:

public static byte[] compress(String string) throws IOException {
byte[] blockcopy = ByteBuffer
.allocate(4)
.order(java.nio.ByteOrder.LITTLE_ENDIAN)
.putInt(string.length())
.array();
ByteArrayOutputStream os = new ByteArrayOutputStream(string.length());
GZIPOutputStream gos = new GZIPOutputStream(os);
gos.write(string.getBytes());
gos.close();
os.close();
byte[] compressed = new byte[4 + os.toByteArray().length];
System.arraycopy(blockcopy, 0, compressed, 0, 4);
System.arraycopy(os.toByteArray(), 0, compressed, 4, os.toByteArray().length);
return compressed;
}

public static String decompress(byte[] compressed) throws IOException {
final int BUFFER_SIZE = 32;
ByteArrayInputStream is = new ByteArrayInputStream(compressed, 4, compressed.length - 4);
GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
StringBuilder string = new StringBuilder();
byte[] data = new byte[BUFFER_SIZE];
int bytesRead;
while ((bytesRead = gis.read(data)) != -1) {
string.append(new String(data, 0, bytesRead));
}
gis.close();
is.close();
return string.toString();
}

You can use the same test script.

Compression/Decompression string with C#

The code to compress/decompress a string

public static void CopyTo(Stream src, Stream dest) {
byte[] bytes = new byte[4096];

int cnt;

while ((cnt = src.Read(bytes, 0, bytes.Length)) != 0) {
dest.Write(bytes, 0, cnt);
}
}

public static byte[] Zip(string str) {
var bytes = Encoding.UTF8.GetBytes(str);

using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream()) {
using (var gs = new GZipStream(mso, CompressionMode.Compress)) {
//msi.CopyTo(gs);
CopyTo(msi, gs);
}

return mso.ToArray();
}
}

public static string Unzip(byte[] bytes) {
using (var msi = new MemoryStream(bytes))
using (var mso = new MemoryStream()) {
using (var gs = new GZipStream(msi, CompressionMode.Decompress)) {
//gs.CopyTo(mso);
CopyTo(gs, mso);
}

return Encoding.UTF8.GetString(mso.ToArray());
}
}

static void Main(string[] args) {
byte[] r1 = Zip("StringStringStringStringStringStringStringStringStringStringStringStringStringString");
string r2 = Unzip(r1);
}

Remember that Zip returns a byte[], while Unzip returns a string. If you want a string from Zip you can Base64 encode it (for example by using Convert.ToBase64String(r1)) (the result of Zip is VERY binary! It isn't something you can print to the screen or write directly in an XML)

The version suggested is for .NET 2.0, for .NET 4.0 use the MemoryStream.CopyTo.

IMPORTANT: The compressed contents cannot be written to the output stream until the GZipStream knows that it has all of the input (i.e., to effectively compress it needs all of the data). You need to make sure that you Dispose() of the GZipStream before inspecting the output stream (e.g., mso.ToArray()). This is done with the using() { } block above. Note that the GZipStream is the innermost block and the contents are accessed outside of it. The same goes for decompressing: Dispose() of the GZipStream before attempting to access the data.

Decompress string in java from compressed string in C#

My C# code to compress is

 private string Compress(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
MemoryStream ms = new MemoryStream();
using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
{
zip.Write(buffer, 0, buffer.Length);
}

ms.Position = 0;
MemoryStream outStream = new MemoryStream();

byte[] compressed = new byte[ms.Length];
ms.Read(compressed, 0, compressed.Length);

byte[] gzBuffer = new byte[compressed.Length + 4];
System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
return Convert.ToBase64String(gzBuffer);
}

Java code to decompress the text is

private String Decompress(String compressedText)
{

byte[] compressed = compressedText.getBytes("UTF8");
compressed = org.apache.commons.codec.binary.Base64.decodeBase64(compressed);
byte[] buffer=new byte[compressed.length-4];
buffer = copyForDecompression(compressed,buffer, 4, 0);
final int BUFFER_SIZE = 32;
ByteArrayInputStream is = new ByteArrayInputStream(buffer);
GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
StringBuilder string = new StringBuilder();
byte[] data = new byte[BUFFER_SIZE];
int bytesRead;
while ((bytesRead = gis.read(data)) != -1)
{
string.append(new String(data, 0, bytesRead));
}
gis.close();
is.close();
return string.toString();
}
private byte[] copyForDecompression(byte[] b1,byte[] b2,int srcoffset,int dstoffset)
{
for(int i=0;i<b2.length && i<b1.length;i++)
{
b2[i]=b1[i+4];
}
return b2;
}

This code works perfectly fine for me.

How to decompress a GZip Compressed String in C#?

The code shown works just fine, if we make reasonable assumptions about how it was compressed in the first place:

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

static class P
{
static void Main()
{
Console.WriteLine(lipsum.Length); // 61125 chars of lipsum (not shown)
Console.WriteLine(Encoding.UTF8.GetByteCount(lipsum)); // 61125 bytes of lipsum
var bytes = Compress(lipsum);
Console.WriteLine(bytes.Length); // 16795 bytes compressed
var value = Decompress(bytes);
Console.WriteLine(value.Length); // 61125 bytes again when decompressed
Console.WriteLine(value == lipsum); // True - it worked fine
}
private static byte[] Compress(string value)
{
using (var memoryStream = new MemoryStream())
{
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress))
{
gZipStream.Write(Encoding.UTF8.GetBytes(value));
}
return memoryStream.ToArray();
}
}
private static string Decompress(byte[] bytes)
{
using (var memoryStream = new MemoryStream(bytes))
using (var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress))
using (var memoryStreamOutput = new MemoryStream())
{
gZipStream.CopyTo(memoryStreamOutput);
var outputBytes = memoryStreamOutput.ToArray();

string decompressed = Encoding.UTF8.GetString(outputBytes);
return decompressed;
}
}

// MASSIVELY TRUNCATED FOR POST!
const string lipsum = @"Lorem ipsum dolor sit amet, ... ac dolor ac hendrerit.";
}

.NET 6 failing at Decompress large gzip text

Just confirmed that the article linked in the comments below the question contains a valid clue on the issue.

Corrected code would be:

string Decompress(string compressedText)
{
var gZipBuffer = Convert.FromBase64String(compressedText);

using var memoryStream = new MemoryStream();
int dataLength = BitConverter.ToInt32(gZipBuffer, 0);
memoryStream.Write(gZipBuffer, 4, gZipBuffer.Length - 4);

var buffer = new byte[dataLength];
memoryStream.Position = 0;

using var gZipStream = new GZipStream(memoryStream, CompressionMode.Decompress);

int totalRead = 0;
while (totalRead < buffer.Length)
{
int bytesRead = gZipStream.Read(buffer, totalRead, buffer.Length - totalRead);
if (bytesRead == 0) break;
totalRead += bytesRead;
}

return Encoding.UTF8.GetString(buffer);
}

This approach changes

gZipStream.Read(buffer, 0, buffer.Length);

to

    int totalRead = 0;
while (totalRead < buffer.Length)
{
int bytesRead = gZipStream.Read(buffer, totalRead, buffer.Length - totalRead);
if (bytesRead == 0) break;
totalRead += bytesRead;
}

which takes the Read's return value into account correctly.

Without the change, the issue is easily repeatable on any string random enough to produce a gzip of length > ~10kb.

Here's the compressor, if anyone's interested in testing this on your own

string Compress(string plainText)
{
var buffer = Encoding.UTF8.GetBytes(plainText);
using var memoryStream = new MemoryStream();

var lengthBytes = BitConverter.GetBytes((int)buffer.Length);
memoryStream.Write(lengthBytes, 0, lengthBytes.Length);

using var gZipStream = new GZipStream(memoryStream, CompressionMode.Compress);

gZipStream.Write(buffer, 0, buffer.Length);
gZipStream.Flush();

var gZipBuffer = memoryStream.ToArray();

return Convert.ToBase64String(gZipBuffer);
}

.NET GZipStream compress and decompress

Try this code:

public static bool Test()
{
string sample = "This is a compression test of microsoft .net gzip compression method and decompression methods";

System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();

byte[] data = encoding.GetBytes(sample);
bool result = false;

// Compress
MemoryStream cmpStream = new MemoryStream();

GZipStream hgs = new GZipStream(cmpStream, CompressionMode.Compress);

hgs.Write(data, 0, data.Length);

byte[] cmpData = cmpStream.ToArray();

MemoryStream decomStream = new MemoryStream(cmpData);

hgs = new GZipStream(decomStream, CompressionMode.Decompress);
hgs.Read(data, 0, data.Length);

string sampleOut = encoding.GetString(data);

result = String.Equals(sample, sampleOut);
return result;
}

The problem what that you were not using the ASCIIEncoder to get the string back for sampleData.

EDIT: Here's a cleaned up version of the code to help with Closing/Disposing:

public static bool Test()
{
string sample = "This is a compression test of microsoft .net gzip compression method and decompression methods";

System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();

byte[] data = encoding.GetBytes(sample);

// Compress.
GZipStream hgs;
byte[] cmpData;

using(MemoryStream cmpStream = new MemoryStream())
using(hgs = new GZipStream(cmpStream, CompressionMode.Compress))
{
hgs.Write(data, 0, data.Length);
hgs.Close()

// Do this AFTER the stream is closed which sounds counter intuitive
// but if you do it before the stream will not be flushed
// (even if you call flush which has a null implementation).
cmpData = cmpStream.ToArray();
}

using(MemoryStream decomStream = new MemoryStream(cmpData))
using(hgs = new GZipStream(decomStream, CompressionMode.Decompress))
{
hgs.Read(data, 0, data.Length);
}

string sampleOut = encoding.GetString(data);

bool result = String.Equals(sample, sampleOut);
return result;
}

How to compress a string in .net c# and decompress it in flash as3?

I got it to work with ZLIB.NET. I just had to set the encoding to ASCII Encoding.ASCII.GetBytes(str);

c# file compress with .gzip

The issue could be related you to the way the file stream is instantiated. In your code, you are combining a path, with the Path.Combine method with another fully qualified path.

Please see the code below. Another issue could be related to the hard coded path. Is the file named Input or Input.gz? Also note the ability to stack using statements for reduced nesting.

private void Compress(string filePath)
{
using (FileStream inputStream =
new FileStream(filePath, FileMode.OpenOrCreate, FileAccess.ReadWrite))
using (FileStream outputStream =
new FileStream(@"C:\\Users\\maki\\Desktop\\Input",
FileMode.OpenOrCreate, FileAccess.ReadWrite))
using (GZipStream gzip = new GZipStream(outputStream, CompressionMode.Compress))
{
inputStream.CopyTo(gzip);
}
}


Related Topics



Leave a reply



Submit