Steganography in Lossy Compression (Java)

Steganography in lossy compression (JAVA)

The jpeg uses a lossy compression method to achieve smaller file sizes. Unfortunately, that very method directly affects the value of (some) pixels, thereby destroying the information the way you have embedded it. You need to save the file in a lossless format to avoid this problem, such as bmp or png.

Jpeg steganography is somewhat more complex to code, but the concept is straightforward. You will either need to write up a jpeg encoder, or use one in existence already. The code you linked to is indeed an encoder and with some minor modifications you can use it for your project.

If you want to understand the code, you can read the wikipedia article on jpeg encoding. I will briefly summarise some of its key steps.

Split the image in 8x8 blocks.
Use the discrete cosine transform (DCT) on each to obtain the float DCT coefficients and quantise them to integers.
Store the quantised coefficients to a file using Huffman coding and run length encoding.

The quantisation in the second step is the lossy bit, but everything that follows afterwards is lossless. So basically, obtain the quantised coefficients from the second step, modify them with your steganography algorithm and continue with the third step.

Onto the practical modifications of the linked code. The Compress method is what you need to call to store an rgb image to a file. It takes care of writing the header data and the compressed coefficients. You just need to add a bit of code in the WriteCompressedData method. What it does for now is loop over each 8x8 image block, apply the dct and quantise the coefficients, which are stored in dctArray3. This data is then compressed written to file. That's where you have to intervene, by modifying dctArray3 before calling Huf.HuffmanBlockEncoder.

For example, let's say you have a byte array of your secret, called message, and you want to embed one bit per 8x8 block in the lsb of a specific coefficient.

public void WriteCompressedData(BufferedOutputStream outStream, byte[] message) {
    byte currentByte;
    int nBytes = message.length;
    int iByte = 0;
    int iBit = 7;
    if (nBytes > 0) {
        currentByte = message[0];
    } else {
        currentByte = (byte) 0;
    }
    // Original method code up until the following line
    dctArray3 = dct.quantizeBlock(dctArray2, JpegObj.QtableNumber[comp]);
    // ******************** our stuff *******************
    if (iByte < nBytes) {
        int bit = (currentByte >> iBit) & 1;
        iBit--;
        if (iBit == -1) {
            iBit = 7;
            iByte++;
            if (iByte < nBytes) {
                currentByte = message[iByte];
            }
        }
        dctArray3[23] = (dctArray3[23] & 0xfffffffe) | bit;
    }
    // **************************************************
    Huf.HuffmanBlockEncoder(outStream, dctArray3, lastDCvalue[comp], JpegObj.DCtableNumber[comp], JpegObj.ACtableNumber[comp]);
    ...
}

The decoding is the reverse of this, where you read the DCT coefficients and extract your secret from them with the appropriate algorithm. You will require a jpeg decoder for this, so I just borrowed the relevant files from the F5 Steganography project. Specifically, you need the files in the ortega folder and then you can use it like this.

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import ortega.HuffmanDecode;

public class Extract {
    private static byte[] deZigZag = {
            0, 1, 5, 6, 14, 15, 27, 28, 2, 4, 7, 13, 16, 26, 29, 42, 3, 8, 12, 17, 25, 30, 41, 43, 9, 11, 18, 24, 31,
            40, 44, 53, 10, 19, 23, 32, 39, 45, 52, 54, 20, 22, 33, 38, 46, 51, 55, 60, 21, 34, 37, 47, 50, 56, 59, 61,
            35, 36, 48, 49, 57, 58, 62, 63 };

    private static int[] extract(InputStream fis, int flength) throws IOException {
        byte[] carrier = new byte[flength];
        fis.read(carrier);
        HuffmanDecode hd = new HuffmanDecode(carrier);
        int[] coeff = hd.decode();
        return coeff;
    }

    public static void main(String[] args) {
        // run with argument the stego jpeg filename
        try {
            File f = new File(args[0]);
            FileInputStream fis = new FileInputStream(f);
            int[] coeff = extract(fis, (int) f.length());

            int idx = deZigZag[23];
            // The coeff array has all of the DCT coefficients in one big
            // array, so that the first 64 elements are the coefficients 
            // from the first block, the next 64 from the second and so on.
            //
            // idx is the position of the embedding DCT coefficient.
            // You can start with that and extract its lsb, then increment
            // by 64 to extract the next bit from the next "block" and so on.
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Hiding a message in a JPEG image

The best place to hide a message in JPEG is in the blocks that extend beyond the edge of the image (unless the image dimensions are multiples of 8).

Steganography using the DCT

The embedding of the message occurs after the lossy compression -- there's no possibility of losing the message, because the steps which lose data have already been performed (other than the actual embedding, which loses only image data, replacing it with your message). Ideally you then extract the message directly from the coefficients themselves -- that is, decompression is not involved in the extraction.

DCT Steganography

It seems like you're trying to do JPEG steganography. The standard for writing a jpeg encoder is quite involving, so it's easier to borrow an already written one and make any small modifications to inject your hiding algorithm in it.

I've answered a similar question where I briefly summarised the key points of the algorithm and showed an example in java.

A good start to learn about the process is on wikipedia on JPEG encoding. This should answer all questions you've posed, but I'll address them here as well.

What should I pass into getDTCTransformMatrix() function? Right now I am passing integer value of the pixel, which is wrong I think. I have seen some examples where guys were passing value from 0-255 so should I convert image into gray-scale? Or should I do it for each color (R,G,B)?

Yes, you do the pass integer values of each colour plane individually. That can either be in RGB or YCrCb. And it can also be from 0-255, or centred around 0, i.e., -127,128. The reason why YCrCb is preferred is because some of the channels can be compressed more without any obvious loss of quality due to how our eyes work. And shifting the range of the numbers to be centred around 0 means that the resulting DCT coefficients will have smaller values and will take fewer bits to store.

After getDTCTransformMatrix() is performed I get matrix of double. How can I edit LSB in double value? And is this a correct approach?

This is the main point of JPEG encoding. You're supposed to quantise the coefficients (turn them to integers) with a specific quantisation matrix. While there is a default one, various programs opt to use custom ones. The idea is that the low frequency coefficients will not be affected a lot, while most of the high frequency ones may become 0, which aids in the final file size being smaller. This is a lossy process and you're supposed to embed your information AFTER you have quantised the coefficients, as the remaining steps are all lossless.

After I will change an LSB in double value what should I do next? How to make sure that information will be stored in the image.

You arrange the coefficients in a zigzag pattern in a 1D, so that the low frequency coefficients come first. You then use a combination of run-length and Huffman encoding to store that information in a file. The binary data of the file bear no resemblance to either the original pixels (obviously) or the values of the DCT coefficients. It's compressed data of the coefficients.

what's the best practice for image steganography resistant to various attacks?

Your 1st question pertains to lossy methods removing the 'noise' (which are of course, the hidden bits) in your image. You may have to scatter it with redundancy. The LSB may not work as well as the position of the bits has to be distributed. Which means, the bits may have to be at various parts of the bits repetitively, so that, you can recover the message even when the other copies are corrupted. You may like to add a hash to ensure that the message is not corrupted (though the probability of the hash itself may). But redundancy and wider distribution may give you a good chance to survive the bits.
An idea may be to use proven cryptographic methods like AES or ECC (key management would be another topic). This will make your data bits "noise like". The position indices may also be determined via similar way. The principle is to create uniform distributions to deter predictability or pastern correlation for both data and location of the bits.

I hope this may give some guides to your steganographic design considerations.

Steganography in Lossy Compression (Java)