How to Extract a Tar File in Java

How do I extract a tar file in Java?

Note: This functionality was later published through a separate project, Apache Commons Compress, as described in another answer. This answer is out of date.


I haven't used a tar API directly, but tar and bzip2 are implemented in Ant; you could borrow their implementation, or possibly use Ant to do what you need.

Gzip is part of Java SE (and I'm guessing the Ant implementation follows the same model).

GZIPInputStream is just an InputStream decorator. You can wrap, for example, a FileInputStream in a GZIPInputStream and use it in the same way you'd use any InputStream:

InputStream is = new GZIPInputStream(new FileInputStream(file));

(Note that the GZIPInputStream has its own, internal buffer, so wrapping the FileInputStream in a BufferedInputStream would probably decrease performance.)

How to Access files in a .tar file in Java

Java inclides built in function for handling bot zipped and gziped libraries.
http://docs.oracle.com/javase/6/docs/api/java/util/zip/package-summary.html.

This can be used turn your .tgz into a regular .tar without much trouble. And no you cannot treat .tgz as regular zips. They are first achived in a tar and then compressed with gzip. Even if you gunzip it, you will still need to unpack the tar archive to get any of the file out of it.

Handling tar files is a bit more difficult. This previous question might help :
How do I extract a tar file in Java?

From these solutions I strongly recommend the apache commons solution :
http://commons.apache.org/vfs/filesystems.html

It will allow you to read from your tar as if it were a filesystem without doing any writes to your hard drive. You will have to know what you're looking for before you go in to it, but i doubt that will hinder you much

Extract a .tar.gz file in java (JSP)

Ok, i finally figured this out, here is my code in case this helps anyone in the future.
Its written in Java, using the apache commons io and compress librarys.

File dir = new File("directory/of/.tar.gz/files/here");
File listDir[] = dir.listFiles();
if (listDir.length!=0){
for (File i:listDir){
/* Warning! this will try and extract all files in the directory
if other files exist, a for loop needs to go here to check that
the file (i) is an archive file before proceeding */
if (i.isDirectory()){
break;
}
String fileName = i.toString();
String tarFileName = fileName +".tar";
FileInputStream instream= new FileInputStream(fileName);
GZIPInputStream ginstream =new GZIPInputStream(instream);
FileOutputStream outstream = new FileOutputStream(tarFileName);
byte[] buf = new byte[1024];
int len;
while ((len = ginstream.read(buf)) > 0)
{
outstream.write(buf, 0, len);
}
ginstream.close();
outstream.close();
//There should now be tar files in the directory
//extract specific files from tar
TarArchiveInputStream myTarFile=new TarArchiveInputStream(new FileInputStream(tarFileName));
TarArchiveEntry entry = null;
int offset;
FileOutputStream outputFile=null;
//read every single entry in TAR file
while ((entry = myTarFile.getNextTarEntry()) != null) {
//the following two lines remove the .tar.gz extension for the folder name
String fileName = i.getName().substring(0, i.getName().lastIndexOf('.'));
fileName = fileName.substring(0, fileName.lastIndexOf('.'));
File outputDir = new File(i.getParent() + "/" + fileName + "/" + entry.getName());
if(! outputDir.getParentFile().exists()){
outputDir.getParentFile().mkdirs();
}
//if the entry in the tar is a directory, it needs to be created, only files can be extracted
if(entry.isDirectory){
outputDir.mkdirs();
}else{
byte[] content = new byte[(int) entry.getSize()];
offset=0;
myTarFile.read(content, offset, content.length - offset);
outputFile=new FileOutputStream(outputDir);
IOUtils.write(content,outputFile);
outputFile.close();
}
}
//close and delete the tar files, leaving the original .tar.gz and the extracted folders
myTarFile.close();
File tarFile = new File(tarFileName);
tarFile.delete();
}
}

How to Compress/Decompress tar.gz files in java

My favorite is plexus-archiver - see sources on GitHub.

Another option is Apache commons-compress - (see mvnrepository).

With plexus-utils, the code for unarchiving looks like this:

final TarGZipUnArchiver ua = new TarGZipUnArchiver();
// Logging - as @Akom noted, logging is mandatory in newer versions, so you can use a code like this to configure it:
ConsoleLoggerManager manager = new ConsoleLoggerManager();
manager.initialize();
ua.enableLogging(manager.getLoggerForComponent("bla"));
// -- end of logging part
ua.setSourceFile(sourceFile);
destDir.mkdirs();
ua.setDestDirectory(destDir);
ua.extract();

Similar *Archiver classes are there for archiving.

With Maven, you can use this dependency:

<dependency>
<groupId>org.codehaus.plexus</groupId>
<artifactId>plexus-archiver</artifactId>
<version>2.2</version>
</dependency>

How to untar a TAR file using Apache Commons

A couple of general points, why do you do voodoo with the File constructor, where there is a perfectly usable constructor where you can define the name of the File you want to create and give a parent File?

Secondly I am not too sure how empty spaces are handled in paths in windows. It might be the cause of your problems. Try using the constructor I mentioned above and see if it makes a difference: File destPath = new File(dest, tarEntry.getName()); (assuming that File dest is a proper file, and exists and is accessible by you.

Third, before you do anything with a File object you should check if it exists and if it is accessible. That will ultimately help you pinpoint the problem.

Mounting and untar'ing a file in Java

Making progress. In case anyone was wondering, here is how I am extracting a tar.gz file in Java. Put together from a few online tutorials.

public static void extract(String tgzFile, String outputDirectory)
throws Exception {

// Create the Tar input stream.
FileInputStream fin = new FileInputStream(tgzFile);
GZIPInputStream gin = new GZIPInputStream(fin);
TarInputStream tin = new TarInputStream(gin);

// Create the destination directory.
File outputDir = new File(outputDirectory);
outputDir.mkdir();

// Extract files.
TarEntry tarEntry = tin.getNextEntry();
while (tarEntry != null) {
File destPath = new File(outputDirectory + File.separator + tarEntry.getName());

if (tarEntry.isDirectory()) {
destPath.mkdirs();
} else {
// If the parent directory of a file doesn't exist, create it.
if (!destPath.getParentFile().exists())
destPath.getParentFile().mkdirs();

FileOutputStream fout = new FileOutputStream(destPath);
tin.copyEntryContents(fout);
fout.close();
// Presserve the last modified date of the tar'd files.
destPath.setLastModified(tarEntry.getModTime().getTime());
}
tarEntry = tin.getNextEntry();
}
tin.close();
}

Extract tar.gz file in memory in Java

Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?

Yea sure.

Just replace the code in the inner loop that is openning files and writing to them with code that writes to a ByteArrayOutputStream ... or a series of such streams.

The natural representation of the data that you read from the TAR (like that) will be bytes / byte arrays. If the bytes are properly encoded characters, and you know the correct encoding, then you can convert them to strings. Otherwise, it is better to leave the data as bytes. (If you attempt to convert non-text data to strings, or if you convert using the wrong charset/encoding you are liable to mangle it ... irreversibly.)

Obviously, you are going to need to think through some of these issues yourself, but basic idea should work ... provided you have enough heap space.

How uncompress a specific file from a TAR using apache commons?

The TAR file format is designed to be written or read as a stream (ie, to/from a tape drive), and does not have a centralized header. So no, there's no way around reading the entire file to extract individual entries.

If you want random access, you should use the ZIP format, and open using the JDK's ZipFile. Assuming that you have enough virtual memory, the file will be memory-mapped, making random access very fast (I haven't looked to see if it will use a random-access file if unable to memory-map).



Related Topics



Leave a reply



Submit