How do I extract a tar file in Java?
Note: This functionality was later published through a separate project, Apache Commons Compress, as described in another answer. This answer is out of date.
I haven't used a tar API directly, but tar and bzip2 are implemented in Ant; you could borrow their implementation, or possibly use Ant to do what you need.
Gzip is part of Java SE (and I'm guessing the Ant implementation follows the same model).
GZIPInputStream
is just an InputStream
decorator. You can wrap, for example, a FileInputStream
in a GZIPInputStream
and use it in the same way you'd use any InputStream
:
InputStream is = new GZIPInputStream(new FileInputStream(file));
(Note that the GZIPInputStream has its own, internal buffer, so wrapping the FileInputStream
in a BufferedInputStream
would probably decrease performance.)
How to Access files in a .tar file in Java
Java inclides built in function for handling bot zipped and gziped libraries.
http://docs.oracle.com/javase/6/docs/api/java/util/zip/package-summary.html.
This can be used turn your .tgz
into a regular .tar
without much trouble. And no you cannot treat .tgz
as regular zips. They are first achived in a tar and then compressed with gzip. Even if you gunzip it, you will still need to unpack the tar archive to get any of the file out of it.
Handling tar files is a bit more difficult. This previous question might help :
How do I extract a tar file in Java?
From these solutions I strongly recommend the apache commons solution :
http://commons.apache.org/vfs/filesystems.html
It will allow you to read from your tar as if it were a filesystem without doing any writes to your hard drive. You will have to know what you're looking for before you go in to it, but i doubt that will hinder you much
Extract a .tar.gz file in java (JSP)
Ok, i finally figured this out, here is my code in case this helps anyone in the future.
Its written in Java, using the apache commons io and compress librarys.
File dir = new File("directory/of/.tar.gz/files/here");
File listDir[] = dir.listFiles();
if (listDir.length!=0){
for (File i:listDir){
/* Warning! this will try and extract all files in the directory
if other files exist, a for loop needs to go here to check that
the file (i) is an archive file before proceeding */
if (i.isDirectory()){
break;
}
String fileName = i.toString();
String tarFileName = fileName +".tar";
FileInputStream instream= new FileInputStream(fileName);
GZIPInputStream ginstream =new GZIPInputStream(instream);
FileOutputStream outstream = new FileOutputStream(tarFileName);
byte[] buf = new byte[1024];
int len;
while ((len = ginstream.read(buf)) > 0)
{
outstream.write(buf, 0, len);
}
ginstream.close();
outstream.close();
//There should now be tar files in the directory
//extract specific files from tar
TarArchiveInputStream myTarFile=new TarArchiveInputStream(new FileInputStream(tarFileName));
TarArchiveEntry entry = null;
int offset;
FileOutputStream outputFile=null;
//read every single entry in TAR file
while ((entry = myTarFile.getNextTarEntry()) != null) {
//the following two lines remove the .tar.gz extension for the folder name
String fileName = i.getName().substring(0, i.getName().lastIndexOf('.'));
fileName = fileName.substring(0, fileName.lastIndexOf('.'));
File outputDir = new File(i.getParent() + "/" + fileName + "/" + entry.getName());
if(! outputDir.getParentFile().exists()){
outputDir.getParentFile().mkdirs();
}
//if the entry in the tar is a directory, it needs to be created, only files can be extracted
if(entry.isDirectory){
outputDir.mkdirs();
}else{
byte[] content = new byte[(int) entry.getSize()];
offset=0;
myTarFile.read(content, offset, content.length - offset);
outputFile=new FileOutputStream(outputDir);
IOUtils.write(content,outputFile);
outputFile.close();
}
}
//close and delete the tar files, leaving the original .tar.gz and the extracted folders
myTarFile.close();
File tarFile = new File(tarFileName);
tarFile.delete();
}
}
How to Compress/Decompress tar.gz files in java
My favorite is plexus-archiver - see sources on GitHub.
Another option is Apache commons-compress - (see mvnrepository).
With plexus-utils, the code for unarchiving looks like this:
final TarGZipUnArchiver ua = new TarGZipUnArchiver();
// Logging - as @Akom noted, logging is mandatory in newer versions, so you can use a code like this to configure it:
ConsoleLoggerManager manager = new ConsoleLoggerManager();
manager.initialize();
ua.enableLogging(manager.getLoggerForComponent("bla"));
// -- end of logging part
ua.setSourceFile(sourceFile);
destDir.mkdirs();
ua.setDestDirectory(destDir);
ua.extract();
Similar *Archiver classes are there for archiving.
With Maven, you can use this dependency:
<dependency>
<groupId>org.codehaus.plexus</groupId>
<artifactId>plexus-archiver</artifactId>
<version>2.2</version>
</dependency>
How to untar a TAR file using Apache Commons
A couple of general points, why do you do voodoo with the File
constructor, where there is a perfectly usable constructor where you can define the name of the File
you want to create and give a parent File?
Secondly I am not too sure how empty spaces are handled in paths in windows. It might be the cause of your problems. Try using the constructor I mentioned above and see if it makes a difference: File destPath = new File(dest, tarEntry.getName());
(assuming that File dest
is a proper file, and exists and is accessible by you.
Third, before you do anything with a File
object you should check if it exists and if it is accessible. That will ultimately help you pinpoint the problem.
Mounting and untar'ing a file in Java
Making progress. In case anyone was wondering, here is how I am extracting a tar.gz file in Java. Put together from a few online tutorials.
public static void extract(String tgzFile, String outputDirectory)
throws Exception {
// Create the Tar input stream.
FileInputStream fin = new FileInputStream(tgzFile);
GZIPInputStream gin = new GZIPInputStream(fin);
TarInputStream tin = new TarInputStream(gin);
// Create the destination directory.
File outputDir = new File(outputDirectory);
outputDir.mkdir();
// Extract files.
TarEntry tarEntry = tin.getNextEntry();
while (tarEntry != null) {
File destPath = new File(outputDirectory + File.separator + tarEntry.getName());
if (tarEntry.isDirectory()) {
destPath.mkdirs();
} else {
// If the parent directory of a file doesn't exist, create it.
if (!destPath.getParentFile().exists())
destPath.getParentFile().mkdirs();
FileOutputStream fout = new FileOutputStream(destPath);
tin.copyEntryContents(fout);
fout.close();
// Presserve the last modified date of the tar'd files.
destPath.setLastModified(tarEntry.getModTime().getTime());
}
tarEntry = tin.getNextEntry();
}
tin.close();
}
Extract tar.gz file in memory in Java
Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?
Yea sure.
Just replace the code in the inner loop that is openning files and writing to them with code that writes to a ByteArrayOutputStream
... or a series of such streams.
The natural representation of the data that you read from the TAR (like that) will be bytes / byte arrays. If the bytes are properly encoded characters, and you know the correct encoding, then you can convert them to strings. Otherwise, it is better to leave the data as bytes. (If you attempt to convert non-text data to strings, or if you convert using the wrong charset/encoding you are liable to mangle it ... irreversibly.)
Obviously, you are going to need to think through some of these issues yourself, but basic idea should work ... provided you have enough heap space.
How uncompress a specific file from a TAR using apache commons?
The TAR file format is designed to be written or read as a stream (ie, to/from a tape drive), and does not have a centralized header. So no, there's no way around reading the entire file to extract individual entries.
If you want random access, you should use the ZIP format, and open using the JDK's ZipFile
. Assuming that you have enough virtual memory, the file will be memory-mapped, making random access very fast (I haven't looked to see if it will use a random-access file if unable to memory-map).
Related Topics
How to Check If My String Is Equal to Null
Increasing the Jvm Maximum Heap Size for Memory Intensive Applications
How Is Hashcode() Calculated in Java
Does Java Have Built in Libraries for Audio _Synthesis_
Jfreechart Line Chart with Text at Each Point
Spring Jdbc Template for Calling Stored Procedures
Using Internal Sun Classes with Javac
Creating Java Date Object from Year,Month,Day
Why Does Gson Use Fields and Not Getters/Setters
Java Spring Boot: How to Map My App Root ("/") to Index.Html
How to Get the Caller Class in Java
Why Do Many Collection Classes in Java Extend the Abstract Class and Implement the Interface as Well
Why I Can't Create an Array with Large Size
Calendar Returns Date in Wrong Time Zone
Get Only Part of an Array in Java
Convert JSON String to Pretty Print JSON Output Using Jackson
How to Read Multiple Integer Values from a Single Line of Input in Java