Extract a Specific Folder to Specific Directory from a Tar.Gz

How to uncompress a tar.gz in another directory

gzip -dc archive.tar.gz | tar -xf - -C /destination

or, with GNU tar

tar xzf archive.tar.gz -C /destination

Extract a specific folder to specific directory from a tar.gz

Ok I figured it out!

Basically I can just use the strip command to remove the x number of leading directories. In this case, my command would look like this:

tar -xzf backup.tar.gz --strip-components=3 -C a/b/m

That removed the first three path directories from my archive (backup.tar.gz : a/b/c/d) before extracting it to the desctination directory.

Now it looks like this: a/b/m+d

Extracting specific folders in multiple tar.gz files recursively

Possible solution

After tinkering around with the above shell code I managed to extract only the csv folders by adding the csv wildcard command:

for f in *.tar.gz; do tar -xzvf "$f" "*csv*" -C ../synthea_output; done

The output now looks like this:

|-- output_1
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_10
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_11
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_12
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_2
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_3
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_4
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_5
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_6
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_7
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
|-- output_8
| `-- csv
| |-- allergies.csv
| |-- careplans.csv
| |-- conditions.csv
| |-- encounters.csv
| |-- immunizations.csv
| |-- medications.csv
| |-- observations.csv
| |-- patients.csv
| `-- procedures.csv
`-- output_9
`-- csv
|-- allergies.csv
|-- careplans.csv
|-- conditions.csv
|-- encounters.csv
|-- immunizations.csv
|-- medications.csv
|-- observations.csv
|-- patients.csv
`-- procedures.csv

Extract tar archive excluding a specific folder and its contents

You can use '--exclude' to omit a folder:

tar -xf archive.tar -C /home/user/target/folder" --exclude="folderC"

How to extract a single file from tar to a different directory?

The problem is that your arguments are in incorrect order. The single file argument must be last.

E.g.

$ tar xvf test.tar -C anotherDirectory/ testfile1

should do the trick.

PS: You should have asked this question on superuser instead of SO

How do I extract only the desired files from tar.gz?

I tested it with the following folder structure:

data/
data/a
data/a/ANOTHER_SNAPSHOT.jar
data/b
data/c
data/c/SNAPSHOT.jar
data/d
data/e
data/f
data/f/SNAPSHOT.jar.with.extension
data/g
data/g/SNAPSHOT.jar
data/h

The following wildcard mask works and extract only the files matching exactly SNAPSHOT.jar not SNAPSHOT.jar.with.extension and ANOTHER_SNAPSHOT.jar

tar -xf data.tar.gz --wildcards "*/SNAPSHOT.jar"

Result:

data/c/SNAPSHOT.jar
data/g/SNAPSHOT.jar

Extract files contained in archive.tar.gz to new directory named archive

Update since GNU tar 1.28:
use --one-top-level, see https://www.gnu.org/software/tar/manual/tar.html#index-one_002dtop_002dlevel_002c-summary

Older versions need to script this. You can specify the directory that the extract is placed in by using the tar -C option.

The script below assumes that the directories do not exist and must be created. If the directories do exist the script will still work - the mkdir will simply fail.

tar -xvzf archive.tar.gx -C archive_dir

e.g.

for a in *.tar.gz
do
a_dir=${a%.tar.gz}
mkdir --parents $a_dir
tar -xvzf $a -C $a_dir
done

How to extract a number of tar.gz files to a directory?

import glob, os, re, tarfile

# Setup main paths.
tarfile_rootdir = r'D:\SPRING2019\Tarfiles'
extract_rootdir = r'D:\SPRING2019\Test'

# Process the files.
re_pattern = re.compile(r'\A(\w+)-\d+[a-zA-Z]0{0,5}(\d+)')

for tar_file in glob.iglob(os.path.join(tarfile_rootdir, '*.tgz')):

# Get the parts from the base tgz filename using regular expressions.
part = re.findall(re_pattern, os.path.basename(tar_file))[0]

# Build the extraction path from each part.
extract_path = os.path.join(extract_rootdir, *part)

# Perform the extract of all files from the zipfile.
with tarfile.open(tar_file, 'r:gz') as r:
r.extractall(extract_path)

This code is based similar to the
answer
to your last question. Due to uncertain information on
directory structure, I will provide a structure as an
example.

TGZ files in D:\SPRING2019\Tarfiles:

DZB1216-500058L002001.tgz
DZB1216-500058L003001.tgz

Extract directory structure in D:\SPRING2019\Test:

DZB1216
2001
3001

The .tgz file paths are retrieved with glob.

From example filename: DZB1216-500058L002001.tgz,
the regular expression will capture 2 groups:

  • \A is an anchor at the start of the string.

    This is not a group.
  • (\w+) to match DZB1216.

    This is the 1st group.
  • -\d+[a-zA-Z]0{0,5} matches up to the next group.

    This is not a group.
  • (\d+) to match 2001.

    This is the 2nd group.

The extraction path is joined using the values of
extract_rootdir, DZB1216, and 2001.
This results in D:\SPRING2019\Test\DZB1216\2001
as the extraction path.

The use of tarfile
will extract all from the .tgz file.



Related Topics



Leave a reply



Submit