How to grep for a pattern in the files in tar archive without filling up disk space
Here's my take on this:
while read filename; do tar -xOf file.tar "$filename" | grep 'pattern' | sed "s|^|$filename:|"; done < <(tar -tf file.tar | grep -v '/$')
Broken out for explanation:
while read filename; do
-- it's a loop...tar -xOf file.tar "$filename"
-- this extracts each file...| grep 'pattern'
-- here's where you put your pattern...| sed "s|^|$filename:|";
- prepend the filename, so this looks like grep. Salt to taste.done < <(tar -tf file.tar | grep -v '/$')
-- end the loop, get the list of files as to fead to yourwhile read
.
One proviso: this breaks if you have OR bars (|
) in your filenames.
Hmm. In fact, this makes a nice little bash function, which you can append to your .bashrc
file:
targrep() {
local taropt=""
if [[ ! -f "$2" ]]; then
echo "Usage: targrep pattern file ..."
fi
while [[ -n "$2" ]]; do
if [[ ! -f "$2" ]]; then
echo "targrep: $2: No such file" >&2
fi
case "$2" in
*.tar.gz) taropt="-z" ;;
*) taropt="" ;;
esac
while read filename; do
tar $taropt -xOf "$2" \
| grep "$1" \
| sed "s|^|$filename:|";
done < <(tar $taropt -tf $2 | grep -v '/$')
shift
done
}
Performing grep operation in tar files without extracting
the tar
command has a -O
switch to extract your files to standard output. So you can pipe those output to grep/awk
tar xvf test.tar -O | awk '/pattern/{print}'
tar xvf test.tar -O | grep "pattern"
eg to return file name one pattern found
tar tf myarchive.tar | while read -r FILE
do
if tar xf test.tar $FILE -O | grep "pattern" ;then
echo "found pattern in : $FILE"
fi
done
Getting contents of a particular file in the tar archive
This is usually documented in man pages, try running this command:
man tar
Unfortunately, Linux has not the best set of man pages. There is an online copy of tar manpage from this OS: http://linux.die.net/man/1/tar and it is terrible. But it links to info man
command which is command to access the "info" system widely used in GNU world (many programs in linux user-space are from GNU projects, for example gcc). There is an exact link to section of online info tar
about extracting specific files: http://www.gnu.org/software/tar/manual/html_node/extracting-files.html#SEC27
I may also recommend documentation from BSD (e.g. FreeBSD) or opengroup.org. Utilities can be different in detail but behave same in general.
For example, there is some rather old but good man from opengroup (XCU means 'Commands and Utilities' of the Single UNIX Specification, Version 2, 1997):
http://pubs.opengroup.org/onlinepubs/7908799/xcu/tar.html
tar key [file...]
The following operands are supported:
key --
The key operand consists of a function letter followed immediately by zero or more modifying letters. The function letter is one of the following:x --
Extract the named file or files from the archive. If a named file matches a directory whose contents had been written onto the archive, this directory is (recursively) extracted. If a named file in the archive does not exist on the system, the file is created with the same mode as the one in the archive, except that the set-user-ID and set-group-ID modes are not set unless the user has appropriate privileges. If the files exist, their modes are not changed except as described above. The owner, group, and modification time are restored (if possible). If no file operand is given, the entire content of the archive is extracted. Note that if several files with the same name are in the archive, the last one overwrites all earlier ones.
And to fully understand command tar xf test.tar $FILE
you should also read about f
option:
f --
Use the first file operand (or the second, if b has already been specified) as the name of the archive instead of the system-dependent default.
So, test.tar
in your command will be used by f
key as archive name; then x will use second argument ($FILE
) as name of file or directory to extract from archive.
How can I grep for a text pattern in a zipped text file?
zgrep on Linux. If you're on Windows, you can download GnuWin which contains a Windows port of zgrep.
Listing(or counting) files in .tar/.tar.gz archives: What is the time complexity?
Depends on your storage!
uncompressed tar
For tape archives (you know, "tar"s), linear to byte length, in any case, because fast-forwarding is still linear in time to the length you need to fast-forward.
For small files on modern storage: the same; you don't ask your SSD for 20 Bytes of storage. You get 4kB at once; in theory, this means you could pretty instantly skip over that 1GB file. In practice, my experience tells me that doesn't happen; I honestly don't know why. To me, the "next_block_after" function should just skip forward. shrugs
compressed tar
yes, in general you'll have to uncompress to know how long the content is to seek somewhere. I don't think there's a compression format that keeps some kind of table with "intermediate" lengths to speed up seeking.
grep json value of a key name. (busybox without option -P)
With busybox awk
:
busybox awk -F '[:,]' '/"one"/ {gsub("[[:blank:]]+", "", $2); print $2}'
-F '[:,]'
sets the field separator as:
or,
/"one"/ {gsub("[[:blank:]]+", "", $2); print $2}
macthes if the line contains"one"
, if so strips off all horizontal whitespace(s) from second field and then printing the field
If you want to strip off the quotes too:
busybox awk -F '[:,]' '/"one"/ {gsub("[[:blank:]\"]+", "", $2); print $2}'
Example:
$ cat file.json
{
"one": "apple",
"two": "banana"
}
$ busybox awk -F '[:,]' '/"one"/ {gsub("[[:blank:]]+", "", $2); print $2}' file.json
"apple"
$ busybox awk -F '[:,]' '/"one"/ {gsub("[[:blank:]\"]+", "", $2); print $2}' file.json
apple
Related Topics
Delete All Files Except the Newest 3 in Bash Script
How to Limit File Size on Commit
Can Ptrace Tell If an X86 System Call Used the 64-Bit or 32-Bit Abi
Which Jdk's Distributions Can Run 'Javac -Source 1.6 -Target 1.5'
Linux: Block Until a String Is Matched in a File ("Tail + Grep with Blocking")
Url Encoding a String in Bash Script
Redirecting Output to a File in C
Differencebetween "Var=${Var:-Word}" and "Var=${Var:=Word}"
How to Read the Mouse Button State from /Dev/Input/Mice
Creating Permanent Executable Aliases
Find All Files Matching 'Name' on Linux System, and Search with Them for 'Text'
How to Check If Ssh-Agent Is Already Running in Bash
Position of a String Within a String Using Linux Shell Script