Short Command to Find Total Size of Files Matching a Wild Card

Short command to find total size of files matching a wild card

Try du to summarize disk usage:

du -csh *.jpg

Output (for example):

8.0K sane-logo.jpg
16K sane-umax-advanced.jpg
28K sane-umax-histogram.jpg
24K sane-umax.jpg
16K sane-umax-standard.jpg
4.0K sane-umax-text2.jpg
4.0K sane-umax-text4.jpg
4.0K sane-umax-text.jpg
104K total

du does not summarize the size of the files but summarizes the size of the used blocks in the file system. If a file has a size of 13K and the file system uses a block size of 4K, then 16K is shown for this file.

Output file size for all files of certain type in directory recursively?

Use stat -c %n,%s to get the file name and size of the individual files. Then use awk to sum the size and print.

$ find . -name '*.pdf' -exec stat -c %n,%s {} \; | awk -F, '{sum+=$2}END{print sum}'

In fact you don't need %n, since you want only the sum:

$ find . -name '*.pdf' -exec stat -c %s {} \; | awk '{sum+=$1}END{print sum}'

How to loop through files matching wildcard in batch file

Assuming you have two programs that process the two files, process_in.exe and process_out.exe:

for %%f in (*.in) do (
    echo %%~nf
    process_in "%%~nf.in"
    process_out "%%~nf.out"
)

%%~nf is a substitution modifier, that expands %f to a file name only.
See other modifiers in https://technet.microsoft.com/en-us/library/bb490909.aspx (midway down the page) or just in the next answer.

total size of group of files selected with 'find'

The command du tells you about disk usage. Example usage for your specific case:

find rapidly_shrinking_drive/ -name "offender1" -mtime -1 -print0 | du --files0-from=- -hc | tail -n1

(Previously I wrote du -hs, but on my machine that appears to disregard find's input and instead summarises the size of the cwd.)

find file with wild card matching

This is not covered by Node core. You can check out this module for what you are after.

Setup

npm i glob

Usage

var glob = require("glob")

// options is optional
glob("**/*.js", options, function (er, files) {
  // files is an array of filenames.
  // If the `nonull` option is set, and nothing
  // was found, then files is ["**/*.js"]
  // er is an error object or null.
})

Wildcard single file

Use this:

$ var=(*.txt)
$ echo $var
bar.txt

Key here is to use parentheses - putting elements into array. So echo $var prints the first element from the array (bar.txt). You can see that by printing the whole array:

$ echo ${var[@]}
bar.txt baz.txt qux.txt

The way to check a HDFS directory's size?

Prior to 0.20.203, and officially deprecated in 2.6.0:

hadoop fs -dus [directory]

Since ~~0.20.203~~ (dead link) 1.0.4 and still compatible through 2.6.0:

hdfs dfs -du [-s] [-h] URI [URI …]

You can also run hadoop fs -help for more info and specifics.

Using wildcards with sed

choroba's helpful answer works well with GNU sed, because using \| for alternation in a basic regular expression (implied by the absence of the -r option) is only supported there.

Also, the OP has since expressed a desire to use patterns to match similar element names.

Here's a solution that makes uses of extended regular expressions, which should work on both Linux (GNU Sed) and BSD/OSX platforms (BSD Sed):

sed -E 's%<([^>]*Name|[^>]*SSN|Address[^>]*)>[^<]*%<\1>***%g' file

Note:

It is import to match the variable parts of the element names with [^>]* rather than .* so as to ensure that the matches remain confined to the opening tag.
BSD/OSX extended regular expressions (in accordance with POSIX extended regular expressions) do not support backreferences inside the regular expression itself (as opposed to the "backreferences" that refer to capture-group matches in the replacement string), so no attempt is made to match the closing tag with one.
While this command works on the stated platforms, it is not POSIX-compliant, because POSIX only mandates support for basic regular expressions in Sed.

The above command is the equivalent of the following GNU Sed command using a basic regular expression - note the need to escape (, ), and |:

sed  's%<\([^>]*Name\|[^>]*SSN\|Address[^>]*\)>[^<]*%<\1>***%g' file

Note, that it is the use of alternation (\|) that makes this command not portable, because POSIX basic regular expressions do not support it.

Short Command to Find Total Size of Files Matching a Wild Card