Need to remove the count from the output when using uniq -c command
The count from uniq
is preceded by spaces unless there are more than 7 digits in the count, so you need to do something like:
uniq -c | sort -nr | cut -c 9-
to get columns (character positions) 9 upwards. Or you can use sed
:
uniq -c | sort -nr | sed 's/^.\{8\}//'
or:
uniq -c | sort -nr | sed 's/^ *[0-9]* //'
This second option is robust in the face of a repeat count of 10,000,000 or more; if you think that might be a problem, it is probably better than the cut
alternative. And there are undoubtedly other options available too.
Caveat: the counts were determined by experimentation on Mac OS X 10.7.3 but using GNU uniq
from coreutils
8.3. The BSD uniq -c
produced 3 leading spaces before a single digit count. The POSIX spec says the output from uniq -c
shall be formatted as if with:
printf("%d %s", repeat_count, line);
which would not have any leading blanks. Given this possible variance in output formats, the sed
script with the [0-9]
regex is the most reliable way of dealing with the variability in observed and theoretical output from uniq -c
:
uniq -c | sort -nr | sed 's/^ *[0-9]* //'
Remove count numbers in the text from uniq command | Bash Linux
With GNU grep:
sort file | uniq -cd | awk '$1>50' | sort -nr | grep -oP '^ *[0-9]+ \K.*'
Why uniq -c output with space instead of \t?
Try this:
uniq -c | sed -r 's/^( *[^ ]+) +/\1\t/'
What is the three spaces after ' uniq -c ' command in shell
Common uniq
implementations add padding spaces on the left to align the counts number. This both looks neater and allows for correct sorting by count even with a "brutal" lexicographical sort; notice however that this courtesy doesn't seem to be mandated by POSIX.
You can easily trim them adding sed
in pipe:
uniq -c | sed 's/^ *//'
uniq -c unable to count unique lines
awk
-free version with cut
, sort
and uniq
:
cut -f 3 bisulfite_seq_set0_v_set1.tsv | sort | uniq -c
uniq
operates on adjacent matching lines, so the input has to be sorted first.
How can I use uniq -c command of unix in python code?
Just for completeness, this is how you could solve it in Python:
import re, collections
paragraph = "how are you now? Are you better now?"
splitter = re.compile('\W')
counts = collections.Counter(word.lower()
for word in splitter.split(paragraph)
if word)
for word, count in counts.most_common():
print(count, word)
uniq -c without additional spaces
You can try to make the sed command as short as possible with
sort | uniq -c | sed 's/^ *//'
If you have GNU grep, you can also use the -P flag:
sort | uniq -c | grep -Po '\d.*'
(Do not use awk '{$1=$1};1'
, it will trim more than you want)
When you need this often, you can make a function or script calling
sort | uniq -c | sed 's/^ *//'
or only
uniq -c | sed 's/^ *//'
Related Topics
Multiple Websites on Nginx & Sites-Available
Counter Increment in Bash Loop Not Working
How to Use Gdb to Debug a Running Process
Why Doesn't "Total" from Ls -L Add Up to Total File Sizes Listed
Term Environment Variable Not Set
Best Way to Script Remote Ssh Commands in Batch (Windows)
How to Compile and Link a 32-Bit Windows Executable Using Mingw-W64
Why No Output Is Shown When Using Grep Twice
Using 'Date' Command to Get Previous, Current and Next Month
Find a Pattern in Files and Rename Them
Where Are Include Files Stored - Ubuntu Linux, Gcc
How to Check If X Server Is Running
Grep a Large List Against a Large File
How to Change Bash History Completion to Complete What's Already on the Line