Using Linux Cut, Sort and Uniq

using Linux cut, sort and uniq

You can add a delimiter, which is a comma in your case:

cut -f 3 -d, list.txt | sort | uniq

then, -c specifies character position, rather than field, which is specified with -f.

To strip spaces in front you can pipe this all through, e.g. awk '{print $1}', i.e.

cut -f 3 -d, list.txt | awk '{print $1}' | sort | uniq

[edit]

Aaaaand. If you try to cut the 3rd field out, you are left with only one field after the pipe, so sorting on the 3rd field won't work, which is why I omitted it in my example. You get 1 field, you just sort on it and the apply uniq.

Pipelining cut sort uniq

Use

cut -f 2 practice.sam | sort -o | uniq -c

In your original code, you're redirecting the output of cut to field2.txt and at the same time, trying to pipe the output into sort. That won't work (unless you use tee). Either separate the commands as individual commands (e.g., use ;) or don't redirect the output to a file.

Ditto the second half, where you write the output to sortedfield2.txt and thus end up with nothing going to stdout, and nothing being piped into uniq.

So an alternative could be:

cut -f 2 practice.sam > field2.txt ; sort -o field2.txt sortedfield2.txt ; uniq -c sortedfield2.txt

which is the same as

cut -f 2 practice.sam > field2.txt 
sort -o field2.txt sortedfield2.txt
uniq -c sortedfield2.txt

how to use sort, cut, and unique commands in pipe

Summarizing the answers excruciatingly hidden in comments:

You were close, only

  • as tripleee noticed, the shell is in the seventh field
  • as shellter noticed, since the shells are not numbers, -n is useless
  • as shellter noticed, for the counting, there's uniq -c

That gives

cut -f7 -d: /etc/passwd | sort | uniq -c

Ant equivalent of cut | sort | uniq

You can do this using a loadresource task with a filterchain. Something like this perhaps:

<property name="list.of.files">
web/src/main/test/com/whatever/Ralph
business/src/main/test/com/whatever/Alice
web/src/main/test/com/whatever/Bob
</property>

<loadresource property="dirs">
<string value="${list.of.files}" />
<filterchain>
<replaceregex pattern="/.*" replace="" />
<sortfilter />
<uniqfilter />
</filterchain>
</loadresource>

<echo message="${dirs}" />

Result:

 [echo] business
[echo] web

BUILD SUCCESSFUL

In older versions of Ant (<1.7) you could do the same by writing the property out to a file, then using a loadfile task with filterchain.

Even after `sort`, `uniq` is still repeating some values

The file has dos line endings - each line is ending with \r CR character.

You can inspect your tail output for example with hexdump -C, lines starting with # added by me:

$ awk '!/^#/ { print $1; print $2; }' ./wiki-Vote.txt | sort | uniq | tail | hexdump -C
00000000 39 39 32 0a 39 39 33 0a 39 39 33 0d 0a 39 39 34 |992.993.993..994|
# ^^ HERE
00000010 0a 39 39 34 0d 0a 39 39 35 0d 0a 39 39 36 0a 39 |.994..995..996.9|
# ^^ ^^
00000020 39 38 0a 39 39 39 0a 39 39 39 0d 0a |98.999.999..|
# ^^
0000002c

Because uniq sees unique lines, one with CR and one not, they are not removed. Remove the CR character before pipeing. Note that sort | uniq is better to sort -u.

$ awk '!/^#/ { print $1; print $2; }' ./wiki-Vote.txt | tr -d '\r' | sort -u | wc -l
7115

What is the difference between sort data.txt | uniq -q and just 'uniq -q'?

The uniq utility reads the specified input file and compares only adjacent lines and writes a copy of each unique input line to the output file.

For example if your data is:

1
1
2
5
4
1

Output of uniq -u data would be:

2
5
4
1

Whereas output for sort data | uniq -u would be as follows:

Output of sort:

1 
1
1
2
4
5

And when this is passed to uniq -u the output would be:

2
4
5


Related Topics



Leave a reply



Submit