How to Get Unique Values from an Array in Bash

How can I get unique values from an array in Bash?

A bit hacky, but this should do it:

echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '

To save the sorted unique results back into an array, do Array assignment:

sorted_unique_ids=($(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))

If your shell supports herestrings (bash should), you can spare an echo process by altering it to:

tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' '

A note as of Aug 28 2021:

According to ShellCheck wiki 2207 a read -a pipe should be used to avoid splitting.
Thus, in bash the command would be:

IFS=" " read -r -a ids <<< "$(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')"

or

IFS=" " read -r -a ids <<< "$(tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' ')"

Input:

ids=(aa ab aa ac aa ad)

Output:

aa ab ac ad

Explanation:

  • "${ids[@]}" - Syntax for working with shell arrays, whether used as part of echo or a herestring. The @ part means "all elements in the array"
  • tr ' ' '\n' - Convert all spaces to newlines. Because your array is seen by shell as elements on a single line, separated by spaces; and because sort expects input to be on separate lines.
  • sort -u - sort and retain only unique elements
  • tr '\n' ' ' - convert the newlines we added in earlier back to spaces.
  • $(...) - Command Substitution
  • Aside: tr ' ' '\n' <<< "${ids[@]}" is a more efficient way of doing: echo "${ids[@]}" | tr ' ' '\n'

Keeping only the unique elements in an array Bash

One way to do it would be to create a new array with the unique numbered values which will take the first of each numeric prefix found. Say your values are in the indexed-array array. You could do:

new_array=( $(printf "%s\n" ${array[@]} | sort -n -u) )

Above you are just using the command-substitution of printf (used to output each element on a separate line) piped to sort -n -u (which sorts numerically unique). You use the results to populate new_array.

Now new_array would contain:

67A
257B

How to sort and get unique values from an array in bash?

Try:

$ list=(a b b b c c)
$ unique_sorted_list=($(printf "%s\n" "${list[@]}" | sort -u))
$ echo "${unique_sorted_list[@]}"
a b c

Update based on comments:

$ uniq=($(printf "%s\n" "${list[@]}" | sort | uniq -c | sort -rnk1 | awk '{ print $2 }'))

Count unique values in a bash array

The command

(IFS=$'\n'; sort <<< "${array[*]}") | uniq -c

Explanation

  • Counting occurrences of unique lines is done with the idiom sort file | uniq -c.
  • Instead of using a file, we can also feed strings from the command line to sort using the here string operator <<<.
  • Lastly, we have to convert the array entries to lines inside a single string. With ${array[*]} the array is expanded to one single string where the array elements are separated by $IFS.
  • With IFS=$'\n' we set the $IFS variable to the newline character for this command exclusively. The $'...' is called ANSI-C Quoting and allows us to express the newline character as \n.
  • The subshell (...) is there to keep the change of $IFS local. After the command $IFS will have the same value as before.

Example

array=(fire air fire earth water air air)
(IFS=$'\n'; sort <<< "${array[*]}") | uniq -c

prints

      3 air
1 earth
2 fire
1 water

How to remove duplicate elements in an existing array in bash?

Naive approach

To get the unique elements of arr and assuming that no element contains newlines:

$ printf "%s\n" "${arr[@]}" | sort -u
aa
ab
bb
cc

Better approach

To get a NUL-separated list that works even if there were newlines:

$ printf "%s\0" "${arr[@]}" | sort -uz
aaabbbcc

(This, of course, looks ugly on a terminal because it doesn't display NULs.)

Putting it all together

To capture the result in newArr:

$ newArr=(); while IFS= read -r -d '' x; do newArr+=("$x"); done < <(printf "%s\0" "${arr[@]}" | sort -uz)

After running the above, we can use declare to verify that newArr is the array that we want:

$ declare -p newArr
declare -a newArr=([0]="aa" [1]="ab" [2]="bb" [3]="cc")

For those who prefer their code spread over multiple lines, the above can be rewritten as:

newArr=()
while IFS= read -r -d '' x
do
newArr+=("$x")
done < <(printf "%s\0" "${arr[@]}" | sort -uz)

Additional comment

Don't use all caps for your variable names. The system and the shell use all caps for their names and you don't want to accidentally overwrite one of them.

Select unique or distinct values from a list in UNIX shell script

You might want to look at the uniq and sort applications.


./yourscript.ksh | sort | uniq

(FYI, yes, the sort is necessary in this command line, uniq only strips duplicate lines that are immediately after each other)

EDIT:

Contrary to what has been posted by Aaron Digulla in relation to uniq's commandline options:

Given the following input:


class
jar
jar
jar
bin
bin
java

uniq will output all lines exactly once:


class
jar
bin
java

uniq -d will output all lines that appear more than once, and it will print them once:


jar
bin

uniq -u will output all lines that appear exactly once, and it will print them once:


class
java

Find and show unique value from bash array

Try this:

printf "%s\n" "${x[@]}" | sort | uniq -u

Output:


4

bash to store unique value if array in variable

Maybe something like this:

dir=$(ls -td "$dir"/*/ | head -1)
find "$dir" -maxdepth 1 -type f -name '*_*.html' -printf "%f\n" |
cut -d_ -f1 | sort -u

For input directory structure created like:

dir=dir
mkdir -p dir/dir
touch dir/dir/id{1,2,3}_{a,b,c}.html

So it looks like this:

dir/dir/id2_b.html
dir/dir/id1_c.html
dir/dir/id2_c.html
dir/dir/id1_b.html
dir/dir/id3_b.html
dir/dir/id2_a.html
dir/dir/id3_a.html
dir/dir/id1_a.html
dir/dir/id3_c.html

The script will output:

id1
id2
id3

Tested on repl.

How do I get the unique values of a list in bash preserving order and keep the last value for each unique?

Without Associative Arrays

You can do it with indexed arrays by using an intermediate indexed array to hold unique values from A. This requires a nested loop over values stored in c[] for each element of A, e.g.

#!/bin/bash

declare -a result # declare result indexed array
declare -a c # declare temp intermediate indexed array

A=( D B A C D ) # original with duplicates

## loop decending over A, reset found flag, loop over c, if present continue,
# otherwise store A at index in c
for ((i = $((${#A[@]}-1)); i >= 0; i--)); do
found=0;
for j in ${c[@]}; do
[ "$j" = "${A[i]}" ] && { found=1; break; }
done
[ "$found" -eq '1' ] && continue
c[i]=${A[i]}
done

## loop over c testing if index for A exists, add from c to result
for ((i = 0; i < ${#A[@]}; i++)); do
[ "${c[i]}" ] && result+=(${c[i]})
done

declare -p result # output result

Example Use/Output

$ bash lastuniqindexed.sh
declare -a result='([0]="B" [1]="A" [2]="C" [3]="D")'

Using Associative Arrays with BASH_VERSION Test

You can do it with a combination of indexed and associative arrays making only a single pass though each array. You use an associative array B keyed with the value of A using B as a frequency array indicating whether an element of A has been seen. You then store the element of A in a temporary indexed array c[] so that the unique values can be added to result preserving the original order.

You can address whether associative array functionality is present with a bash version test at the beginning, e.g.

#!/bin/bash

case $BASH_VERSION in
## empty or beginning with 1, 2, 3
''|[123].*) echo "ERROR: Bash 4.0 needed" >&2
exit 1;;
esac

declare -A B # declare associative array
declare -a result # declare indexed array

A=( D B A C D ) # original with duplicates

## loop decending over A, if B[A] doesn't exist, set B[A]=1, store in c[]
for ((i = $((${#A[@]}-1)); i >= 0; i--)); do
[ -n "${B[${A[i]}]}" ] || { B[${A[i]}]=1; c[i]=${A[i]};}
done

## loop over c testing if index for A exists, add from c to result
for ((i = 0; i < ${#A[@]}; i++)); do
[ "${c[i]}" ] && result+=(${c[i]})
done

declare -p result # output result

Without the use of associative arrays, the nested loops looping over the original checking against each entry in c[] will be much less efficient as the size of the array grows.

Example Use/Output

 $ bash lastuniq.sh
declare -a result='([0]="B" [1]="A" [2]="C" [3]="D")'

Look things over and let me know if you have further questions.



Related Topics



Leave a reply



Submit