How can I get unique values from an array in Bash?
A bit hacky, but this should do it:
echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '
To save the sorted unique results back into an array, do Array assignment:
sorted_unique_ids=($(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))
If your shell supports herestrings (bash
should), you can spare an echo
process by altering it to:
tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' '
A note as of Aug 28 2021:
According to ShellCheck wiki 2207 a read -a
pipe should be used to avoid splitting.
Thus, in bash the command would be:
IFS=" " read -r -a ids <<< "$(echo "${ids[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' ')"
or
IFS=" " read -r -a ids <<< "$(tr ' ' '\n' <<< "${ids[@]}" | sort -u | tr '\n' ' ')"
Input:
ids=(aa ab aa ac aa ad)
Output:
aa ab ac ad
Explanation:
"${ids[@]}"
- Syntax for working with shell arrays, whether used as part ofecho
or a herestring. The@
part means "all elements in the array"tr ' ' '\n'
- Convert all spaces to newlines. Because your array is seen by shell as elements on a single line, separated by spaces; and because sort expects input to be on separate lines.sort -u
- sort and retain only unique elementstr '\n' ' '
- convert the newlines we added in earlier back to spaces.$(...)
- Command Substitution- Aside:
tr ' ' '\n' <<< "${ids[@]}"
is a more efficient way of doing:echo "${ids[@]}" | tr ' ' '\n'
Keeping only the unique elements in an array Bash
One way to do it would be to create a new array with the unique numbered values which will take the first of each numeric prefix found. Say your values are in the indexed-array array
. You could do:
new_array=( $(printf "%s\n" ${array[@]} | sort -n -u) )
Above you are just using the command-substitution of printf
(used to output each element on a separate line) piped to sort -n -u
(which sorts numerically unique). You use the results to populate new_array
.
Now new_array
would contain:
67A
257B
How to sort and get unique values from an array in bash?
Try:
$ list=(a b b b c c)
$ unique_sorted_list=($(printf "%s\n" "${list[@]}" | sort -u))
$ echo "${unique_sorted_list[@]}"
a b c
Update based on comments:
$ uniq=($(printf "%s\n" "${list[@]}" | sort | uniq -c | sort -rnk1 | awk '{ print $2 }'))
Count unique values in a bash array
The command
(IFS=$'\n'; sort <<< "${array[*]}") | uniq -c
Explanation
- Counting occurrences of unique lines is done with the idiom
sort file | uniq -c
. - Instead of using a file, we can also feed strings from the command line to
sort
using the here string operator<<<
. - Lastly, we have to convert the array entries to lines inside a single string. With
${array[*]}
the array is expanded to one single string where the array elements are separated by$IFS
. - With
IFS=$'\n'
we set the$IFS
variable to the newline character for this command exclusively. The$'...'
is called ANSI-C Quoting and allows us to express the newline character as\n
. - The subshell
(...)
is there to keep the change of$IFS
local. After the command$IFS
will have the same value as before.
Example
array=(fire air fire earth water air air)
(IFS=$'\n'; sort <<< "${array[*]}") | uniq -c
prints
3 air
1 earth
2 fire
1 water
How to remove duplicate elements in an existing array in bash?
Naive approach
To get the unique elements of arr
and assuming that no element contains newlines:
$ printf "%s\n" "${arr[@]}" | sort -u
aa
ab
bb
cc
Better approach
To get a NUL-separated list that works even if there were newlines:
$ printf "%s\0" "${arr[@]}" | sort -uz
aaabbbcc
(This, of course, looks ugly on a terminal because it doesn't display NULs.)
Putting it all together
To capture the result in newArr
:
$ newArr=(); while IFS= read -r -d '' x; do newArr+=("$x"); done < <(printf "%s\0" "${arr[@]}" | sort -uz)
After running the above, we can use declare
to verify that newArr
is the array that we want:
$ declare -p newArr
declare -a newArr=([0]="aa" [1]="ab" [2]="bb" [3]="cc")
For those who prefer their code spread over multiple lines, the above can be rewritten as:
newArr=()
while IFS= read -r -d '' x
do
newArr+=("$x")
done < <(printf "%s\0" "${arr[@]}" | sort -uz)
Additional comment
Don't use all caps for your variable names. The system and the shell use all caps for their names and you don't want to accidentally overwrite one of them.
Select unique or distinct values from a list in UNIX shell script
You might want to look at the uniq
and sort
applications.
./yourscript.ksh | sort | uniq
(FYI, yes, the sort is necessary in this command line, uniq
only strips duplicate lines that are immediately after each other)
EDIT:
Contrary to what has been posted by Aaron Digulla in relation to uniq
's commandline options:
Given the following input:
class
jar
jar
jar
bin
bin
java
uniq
will output all lines exactly once:
class
jar
bin
java
uniq -d
will output all lines that appear more than once, and it will print them once:
jar
bin
uniq -u
will output all lines that appear exactly once, and it will print them once:
class
java
Find and show unique value from bash array
Try this:
printf "%s\n" "${x[@]}" | sort | uniq -u
Output:
4
bash to store unique value if array in variable
Maybe something like this:
dir=$(ls -td "$dir"/*/ | head -1)
find "$dir" -maxdepth 1 -type f -name '*_*.html' -printf "%f\n" |
cut -d_ -f1 | sort -u
For input directory structure created like:
dir=dir
mkdir -p dir/dir
touch dir/dir/id{1,2,3}_{a,b,c}.html
So it looks like this:
dir/dir/id2_b.html
dir/dir/id1_c.html
dir/dir/id2_c.html
dir/dir/id1_b.html
dir/dir/id3_b.html
dir/dir/id2_a.html
dir/dir/id3_a.html
dir/dir/id1_a.html
dir/dir/id3_c.html
The script will output:
id1
id2
id3
Tested on repl.
How do I get the unique values of a list in bash preserving order and keep the last value for each unique?
Without Associative Arrays
You can do it with indexed arrays by using an intermediate indexed array to hold unique values from A
. This requires a nested loop over values stored in c[]
for each element of A
, e.g.
#!/bin/bash
declare -a result # declare result indexed array
declare -a c # declare temp intermediate indexed array
A=( D B A C D ) # original with duplicates
## loop decending over A, reset found flag, loop over c, if present continue,
# otherwise store A at index in c
for ((i = $((${#A[@]}-1)); i >= 0; i--)); do
found=0;
for j in ${c[@]}; do
[ "$j" = "${A[i]}" ] && { found=1; break; }
done
[ "$found" -eq '1' ] && continue
c[i]=${A[i]}
done
## loop over c testing if index for A exists, add from c to result
for ((i = 0; i < ${#A[@]}; i++)); do
[ "${c[i]}" ] && result+=(${c[i]})
done
declare -p result # output result
Example Use/Output
$ bash lastuniqindexed.sh
declare -a result='([0]="B" [1]="A" [2]="C" [3]="D")'
Using Associative Arrays with BASH_VERSION Test
You can do it with a combination of indexed and associative arrays making only a single pass though each array. You use an associative array B
keyed with the value of A
using B
as a frequency array indicating whether an element of A
has been seen. You then store the element of A
in a temporary indexed array c[]
so that the unique values can be added to result
preserving the original order.
You can address whether associative array functionality is present with a bash version test at the beginning, e.g.
#!/bin/bash
case $BASH_VERSION in
## empty or beginning with 1, 2, 3
''|[123].*) echo "ERROR: Bash 4.0 needed" >&2
exit 1;;
esac
declare -A B # declare associative array
declare -a result # declare indexed array
A=( D B A C D ) # original with duplicates
## loop decending over A, if B[A] doesn't exist, set B[A]=1, store in c[]
for ((i = $((${#A[@]}-1)); i >= 0; i--)); do
[ -n "${B[${A[i]}]}" ] || { B[${A[i]}]=1; c[i]=${A[i]};}
done
## loop over c testing if index for A exists, add from c to result
for ((i = 0; i < ${#A[@]}; i++)); do
[ "${c[i]}" ] && result+=(${c[i]})
done
declare -p result # output result
Without the use of associative arrays, the nested loops looping over the original checking against each entry in c[]
will be much less efficient as the size of the array grows.
Example Use/Output
$ bash lastuniq.sh
declare -a result='([0]="B" [1]="A" [2]="C" [3]="D")'
Look things over and let me know if you have further questions.
Related Topics
Getting a Unique Id from a Unix-Like System
Contiguous Physical Memory from Userspace
When Should We Use Mutex and When Should We Use Semaphore
Is Gettimeofday() Guaranteed to Be of Microsecond Resolution
Difference Between ${} and $() in Bash
My Shell Script Stops After Exec
Is There Really No Asynchronous Block I/O on Linux
How to Find Out Which Processes Are Using Swap Space in Linux
The Bash Command :(){ :|:& };: Will Spawn Processes to Kernel Death. Can You Explain the Syntax
Make $Java_Home Easily Changable in Ubuntu
How to Configure Qt For Cross-Compilation from Linux to Windows Target
How to Add Users to Docker Container
How to Find the Original User Through Multiple Sudo and Su Commands
Curl to Access a Page That Requires a Login from a Different Page
Kernel Stack and User Space Stack
Compiling C++ on Remote Linux Machine - "Clock Skew Detected" Warning