How to Compare 2 Lists of Ranges in Bash

How to compare 2 lists of ranges in bash?

It depends on how big your files are, of course. If they are not big enough to exhaust the memory, you can try this 100% bash solution:

declare -a min=() # array of lower bounds of ranges
declare -a max=() # array of upper bounds of ranges

# read ranges in second file, store then in arrays min and max
while read a b; do
min+=( "$a" );
max+=( "$b" );
done < file2

# read ranges in first file
while read a b; do
# loop over indexes of min (and max) array
for i in "${!min[@]}"; do
if (( max[i] >= a && min[i] <= b )); then # if ranges overlap
echo "${min[i]} ${max[i]}" # print range
unset min[i] max[i] # performance optimization
fi
done
done < file1

This is just a starting point. There are many possible performance / memory footprint improvements. But they strongly depend on the sizes of your files and on the distributions of your ranges.

EDIT 1: improved the range overlap test.

EDIT 2: reused the excellent optimization proposed by RomanPerekhrest (unset already printed ranges from file2). The performance should be better when the probability that ranges overlap is high.

EDIT 3: performance comparison with the awk version proposed by RomanPerekhrest (after fixing the initial small bugs): awk is between 10 and 20 times faster than bash on this problem. If performance is important and you hesitate between awk and bash, prefer:

awk 'NR == FNR { a[FNR] = $1; b[FNR] = $2; next; }
{ for (i in a)
if ($1 <= b[i] && a[i] <= $2) {
print a[i], b[i]; delete a[i]; delete b[i];
}
}' file2 file1

bash - compare two lists as string , find item in list A but not in list B

jq -r '.[]' <<<'["a1ex","oliver","maggie","walter","ben"]' |
fgrep -vF "$(jq -r '.[]' <<<'[ "a1ex", "oliver", "ben" ]')" -- -

Output:

maggie
walter

Evaluating overlap of number ranges in bash

try in awk.

awk -F"-" 'Q>=$1 && Q{print}{Q=$NF}'   Input_file

Making here -(dash) as a field separator then checking if a variable named Q is NOT NULL and it's value is greater then current line's first field($1) is yes then print that line(if you want to print previous line we could do that also), now create/re-assign variable Q's value to current line's last field's value.

EDIT: As per OP user wants to get the previous line so changing it to that now too.

awk -F"-" 'Q>=$1 && Q{print val}{Q=$NF;val=$0}'  Input_file

Comparing a variable to a range of numbers in BASH

In simple terms, you can just test the variable passed while running the script:

#!/bin/bash

if (( 0 <= $1 && $1 <= 5 )); then
echo "In range"
else
echo "Not in range"
fi

Pass the number to the script and it will test it against your range. For example, if the above it put in a script called check.sh then:

$ bash check.sh 10
Not in range
$ bash check.sh 3
In range

You can make the script executable to avoid using bash ... whenever you need to run the script. The $1 used above the is the first parameter passed to the script. If you don't like to use positional variables, then you can save it a variable inside the script if you wish.

compare value to multiple ranges in bash script and set other variables based on that match

If the values can be calculated directly (as shown in other answer), that eliminates a lot of scripting; however, you may still need to check for valid/invalid input; but that also could be done w/o using an extended if/elif/else/fi.

But to answer the question, here's an if/else that can check for ranges of numbers (I don't think a 'case' would simplify matters):

#!/bin/bash

arg=$1

if (( 1 <= arg && arg <= 5 )) ; then
echo "from 1-5: $arg"
elif (( 16 <= arg && arg <= 23 )) ; then
echo "from 16-23: : $arg"
elif (( 24 <= arg && arg <= 31 )) ; then
echo "from 24-31: : $arg"
else
echo "invalid : $arg"
fi

The main point is that ((...)) is used for arithmetic evaluation, not [..]; it's equivalent to let and returns true/false.

Sample output:

$ ./scr 1
from 1-5: 1

$ ./scr 7
invalid : 7

$ ./scr 23
from 16-24: : 23

Compare different columns of subsequent rows to merge ranges

Assuming, if the condition is satisfied for 2 pairs of consecutive records (i.e 3 records in total, consecutively) then 3rd one would consider the output of rec-1 and rec-2 as it's previous record.

awk -v dist=10 'FNR==1{prev_1=$1; prev_2=$2; next} ($1<=prev_2+dist){print prev_1,$2; prev_2=$2;next} {prev_1=$1; prev_2=$2}1' file

Input :

$cat file
1 10
9 19
10 30
51 60

Output:

1 19
1 30
51 60


Related Topics



Leave a reply



Submit