Conditional Awk Hashmap Match Lookup

Conditional Awk hashmap match lookup

Assuming your files have comma-separated fields and the "id column" is field 3:

awk '
BEGIN{ FS=OFS="," }
NR==FNR { map[$1] = $2; next }
{ $3 = map[$3]; print }
' lookup_file.txt data.txt

If any of those assumptions are wrong, clue us in if the fix isn't obvious...

EDIT: and if you want to avoid the (IMHO negligible) NR==FNR test performance impact, this would be one of those every rare cases when use of getline is appropriate:

awk '
BEGIN{
FS=OFS=","
while ( (getline line < "lookup_file.txt") > 0 ) {
split(line,f)
map[f[1]] = f[2]
}
}
{ $3 = map[$3]; print }
' data.txt

Awk conditional with multiple conditions

awk evaluates conditions within ' ' and you have two blocks of them, && being outside. So you have to put all this syntax within ' ':

awk -F, '$32 > 4000 && $60 < 10' *

Instead of:

awk -F, '$32 > 4000' && '$60 < 10' *
^ ^

Looking for a way to have awk iteratively loop through a file (to create PERCENTRANK function in bash)

Following simpler sort and awk approach may also help you in same(though I haven't tested it on millions of lines, since I didn't have it).

Solution 1st: This will not show duplicate item's rank in output eg--> digit 1 in your example.

sort -nr Input_file | awk '
function sum(array){
tot="";
for(i in array){
tot+=array[i]};
return tot}
{
a[FNR]=$0;
b[$0]++
}
END{
for(j=1;j<=FNR;j++){
if(b[a[j]]){
val=b[a[j]];
delete b[a[j]];
printf("%d %0.4f\n",a[j],sum(b)/(sum(d)+sum(b)));
d[a[j]]=val;}
}}
'

Output will be as follows.

13 1.0000
12 0.8889
11 0.7778
8 0.6667
4 0.5556
3 0.4444
2 0.3333
1 0.0000

Solution 2nd: Adding solution(a minor different from 1st one) which will provide even duplicate item's RANK too in output as follows.

sort -nr Input_file | awk '
function sum(array){
tot="";
for(i in array){
tot+=array[i]};
return tot}
{
a[FNR]=$0;
b[$0]++
}
END{
for(j=1;j<=FNR;j++){
if(b[a[j]]){
val=val1=b[a[j]];
delete b[a[j]];
while(val1>0){
printf("%d %0.4f\n",a[j],sum(b)/(sum(d)+sum(b)));
val1--}
d[a[j]]=val;}
}}
'
13 1.0000
12 0.8889
11 0.7778
8 0.6667
4 0.5556
3 0.4444
2 0.3333
1 0.0000
1 0.0000
1 0.0000

Easiest way to check for an index or a key in an array?

To check if the element is set (applies to both indexed and associative array)

[ "${array[key]+abc}" ] && echo "exists"

Basically what ${array[key]+abc} does is

  • if array[key] is set, return abc
  • if array[key] is not set, return nothing

References:
  1. See Parameter Expansion in Bash manual and the little note

if the colon is omitted, the operator tests only for existence [of parameter]


  1. This answer is actually adapted from the answers for this SO question: How to tell if a string is not defined in a bash shell script?

A wrapper function:

exists(){
if [ "$2" != in ]; then
echo "Incorrect usage."
echo "Correct usage: exists {key} in {array}"
return
fi
eval '[ ${'$3'[$1]+muahaha} ]'
}

For example

if ! exists key in array; then echo "No such array element"; fi 

Truly modal window possible?

This should be what you want:

var window = new MyWindow();
var helper = new WindowInteropHelper(window);
helper.Owner = this.Handle;
window.ShowDialog();

This is the key to ensuring correct behaviour upon minimise/restore. See this blog post for more information about the method.

(If this isn't quite what you need, perhaps you could define "truly modal".)

do this without using an if | if(s == value1){...} else if(s == value2) { ...}

Make use of the strategy pattern.

In Java terms:

public interface Strategy {
void execute();
}

public class SomeStrategy implements Strategy {
public void execute() {
System.out.println("Some logic.");
}
}

which you use as follows:

Map<String, Strategy> strategies = new HashMap<String, Strategy>();
strategies.put("strategyName1", new SomeStrategy1());
strategies.put("strategyName2", new SomeStrategy2());
strategies.put("strategyName3", new SomeStrategy3());

// ...

strategies.get(s).execute();

Rust function as slow as its python counterpart

You are computing hash function multiple times, this may matter for large n values. Try using entry function instead of manual inserts:

while current_pos+n <= seq.len() {
let en = counts.entry(&seq[current_pos..current_pos+n]).or_default();
*en += 1;
current_pos +=1;
}

Complete code here

Next, make sure you are running --release compiled code like cargo run --release.

And one more thing to take in mind is discussed here, Rust may use non-optimal hash function for your case which you can change.

And finally, on large data, most of time is spent in HashMap/dict internals which are not a python, but compiled code. So don't expect it to scale well.

I have generated cookies file for Chrome Extension I need to load it in HashMapString,String in Java

I would first pre-process the text file to get a key-value list. Something like this:

grep "^[^#]" cookies.txt | awk '{print $6 " " $7}'
_ga GA1.2.10834324067.1638446981
_gid GA1.2.25433025264.1638446981
_fbp fb.1.1643546988197.973328968

The code above strips off lines beginning with a # and empty lines. Next, the result is filtered to only select the 6th (cookie name) and 7th (cookie value) column.

If you save the output of the above bash command into filtered.txt, you can parse cookie information in Java like so:

Map<String, String> cookies = new HashMap<>();
try (Stream<String> stream = Files.lines(Paths.get("filtered.txt"))) {
stream.forEach(line -> {
String[] columns = line.split(" ");
cookies.put(columns[0], columns[1]);
});
}

We are simply grabbing the key and value from every row to fill our cookies map; I suppose the code could be shorter, however, at the expense of readability.



References

  • How to grep lines which does not begin with "#" or ";"?
  • cut column 2 from text file
  • How can I read a large text file line by line using Java?


Related Topics



Leave a reply



Submit