Bash: How to Tokenize a String Variable

Bash: How to tokenize a string variable?

Use the shell's automatic tokenization of unquoted variables:

$ string="john is 17 years old"
$ for word in $string; do echo "$word"; done
john
is
17
years
old

If you want to change the delimiter you can set the $IFS variable, which stands for internal field separator. The default value of $IFS is " \t\n" (space, tab, newline).

$ string="john_is_17_years_old"
$ (IFS='_'; for word in $string; do echo "$word"; done)
john
is
17
years
old

(Note that in this second example I added parentheses around the second line. This creates a sub-shell so that the change to $IFS doesn't persist. You generally don't want to permanently change $IFS as it can wreak havoc on unsuspecting shell commands.)

How to split one string into multiple variables in bash shell?

If your solution doesn't have to be general, i.e. only needs to work for strings like your example, you could do:

var1=$(echo $STR | cut -f1 -d-)
var2=$(echo $STR | cut -f2 -d-)

I chose cut here because you could simply extend the code for a few more variables...

How do I split a string on a delimiter in Bash?

You can set the internal field separator (IFS) variable, and then let it parse into an array. When this happens in a command, then the assignment to IFS only takes place to that single command's environment (to read ). It then parses the input according to the IFS variable value into an array, which we can then iterate over.

This example will parse one line of items separated by ;, pushing it into an array:

IFS=';' read -ra ADDR <<< "$IN"
for i in "${ADDR[@]}"; do
# process "$i"
done

This other example is for processing the whole content of $IN, each time one line of input separated by ;:

while IFS=';' read -ra ADDR; do
for i in "${ADDR[@]}"; do
# process "$i"
done
done <<< "$IN"

How to create a tokenization tool in bash script?

Your script had some mistakes that I corrected. Now it works :

#!/bin/bash

STRING='
abc.xyc.kkk.com hjk.pol.lll.kkk.com
'

IFS=' ' read -d '' -a VALUES <<< "$STRING"

for i in ${VALUES[@]}; do
echo "$i" | sed 's/\./ /g'
done

Output

abc xyc kkk com
hjk pol lll kkk com

By the way, if you want to have each token in the array, instead of the entire url, you can do this :

#!/bin/bash

STRING='
abc.xyc.kkk.com hjk.pol.lll.kkk.com
'

IFS='.' read -d '' -a VALUES <<< "$STRING"

for i in ${VALUES[@]}; do
echo "$i" | sed 's/\./ /g'
done

Output

abc
xyc
kkk
com
hjk
pol
lll
kkk
com

Let me know if it works!

How to tokenise string and call a function on each token in bash?

Would yo please try the following:

validate_token() {
local rule="???" # matches a three-chraracter string
if [[ $1 == $rule ]]; then
echo 1
else
echo 0
fi
}

final=1 # final result
while IFS=',' read -ra ary; do
for i in "${ary[@]}"; do
final=$(( final & $(validate_token "$i") ))
# take AND with the individual test result
done
done < "str_data.txt"

(( $final )) && echo "true" || echo "false"

I've also modified your function due to several reasons.

  • When defining a bash function, the form name() { .. } is preferred.
  • It is not recommended to start the user's variable name with an underscore.
    You have localized it and don't have to care about the variable name
    collision.
  • When evaluating the conditional expression by using == or = operator
    within [[ .. ]], it will be better to place the pattern or rule to the right of the
    operator.
  • It will be convenient to return 1 or 0 rather than true or false for further calculation.

Hope this helps.

How to split a string in shell and get without the last field

I find to gest the last element

And similarly to remove the last:

echo "${toto%-*}"

How to split a string by underscore and extract an element as a variable in bash?

No need to spend a sub-shell calling cut -d'_' -f1 and using bashism <<< "$s".

The POSIX shell grammar has built-in provision for stripping-out the trailing elements with variable expansion, without forking a costly sub-shell or using non-standard Bash specific <<<"here string".

#!/usr/bin/env sh

s=DNA128533_mutect2_filtered.vcf.gz
id=${s%%_*}
echo "$id"


Related Topics



Leave a reply



Submit