Using Sed to Split a String with a Delimiter

Using sed to split a string with a delimiter

To split a string with a delimiter with GNU sed you say:

sed 's/delimiter/\n/g'     # GNU sed

For example, to split using : as a delimiter:

$ sed 's/:/\n/g' <<< "he:llo:you"
he
llo
you

Or with a non-GNU sed:

$ sed $'s/:/\\\n/g' <<< "he:llo:you"
he
llo
you

In this particular case, you missed the g after the substitution. Hence, it is just done once. See:

$ echo "string1:string2:string3:string4:string5" | sed s/:/\\n/g
string1
string2
string3
string4
string5

g stands for global and means that the substitution has to be done globally, that is, for any occurrence. See that the default is 1 and if you put for example 2, it is done 2 times, etc.

All together, in your case you would need to use:

sed 's/:/\\n/g' ~/Desktop/myfile.txt

Note that you can directly use the sed ... file syntax, instead of unnecessary piping: cat file | sed.

How to use sed to split a string?

This sed may work:

sed -E 's/[^-+_[:alnum:]]+/ /g; s/ +$//; s/-(.)/ \1/g' <<< 'chr14:81370042-81371098(+)'

chr14 81370042 81371098 +

Or else:

sed -E 's/[^-+_[:alnum:]]+/ /g; s/ +$//; s/-(.)/ \1/g' <<< 'chr14:81370042-81371098(-)'

chr14 81370042 81371098 -

[^-+_[:alnum:]]+ matches 1 or more of any character that is not -, +, _ and alphanumeric.

splitting string with - delimiter using sed not working

You can use awk this way:

awk 'NR==1{a=$2;cnt=0} /^-/{rta[cnt]=$3;getline;rtn[cnt]=$2; getline; n[cnt]=$2;cnt++} END{ for(i=0;i<cnt;i++) { print a","n[i]","rtn[i]","rta[i] } }' file > outputfile

See the online demo:

#!/bin/bash
string="name: MAIN_ROLE
description: ROLE DESCRIPTION
readOnly:
roleReferences:
- roleTemplateAppId: app1
roleTemplateName: template1
name: Name1
- roleTemplateAppId: app2
roleTemplateName: template2
name: Name2
"
awk 'NR==1{ # When on Line 1
a=$2;cnt=0 # Set a (main name) and cnt (counter) vars
}
/^-/{ # When line starts with -
rta[cnt]=$3; getline; # Add role template app ID to rta array, read next line
rtn[cnt]=$2; getline; # Add role template name to rtn array, read next line
n[cnt]=$2;cnt++ # Add name to n array, increment the cnt variable
}
END{ # When the file processing is over
for(i=0;i<cnt;i++) { # Iterate over the found values and...
print a","n[i]","rtn[i]","rta[i] # print them
}
}' <<< "$string"

# => MAIN_ROLE,Name1,template1,app1
# MAIN_ROLE,Name2,template2,app2

How do I split a string on a delimiter in Bash?

You can set the internal field separator (IFS) variable, and then let it parse into an array. When this happens in a command, then the assignment to IFS only takes place to that single command's environment (to read ). It then parses the input according to the IFS variable value into an array, which we can then iterate over.

This example will parse one line of items separated by ;, pushing it into an array:

IFS=';' read -ra ADDR <<< "$IN"
for i in "${ADDR[@]}"; do
# process "$i"
done

This other example is for processing the whole content of $IN, each time one line of input separated by ;:

while IFS=';' read -ra ADDR; do
for i in "${ADDR[@]}"; do
# process "$i"
done
done <<< "$IN"

Split string with for loop and sed in bash shell


Notes on the revised scenario

The question has been modified to include a shell fragment:

ListOfFiles=$(sed '1,2d' $LstFile) #delete first 2 lines
for line in $ListOfFiles
do
$line=$(echo "${line}" | sed # I want to print only file name and date
done

Saving the results into a variable, as in the first line, is simply the wrong way to deal with it. You can use a simple adaptation of the code in my original answer (below) to achieve your requirement simply — very simply using awk, but it is possible using sed with a simple adaptation of the original code, if you're hung up on using sed.

awk variant

awk 'NR <= 2 { next } { print $6, $7, $8, $9 }' $LstFile

The NR <= 2 { next } portion skips the first two lines; the rest is unchanged, except that the data source is the list file you downloaded.

sed variant

sed -nE -e '1,2d' -e 's/^([^ ]+[ ]+){5}([^ ]+([ ]+[^ ]+){3})$/\2/p' $LstFile

In practice, the 1,2d command is unlikely to be necessary, but it is safer to use it, just in case one of the first two lines has 9 fields. (Yes, I could avoid using the -e option twice — no, I prefer to have separate commands in separate options; it makes it easier to read IMO.)

An answer for the original question

If you treat this as an exercise in string manipulation (disregarding legitimate caveats about trying to parse the output from ls reliably), then you don't need sed. In fact, sed is almost definitely the wrong tool for the job — awk would be a better choice — but the shell alone could be used. For example, assuming the data is in the string $variable, you could use:

set -- $variable
echo $6 $7 $8 $9
echo $15 $16 $17 $18

This gives you 18 positional parameters and prints the 8 you're interested in. Using awk, you might use:

echo $variable | awk '{ print $6, $7, $8, $9; print $15, $16, $17, $18 }'

Both these automatically split a string at spaces and allow you to reference the split elements with numbers. Using sed, you don't get that automatic splitting, which makes the job extremely cumbersome.

Suppose the variable actually holds two lines, so:

echo "$variable"

reports:

-rw-r--r-- 0 1068 1001  4870 Dec 6 11:58 1.zip
-rw-r--r-- 0 1068 1001 20246 Dec 6 11:59 10.zip

The code above assumed that the contents of $variable was a single line (though it would work unchanged if the variable contained two lines), but the code below assumes that it contains two lines. In fact, the code below would work if $variable contained many lines, whereas the set and awk versions are tied to '18 fields in the input'.

Assuming that the -E option to sed enables extended regular expressions, then you could use:

variable="-rw-r--r-- 0 1068 1001  4870 Dec 6 11:58 1.zip
-rw-r--r-- 0 1068 1001 20246 Dec 6 11:59 10.zip"
echo "$variable" |
sed -nE 's/^([^[:space:]]+[[:space:]]+){5}([^[:space:]]+([[:space:]]+[^[:space:]]+){3})$/\2/p'

That looks for a sequence of not white space characters followed by a sequence of white space characters, repeated 5 times, followed by a sequence of not white space characters and 3 sets of a sequence of white space followed by a sequence of not white space. The grouping parentheses — thus picking out fields 1-5 into \1 (which is ignored), and fields 6-9 into \2 (which is preserved), and then prints the result. If you decide you can assume no tabs etc, you can simplify the sed command to:

echo "$variable" | sed -nE 's/^([^ ]+[ ]+){5}([^ ]+([ ]+[^ ]+){3})$/\2/p'

Both of those produce the output:

Dec 6 11:58 1.zip
Dec 6 11:59 10.zip

Dealing with the single line variant of the input is excruciating — sufficiently so that I'm not going to show it.

Note that with the two-line value in $variable, the awk version could become:

echo "$variable" | awk '{ print $6, $7, $8, $9 }'

This will also handle an arbitrary number of lines.

Note how it is crucial to understand the difference between echo $variable and echo "$variable". The first treats all white space sequences as equivalent to a single blank but the other preserves the internal spacing. And capturing output such as with:

variable=$(ls -l 1*.zip)

preserves the spacing (especially the newline) in the assignment (see Capturing multiple line output into a Bash variable). Thus there's a moderate chance that the sed shown would work for you, but it isn't certain because you didn't answer clarifications sought before this answer was posted.

How to split a string by pattern into tokens using sed or awk

As the question was about sed and awk, here's the sed version:

echo filename.b.c | sed 's/\./\n/g'

And the awk version:

echo filename.b.c | awk -F'.' '{for (i=1; i<= NF; i++) print $i}'

Split a string into fixed-size pieces using sed

There you go:

echo $(echo -n "$d"; printf "%`echo $(((5 - (${#d} % 5)) % 5))`s" | tr ' ' '%') | sed 's/.\{5\}/&%/g'

EDIT:

For Mac(which doesn't support echo -n)

echo $(printf "$d"; printf "%`echo $(((5 - (${#d} % 5)) % 5))`s" | tr ' ' '%') | sed 's/.\{5\}/&%/g'

How to split the string with delimiter and swap the substring's position in bash?

Assumptions/understanding:

  • the objective is to rename the files (the question mentions Goal is to name them as ... but there is no mention of the mv command ... ???)
  • all file names contain a single period
  • we want to switch the before-period/after-period portions of the file names to create new file names
  • no files exist with the new file name (eg, we don't currently have files named a.txt and txt.a)

One idea using parameter expansion which allows us to eliminate the overhead of sub-process calls:

for fname in *.txt
do
new_fname="${fname#*.}.${fname%.*}"
echo mv "${fname}" "${new_fname}"
done

For files names a.txt, b.txt and c.txt this generates:

mv a.txt txt.a
mv b.txt txt.b
mv c.txt txt.c

Once OP is satisfied with the results the echo can be removed and the script should perform the actual mv/rename.



Related Topics



Leave a reply



Submit