How to Cut First Column (Variable Length) of a String in Shell

how to cut a string using length in unix shell

Given your new requirements, is this what you're trying to do:

$ cat tst.awk
BEGIN { FIELDWIDTHS="9 18 11 5" }
NR==FNR { f2[$1]=$2; f3[$1]=$3; next }
$1 in f2 { print $1 f2[$1] f3[$1] $4 $5 }

$ awk -f tst.awk file1 file2
000123 moorsevi har NC asee terel
000125 staevil strd NC klass aklsd
000126 carolie asdr NC skdkld kaks

Uses GNU awk for FIELDWIDTHS.

Get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.

How to cut a string after a specific character in unix

Using sed:

$ var=server@10.200.200.20:/home/some/directory/file
$ echo $var | sed 's/.*://'
/home/some/directory/file

Length of string in bash

UTF-8 string length

In addition to fedorqui's correct answer, I would like to show the difference between string length and byte length:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
LANG=$oLang LC_ALL=$oLcAll
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen

will render:

Généralités is 11 char len, but 14 bytes len.

you could even have a look at stored chars:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
printf -v myreal "%q" "$myvar"
LANG=$oLang LC_ALL=$oLcAll
printf "%s has %d chars, %d bytes: (%s).\n" "${myvar}" $chrlen $bytlen "$myreal"

will answer:

Généralités has 11 chars, 14 bytes: ($'G\303\251n\303\251ralit\303\251s').

Nota: According to Isabell Cowan's comment, I've added setting to $LC_ALL along with $LANG.

Length of an argument, working sample

Argument work same as regular variables

showStrLen() {
local bytlen sreal oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#1}
printf -v sreal %q "$1"
LANG=$oLang LC_ALL=$oLcAll
printf "String '%s' is %d bytes, but %d chars len: %s.\n" "$1" $bytlen ${#1} "$sreal"
}

will work as

showStrLen théorème
String 'théorème' is 10 bytes, but 8 chars len: $'th\303\251or\303\250me'

Useful printf correction tool:

If you:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
printf " - %-14s is %2d char length\n" "'$string'" ${#string}
done

- 'Généralités' is 11 char length
- 'Language' is 8 char length
- 'Théorème' is 8 char length
- 'Février' is 7 char length
- 'Left: ←' is 7 char length
- 'Yin Yang ☯' is 10 char length

Not really pretty output!

For this, here is a little function:

strU8DiffLen() {
local charlen=${#1} LANG=C LC_ALL=C
return $(( ${#1} - charlen ))
}

or written in one line:

strU8DiffLen() { local chLen=${#1} LANG=C LC_ALL=C;return $((${#1}-chLen));}

Then now:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
strU8DiffLen "$string"
printf " - %-$((14+$?))s is %2d chars length, but uses %2d bytes\n" \
"'$string'" ${#string} $((${#string}+$?))
done

- 'Généralités' is 11 chars length, but uses 14 bytes
- 'Language' is 8 chars length, but uses 8 bytes
- 'Théorème' is 8 chars length, but uses 10 bytes
- 'Février' is 7 chars length, but uses 8 bytes
- 'Left: ←' is 7 chars length, but uses 9 bytes
- 'Yin Yang ☯' is 10 chars length, but uses 12 bytes

Unfortunely, this is not perfect!

But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple...

Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.

Extract substring in Bash

Use cut:

echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2

More generic:

INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING

How to split a string in shell and get the last field

You can use string operators:

$ foo=1:2:3:4:5
$ echo ${foo##*:}
5

This trims everything from the front until a ':', greedily.

${foo  <-- from variable foo
## <-- greedy front trim
* <-- matches anything
: <-- until the last ':'
}

How to get the length of each word in a column without AWK, sed or a loop?

while read -r num word; do
printf '%s %s %s\n' "$num" "$word" "${#word}"
done < file

how to grep only the first word of the output

You can use awk just to print the first column from the output

[ /Downloads - 11:34 AM ]$ du -s /Users/test_user
80839384 /Users/test_user
[ /Downloads - 11:34 AM ]$ du -s /Users/test_user | awk '{print $1}'
80839384
[ /Downloads - 11:34 AM ]$



Related Topics



Leave a reply



Submit