How to Cut First Column (Variable Length) of a String in Shell

how to cut a string using length in unix shell

Given your new requirements, is this what you're trying to do:

$ cat tst.awk
BEGIN { FIELDWIDTHS="9 18 11 5" }
NR==FNR { f2[$1]=$2; f3[$1]=$3; next }
$1 in f2 { print $1 f2[$1] f3[$1] $4 $5 }

$ awk -f tst.awk file1 file2
000123   moorsevi har      NC asee    terel
000125   staevil strd      NC klass   aklsd
000126   carolie asdr      NC skdkld  kaks

Uses GNU awk for FIELDWIDTHS.

Get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.

How to cut a string after a specific character in unix

Using sed:

$ var=server@10.200.200.20:/home/some/directory/file
$ echo $var | sed 's/.*://'
/home/some/directory/file

Length of string in bash

UTF-8 string length

In addition to fedorqui's correct answer, I would like to show the difference between string length and byte length:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
LANG=$oLang LC_ALL=$oLcAll
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen

will render:

Généralités is 11 char len, but 14 bytes len.

you could even have a look at stored chars:

myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
printf -v myreal "%q" "$myvar"
LANG=$oLang LC_ALL=$oLcAll
printf "%s has %d chars, %d bytes: (%s).\n" "${myvar}" $chrlen $bytlen "$myreal"

will answer:

Généralités has 11 chars, 14 bytes: ($'G\303\251n\303\251ralit\303\251s').

Nota: According to Isabell Cowan's comment, I've added setting to $LC_ALL along with $LANG.

Length of an argument, working sample

Argument work same as regular variables

showStrLen() {
    local bytlen sreal oLang=$LANG oLcAll=$LC_ALL
    LANG=C LC_ALL=C
    bytlen=${#1}
    printf -v sreal %q "$1"
    LANG=$oLang LC_ALL=$oLcAll
    printf "String '%s' is %d bytes, but %d chars len: %s.\n" "$1" $bytlen ${#1} "$sreal"
}

will work as

showStrLen théorème
String 'théorème' is 10 bytes, but 8 chars len: $'th\303\251or\303\250me'

Useful `printf` correction tool:

If you:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    printf " - %-14s is %2d char length\n" "'$string'"  ${#string}
done

 - 'Généralités' is 11 char length
 - 'Language'     is  8 char length
 - 'Théorème'   is  8 char length
 - 'Février'     is  7 char length
 - 'Left: ←'    is  7 char length
 - 'Yin Yang ☯' is 10 char length

Not really pretty output!

For this, here is a little function:

strU8DiffLen() {
    local charlen=${#1} LANG=C LC_ALL=C
    return $(( ${#1} - charlen ))
}

or written in one line:

strU8DiffLen() { local chLen=${#1} LANG=C LC_ALL=C;return $((${#1}-chLen));}

Then now:

for string in Généralités Language Théorème Février  "Left: ←" "Yin Yang ☯";do
    strU8DiffLen "$string"
    printf " - %-$((14+$?))s is %2d chars length, but uses %2d bytes\n" \
        "'$string'" ${#string} $((${#string}+$?))
  done 

 - 'Généralités'  is 11 chars length, but uses 14 bytes
 - 'Language'     is  8 chars length, but uses  8 bytes
 - 'Théorème'     is  8 chars length, but uses 10 bytes
 - 'Février'      is  7 chars length, but uses  8 bytes
 - 'Left: ←'      is  7 chars length, but uses  9 bytes
 - 'Yin Yang ☯'   is 10 chars length, but uses 12 bytes

Unfortunely, this is not perfect!

But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple...

Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.

Extract substring in Bash

Use cut:

echo 'someletters_12345_moreleters.ext' | cut -d'_' -f 2

More generic:

INPUT='someletters_12345_moreleters.ext'
SUBSTRING=$(echo $INPUT| cut -d'_' -f 2)
echo $SUBSTRING

How to split a string in shell and get the last field

You can use string operators:

$ foo=1:2:3:4:5
$ echo ${foo##*:}
5

This trims everything from the front until a ':', greedily.

${foo  <-- from variable foo
  ##   <-- greedy front trim
  *    <-- matches anything
  :    <-- until the last ':'
 }

How to get the length of each word in a column without AWK, sed or a loop?

while read -r num word; do
    printf '%s %s %s\n' "$num" "$word" "${#word}"
done < file

how to grep only the first word of the output

You can use awk just to print the first column from the output

[ /Downloads - 11:34 AM ]$ du -s /Users/test_user
80839384    /Users/test_user
[ /Downloads - 11:34 AM ]$ du -s /Users/test_user | awk '{print $1}'
80839384
[ /Downloads - 11:34 AM ]$

How to Cut First Column (Variable Length) of a String in Shell

how to cut a string using length in unix shell

How to cut a string after a specific character in unix

Length of string in bash

UTF-8 string length

Length of an argument, working sample

Useful `printf` correction tool:

Unfortunely, this is not perfect!

Extract substring in Bash

How to split a string in shell and get the last field

How to get the length of each word in a column without AWK, sed or a loop?

how to grep only the first word of the output

Related Topics

Leave a reply

how to cut a string using length in unix shell

How to cut a string after a specific character in unix

Length of string in bash

UTF-8 string length

Length of an argument, working sample

Useful printf correction tool:

Unfortunely, this is not perfect!

Extract substring in Bash

How to split a string in shell and get the last field

How to get the length of each word in a column without AWK, sed or a loop?

how to grep only the first word of the output

Related Topics

Leave a reply

Useful `printf` correction tool: