String Parsing in Java with Delimiter Tab "\T" Using Split

Splitting a String in java at a tab

The tab character is written \t. The code for splitting the line thus looks like this:

String[] thisLine = line.split("\t");

More flexible, if feasible for your use case: For splitting on generic white space characters, including space and tab use \\s (note the double reversed slash, because this is a regex).

What does the split([\t ]) from line

  • [...] Character class - matches one character in between the brackets.
    • \t The tab character
    • A space
  • + Quantifier - match the previous token 1 or more times.

Examples:

"foo   bar qux"   => ["foo", "bar", "qux"]
"foo\t bar\tqux" => ["foo", "bar", "qux"]

You might want to use the \s whitespace meta character, which is equivalent to [\r\n\t\f ].

Understanding regex in Java: split(\t) vs split(\\t) - when do they both work, and when should they be used

When using "\t", the escape sequence \t is replaced by Java with the character U+0009. When using "\\t", the escape sequence \\ in \\t is replaced by Java with \, resulting in \t that is then interpreted by the regular expression parser as the character U+0009.

So both notations will be interpreted correctly. It’s just the question when it is replaced with the corresponding character.

how to split string with \\t in java

you are trying to split on \t (literally backslash followed by lower case T) because you're escaping the backslash. a single backslash with a t will represent a tab.

resultingTokens = currentLine.split("\t"); 

is what will give you the result you were expecting.

How to split a string with any whitespace chars as delimiters

Something in the lines of

myString.split("\\s+");

This groups all white spaces as a delimiter.

So if I have the string:

"Hello[space character][tab character]World"

This should yield the strings "Hello" and "World" and omit the empty space between the [space] and the [tab].

As VonC pointed out, the backslash should be escaped, because Java would first try to escape the string to a special character, and send that to be parsed. What you want, is the literal "\s", which means, you need to pass "\\s". It can get a bit confusing.

The \\s is equivalent to [ \\t\\n\\x0B\\f\\r].

How to safety parse tab-delimited string ?

strtok() is a standard function for parsing strings with arbitrary delimiters. It is, however, not thread-safe. Your C library of choice might have a thread-safe variant.

Another standard-compliant way (just wrote this up, it is not tested):

#include <string.h>
#include <stdio.h>

int main()
{
char string[] = "foo\tbar\tbaz";
char * start = string;
char * end;
while ( ( end = strchr( start, '\t' ) ) != NULL )
{
// %s prints a number of characters, * takes number from stack
// (your token is not zero-terminated!)
printf( "%.*s\n", end - start, start );
start = end + 1;
}
// start points to last token, zero-terminated
printf( "%s", start );
return 0;
}

How to split a string in bash delimited by tab

If your file look something like this (with tab as separator):

1st-field   2nd-field

you can use cut to extract the first field (operates on tab by default):

$ cut -f1 input
1st-field

If you're using awk, there is no need to use tail to get the last line, changing the input to:

1:1st-field     2nd-field
2:1st-field 2nd-field
3:1st-field 2nd-field
4:1st-field 2nd-field
5:1st-field 2nd-field
6:1st-field 2nd-field
7:1st-field 2nd-field
8:1st-field 2nd-field
9:1st-field 2nd-field
10:1st-field 2nd-field

Solution using awk:

$ awk 'END {print $1}' input
10:1st-field

Pure bash-solution:

#!/bin/bash

while read a b;do last=$a; done < input
echo $last

outputs:

$ ./tab.sh 
10:1st-field

Lastly, a solution using sed

$ sed '$s/\(^[^\t]*\).*$/\1/' input
10:1st-field

here, $ is the range operator; i.e. operate on the last line only.

For your original question, use a literal tab, i.e.

x="1st-field    2nd-field"
echo ${x% *}

outputs:

1st-field


Related Topics



Leave a reply



Submit