How to Use a Delimiter with Scanner.Usedelimiter in Java

How do I use a delimiter with Scanner.useDelimiter in Java?

The scanner can also use delimiters other than whitespace.

Easy example from Scanner API:

 String input = "1 fish 2 fish red fish blue fish";

// \\s* means 0 or more repetitions of any whitespace character
// fish is the pattern to find
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");

System.out.println(s.nextInt()); // prints: 1
System.out.println(s.nextInt()); // prints: 2
System.out.println(s.next()); // prints: red
System.out.println(s.next()); // prints: blue

// don't forget to close the scanner!!
s.close();

The point is to understand the regular expressions (regex) inside the Scanner::useDelimiter. Find an useDelimiter tutorial here.


To start with regular expressions here you can find a nice tutorial.

Notes

abc…    Letters
123… Digits
\d Any Digit
\D Any Non-digit character
. Any Character
\. Period
[abc] Only a, b, or c
[^abc] Not a, b, nor c
[a-z] Characters a to z
[0-9] Numbers 0 to 9
\w Any Alphanumeric character
\W Any Non-alphanumeric character
{m} m Repetitions
{m,n} m to n Repetitions
* Zero or more repetitions
+ One or more repetitions
? Optional character
\s Any Whitespace
\S Any Non-whitespace character
^…$ Starts and ends
(…) Capture Group
(a(bc)) Capture Sub-group
(.*) Capture all
(ab|cd) Matches ab or cd

How do I use a delimiter with Scanner.useDelimiter in Java to delimeter [,?

You need to escape the [ character, which is has a special meaning in a regex:

myScanner.useDelimiter("[\\[,]");

Java useDelimiter and nextLine

I figuered out that I should have used 3 strings and use next and everything worked just fine.

  //String representing pathway
String sokvag1, sokvag2, sokvag3;

//Creating scanner object for reading from input stream
Scanner userInput = new Scanner(System.in);

// Set delimiter to ':' or '/' or whitespace
userInput.useDelimiter("[:/\\s]+");

// Instructions to the user to type a windows patway ex: C://Windows/System/
System.out.print("Skriv in sökvägen: ");

//Input
sokvag1 = userInput.next();
sokvag2 = userInput.next();
sokvag2 = userInput.next();

//Print the result
System.out.println(sokvag1);
System.out.println(sokvag2);
System.out.println(sokvag3);

userInput.close();

Use delimiter to separate a pattern

Setting the delimiter for class java.util.Scanner to comma (,) means that each call to method next() will read all the data up to the next comma, including newlines. Hence the call to nextInt reads the score plus the name on the next line and that isn't an int. Hence the InputMismatchException.

Just read the entire line and split it on the comma (,).

(Note: Below code uses try-with-resources)

public class ReadData {
public static void main(String[] args) throws Exception {
java.io.File file = new java.io.File("scores.txt");
try (Scanner input = new Scanner(file)) {
// input.useDelimiter(","); <- not required
while (input.hasNextLine()) {
String line = input.nextLine();
String[] parts = line.split(",");
String name1 = parts[0];
int score1 = Integer.parseInt(parts[1].trim());
System.out.println(name1+" "+score1);
}
}
}
}

Java Scanner, Pattern and difference between useDelimiter() and skip()

Here's what you've specified as a delimiter:

scan.useDelimiter("[-*:*\t*%]*");

The square brackets contain a list of characters, and using them means "match a character that is in this list". The * outside the square brackets means "match 0 or more occurrences of one of these characters."

The reason you're getting one character at a time is that when you match 0 or more occurrences, that means that an empty string (string of length 0) matches the delimiter pattern. Since every 2 characters in the input file has an empty string between them (there are no characters between them, so an empty string matches), the scanner will consider each character to be its own token. So the first thing you want to do is to change that last * to +, which means "match 1 or more occurrences". Now an empty string won't match.

The second problem with your pattern is that * inside square brackets just means that an asterisk is one of the characters you match; the meaning of "0 or more" does not apply inside square brackets. In fact, whenever you have square brackets, no matter what is inside them, this pattern always matches exactly one character. So any *, +, or anything else that you want to specify as repeating needs to be outside square brackets.

If you just take out the *:

scan.useDelimiter("[-:\t%]+");

Now this will match any sequence of -, :, tab, and % characters. It won't match a space, though, and I see spaces in some of your examples. So you may want to add a space inside the square brackets. Or you could say this:

scan.useDelimiter("[-:\\s%]+");

since a \s combination inside square brackets means match "any whitespace character", which includes space, tab, and a few others like newlines. (But only do this if you really do want to match the newlines.)

One other thing: you were right to put - first inside the square brackets. If you don't, it may have a different meaning:

"[a-z]"

matches any character from a to z, and it doesn't match hyphen. However:

"[a\\-z]"

matches a, z, or hyphen. Some programmers (including me), when we want a hyphen to be in the character set, would use this backslash on the hyphen even when it isn't necessary, to avoid any possible confusion:

scan.useDelimiter("[\\-:\t%]+");

How do I remove delimiter restovers from a scanner? (Java)

There are a number of ways to deal with this, depending on your actual requirements1:

  1. Don't change the delimiter. The token after "Blackjack" will be "LasVegas, NewYork to Poker Blackjack LasVegas NewYork". Create another scanner to parse that token. (Or use String::split.)
  2. Use a delimiter regex that can will match either delimiter; e.g. "[;,]\\s*".
  3. Parse like this:

     String line = scanner.nextLine();
    String[] parts = line.split(";\\s*");
    String[] parts2 = parts[2].split(",\\s*");

    This is assuming that ; is a primary delimiter and , is a secondary delimiter.

  4. Change the input file syntax so that it uses only one delimiter character. (This assumes that you are free to do that, AND that an alternative syntax would "make more sense".)


1 - Obviously, we cannot infer the syntax of the file that you are trying to parse from a single line of input. Or, in general, from a single example input file.



Related Topics



Leave a reply



Submit