Saturday, February 1, 2014

StringTokenizer Example in Java with Multiple Delimiters

StringTokenizer is a legacy class for splitting strings into tokens. In order to break String into tokens, you need to create a StringTokenizer object and provide a delimiter for splitting string into tokens. You can pass multiple delimiter e.g. you can break String into tokens by , and : at same time. If you don't provide any delimiter then by default it will use white-space. It's inferior to split() as it doesn't support regular expression, also it is not every efficient. Since it’s an obsolete class, don't expect any performance improvement either. On the hand split() has gone some major performance boost on Java 7, see here to learn more about splitting String with regular expression. StringTokenizer looks easier to use but you should avoid it, except for trivial task. Always  Prefer String's split() method for splitting String and for repeated split use Pattern.split() method. Coming back to StringTokenizer, we will see three examples of StringTokenizer in this article, simple example to break String based on white-space, second example will show how to use multiple delimiter and third example will show you how to count number of tokens. In order to get tokens, you basically follow Enumeration style model, i.e. checking for more tokens using hasMoreTokens() and then getting tokens using nextToken().



Java StringTokenizer Example

Here is full  code of our Java StringTokenizer Example. You can copy paste this code into your favourite IDE and run it straight-away. It doesn't require any third party library like Apache commons or Google Guava. All you need to do is create a Java source file with same name as public class of this example, then IDE will take care of compiling and running this example. Alternatively you can also compile and execute this example from command prompt as well. If you look at the first example, we have a String where words are separated by a white-space, and to get each word from that String, we have created a StringTokenizer object by passing that String itself, notice we have not provided any delimiter, because by default StringTokenizer uses white-space as token separator.

StringTokenizer Example in Java with Multiple Delimiter
In order to get each token, in our case word, you just need to loop, until hasMoreTokens() return false. Now to get the word itself, just call nextToken() method of StringTokenizer. This is similar to Iterating over Java Collection using Iterator, where we use hasNext() method as while loop condition and next() method to get next element from Collection. Second example is more interesting, because here our text is a web address, which has protocol and IP address. Here we are passing multiple delimiter to split http string e.g. //(double slash), :(colon) and .(dot), Now StringTokenizer will create token if any of this is found in target String.  Third example shows you how to get total number of tokens from StringTokenizer, quite useful if you want to copy tokens into array or collection, as you can use this number to decide length of array or size of respective collection. 

import java.util.StringTokenizer;

/**
 * Java program to show how to use StringTokenizer for breaking a delimited
 * String into tokens. StringTokenizer allows you to use multiple delimiters as
 * well. which means you can split String containing comma and colon in one call.
 *
 * @author Javin Paul
 */
public class StringTokenizerDemo{
   
    public static void main(String args[]) {

        // Example 1 - By default StringTokenizer breaks String on space
        System.out.println("StringTokenizer Example in Java, split String on whitespace");

        String word = "Which one is better, StringTokenizer vs Split?";
        StringTokenizer tokenizer = new StringTokenizer(word);
        while (tokenizer.hasMoreTokens()) {
            System.out.println(tokenizer.nextToken());
        }


        // Example 2 - StringTokenizer with multiple delimiter
        System.out.println("StringTokenizer multiple delimiter Example in Java");

        String msg = "http://192.173.15.36:8084/";
        StringTokenizer st = new StringTokenizer(msg, "://.");
        while (st.hasMoreTokens()) {
            System.out.println(st.nextToken());
        }
       
       
        // Example 3 - Counting number of String tokens
        System.out.println("StringTokenizer count Token Example");

        String records = "one,two,three,four,five,six,seven";
        StringTokenizer breaker = new StringTokenizer(records, ",");
        System.out.println("Total number of tokens : " + breaker.countTokens());
    }
}
Output:
StringTokenizer Example in Java, split String on whitespace
Which
one
is
better,
StringTokenizer
vs
Split?

StringTokenizer multiple delimiter Example in Java
http
192
173
15
36
8084

StringTokenizer count Token Example
Total number of tokens : 7

As I said, all this functionality is also available to String class' split method, and you should use that as your default tool for creating tokens from String or breaking them based upon any limiter. To learn more about pros and cons of  using StringTokenizer and Split method,  you can see  my post difference between Split vs StringTokenizer in Java.


That's all on how to use StringTokenizer in Java with multiple delimiters. Yeah it's convenient, especially if you are not very comfortable with regular expression. By the way,  if that's the case than you better spend some time learning regular expression, not just to split String into tokens but to use regex as skill. You will be surprised to see power of regular expression, while searching, replacing and doing other text stuff. StringTokenizer is also a legacy class, which is only retained for compatibility reasons and you should not use it in new code. It is recommended to use the split method of String for splitting strings into tokens or Patterns.split() method from java.util.regex package instead. In terms of performance also, split() has got major boost in Java 7 from Java 6, and it's reasonable to expect performance improvement only on split() method, because no work will be done on StringTokenizer.

2 comments :

Andreas Haufler said...

Nice writeup Paul. I think StringTokenizer is one of those classes which' existence is often forgotten. I also 100% agree on the importance on regular expressions. Might look scary at first, but they are simply unavoidable sooner or later.

Just a sidehint: For parsing more complex strings, one can use parsii which provides a quite flexible tokenizer capable of frequently used token types and unlimited lookahead: https://github.com/scireum/parsii (open source + MIT license of course)

cheers Andy

Anonymous said...

Nice explanation yar...daily i am reading your articles those are inspired me a lot ...

Post a Comment