Tuesday, November 11, 2008

Read a text file using java.util.Scanner

Scanner is a utility class that comes with Java SE since version 5.0. It lets you read a plain text file just like BufferReader, but more flexible. It can be customized so that you can read the contents line-by-line, word-by-word, or customized delimiter.

The readFileByLine method reads a plain text file one line at a time.

public static void readFileByLine(String fileName) {
try {
File file = new File(fileName);
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
System.out.println(scanner.next());
}
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}

The following method does the same as above except that it use a new line character as a delimiter.

public static void readFileByLine(String fileName) {
try {
Scanner scanner = new Scanner(new File(fileName));
scanner.useDelimiter(System.getProperty("line.separator"));
while (scanner.hasNext())
System.out.println(scanner.next());
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}

What if you want to read the entire text file at once? You can do so by setting the delimiter "\\z".

public static void readFileByLine(String fileName) {
try {
Scanner scanner = new Scanner(new File(fileName));
scanner.useDelimiter("\\z");
System.out.println(scanner.next());
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
scanner.useDelimiter("\\z"); means the same as
scanner.useDelimiter(Pattern.compile("\\z"));
By using the "\\z" delimiter, you can read the entire web page as an HTML string.

public static void readWebPage(String url) {
URLConnection connection;
try {
connection = new URL(url).openConnection();
Scanner scanner = new Scanner(connection.getInputStream());
scanner.useDelimiter("\\z");
String text = scanner.next();
System.out.println(text);
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}

You can also use Scanner to read a semi-structured text file which stores data separated by tabs, commas, or semi-colons. For more information read this.

Consult java.util.regex.Pattern javadoc for other delimiters.

No comments:

Post a Comment