Project Technical Lead
Java files handling: How to read Java files quickly and efficiently
Java files handling is a fundamental aspect of programming. There are many different ways to read files in Java. Just by looking for a Java example for simple file reading you may come across the following Java classes for working with files, for example InputStream, FileInputStream, DataInputStream, SequenceInputStream, Reader, InputStreamReader, FileReader, BufferedReader, FileChannel, SeekableByteChannel, Scanner, StreamTokenizer, Files and the like.
We’re pretty sure there will be more of those classes, and some of them didn’t make it on this list. Of course, we haven’t yet mentioned third-party external libraries for working with files, which are also available.
Most of the pre-defined classes for reading and writing files in Java are located in the packages java.io and java.nio.file. With the introduction of the new Java NIO.2 (New I/O) file API, the situation became even more complicated, and how to work with files quickly and efficiently is often a question on programming forums even among experienced developers.
When working with files, we should know in advance what types of files we will be working with and therefore whether we need to read binary files (e.g. music in mp3 format) or text files. Whether the files are smaller and it’s better to load them entirely into memory, or we’re dealing with files that won’t fit into memory and will require a different type of processing, such as… reading and processing line by line.
Each of the file handling classes mentioned above has its use for specific cases. In general, however, we use binary data retrieval Stream classes and text Reader classes. Classes that have both of these expressions in their name combine binary and text data retrieval. for example Classes InputStreamReader consumes InputStream, but behaves itself as Reader. FileReader is basically a combination of FileInputStream with InputStreamReader.
As we can see from the preceding text, reading data in Java can get pretty messy, so in our article we’ll focus on three basic data loading scenarios that cover 90 percent of the use cases:
- The simplest way of reading the whole text file into a variable of the String type, or list (and the binary file into a byte array).
- Reading and processing large files that don’t fit entirely in memory.
- Reading files whose content can be split by a separator, e.g. CSV files.
A short history of reading files in Java
Reading files using Java libraries was quite cumbersome until Java 7. The most commonly used class to read the file was FileInputStreamclass, which, in addition to exception handling, had to ensure that the stream was closed in case of a successful file read as well as in case of an error. Automatic closure of used resources (as we know it today with try-with-resources) did not exist back then. Therefore, many Java developers preferred third-party libraries such as Apache Commons or Google Guava, which provided much more convenient options.
That changed with the arrival of Java 7, which brought the long-awaited NIO.2 File APIwhich, besides a lot of functionality, also brought a helper class java.nio.file.Files, which can be used to read a whole text/binary file in one method.
Reading a binary file into a byte array
Using the Files.readAllBytes() method, we can read the contents of the entire file into the byte array:
import java.nio.file.Files;
import java.nio.file.Path;
String fileName = "fileName.dat";
byte[] bytes = Files.readAllBytes(Path.of(fileName));
The Path class represents an abstraction of a file and contains the path to the file in the file system.
Reading a text file into a variable of the String type
Since Java 11, it is possible to simply read the contents of an entire text file into a variable of type String using the Files.readString() method as follows:
import java.nio.file.Files;
import java.nio.file.Path;
String fileName = "fileName.dat";
String text = Files.readString(Path.of(fileName));
The readString() method uses the readAllBytes() method internally, and then converts the binary data into the desired string of type String.
Reading a text file line by line
Text files usually consist of multiple lines. If we want to read and process the text line by line, we can use a method available since Java 8 – readAllLines(), which does this automatically.
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
String fileName = "fileName.dat";
List<String> lines = Files.readAllLines(Path.of(fileName));
Then we just iterate classically through the list and process each row.
Reading a text file line by line using String stream
Java 8 introduced streams as a significant language enhancement. The same version extended the Files class with a new lines() method that returns the read lines of a text file as a stream of strings of type String. This allows us to use the functionality of streams e.g. when filtering data.
import java.nio.file.Files;
import java.nio.file.Path;
String fileName = "fileName.dat";
Files.lines(Path.of(fileName))
.filter(line -> line.contains("ERROR"))
.forEach(System.out::println);
In this example, we will output to the console all the lines of the read file that contains the string “ERROR”.
These methods cover the most common scenarios for reading smaller files and share the characteristic that they are read entirely into RAM. For large files, it is advisable to read them in chunks and process them immediately. We will demonstrate this below.
Reading a large binary file using BufferedInputStream
The binary file is read via InputStream one byte at a time (until the end of the file when -1 is returned), which is quite long in the case of large files. This can be speeded up by reading data via BufferedInputStream, which wraps the FileInputStream class and reads data from the operating system no longer byte by byte, but in 8 KB blocks that are stored in memory. Subsequently, the reading of the file is done byte by byte, but it is much faster because it is done directly from memory.
import java.io.BufferedInputStream;
import java.io.FileInputStream;
String fileName = "fileName.dat";
try (FileInputStream is = new FileInputStream(fileName);
BufferedInputStream bis = new BufferedInputStream(is)) {
int b;
while ((b = bis.read()) != -1) {
// TODO: process b
}
}
Reading a file block by block (called buffering) is significantly faster than reading by bytes.
Reading a large text file using BufferedReader
Class FileReader combines FileInputStream a InputStreamReader. For faster files reading we use the class BufferedReaderwhich wraps the class FileReader and allows to use an 8 KB buffer along with an additional buffer for 8192 decoded characters. The advantage of the class BufferedReader is that it allows us to read and process the text file line by line (instead of reading and processing on a character by character basis).
import java.io.BufferedReader;
import java.io.FileReader;
String fileName = "fileName.dat";
try (FileReader reader = new FileReader(fileName);
BufferedReader bufferedReader = new BufferedReader((reader))) {
String line;
while ((line = bufferedReader.readLine()) != null) {
System.out.println("Line: " + line);
}
}
Reading a file in parts using Scanner
Sometimes, instead of reading a file line by line, we need to read it in parts. Class Scanner works by dividing the contents of a file into parts using a separator, which can be any constant value. This class is commonly used for CSV (comma-separated values) files, which have a specific format where data is separated by commas. Such files can be used as tables in Excel applications.
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
List<String> words = new ArrayList<>();
String fileName = "fileName.csv";
Scanner scanner = new Scanner(Path.of(fileName));
scanner.useDelimiter(",");
while (scanner.hasNext()) {
String next = scanner.next();
words.add(next);
}
scanner.close();
In this example, we read the CSV file on a token-by-token basis (instead of the classic line-by-line reading approach) separated by commas and saved those in a list for further processing.
In this article, we have shown the most common scenarios of reading data from a file. In Java, handling small files is a breeze—you can read and save them into memory with just a single function call. And when it comes to reading and processing large files, Java offers efficient solutions using buffering classes.
If you’re a Java developer looking for work, check out our employee benefits and respond to our job offers.