When I have free time, I attend contests in Online Judge websites just for fun. To get AC (passed all test cases and accepted) result in these contests, your solution has to be time and memory efficient. Most of the problems have several test cases to evaluate solutions. So, reading those test cases from file and outputting your results into another file to check may take much time as File Input and Output (IO) operations are toooo slow compared to other operations in most programming languages.
I decided to benchmark all possible ways of reading and writing data into files in Java. Now I present you some results I found. Note that this post only shows results without any tuning disk file IO such as low level caching. Here are some approaches in Java with explanations you can use for writing into file and reading from it. These benchmarks are done using Intel® Core™ i3-3120M CPU @ 2.50GHz × 4 in 64-bit Ubuntu 14.04 with Java SE 1.8.0_20 on SSD. There are several factors(CPU, Hard disk and OS) that can make these benchmark results different. Hovewer, the relative difference is almost the same in all machines.
JavaClass/FileSize 1 MB 10 MB 100 MB 1000 MB Code
FileInputStream 600 ms 4500 ms 41200 ms 425400 ms Link
BufferedInputStream 30 ms 350 ms 3300 ms 32100 ms Link
BufferedReader 60 ms 390 ms 3400 ms 32900 ms Link
Direct BufferedInputStream 50 ms 80 ms 280 ms 3300 ms Link
MappedByteBuffer 20 ms 25 ms 160 ms 1450 ms Link
What the Difference! You may be wondering why the famous and mostly used input stream approach is so slow. Main reason for this is that FileInputStream calls native read() method for each next 1 byte (raw octet, 8 bit) of the file content. When the file size is 1 MB, read() method is called 1048576 times that is 1024 x 1024. Now you can imagine how many times it is called for 1000MB. So, when working with Files, the first rule that you need to consider to speed up your application is the number of method calls() to underlying system and disk.
Rule 1. Try to reduce number of method calls to underlying system and disk.
To reduce those method calls in FileInputStream, we can read some amount of data (1 MB or 2 MB) from disk and store it in memory then use this stored data. In this case, we dont make access to disk for each next byte. As accessing to memory is much faster than accessing to disk, we can speed up our reading operation. BufferedInputStream use the same strategy to make File IO faster by storing data in buffer (8192 bytes). When read() method is callled in BufferedStreams, data is read from buffered array in memory and rarely accesses to undelying system.
We can create our own buffer array with bigger size and speed up reading. The Direct BufferedInputStream appraoch shows benchmark of using 2MB byte array as buffer. As you can see, that is nearly 10 times faster than Java's default Bufferedstream usage when it comes to read data from 1000 MB file. You can buffer the whole file into byte array as long as you have enough space in memory.
Rule 2. Use your own buffer with certain size according to your business logic.
If you pay attention above, we only discussed about reading bytes from file. Hovewer, there are many cases that we want to read content of the file as characters. Unlike BufferedStreams, BufferedReader is all about reading characters. In the background, it reads bytes (like streams) from disk and translates into characters. You can see the BufferedReader works little slower than BufferedStreams because of translaton proccess.
Next comes MemoryByteBuffer, which is Memory Mapped IO Class in Java. So, What is Memory Mapped file in Java? Memory mapped files are new feature in Java nio package, which allows to access contents of file directly from memory. This is achieved by mapping whole file or portion of file into memory and operating system takes care of loading page requested and writing into file while application only deals with memory which results in very fast IO operations. Memory used to load Memory mapped file is outside of Java heap Space. We use MappedByteBuffer in Java to read and write from memory.
Memory Mapped files are so fast that they are used when performance is very important. However, there are some tradeoffs such as occurance of page faults if requested page is not in memory.
If your application is dealing with small size of files, choosing MappedByteBuffer or BufferedStreams does not make much difference. Hovewer, reading data from large files using and changing certain pages of the file using multiple cores makes huge impact in performance. As Mapped Files loads the whole or some region of the file into shared memory, every process does not have to load the file into memory.