Java – Reading and processing big text file of 25GB


I have to read a big text file of, say, 25 GB and need to process this file within 15-20 minutes. This file will have multiple header and footer section.

I tried CSplit to split this file based on header, but it is taking around 24 to 25 min to split it to a number of files based on header, which is not acceptable at all.

I tried sequential reading and writing by using BufferReader and BufferWiter along with FileReader and FileWriter. It is taking more than 27 min. Again, it is not acceptable.

I tried another approach like get the start index of each header and then run multiple threads to read file from specific location by using RandomAccessFile. But no luck on this.

How can I achieve my requirement?

Best Solution

Try using a large buffer read size (for example, 20MB instead of 2MB) to process your data quicker. Also don't use a BufferedReader because of slow speeds and character conversions.

