Stephen A. Fuqua (SAF) is a Bahá'í, software developer, and conservation and interfaith advocate in the DFW area of Texas.

Performance #5: File Buffering

July 19, 2007

This article is part of the series An Exercise in Performance Tuning in C#.Net.

It's time to stop ignoring the 800 pound gorilla in the room: System.String. Scrolling all the way over in the Allocation Graph, it is clear that Strings take up most of the memory, and it seems logical that most of that comes from the input file.

filebuffer1.jpg

40 MB total are allocated to strings. The input file I am processing is about 1.02 MB. So that's only 1/40th of the total. Let's trace back to the left with those blue, pink, yellow, and gray lines that go into System.String. The blue line traces all the way back to the StreamReader – the file itself. Shockingly the StreamReader uses 8.0 MB. Why does it use more memory than the file takes? Because StreamReader has a lot of overhead and we haven't tweaked the buffer size, among other things. Here's the original code:


StreamReader stream = new StreamReader(inputFileName);
string fileContents = stream.ReadToEnd();

// Pass file contents to upload routine
MyClass upload = new MyClass(option, fileContents, fileId);

// Begin processing file
upload.ProcessFile();


ProcessFile:

// Split the inputfile string into an array based on EOL
string[] fileLines = _fileContents.Split('\n');

// Cycle through and process lines
for (int i = 0; i < fileLines.Length; i++)
{
     string singleLine = fileLines[i];
     // perform various tasks

I rewrote the application to use StreamReader.ReadLine() instead of reading in the entire file and then parsing it by line. Seems like that might be more efficient. I also wrapped the read in a using {} clause to make sure the stream is properly disposed. This did not result in a meaningful performance difference, but it might lead me to some other improvements later.

Then I started playing with buffer size. Each line is supposed to be 512 characters, though in the future we might have more. I tried 520, 800, and 1000 characters, and found the best performance gain with 800, though it was only 1.1% faster than before I started using ReadLine().

Here's what I have now:

// Read intput file into a string
List<string> fileLines = new List<string>();
using (StreamReader stream = new StreamReader(inputFileName, Encoding.ASCII, true, 800))
{
     string line;
     while ((line = stream.ReadLine()) != null)
          fileLines.Add(line);
}

// Parse the input file
MyClass upload = new MyClass(fileLines);
upload.ProcessFile();


ProcessFile:

// perform various tasks