We have large log files that we parse on a daily basis to extract summary information from them about mechanical systems. We read the files, and then output a summary on a secondly basis, one line at a time. Recently, I ran afoul of Perl's file reading mechanisms. When reading files in Perl, there's any number of ways to do so, and it turns out that for the longest time, we've been using the wrong one. Previously, we had been using :
foreach my $line_of_log (<LOG>)
{
// DO STUFF WITH $line_of_log
}
We thought that this was reading the file in one line of the log file at a time, processing it, and then moving on. What it was actually doing was reading (or "slurping") the whole file into memory, and giving us an array of strings, which we processed one line at a time. After 10 minutes of cursory Googling, I ran across a tutorial which presented this :
while (<LOG>)
{
my $line_of_log = $_;
// DO STUFF WITH $line_of_log
}
The 'while' version of the file read actually does what we thought we where doing all along: reading one line from the file, and then doing stuff with it. The difference between the two methods is that in the 'foreach' version, the entire input file gets read into memory, whereas in the 'while' version, only a single line gets read into memory at any given time. As it turns out, another difference is that reading in a 7 MB file resulted in Perl grabbing 34 MB of memory with the 'foreach' version, but only 2.2 MB with the 'while' version. That's an ENTIRE ORDER OF MAGNITUDE in difference!. This also makes a huge difference when running Perl on memory-limited systems, as we are.
No comments:
Post a Comment