Reading files line by line in Bash – performance

I’m processing stored ┬áHL7 message files with a bash script, and have been getting pretty poor performance (around one message per second for simple parsing, processing and outputting to CSV format). I thought I might be able to improve performance by using a different method of loading the text file.

The two methods of reading a file in line by line:

bashreadtest1.sh:
#!/bin/bash
while read line ; do
echo “$line”
done < “$1”

bashreadtest2.sh
#!/bin/bash
IFS=$’\n’
LINES=(`cat $1`)
for i in ${LINES[@]} ; do
echo “$i”
done

The test:

phil@mig1:~/$ time ./bashreadtest1.sh 20120821_prjBsqrPacsIn.dat | wc -l
12706

real 0m2.708s
user 0m2.492s
sys 0m0.096s

phil@mig1:~/$ time ./bashreadtest2.sh 20120821_prjBsqrPacsIn.dat | wc -l
12706

real 0m1.096s
user 0m0.920s
sys 0m0.116s

The verdict – read the file into an array and iterate through the array – somewhere around twice as fast.

(I did actually run the test multiple times and they were fairly consistent. The script was also using the second, faster method, so I’ll need to look elsewhere for performance improvements)

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *