Reading files line by line in Bash – performance

I’m processing stored  HL7 message files with a bash script, and have been getting pretty poor performance (around one message per second for simple parsing, processing and outputting to CSV format). I thought I might be able to improve performance by using a different method of loading the text file.

The two methods of reading a file in line by line:
while read line ; do
echo “$line”
done < “$1”
LINES=(`cat $1`)
for i in ${LINES[@]} ; do
echo “$i”

The test:

phil@mig1:~/$ time ./ 20120821_prjBsqrPacsIn.dat | wc -l

real 0m2.708s
user 0m2.492s
sys 0m0.096s

phil@mig1:~/$ time ./ 20120821_prjBsqrPacsIn.dat | wc -l

real 0m1.096s
user 0m0.920s
sys 0m0.116s

The verdict – read the file into an array and iterate through the array – somewhere around twice as fast.

(I did actually run the test multiple times and they were fairly consistent. The script was also using the second, faster method, so I’ll need to look elsewhere for performance improvements)




Using find in a bash script

Here’s something that will hopefully save someone the frustration that I experienced getting this working. I am writing a script that needs to traverse a directory structure, then operate on the files it finds (read DICOM tags, modify/add tags). The reading and modifying was simple enough, but I could not get find to work inside the script, so basically resorted to running a command like:

$ find . -type f -exec {} \;

That worked fine, but I really wanted to have it all integrated into a single command. The problem I had was that the output of a find command couldn’t be used to iterate in a for loop – as a scalar (FILES=`find . -type f`) it was a single string, and as an array (FILES=(`find . -type f`)) elements would be split on spaces and newlines, which breaks if your paths have spaces in them.

The trick here is to change the IFS variable, which tells bash what to split strings on (by default it is space, tab or newline).  In my case, the output of find is separated by newline characters. Change it to a newline and now it works as expected. yay! Here’s the code:

# IFS controls how bash splits strings (by default, whitespace.
# change to newline
# dummy function for demo
fileops () {
    echo $1
FILES=(`find "$1" -type f`)
    for i in "${FILES[@]}" ; do
fileops "$i"
# restore IFS