Tuesday, January 24, 2012

Bash'ing in Parallel

When you need to process many items (like I had to process for my video from the previous post - 4817 initial pictures) you would better think about how much time it might take. In my example, even knowing that 4817 pictures with such perfect but not yet well threaded to use all CPU cores tool as imagemagick takes about 2 second per picture , total time is kind of nightmare to wait for.
So, as I am processing on Linux and with script - here is the trick to utilize all local and remote CPU cores even from BASH script and(!) minimum changes required:
Instead of traditional for many years:

for i in `find . -type f -name "file*.png"`;
do 
do_view_port.sh `echo $i | sed -e 's/\.png//' -e 's/^.*_//'`
done 

Where I process all .png files in the folder and one by one with my do_view_port.sh, "magic" script with needed actions, just get GNU Parallel (which is IMHO best by now) and do very minor changes:


find . -type f -name "file*.png"  | sed -e 's/\.png//' -e 's/^.*_//' |
parallel -j+0 --eta do_view_port.sh {.}

where it start to be cleaner for the look and faster ( in my case: 7.65x ).
Just note {.} which represents a "current" arguments...

So and in total - this, kind of one change, allowed me to finish needed changes within as much as 23min (instead of 175min or close to 3 hours ) 

PS: I personally liked this free ETA, as always, meaning time yet to go/Estimated Time to Achieve :)

No comments:

Post a Comment