Middleman Parallelization

I recently discovered a very clever tool called Middleman. It's a quick way to set up and manage multiple-process workload queue. The process output and display is done inside of a screen session, so if it's going to take awhile you can just detach and check on it again later. In the past I used make's -j option to do this, but that's always a pain to set up.

It is composed of three programs: mdm.screen, mdm-run, and mdm-sync. The first is the top level supervisor that you use to launch the enhanced shell script. The second prefixes every command to be run in parallel. The third is prefixes the final command that depends on all of the individual processes.

The linked Middleman page has a good example, but I'll share my own anyway. I used it over the weekend to download a long series of videos with youtube-dl. Because the transfer rate for a single video is throttled I wanted to grab several at a time, but I also didn't want to grab them all at the same time. Here's the dumb version of the script, download.sh, that does them all at once.

youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX0 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX1 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX2 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX3 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX4 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX5 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX6 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX7 &
youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX8 &

With Middleman all I had to do was this,

mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX0
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX1
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX2
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX3
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX4
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX5
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX6
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX7
mdm-run youtube-dl -t http://www.youtube.com/watch?v=XXXXXXXXXX8

Then just launch the script with mdm.screen. It defaults to 6 processes at a time, but you can adjust it to whatever you want with the -n switch. I used 4.

$ mdm.screen -n 4 ./download.sh

There is a screen window that lists the process queue and highlights the currently active jobs. I could switch between screen windows to see the output from individual processes and see how they were doing.

From the perspective of the shell script, the first four commands finish instantly but fifth command will block. As soon as Middleman sees one of the first four processes complete the fifth one will begin work, returning control to the shell script, and the sixth command will block, since the queue is full again.

I'm sure I'll be using this more in the future, especially for tasks like batch audio and video encoding. I bet this could be useful on a cluster.

blog comments powered by Disqus

null program

Chris Wellons