ZEW

runjobs: An XP Automation Tool for Parallel Processing of GAMS Jobs

Thomas F. Rutherford

May 2006

* This web page documents a computational tool which has been developed within the project "Indicators and Quantitative Tools for Improving the Impact Assessment Process for Sustainability" (I.Q. Tools), 6th Framework Programme of the European Commission, Contract SSP1-CT-2003-502078, Thematic Priority 8: Policy Oriented Research.

Motivation

Mathematical programs can take from minutes to hours of CPU time to solve. In many applied settings, a large number of these jobs need to be processed at once. In the days of single processor computing, the DOS batch file was the standard tool for this type of work. A typical batch script (using the GAMS modelling language) would look like this:

runserial.bat

gams model gdx=sc1 --alpha=0.4 --beta=0.1 -sigma=2
gams model gdx=sc2 --alpha=0.4 --beta=0.2 -sigma=2
gams model gdx=sc3 --alpha=0.4 --beta=0.3 -sigma=2
gams model gdx=sc4 --alpha=0.5 --beta=0.1 -sigma=2
gams model gdx=sc5 --alpha=0.4 --beta=0.2 -sigma=2
...

In this example there are three input parameters (alpha, beta and sigma). If there were five different values for each of these inputs, a systematic sensitivity analysis (i.e., one based on all possible combinations of input values) would involve 5x5x5=125 simulations. If there were 10 possible values per input, an SSA would involve 1000 simulations.

In a serial computing environment, it made perfect sense to solve these models one after the other, as there would be no time savings by submitting the jobs to run in parallel -- the operating system could queue up the jobs, but it would only solve them in sequence. (If we needed more speed, we just bought more computers.)

Dual Core Computing

I recently bought a computer with a dual core processor. I was at first sceptical about the effectiveness of this architecture, but it only took an afternoon of experimentation for me to become a big enthusiast. I verified that these machines process two mathematical programs precisely as fast as they solve a single program. It did not take long for me to set out to modify my serial batch file so that I could automatically submit the jobs in parallel. My first attempt involved a simple edit of the batch script:

massiveparallel.bat

start "Job sc1" gams model gdx=sc1 --alpha=0.4 --beta=0.1 -sigma=2
start "Job sc2" gams model gdx=sc2 --alpha=0.4 --beta=0.2 -sigma=2 
start "Job sc3" gams model gdx=sc3 --alpha=0.4 --beta=0.3 -sigma=2
start "Job sc4" gams model gdx=sc4 --alpha=0.5 --beta=0.1 -sigma=2 
start "Job sc5" gams model gdx=sc5 --alpha=0.4 --beta=0.2 -sigma=2
...  

This launched all the jobs at once, and it provided faster performance than the serial submission, but it also create a few big problems:
  1. GAMS runs out of scratch directories. The current implementation of GAMS generates scratch directories named 225a, 225b, etc., so there is an implicit limit of 26 on the number of GAMS jobs which can be launched from a single directory at a single time.
  2. This approach created big problems when I decide, after having started all the jobs at once, that I wanted to stop the computations because I detected a problem in the results of earlier runs.
  3. A massive parallel submission cannot be paused to temporarily free up machine resources for more pressing tasks. With the serial implementation, the batch window can be suspended simply by bringing it into focus and entering Ctrl-S. This suspends the job immediately, so that email can be answered or papers be written. Once finished with these tasks, it is then simply a matter of bringing the SSA window back into focus and entering Ctrl-Q to resume computations.

Issues in Scripting for Parallel Computation

My key requirements for a utility to control GAMS jobs running in parallel are:
  1. limit the number of jobs submitted at one time based on the number of processors available on the computer,
  2. impose a time limit on each job,
  3. easily suspend computations,
  4. retrieve a status code for each job, and
  5. easily terminate the process prematurely.
I came across a VBScript from 2001 which came close to doing the job. ( Multi-threading agent... (vbscript) by Jeff Prince.) Jeff's program seemed perfect, but it did not seem to be able to terminate a GAMS job. Jeff did this using a call to:

kill.exe ProcessID

After some experimentation I converted to the following Windows XP system call:

taskkill /PID ProcessID /f /t

This utility is provided in my version of Windows XP in c:\windows\system32\taskkill.exe. I seem to recall seeing somewhere on the web that this is not provided for XP Home Edition -- some resourceful graduate student somewhere in the globe will have to sort this out.

I found that the "/t" switch on the taskkill job is crucial, because it instructs the operating system to kill the GAMS job and all child processes. I include the "/f" (forceful terminate) switch as well, but I'm not sure whether it is crucial.

A minor annoyance with taskkill.exe is that a library program, framedyn.dll, is required but not installed. (Google pointed me to a copy of this file immediately, and I have included that file below if you want to be as cavalier as I ("give it a go, and see what happens".). If you are more cautious, you may want to get a copy from an from official Microsoft distribution media.)

A second annoyance with this implementation is that GAMS scratch directories for terminated jobs are not automatically deleted (yet). Be wary if you are running thousands of jobs that the script will run into trouble after 26 (or fewer) jobs have been terminated. (I will be working on a solution for this, but don't know when it will be fixed.)

Command Syntax for runjobs

I've put a batch wrapper around my result VBscript so that the command line syntax is:

runjobs CmdFile [/minutes:xx] [/forks:xx]

If you prefer to call the VBScript directly, the syntax is just about the same:

cscript runjobs.vbs CmdFile [/minutes:xx] [/forks:xx]

Two optional parameters define (i) the maximum number of minutes assigned to a single job (default 10), and (ii) the maximum number of forks (separate jobs running at one time). The default for forks is one less than the number of processors or 1, whicheverr is greater.

A typical command file would appear using be identical to runserial.bat. Using this syntax, output (from all jobs) is displayed in the same console window. If you want to run each job in a separate window in order to more easily assess convergence of individuals jobs, the following syntax can be used:

onewindowperjob.txt

cmd /c start /min /wait "Job sc1" gams model gdx=sc1 --alpha=0.4 --beta=0.1 -sigma=2
cmd /c start /min /wait "Job sc2" gams model gdx=sc2 --alpha=0.4 --beta=0.2 -sigma=2 
cmd /c start /min /wait "Job sc3" gams model gdx=sc3 --alpha=0.4 --beta=0.3 -sigma=2
cmd /c start /min /wait "Job sc4" gams model gdx=sc4 --alpha=0.5 --beta=0.1 -sigma=2 
cmd /c start /min /wait "Job sc5" gams model gdx=sc5 --alpha=0.4 --beta=0.2 -sigma=2
... 

Program Output

Below is displayed the console output which results from the test script provided in the program distribution. This script processes a sequence of GAMS jobs, each of which takes between 1 and 12 seconds to complete. The invocation line in test.bat is:


cscript runjobs.vbs joblist.txt /minutes:0.10 /forks:2

Note that this invocation applies non-default values for both the maximum number of minutes per job and for the number of jobs to be processed at a time. The upper time limit per job is 6 seconds, so on a little less than half the jobs should time out (this was done deliberately to test the task termination.)

The program begins with an echo-print of the options, and then it begins to announce the start-up of new jobs. This log output is:

Line Number(Process ID): command string
Thereafter, each time a program terminates, an echo-print report is displayed which relates the number of normal (status=1) job completions and the number of timed out job completions.

The log output terminates with a report of exit status for each processed command line.

Console Output from test.bat

MaxMinutes = 0.1
MaxForks=2
1(2888): cmd /c start "Job 1" /min /wait gams gamsjob --seed=1 o=1.lst
2(3200): cmd /c start "Job 2" /min /wait gams gamsjob --seed=2 o=2.lst
Normal: 1  TimedOut: 0
3(3680): cmd /c start "Job 3" /min /wait gams gamsjob --seed=3 o=3.lst
Normal: 2  TimedOut: 0
4(1060): cmd /c start "Job 4" /min /wait gams gamsjob --seed=4 o=4.lst
Normal: 3  TimedOut: 0
5(3512): cmd /c start "Job 5" /min /wait gams gamsjob --seed=5 o=5.lst
Normal: 4  TimedOut: 0
Normal: 4  TimedOut: 1
6(404): cmd /c start "Job 6" /min /wait gams gamsjob --seed=6 o=6.lst
7(168): cmd /c start "Job 7" /min /wait gams gamsjob --seed=7 o=7.lst
Normal: 5  TimedOut: 1
Normal: 5  TimedOut: 2

Exit Status:
1 1
2 1
3 1
4 -1
5 1
6 1
7 -1

Installation

framedyn.dllA Windows XP file required by the system console application taskkkill.exe but omitted from the standard XP installation.
runjobs.vbsMy update of Jeff Prince's VBScript, ThreadedForker.vbs. (My script includes command arguments, it uses a different method for killing jobs, and it reports the status of all processed jobs at the end of computations, but it is otherwise based on Jeff's program structure.)
runjobs.batA little batch program wrapper for the VBScript. Edit this should you choose to install the script in some common directory.
test.batA batch file which runs a simple test of the program. This procedure uses a very short limit on job durations (6 seconds) in order to verify that jobs exceeding the time limit are killed.

All files are provided in runjobs.zip

Requirements

A PC running Windows XP Professional with Version 5.6 or later of Windows Script Host.

The runjobs.vbs Script

'Original Author: Jeff Price
'Description:

'       Based on ThreadForker.vbs by Jeff Price, jeff.price@rocketmail.com, Dec-2001
'       Updated May 2006 by Thomas Rutherford -- use taskkill /f /t
'       in place of kill, base number of forks on number of
'       processors and introduce a program argument list.

'       Price's description:
'       Multi-threading agent to run parallel instances of your
'       app/script/process/etc, in order to reduce the total run time of the
'       desired process (eg, auditing 500+ systems). It also has a sentinel
'       which will kill any errant threads after a specified timeout. 

Option Explicit

dim MaxMinutes
dim MaxForks 

' ================================================================================

dim wshShell, oFileSys
dim aThreadInfo(), iThreadNum

'initialise the core objects req'd
Set wshShell = WScript.CreateObject("WScript.Shell")
Set oFileSys=CreateObject("Scripting.FileSystemObject")
Dim oJobDict
Set OJobDict = CreateObject("Scripting.Dictionary")

'check we've atleast v5.6 of WSH
if CDbl(wScript.Version) < CDbl("5.6") then
   wScript.Echo " ***************** "
   wScript.Echo " This script requires atleast v5.6 of Windows Script Host."
   wScript.Echo " Your current version is " & wScript.Version
   wScript.Echo " http://msdn.microsoft.com/downloads/default.asp"
   wScript.Echo "***************** "
   wScript.Quit
end if

if InStr(LCase(wScript.FullName), "wscript.exe") then
   wScript.Echo "You have run this script from the GUI (wscript)" & vbCRLF & "Please rerun from a command prompt as" & vbCRLF & " 'cscript ThreadForker.vbs'"
   wScript.Quit
End if

dim CmdFile
if Wscript.Arguments.Unnamed.Count=1 then
   CmdFile = Wscript.Arguments.Unnamed(0)
else
   wScript.Echo "Command syntax:"
   wScript.Echo ""
   wScript.Echo " cscript runjobs.vbs CmdFile [/minutes:xx] [/forks:xx]"
   wScript.Echo ""
   wScript.Quit
end if
if not oFileSys.FileExists (CmdFile) then
   wScript.Echo "Did not find command file: "& CmdFile
   wScript.Quit
End if

dim oArgs
set oArgs = Wscript.Arguments.Named
if oArgs.Exists("minutes") then
   MaxMinutes = csng(oArgs("minutes"))
else
   MaxMinutes = 10
end if
if oArgs.Exists("forks") then
   MaxForks = cint(oArgs("forks"))
else
   MaxForks = cint(wshShell.Environment.Item("NUMBER_OF_PROCESSORS"))-1
   if MaxForks<1 Then MaxForks=1
end if
wScript.Echo "MaxMinutes = "&cstr(MaxMinutes) & vbCRLF & "MaxForks="& cstr(MaxForks)

'       Redimension the thread tracker and reset the "time" to -1 seconds
redim aThreadInfo( MaxForks-1, 2)
dim i
for i = 0 to UBound( aThreadInfo, 1)
  aThreadInfo(i,1) = -1
next

dim oCmdFile
set oCmdFile = oFileSys.OpenTextFile(CmdFile)
dim CmdLine
dim LineNo : LineNo = 0
dim Normal:Normal=0
dim TimedOut:TimedOut=0
While NOT oCmdFile.AtEndOfStream
   CmdLine = oCmdFile.ReadLine
   LineNo = LineNo + 1
   if Trim(CmdLine) <> "" then
      While GetNextThread = -1

'       Loop until we've a free process
         ProcessCheck
      Wend

'       Get a thread number:
      iThreadNum = GetNextThread

'       Run the command:

      set aThreadInfo(iThreadNum, 0) = wshShell.Exec(CmdLine)
      wScript.Echo cstr(LineNo)&"("&aThreadInfo(iThreadNum, 0).ProcessID&"): "&CmdLine

'       Let the process settle
      wScript.Sleep 1000

'       Start the thread timer
      aThreadInfo( iThreadNum, 1) = 1/60
      aThreadInfo( iThreadNum, 2) = LineNo
   End If
Wend
oCmdFile.Close
set oCmdFile = Nothing
While ProcessCheck
Wend
wScript.Echo ""
wScript.Echo "Exit Status:"
for i = 1 to LineNo
   wScript.Echo i,oJobDict.Item(cstr(i))
next
wScript.Quit

' ==============
' return TRUE = yes we have active processes
' return FALSE = no active processes
' ==============
Function ProcessCheck()

     dim i

     ProcessCheck = False

     wScript.Sleep 1500
     for i = 0 to UBound( aThreadInfo, 1)
          'wScript.Echo "checking " & i & " with timeout " & aThreadInfo(i, 1)
          if aThreadInfo(i,1) > -1 then
'               wScript.Echo aThreadInfo(i,2) & ": " & cstr(round(MaxMinutes-aThreadInfo(i,1),1)) & " min."
               if aThreadInfo(i, 0).Status = 0 then
                    if aThreadInfo(i,1) > MaxMinutes then
                         'aThreadInfo(i,0).Terminate
                         wshShell.Run "taskkill /PID " & aThreadInfo(i,0).ProcessID & " /f /t", 2, True
                         wScript.Sleep 1000
                         oJobDict.Add cstr(aThreadInfo(i,2)), -1
                         set aThreadInfo(i,0) = Nothing
                         aThreadInfo(i,1) = -1
                         aThreadInfo(i,2) = 0
                         TimedOut = TimedOut + 1
                         wScript.Echo "Normal:",Normal," TimedOut:",TimedOut
                    else
                         ProcessCheck = True
                         aThreadInfo(i, 1) = aThreadInfo(i, 1) + 1.5/60
                    End if
               Else
                  if aThreadInfo(i, 0).Status = 1 Then Normal = Normal + 1
                  oJobDict.Add cstr(aThreadInfo(i,2)), aThreadInfo(i,0).Status
                  set aThreadInfo(i, 0) = Nothing
                  aThreadInfo(i, 1) = -1
                  aThreadInfo(i, 2) = 0
                  wScript.Echo "Normal:",Normal," TimedOut:",TimedOut
                  'wScript.Echo i, aThreadInfo(i, 1)

               End If
          End if
     next
     'wScript.Echo "active processes? :" & ProcessCheck
End Function
' ==============
' ==============

Function GetNextThread ( )
     dim i
     GetNextThread = -1
     for i = 0 to UBound( aThreadInfo, 1)
          'wScript.Echo "GetNextThread" & i & ":" & aThreadInfo(i, 1)
          if aThreadInfo( i, 1) = -1 then
               'aThreadInfo( i, 1) = 0
               GetNextThread = i
               i = UBound( aThreadInfo, 1) +1
          End if
     next
End Function