1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331
|
<html>
<head>
<title>Makeflow User's Manual</title>
</head>
<body>
<style type="text/css">
pre {
background: #ffffcc;
font-family: monospace;
font-size: 75%
font-align: left;
white-space: pre;
border: solid 1px black;
padding: 5px;
margin: 20px;
}
</style>
<h1>Makeflow User's Manual</h1>
<b>Last Updated Febuary 2011</b>
<p>
Makeflow is Copyright (C) 2009 The University of Notre Dame.
This software is distributed under the GNU General Public License.
See the file COPYING for details.
<p>
<h2>Overview</h2>
Makeflow is a <b>workflow engine</b> for distributed computing.
It accepts a specification of a large amount of work to be
performed, and runs it on remote machines in parallel where possible.
In addition, Makeflow is fault-tolerant, so you can use it to coordinate
very large tasks that may run for days or weeks in the face of failures.
Makeflow is designed to be similar to <b>Make</b>, so if you can write
a Makefile, then you can write a Makeflow.
<p>
You can run a Makeflow on your local machine to test it out.
If you have a multi-core machine, then you can run multiple tasks simultaneously.
If you have a Condor pool or a Sun Grid Engine batch system, then you can send
your jobs there to run. If you don't already have a batch system, Makeflow comes with a
system called Work Queue that will let you distribute the load across any collection
of machines, large or small.
<p>
Makeflow is part of the <a href=http://www.cse.nd.edu/~ccl/software>Cooperating Computing Tools</a>. You can download the CCTools from <a href=http://www.cse.nd.edu/~ccl/software/download>this web page</a>, follow the <a href=install.html>installation instructions</a>, and you are ready to go.
<h2>The Makeflow Language</h2>
The Makeflow language is very similar to Make.
A Makeflow script consists of a set of rules.
Each rule specifies a set of <i>target files</i> to create,
a set of <i>source files</i> needed to create them,
and a <i>command</i> that generates the target files from the source files.
<p>
Makeflow attempts to generate all of the target files in a script.
It examines all of the rules and determines which rules must run before
others. Where possible, it runs commands in parallel to reduce the
execution time.
<p>
Here is a Makeflow that uses the <tt>convert</tt> utility to make an animation.
It downloads an image from the web, creates four variations
of the image, and then combines them back together into an animation.
The first and the last task are marked as LOCAL to force them to
run on the controlling machine.
<pre>
CURL=/usr/bin/curl
CONVERT=/usr/bin/convert
URL=http://www.cse.nd.edu/~ccl/images/capitol.jpg
capitol.montage.gif: capitol.jpg capitol.90.jpg capitol.180.jpg capitol.270.jpg capitol.360.jpg
LOCAL $CONVERT -delay 10 -loop 0 capitol.jpg capitol.90.jpg capitol.180.jpg capitol.270.jpg capitol.360.jpg capitol.270.jpg capitol.180.jpg capitol.90.jpg capitol.montage.gif
capitol.90.jpg: capitol.jpg $CONVERT
$CONVERT -swirl 90 capitol.jpg capitol.90.jpg
capitol.180.jpg: capitol.jpg $CONVERT
$CONVERT -swirl 180 capitol.jpg capitol.180.jpg
capitol.270.jpg: capitol.jpg $CONVERT
$CONVERT -swirl 270 capitol.jpg capitol.270.jpg
capitol.360.jpg: capitol.jpg $CONVERT
$CONVERT -swirl 360 capitol.jpg capitol.360.jpg
capitol.jpg: $CURL
LOCAL $CURL -o capitol.jpg $URL
</pre>
Note that Makeflow differs from Make in a few important ways.
Read section 4 below to get all of the details.
<h2>Running Makeflow</h2>
To try out the example above, copy and paste it into a file named <tt>example.makeflow</tt>.
To run it on your local machine:
<pre>
% makeflow example.makeflow
</pre>
Note that if you run it a second time, nothing will happen, because all of the files are built:
<pre>
% makeflow example.makeflow
makeflow: nothing left to do
</pre>
Use the <tt>-c</tt> option to clean everything up before trying it again:
<pre>
% makeflow -c example.makeflow
</pre>
If you have access to a batch system running <a href=http://www.sun.com/software/sge>SGE</a>,
then you can direct Makeflow to run your jobs there:
<pre>
% makeflow -T sge example.makeflow
</pre>
Or, if you have a <a href=http://www.cs.wisc.edu/condor>Condor Pool</a>,
then you can direct Makeflow to run your jobs there:
<pre>
% makeflow -T condor example.makeflow
</pre>
To submit Makeflow as a Condor job that submits more Condor jobs:
<pre>
% condor_submit_makeflow example.makeflow
</pre>
You will notice that a workflow can run very slowly if you submit
each batch job to SGE or Condor, because it typically
takes 30 seconds or so to start each batch job running. To get
around this limitation, we provide the Work Queue system.
This allows Makeflow to function as a master process that
quickly dispatches work to remote worker processes.
<p>
To begin, let's assume that you are logged into a machine
named <tt>barney.nd.edu</tt>. start your Makeflow like this:
<pre>
% makeflow -T wq example.makeflow
</pre>
Then, submit 10 worker processes to Condor like this:
<pre>
% condor_submit_workers barney.nd.edu 9123 10
Submitting job(s)..........
Logging submit event(s)..........
10 job(s) submitted to cluster 298.
</pre>
Or, submit 10 worker processes to SGE like this:
<pre>
% sge_submit_workers barney.nd.edu 9123 10
</pre>
Or, you can start workers manually on any other machine you can log into:
<pre>
% work_queue_worker barney.nd.edu 9123
</pre>
Once the workers begin running, Makeflow will dispatch multiple
tasks to each one very quickly. If a worker should fail, Makeflow
will retry the work elsewhere, so it is safe to submit many
workers to an unreliable system.
<p>
When the Makeflow completes, your workers will still be available,
so you can either run another Makeflow with the same workers,
remove them from the batch system, or wait for them to expire.
If you do nothing for 15 minutes, they will automatically exit.
<p>
Note that <tt>condor_submit_workers</tt> and <tt>sge_submit_workers</tt>
are simple shell scripts, so you can edit them directly if you would
like to change batch options or other details.
<h2>The Fine Details</h2>
The Makeflow language is very similar to Make, but it does have a few important differences that you should be aware of.
<h3>Get the Dependencies Right</h3>
You must be careful to accurately specify <b>all of the files that a rule requires and creates</b>, including any custom executables. This is because Makeflow requires all these information to construct the environment for a remote job. For example, suppose that you have written a simulation program called <tt>mysim.exe</tt> that reads <tt>calib.data</tt> and then produces and output file. The following rule won't work, because it doesn't inform Makeflow what files are neded to execute the simulation:
<pre>
# This is an incorrect rule.
output.txt:
./mysim.exe -c calib.data -o output.txt
</pre>
However, the following is correct, because the rule states all of the files needed to run the simulation. Makeflow will use this information to
construct a batch job that consists of <tt>mysim.exe</tt> and <tt>calib.data</tt> and uses it to produce <tt>output.txt</tt>:
<pre>
# This is a correct rule.
output.txt: mysim.exe calib.data
./mysim.exe -c calib.data -o output.txt
</pre>
When a regular file is specified as an input file, it means the command relies on the contents of that file. When a directory is specified as an input file, however, it could mean one of two things. First, the command depends on the contents inside the directory. Second, the command relies on the existence of the directory (for example, you just want to add more things into the directory later, it does not matter what's already in it). <b>Makeflow assumes that an input directory indicates that the command relies on the directory's existence</b>.
<h3>No Phony Rules</h3>
For a similar reason, you cannot have "phony" rules that don't actually
create the specified files. For example, it is common practice to define
a <tt>clean</tt> rule in Make that deletes all derived files. This doesn't
make sense in Makeflow, because such a rule does not actually create
a file named <tt>clean</tt>. Instead use the <tt>-c</tt> option as shown above.
<h3>Just Plain Rules</h3>
Makeflow does not support all of the syntax that you find in various versions of Make. Each rule must have exactly one command to execute. If you have multiple commands, simply join them together with semicolons. Makeflow allows you to define and use variables, but it does not support pattern rules, wildcards, or special variables like <tt>$<</tt> or <tt>$@</tt>. You simply have to write out the rules longhand, or write a script in your favorite language to generate a large Makeflow.
<h3>Local Job Execution</h3>
Certain jobs don't make much sense to distribute. For example, if you have a very fast running job that consumes a large amount of data, then it should simply run on the same machine as Makeflow. To force this, simply add the word <tt>LOCAL</tt> to the beginning of the command line in the rule.
<h3>Batch Job Refinement</h3>
When executing jobs, Makeflow simply uses the default settings in your batch system. If you need to pass additional options, use the <tt>BATCH_OPTIONS</tt> variable or the <tt>-B</tt> option to Makeflow.
<p>
When using Condor, this string will be added to each submit file. For example, if you want to add <tt>Requirements</tt> and <tt>Rank</tt> lines to your Condor submit files, add this to your Makeflow:
<pre>
BATCH_OPTIONS = Requirements = (Memory>1024)
</pre>
<p>
When using SGE, the string will be added to the qsub options. For example, to specify that jobs should be submitted to the <tt>devel</tt> queue:
<pre>
BATCH_OPTIONS = -q devel
</pre>
<h3>Displaying a Makeflow</h3>
When run with the <tt>-D</tt> option, Makeflow will emit a diagram of the Makeflow in the
<a href=http://www.graphviz.org>Graphviz DOT</a> format. If you have <tt>dot</tt> installed,
then you can generate an image of your workload like this:
<pre>
% makeflow -D example.makeflow | dot -T gif > example.gif
</pre>
<h2>Running Makeflow with Work Queue</h2>
With the '-T wq' option, Makeflow runs as a master process that dispatches
tasks to remote worker processes.
<h3>Change master port</h3>
The master process listens on a port which the remote workers would connect to.
The default port number is 9123. Sometimes, however, the port number might be
not available on your system. You can change the default port via the '-p
<port number>' option. For example, if you want the master to listen on port
9567 by default, you can run the following command:
<pre>
% makeflow -T wq -p 9567 exmaple.makeflow
</pre>
<h3>Master Worker Matching</h3>
Explicitly specifying the hostname:port is the default way for a worker to
locate a master. A simpler way to match workers to masters is to use the
project name matching. You can give the master a project name with the -N
option:
<pre>
% makeflow -T wq -a -N MyProj example.makeflow
</pre>
The -N option gives the master a project name called 'MyProj'. The -a option
enables the catalog mode of the master. Only in the catalog mode a master
would advertise its information, such as the project name, running status,
hostname and port number, to a catalog server. Then a worker could retrieve
these information from the same catalog server. The above command uses the
default catalog server at Notre Dame which runs 24/7. We will talk about how to
set up your own catalog server later.
<p>
To start a worker that automatically finds MyProj's master via the default Notre
Dame catalog server:
<pre>
% work_queue_worker -a -N MyProj
</pre>
The '-a' option enables the catalog mode on the worker, which tells the worker
to contact a catalog server to find out a project's (specified by -N option)
hostname and port.
<p>
You can also give multiple -N options to a worker. The worker will find out
which ones of the specified projects are running from the catalog server and
randomly select one to work for. When one project is done, the worker would
repeat this process. Thus, the worker can work for a different master without
being stopped and given the different master's hostname and port. An example of
specifying multiple projects:
<pre>
% work_queue_worker -a -N proj1 -N proj2 -N proj3
</pre>
<p>
Now let's look at how to set up your own catalog server. Say you want to run
your catalog server on a machine named barney.nd.edu (the default port that the
catalog server will be listening on is 9097, you can change it via the '-p'
option), do:
<pre>
barney% catalog_server
</pre>
Now you have a catalog server listening at barney.nd.edu:9097. To make your
masters and workers contact this catalog server, simply add the '-C
hostname:port' option to both of your master and worker:
<pre>
% makeflow -T wq -C barney.nd.edu:9097 -N MyProj example.makeflow
% work_queue_worker -C barney.nd.edu:9097 -a -N MyProj
</pre>
<h3>Supported Makeflow Drivers</h3>
The full list of supported Makeflow drivers include
<ul>
<li>Local - local execution on a single multicore system</li>
<li>Condor - execution on a campus grid via the local <a href="http://www.cs.wisc.edu/condor">Condor Pool</a></li>
<li>SGE - execution on a high-performance cluster using the <a href="http://www.oracle.com/us/products/tools/oracle-grid-engine-075549.html">Oracle Grid Engine</a> (formerly the Sun Grid Engine).</li>
<li>Moab - execution on a high-performance cluster using the <a href="http://www.adaptivecomputing.com/resources/docs/mwm/6-0/index.php">Moab Workload Manager</a></li>
<li>Cluster - execution on a high-performance cluster using a user-defined cluster manager</li>
<li>Work Queue - execution on a lightweight, scalable master-worker <a href="http://cse.nd.edu/~ccl/software/workqueue/">framework</a> managed directly by Makeflow</li>
<li>Hadoop - execution on a cluster managed by the <a href="http://hadoop.apache.org/">Hadoop</a> framework, using data stored in <a href="http://hadoop.apache.org/hdfs">HDFS</a>.</li>
</ul>
<h2>For More Information</h2>
For the latest information about Makeflow, please visit our <a href=http://www.cse.nd.edu/~ccl/software/makeflow>web site</a> and subscribe to our <a href=http://www.cse.nd.edu/~ccl/software>mailing list</a>.
</body>
</html>
|