File: README.debian

package info (click to toggle)
dqs 3.1.8-9
  • links: PTS
  • area: non-free
  • in suites: slink
  • size: 8,908 kB
  • ctags: 9,887
  • sloc: ansic: 87,447; sh: 2,952; makefile: 442; yacc: 247; lex: 94; perl: 83; csh: 51; fortran: 24; awk: 16
file content (78 lines) | stat: -rw-r--r-- 4,215 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
dqs for DEBIAN
----------------------

   The package installs itself as both a client and a server for cell
"Local".  No queues are created on this cell.  DQS generates many error and
warning messages.  Most of these (adding hosts, creating files) can be
ignored.
   To create and enable a queue, type
	qconf -aq
   This command will start up a vi on a standard queue configuration. 
Edit the hostname and queuename, and whatever other parameters you wish.
Save and exit from vi, and qmod will attempt to add the queue.
	qmod -e <queue name>
Enables the just-created queue.
	qstat -f
will list the status of all queues in the cell.  If the cell is in an
UNKNOWN state, the qmaster and dqs_execd aren't speaking to each other. 
Check your networking and hostname, make sure you can ping `hostname`.
Verify that hostname is in /etc/dqs/resolve_file.  Shutdown and restart dqs.
	/etc/init.d/dqs stop ; /etc/init.d/dqs start
Watch dqs_execd, as it sometimes survives a kill.
   If the qconf or qmod fails with an alarm timeout, try fiddling with the
ALARM* parameters in /etc/dqs/conf_file.  The current values aren't quite
right for most systems.  If you come up with good ones, let me know.

   Use qsub to submit a job.  For instance:
	qsub
	env
	^D
will run a job that prints out the environment in a batch job.  The
output will go to a set of files starting with STDIN in your home
directory.
   If you have more than one machine, you'll want to change the default
setup.  Choose one machine as the cell master, and whatever name you want
for the cell (a domain or hostname is conventional). Edit
/etc/dqs/resolve_file, and distribute a copy to each client.  Restart dqs on
all cell nodes.  A qmaster daemon should start on the cell master, and a
dqs_execd should start on every machine.  You'll need to add queues on the
master initially, in order for the qmaster to build up a trust list.
   DQS requires three tcp/ip ports,  610,611,612.  If these conflict with
existing port numbers edit /etc/services, or choose different existing names in
/etc/dqs/conf_file.  All machines in a cell must use the same port numbers.
DQS is not going to obtain official IANA port numbers, as DQS 4 is under
development and will use a different protocol.  Copying /etc/dqs/* and
/etc/services to all machines in your cell is the easiest way to keep
everything consistent.
   Sharing user home directories across the cell is recommended, it will
simplify writing DQS jobs.  Static NFS mounts may be the simplest. 
Automounts from multiple hosts are what I use, but there are some
difficulties.  amd does not automatically mount files accessed in the /amd
directory, so the -cwd option often fails. autofs should resolve this
difficulty, but has only recently been released in a stable kernel (2.0.31).
   There are serious security implications to using DQS.  Running dqs
on a cell is essentially the same as adding all machines in the cell
to /etc/hosts.equiv.  uid 1015 on one host will be trusted to be uid
1015 on all the others in the cell.  Make sure these really are the
same user.  Root on any machine can become any user they wish without
authentication.  Consider installing bios passwords.
   Use of NIS is recommended, as it isn't any less secure than DQS, and
should reduce the need to manage accounts on all machines in the cell.  You
may need to edit /etc/nsswitch.conf.  Putting your cell behind a firewall or
off the internet entirely is also a good idea.
   Moving the cell master: kill the qmaster.  Copy the
/var/spool/qmaster/hostname directory from the old master to the new master,
renaming the hostname component.  Edit the resolve_file on all nodes in the
cell.  Restart the qmaster.  Long running jobs can survive, but if a
dqs_execd dies, any jobs on that host will die.  Try not to damage them. 
The job id restarts at #1, so don't expect low-numbered jobs to survive a
transition.
  Parallel jobs are supported.  For instance, to submit a 3 node pvmpov job:

qsub -par PVM -master `hostname` -l qty.eq.2,linux
	povray -i /usr/doc/povray/povscn/level2/skyvase.pov \
	+v1 +ft -x +a0.300 +r3 -q9 -mv2.0 -w640 -h480 -d +N
^D


Drake Diedrich <Drake.Diedrich@anu.edu.au>, Tue,  1 Jul 1997 17:31:56 +1000