1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077
|
OAR CHANGELOG
=============
version 2.6.1:
--------------
- [pam_oar_adopt] Make it easier to install and configure the oar-node PAM profile
- [pam_oar_adopt] Fix and improve the script
- [pam_oar_adopt] Move pam_oar_adopt back to the oar-node package
- [pam_oar_adopt] Rework man page
- [job_resource_manager_systemd/pam_oar_adopt] Make OAR env compatible with pam_env
- [job_resource_manager_systemd] 1 liner log message for tmp clean-up
- [job_resource_manager_systemd/oarsh] Fix cgroup v2 path extraction
- [setup] cleanup script
version 2.6.0:
--------------
- [oarsh_shell] Add support for cgroupv2/systemd and use hwloc
- [job_resource_manager] Add support for cgroupv2/systemd and use hwloc
- [oarcgde] Add the oarcgdev tool to manage devices blacklist in cgroupv2
- [suspend_resume_manager] Add support for cgroupv2/systemd
- [Monika] Fix display of wanted resources
- [all] Fix for Perl exporter (Perl 5.39.1)
- [oarsub] Ignore ^Z from termial (SIGTSTP)
- [API] Remove cycle limits
- [IO.pm] Fix issue with resource_log
- [IO.pm/MetaSched] Add functions to allow scheduling perf statistics
- [oaraccounting] Now use a systemd timer
- [pam_adopt_oar] Add PAM script to enable ssh connection to jobs
- [oar_resources_init] Now use hwloc to find resources on nodes
- [oar_resources_add] Add support for resources in the hwloc format
- [man] Move administrator's commands to section 8 and some rewrittings
- [cron] Provide both cron and systemd timers (oaraccounting, oarnodecheck)
- [oarnodecheck] Add support of cgroupv2/systemd, refactor
Starting from OAR 2.6, a new mechanism using cgroupv2/systemd and hwloc is in
place to map OAR resources do physical compute resources (cores, GPUs) on nodes
and manage the processes of jobs.
As a consequence, on an existing installation, the cpuset property of the OAR
resources may need some changes to adopt the hwloc description of the machines.
The oar_resource_init script can be used to look at the expected values for the
cpuset resource property.
Regarding the gpudevice resource property: it must now contain the list of the
special devices (e.g.: /dev/nvidia0) associated to a resource.
Finally, cgroupv1 activation is not need anymore, only the unified cgroupv2
file hierarchy is required now. Thus, cgroupv1 related directives in the
kernel command line can be removed on OAR nodes.
version 2.5.10:
---------------
- [oarnodesetting] Allow setting the suspected state
- [oarstat] Allow to see other users' initial_request
- [oarsub] ensure the inline job key is in a valid format, by adding \n
- [oarsub] remove ssh inline keys from initial_request
- [scheduler] add the SCHEDULER_RESOURCE_ORDER_ADV_RESERVATIONS_THRESHOLD
option: advance reservations in a near future can use the same resources
order as batch job, taking into account the standby and besteffort jobs)
- [oarstat] Add a machine parseable output for -p option
- [oarstat] Add a machine parseable output for -e (events) option
- [oarstat] Exit with an error if -p and --sql options are used together
- [oarnodes] Factorize SQL query used to get resources
- [oarnodes] Factorize SQL query to get nodes' events
- [oarnodes] Optimize OAR's resources querying (option -r)
- [oarnodes] Optimize nodes' states querying
- [oarnodes] Update usage and man page
- [oarsub] add a reservation end time possibility to -r option
- [oarsub] factorize the reservation parsing in interactive condition
- [oarsub] be more flexible about the reservation dates parsing
- [oarsub] add the 'now' keyword support to reservation request
- [man] rework oarsub man page
- [oarsub] update usage
- [oarnodesetting] allow for unsetting the value of resource properties
- [oarsub] rename OARSUB_NODE_EXEC_FILE to INTERACTIVE_JOB_HOOK_EXEC_FILE
- [oarexec] add PASSIVE_JOB_HOOK_EXEC_FILE
- [NodeChangeState] fix DB transaction error when resubmitting a job
- [NodeChangeState] do not resubmit jobs for deploy/cosystem and if
server_prologue error
- [NodeChangeState] on prologue/epilogue error, only set the first node to
suspected
- [NodeChangeState] Do not suspect nodes for prologue/epilogue error of
deploy/cosystem jobs
- [oarsub] add the advance reservation validation hook feature
- [oarnode] fix oarnode --sql if not resources to show
- [Hulot] Add debug info regarding timeouts
- [sarko] fix debug message for job frag
- [accounting] change the behavior of accounting to be more efficient
- [leon] optimize SQL query of get_to_kill_jobs subroutine
- [MetaSched] Optimize querying of the last wake up for a list of node
- [IO.pm] Add constants to store often used OAR's job states
- [Hulot] Fix unknown method oar_warning
- [job_resource_manager] add support for AMD gpus
- [oarsh] add support of AMD gpus
- [oardodo] parametrize the OOM killer setup of oardodo
- [NodeChangeState] factorize the resources state changing
- [NodeChangeState] correctly lock and make transaction with PostgreSQL
- [IO.pm] factorize jobs fragging made by NodeChangeState
- [IO.pm] Add an lock_table_exclusive subroutine, for postgresql
- [oarnodes] fix machine parseables output by returning an empty set
- [oarnodes] fix uninitialized value with -r option
- [oarnodes] fix issue when using non existing nodes in arguments
- [oar.conf] annotate configuration lines with modules it is used for
- [oarwalltime] always allow admins to perform walltime reduction
- [oarapi] add a environment variable to configure the max cycle of cgi
- [IO.pm] do not specify a file name for stderr/stdout for interactive jobs
- [judas] allow a custom instance name in notification mail subjects
- [ssh] Remove deprecated option in sshd config
- [core/scripts] add oar-node start/stop scripts to launch with systemd
- [setup] Add systemd services unit for oar-node and oar-server
- [Makefiles] Add support for systemd
- [oar-database] Add option to skip the installation of the default admission
rules when creating a new OAR database
- [finaud/hulot] fix a race condition resulting in nodes being suspected when
put in standby by hulot while finaud is running
- [monika] show job events in job stats
- [monika] add nodes filtering
- [oarsub] add a -v switch (usable multiple times) to select verbosity
- [misc] remove desktop_computing possibility in oarsub and API
- [oarsub] add # before each outputed string (except the job ids) and better
format of information and error messages
- [oar.conf/bipbip] add DEPLOY_COSYSTEM_JOB_EXEC_SYSTEM option
- [oarsub] Use systemd-run when running processes on the frontend for
deploy/cosystem jobs
- [criu checkpoint] added a timeout to exit when the resume fails
- [oarapi] replace workdir by directory in OAR's rest API
- [all] change the log format and make it configurable and various logs
messages improvements
- [oarstat] add jobs events to the gantt output (--gantt/-g)
- [oarstat] make machine parseable output detailed view without using -f
- [oarstat] add missing fields and uniq some when calling with --gantt
- [oarstat] add a new output format '3' fixing several field names
- [job_resource_manager] write env file without variable references
- [oarwalltime] allow request cancellation
- [oarwalltime] add --whole option
- [oarwalltime] add a timeout for walltime change requests
- [oarnodes/API] add functionality to provide the detailed jobs occupation on nodes
- [api] better returned messages and add an OAR_IN_API env variable
- [IO] make job creation with add_micheline_subjob() more consistent
- [oarsub] enhance -C option to be more user friendly
- [IO.pm] optimize get_gantt_jobs_to_launch() SQL query
- [oarapi] fix an error in logs due to redefined Dump() subroutine
- [tools] add the pam_oar_adopt script
version 2.5.9:
--------------
- [scheduler] add the SCHEDULER_RESOURCE_ORDER_ADV_RESERVATIONS option, so
that the scheduling of advance reservations is not impacted by the current
state of the resources (e.g. nodes in standby, current besteffort jobs)
- [admission rules] add an admission rule to restrict advance reservation
inner jobs to use container jobs that are advance reservations as well
- [schedulers] fix issues with the scheduling of a inner job before its
container
- [schedulers] waiting inner job in container that vanished are set to error
- [oarexec] add an option to have inner jobs killed along with their container
- [oarexec] do not run inner jobs before their container is already running
- [oarwalltime] make walltime change respect a possibly defined job deadline
- [oarwalltime] add an option to disable the walltime reduction
- [oarwalltime] fix oarwalltime the per queue configuration
- [oarsub/scheduler] fix a bug with the recent Perl max recursion depth limit
- [drawgantt] show the timezone in the dates
- [oarexec] fix oarsub shell termination when the job is killed
- [database] add an index to the resource_log table
- [oar_resource_add] add support for reparing the resource properties
- [oarexec] add support to disable the auto-repair of suspected nodes
- [job_resource_manager/oarsh] add the COMPUTE_THREAD_SIBLINGS option, to let
OAR automatically set the HT thread siblings if not set in the resources
hierarchy with a thread resource, or in the resource cpuset field
- [job_resource_manager] rework code, support more cgroup subsystems
- [oarsh] add support to let oarsh create a sub cgroup with either a subset of
the cpuset or of the devices in the shell opened on the node. See an
example of usage with GNU Parallel in the website documentation
- [oarsub] add the OARSUB_NODE_EXEC_FILE configuration to run a custom command
on the head node of the job before the job shell
- [oarsub] make oarsub accept the submission of a noop job with no script
- [oarstat] fix JSON/YAML/XML output when no job to display
- [oarstat] oarstat -j can now use the OAR_JOB_ID environment variable
- [oarstat] fix YAML display with the YAML::Syck library
version 2.5.8:
--------------
- [job_resource_manager] manage nvidia gpu with the cgroup devices
- [oarwalltime] add functionality to allow changing the walltime of a
running job. See the oarwalltime command and oar.conf
- [scheduler] fix the besteffort + deploy VS adv. reservation case
- [scheduler] add the state=permissive job type, allowing jobs to be scheduled
and run (if noop or cosystem as well only) regardless of the aliveness of
resources
- [oarsub/scheduler] fix warning "Use of uninitialized value $resource_value"
- [oarsub] fix unknown error message in case of job termination + typos
- [oarnodesetting] do not kill noop jobs using by resources changed to
Dead or Absent
- [finaud] fix: make pingchecker run only on resources of type default
- [oar-database] fix the privilegies of oar's read only user in PostgreSQL
in new installation. For existing database, the following command apply the
fix: `oar-database --fix-ro-user-priv ...`
- [api] some improvement in the Apache configuration and tests
- [api] added POST /media/force to overwrite a file
- [finaud] bugfix: make pingchecker run only on resources of type default
- [api] hardening on the syntax of the URIs (should not impact good URIs!)
- [drawgantt-svg] add a mark next to the label of the resources pointed by
the mouse
- [drawgantt-svg] fix possible SQL injection with the filters
- [drawgantt-svg] improve the label_display_regex text replacement mechanism
- [drawgantt-svg/oarstat] fix past and current moldable jobs display
- [drawgantt-svg] fix drain display
- [drawgantt-svg] fix nav_filter with only one option
- [oar.conf] update SSH options to the one of OpenSSH 7.6p1
- [oar-database] support --db-is-local (UNIX socket) for MySQL (MariaDB)
- [oar-node] fix warnings with OAR's sshd configuration
- [oar-resource-add] fix the auto-offset option
- [oar-resource-add] add support for creating GPU resources
- [oar-resource-add] add support to handle the CPU and GPU topologies
version 2.5.7:
--------------
This version mainly brings a security fix for the oarsh command. It is highly
recommended to upgrade (server, frontend(s) and nodes), since all previous
versions of OAR are affected.
- [oarsh] fix a security hole when passing option to OpenSSH. See oar.conf to
adapt settings to your setup, if required (OARSH_* variables)
- [oarsh] dropped the mechanism to select whether to use oarsh or fall back
to ssh, given a list of hostname patterns
- [oarsub] fix the job-key information of the manual page
- [oarsub] handle cases where trailing spaces were breaking oarsub script directives
- [api] added an example of Apache configuration for the authentication
- [documentation] improve the SSH keys setup explanations for OAR installation
version 2.5.6:
--------------
- [oar.conf] add the SCHEDULER_MIN_TIME_BETWEEN_2_CALLS option
- [metascheduler] fix a bug with advance reservations when predicted resources
must be recomputed
- [metascheduler] fix a bug with advance reservations with standby start job
types (noop/cosystem/deploy=standby)
- [oar-node init] create /var/run/sshd if needed
- [oarsub] fix several bugs with the array job submission
- [oarstat] allow using Perl's YAML::Syck for a quicker YAML output
- [oarstat] improve performance and information for the --gantt option
- [oarstat] prettier print of job events
- [oarnodesetting] optimize grouped operations on resources and add a lock
around property changes
- [oaradmissionrules] fix bug: changing a rule priority does not enable it
- [oar_resources_init] fix node read from standard input
- [oarnodecheck] use /var/lib/oar instead of /etc/oar for working files
- [logs] several cosmetic fixes
- [api] add colmet extraction function
- [api] proposed apache configuration now uses a virtual host on port 6668
- [drawgantt-svg] fix the possibly very long delay when zooming
- [drawgantt-svg] add forecast buttons + relative start/stop url arguments
- [drawgantt-svg] rework configuration for the default display
- [drawgantt-svg] allow displaying resources of type != default
- [drawgantt-svg] improve support for use as a widget in custom HTML pages
(multisite, etc)
- [monika] fix bugs with recent Perl/Perl CGI versions
- [monika] fix harmless bug in configuration
- [visualization] remove overlib.js (license issue), this breaks the legacy
drawgantt (which is not supported anymore)
- [misc] remove some old development codes from sources
- [misc] fix inconsistent copyrights and licenses
- [doc] update the installation documentation
version 2.5.5:
--------------
- [iolib] fix deadlock with TRUNCATE in postgresql
- [almighty] add SCHEDULER_MIN_TIME_BETWEEN_2_CALLS:
the scheduler is launched at max every t seconds (t=5 by default), this
avoids the scheduler to cause starvation with regard to the other
modules
- [scheduler] fix some memory leaks.
- [scheduler] add a cache to the resources tree computation: improve
the scheduler speed by reducing the number of SQL queries.
- [scheduler] backport the expire/postpone/deadline job types.
- [scheduler] rename the placeholder job types: placeholder/allowed.
- [scheduler] fix timesharing (adv reservation and \*_placeholder schedulers).
- [scheduler] allows noop/cosystem/deploy jobs to start on resources in
standby, no wake-up is triggered (requires activating energy saving).
- [oarsub] use jobkey (-k) if the OAR_JOB_KEY_FILE env variable is set.
- [oarstat] fix accounting display
- [oar_resources_init] fix HyperThreading bug + improve CLI
- [oar_resources_add] make HyperThreading optional + fix long options + make
nicer warning outputs for auto-offset
- [admission rules] rewrite the job type check rule
- [admission rules] fix oaradmissionrules bug with MySQL when modifying a rule
- [oar-node] fix pid in init script.
- [api] some optimizations + rework authentication configuration (apache).
- [api][drawgantt-svg][monika] fix apache config (apache 2.4).
- [drawgantt-svg] new version with aggreation of resources and more.
- [monika] add thread to the hidden properties.
- [api] fastcgi config now using suexec
- [api] now using apache environment variables when headers are not available
- [api] optimization of /jobs query response time (especially efficient for
mysql based installations)
- [api] security fix: HTML outputs which did not break on errors
version 2.5.4:
--------------
- [api] Implemented GET /resources/<property>/jobs to get jobs running on
resources, grouped by a given property.
- [api] Implemented HTTP_X_API_PATH_PREFIX header variable to prefix all
returned URIs.
- [api] Added GET /jobs/<id>/details support.
- [api] Implemented the ability to get a set of jobs at once with
GET /jobs?ids=<id1>:<id2>:<id3>:...
- [api] BUGFIX: stderr and stdout where reversed.
- [api] BUGFIX: memory leak in the API when used with FastCGI.
- [api] Rewritten/commented apache config file.
- [kamelot] BUGFIX: fix hierarchies manipulation (remove toplevel resource).
- [accounting] Fixed a memory leak and a rare case of bad consumption count.
- [oar.conf] Replace the MAX_CONCURRENT_JOB_TERMINATIONS option by
MAX_CONCURRENT_JOBS_STARTING_OR_TERMINATING
- [almighty] Rewrote the handling of starting and finishing jobs: limit
bipbip processes to MAX_CONCURRENT_JOBS_STARTING_OR_TERMINATING
to avoid overloading the server.
- [oarexec] Introduced BASH_ENV=~oar/.batch_job_bashrc for batch jobs
Batch jobs with bash shell have some difficulties to source the
right bash scripts when launching.
Now we set BASH_ENV=~oar/.batch_job_bashrc before launching the
user bash process so we can handle which script must be sourced.
By default we source ~/.bashrc.
- [commands] Exit immediately on wrong arguments.
- [oarsh] Propagate OAR shell environment variables:
The users have access to the same OAR environment variables when
connecting on all the job nodes with oarsh
- [job_uid] Removed job uid feature (not used).
- [job_resource_manager] Use fstrim (for SSD) when cleaning files.
- [deploy] Do not check the nodes when ACTIVATE_PINGCHECKER_AT_JOB_END is on
and the job is of the deploy type (bug #17128).
- [judas] Disabled sending log by email on errors as this could generate too
many mails.
- [noop] Added the 'noop' job type. If specified, nothing is done on computing
nodes. The job just reserves the resources for the specified
walltime.
- [quotas] Added the possibility to make quotas on:
- the number of used resources
- the number of running jobs
- the result of resources X hours of running jobs
- [runner] Added runner bipbip processes in the bipbip_laucher in Almighty.
- [database] Replaced field "maintenance" by "drain".
The administrator can disable resources without killing
current jobs by::
oarnodesetting -h n12 -p drain=YES
or::
oarnodesetting --drain -h n12
:WARNING: any admission rule using the "maintenance" keyword
must be adapted to use the "drain" keyword.
- [oar_resources_init] Added support for SMT (hyperthreading)
- [cpuset] The cpuset resources filed is now a varchar.
It is now possible to specify several cpu id in the cpuset field
as needed in some case where SMT is enabled on nodes, e.g.::
1+4+8
- [oarsub] Added a filter for notifications
It now is possible to specify which TAGs must trigger motifications::
oarsub --notify "[END,ERROR]mail:name@domain.com" -I
- [admission rules] Added priority to rules that allows to manage more easily
the rules execution order.
- [admission rules] Added a enable/disable flag to rules to allow activating
or deactivating rules without having to comment the code.
- [oaradmin] The oaradmin rules command is now disabled since it does not
handle priority and enable flags.
- [oaradmin] The oaradmin conf command is disabled.
- [oar_resources_add] Added the oar_resources_add command to help adding
resources and replace the oaradmin resources command.
- [oaradmissionrules] oaradminssionrules is a new command to manage the
oaradmission rules.
- [oarnodesetting] Removed dependnency to oarnodes.
- [drawgantt-svg] Various bugfixes and improvements
- [metasched] If a besteffort job has a checkpoint duration defined
(oarsub --checkpoint) then OAR tries to checkpoint it before killing it.
It is possible to define a limit of the checkpoint duration with an
admission rule ($checkpoint variable).
- [drawgantt] Drawgantt is not now deprecated (and not shipped with packages)
- [misc] OAR packaged components do not require Ruby anymore.
- [oaraccounting] Fix bug reported in Debian tracker #678976
- [sources] Clean-up some used or unrelevant files/codes
- [scheduler] change default schedulers to quota
The default scheduler of the queues default, admin and besteffort is
now oar_sched_gantt_with_timesharing_and_fairsharing_and_quotas.
The configuration file /etc/oar/scheduler_quotas.conf contains no quota
enforcement so the behaviour remains the same as before.
version 2.5.3:
--------------
- Add the "Name" field on the main Monika page. This is easier for the users
to find there jobs.
- Add MAX_CONCURRENT_JOB_TERMINATIONS into the oar.conf of the master. This
limits the number of concurrent processes launched by the Almighty when the
the jobs finish.
- Bug fix in ssh key feature in oarsub.
- Added --compact, -c option to oarstat (compact view or array jobs).
- Improvements of the API: media upload from html forms, listing of files,
security fixes, add of new configuration options, listing of the scheduled
nodes into jobs, fixed bad reinitialization of the limit parameter,
stress_factor, accounting...
See OAR-DOCUMENTATION-API-USER for more information.
- CGROUP: handle cgroup hierarchy already mounted by the OS like in Fedora 18
(by systemd in /sys/fs/cgroup) in job_resource_manager_cgroups.pl.
- Bug fix oar-database: fix the reset function for mysql.
- SVG version of drawgantt: all features are now implemented to replace the
legacy drawgantt. Both can be installed.
- Bug fix schedulers: rewrite schedulers with placeholders.
- Rework default admission rules.
- Add support to the oar_resource_init command to generate resources with
a "thread" property (useful if HyperThreading is activated/used on nodes).
- Fix stdout/stderr bug: check the allowed characters in the path given by
the users.
- Fix: the user shell (bash) didn't source /etc/bash.bashrc in batch jobs.
- Add quota which limits the number of used resources at a time depending of
the job attributes: queue, project, types, user
(available with the scheduler
"oar_sched_gantt_with_timesharing_and_fairsharing_and_quotas").
- Add comments in user job STDERR files to know if a job was killed or
checkpointed.
- Add the variable $jobproperties_applied_after_validation. It can be used in
an admission rule to add a constraint after the validation of the job. Ex:
$jobproperties_applied_after_validation = "maintenance='off'";
So, even if all the resources have "maintenance='on'", the new jobs will be
accepted but not scheduled now.
- Add the oardel option --force-terminate-finishing-job: to use when a job is
stuck in the Finishing state.
- Bug #15911: Energy saving now waits SCHEDULER_NODE_MANAGER_IDLE_TIME for
nodes that have been woken up, even if they didn't run any job.
- Simplify job dependencies: do not check the exit code of the jobs in
dependencies.
- Admission rules: add the "estimate_job_nb_resources" function that is
useful to know the number of resources that will be used by a job.
- oarstat: add another output format that can be used by using "--format 2"
or by setting "OARSTAT_DEFAULT_OUTPUT_FORMAT=2" in oar.conf.
- oarsub: Add the capability to use the tag %jobname% in the STDOUT (-O)
and/or STDERR (-E) filenames (like %jobid%).
- bug #14935: fix timesharing jobs within a container issue
- add schedulers with the placeholder feature.
version 2.5.2:
--------------
- Bugfix: /var/lib/oar/.bash_oar was empty due to an error in the common
setup script.
- Bugfix: the PINGCHECKER_COMMAND in oar.conf depends now on %%OARDIR%%.
- Bug #13939: the job_resource_manager.pl and job_resource_manager_cgroups.pl
now deletes the user files in /tmp, /var/tmp and /dev/shm at
the end of the jobs.
- Bugfix: in oardodo.c, the preprocessed variables was not defined correctly.
- Finaud: fix race condition when there was a PINGCHECKER error jsut before
another problem. The node became Alive again when the PINGCHECKER said OK
BUT there was another error to resolve.
- Bugfix: The feature CHECK_NODES_WITH_RUNNING_JOB=yes never worked before.
- Speedup monika (X5).
- Monika: Add the conf max_cores_per_line to have several lines if the number
of cores are too big.
- Minor changes into API:
- added cmd_output into POST /jobs.
- API: Added GET /select_all?query=<query> (read only mode).
- Add the field "array_index" into the jobs table. So that resubmit a job
from an array will have the right array_index anvironment variable.
- oarstat: order the output by job_id.
- Speedup oarnodes.
- Fix a spelling error in the oaradmin manpage.
- Bugfix #14122: the oar-node init.d script wasn't executing
start_oar_node/stop_oar_node during the 'restart' action.
- Allow the dash character into the --notify "exec:..." oarsub option.
- Remove some old stuffs from the tarball:
- visualization_interfaces/{tgoar,accounting,poar};
- scheduler/moldable;
- pbs-oar-lib.
- Fix some licence issues.
version 2.5.1:
--------------
- Sources directories reorganized
- New "Phoenix" tool to try to reboot automatically broken nodes
(to setup into /etc/oar/oar_phoenix.pl)
- New (experimental!) scheduler written in Ocaml
- Cpusets are activated by default
- Bugfix #11065: oar_resource_init fix (add a space)
- Bug 10999: memory leak into Hulot when used with postgresql. The leak has
been minimized, but it is still there (DBD::Pg bug)
- Almighty cleans ipcs used by oar on exit
- Bugfix #10641 and #10999: Hulot is automatically and periodically restarted
- Feature request #10565: add the possibility to check the aliveness of the
nodes of a job at the end of this one (pingchecker)
- REST API heavily updated: new data structures with paginated results,
desktop computing functions, rspec tests, oaradmin resources management,
admission rules edition, relative/absolutes uris fixed
- New ruby desktop computing agent using REST API (experimental)
- Experimental testsuite
- Poar: web portal using the REST API (experimental)
- Oaradmin YAML export support for resources creation (for the REST API)
- Bugfix #10567: enabling to bypass window mechanism of hulot.
- Bugfix #10568: Wake up timeout changing with the number of nodes
- Add in oar.conf the tag "RUNNER_SLIDING_WINDOW_SIZE": it allows the runner
to use a sliding window to launch the bipbip processes if
"DETACH_JOB_FROM_SERVER=1". This feature avoids the overload of the server
if plenty of jobs have to be launched at the same time.
- Fix problem when deleting a job in the Suspended state (oarexec was stopped
by a SIGSTOP so it was not able to handle the delete operation)
- Make the USER_SIGNAL feature of oardel multi job independent and remove the
temporary file at the end of the job
- Monika: display if the job is of timesharing type or not
add in the job listing the initial_request (is there a reason to
not display it?)
- IoLib: update scheduler_priority resources property for timesharing jobs.
So the scheduler will be able to avoid to launch every timesharing
jobs on the same resources (they can be dispatched)
- OAREXEC: unmask SIGHUP and SIGPIPE for user script
- node_change_state: do not Suspect the first node of a job which was
EXTERMINATED by Leon if the cpuset feature is configured (let do the job by
the cpuset)
- OAREXEC: ESRF detected that sometime oarexec think that he notified the
Almighty with it exit code but nothing was seen on the server. So try to
resend the exit code until oarexec is killed.
- oar_Tools: add in notify_almighty a check on the print and on the close of
the socket connected to Almighty.
- oaraccounting: --sql is now possible into a "oarstat --accounting" query
- Add more logs to the command "oarnodes -e host" when a node turns into
Suspected
- Execute user commands with /proc/self/oom_adj to 15. So the first processes
that will be killed when there is no more memory available is the user
ones.
Hence the system will remain up and running and the user job will finished.
Drawback: this file can be changed manually by the user so if someone knows
a method to do the same thing but only managed by root, we take???
- Bugfix API: quotes where badly escaped into job submission
- Add the possibility to automatically resubmit idempotent job which ends
with an exit code of 99: oarsub -t idempotent "sleep 5; exit 99"
- Bugfix API: Some informations where missing into jobs/details, especially
the scheduled resources.
- API: added support of "param_file" value for array job submissions. This value
is a string representing the content of a parameters file. Sample submission::
{"resource":"/cpu=1", "command":"sleep", "param_file":"60\n90\n30"}
This submits 3 sleep jobs with differents sleep values.
- Remove any reference to gridlibs and gridapi as these components are obsolete
- Add stdout and stderr files of each job in oarstat output.
- API now supports fastcgi (big performance raise!)
- Add "-f" option to oarnodesetting to read hostnames from a file.
- API can get/upload files (GET or POST /media/<file_path>)
- Make "X11 forwarding" working even if the user XAUTHORITY environment
variable does not contain ~/.Xauthority (GDM issue).
- Add job_resource_manager_cgroups which handles cpuset + other cgroup
features like network packet tagging, IO disk shares, ...
- Bugfix #13351: now oar_psql_db_init is executed with root privileges
- Bugfix #13434: reservation were not handled correctly with the energy
saving feature
- Add cgroups FREEZER feature to the suspend/resume script (better than kill
SIGSTOP/SIGCONT).
This is doable thanks to the new job_resource_manager_cgroups.
- Implement a new script 'oar-database' to manage the oar database.
oar_mysql_init & oar_psql_init are dropped.
- Huge code reorganisation to allow a better packaging and system integration
- Drop the oarsub/oarstat 2.3 version that was kept for compatibility issues
during the 2.4.x branch.
- By default the oar scheduler is now
'oar_sched_gantt_with_timesharing_and_fairsharing' and the following values
has been set in oar.conf: SCHEDULER_TIMEOUT to 30, SCHEDULER_NB_PROCESSES to 4
and SCHEDULER_FAIRSHARING_MAX_JOB_PER_USER to 30
- Add a limitation on the number of concurrent bipbip processes on the server
(for detached jobs).
- Add IPC cleaning to the job_resource_manager* when there is no other job of
the same user on the nodes.
- make better scheduling behaviour for dependency jobs
- API: added missing stop_time into /jobs/details
version 2.4.4:
--------------
- oar_resource_init: bad awk delimiter. There's a space and if the property
is the first one then there is not a ','.
- job suspend: oardo does not exist anymore (long long time ago). Replace it
with oardodo.
- oarsub: when an admission rule died micheline returns an integer and not an
array ref. Now oarsub ends nicely.
- Monika: add a link on each jobid on the node display area.
- sshd_config: with nodes with a lot of core, 10 // connections could be too
few
version 2.4.3:
--------------
- Hulot module now has customizable keepalive feature
- Added a hook to launch a healing command when nodes are suspected
(activate the SUSPECTED_HEALING_EXEC_FILE variable)
- Bugfix #9995: oaraccouting script doesn't freeze anymore when db is unreachable.
- Bugfix #9990: prevent from inserting jobs with invalid username (like an empty username)
- Oarnodecheck improvements: node is not checked if a job is already running
- New oaradmin option: --auto-offset
- Feature request #10565: add the possibility to check the aliveness of the
nodes of a job at the end of this one (pingchecker)
version 2.4.2:
--------------
- New "Hulot" module for intelligent and configurable energy saving
- Bug #9906: fix bad optimization in the gantt lib (so bad scheduling
version 2.4.1:
--------------
- Bug #9038: Security flaw in oarsub --notify option
- Bug #9601: Cosystem jobs are no more killed when a resource is set to Absent
- Fixed some packaging bugs
- API bug fixes in job submission parsing
- Added standby info into `oarnodes -s` and available_upto info into
/resources uri of the API
- Bug Grid'5000 #2687 Fix possible crashes of the scheduler.
- Bug fix: with MySQL DB Finaud suspected resources which are not of the
"default" type.
- Signed debian packages (install oar-keyring package)
version 2.4.0:
--------------
- Bug #8791: added CHECK_NODES_WITH_RUNNING_JOB=no to prevent from checking
occupied nodes
- Fix bug in oarnodesetting command generated by oar_resources_init (detect_resources)
- Added a --state option to oarstat to only get the status of specified jobs
(optimized query, to allow scripting)
- Added a REST API for OAR and OARGRID
- Added JSON support into oarnodes, oarstat and oarsub
- New Makefile adapted to build packages as non-root user
- add the command "oar_resources_init" to easily detect and initialize the
whole resources of a cluster.
- "oaradmin version": now retrieve the most recent database schema number
- Fix rights on the "schema" table in postgresql.
- Bug #7509: fix bug in add_micheline_subjob for array jobs + jobtypes
- Ctrl-C was not working anymore in oarsub.
It seems that the signal handler does not handle the previous syntax
($SIG = 'qdel')
- Fix bug in oarsh with the "-l" option
- Bug #7487: bad initialisation of the gnatt for the container jobs.
- Scheduler: move the "delete_unnecessary_subtrees" directly into
"find_first_hole". Thus this is possible to query a job like::
oarsub -I -l nodes=1/core=1+nodes=4/core=2
(no hard separation between each group)
For the same behaviour as before, you can query:
oarsub -I -l {prop=1}/nodes=1/core=1+{prop=2}/nodes=4/core=2
- Bug #7634: test if the resource property value is effectively defined
otherwise print a ''
- Optional script to take into account cpu/core topology of the nodes at boot
time (to activate inside oarnodesetting_ssh)
- Bug #7174: Cleaned default PATH from "./" into oardodo
- Bug #7674: remove the computation of the scheduler_priority field for
besteffort jobs from the asynchronous OAR part. Now the value is set when
the jobs are turned into toLaunch state and in Error/Terminated.
- Bug #7691: add --array and --array-param-file options parsing into the
submitted script. Fix also some parsing errors.
- Bug #7962: enable resource property "cm_availability" to be manipulated by
the oarnodesetting command
- Added the (standby) information to a node state in oarnodes when it's state
is Absent and cm_availability != 0
- Changed the name of cm_availability to available_upto which is more relevant
- add a --maintenance option to oarnodesetting that sets the state of a resource
to Absent and its available_upto to 0 if maintenance is on and resets previous
values if maintenance is off.
- added a --signal option to oardel that allow a user to send a signal to one of
his jobs
- added a name field in the schema table that will refer to the OAR version name
- added a table containing scheduler name, script and description
- Bug #8559: Almighty: Moved OAREXEC_XXXX management code out of the queue for
immediate action, to prevent potential problems in case of scheduler timeouts.
- oarnodes, oarstat and the REST API are no more making retry connections to the
database in case of failure, but exit with an error instead. The retry behavior
is left for daemons.
- improved packaging (try to install files in more standard places)
- improved init script for Almighty (into deb and rpm packages)
- fixed performance issue on oarstat (array_id index missing)
- fixed performance issue (job_id index missing in event_log table)
- fixed a performance issue at job submission (optimized a query and added an
index on challenges table)
decisions).
version 2.3.5:
--------------
- Bug #8139: Drawgantt nil error (Add condition to test the presence of nil
value in resources table.)
- Bug #8416: When a the automatic halt/wakeup feature is enabled then there
was a problem to determine idle nodes.
- Debug a mis-initialization of the Gantt with running jobs in the
metascheduler (concurrency access to PG database)
version 2.3.4:
--------------
- add the command "oar_resources_init" to easily detect and initialize the
whole resources of a cluster.
- "oaradmin version": now retrieve the most recent database schema number
- Fix rights on the "schema" table in postgresql.
- Bug #7509: fix bug in add_micheline_subjob for array jobs + jobtypes
- Ctrl-C was not working anymore in oarsub.
It seems that the signal handler does not handle the previous syntax
($SIG = 'qdel')
- Bug #7487: bad initialisation of the gnatt for the container jobs.
- Fix bug in oarsh with the "-l" option
- Bug #7634: test if the resource property value is effectively defined
otherwise print a ''
- Bug #7674: remove the computation of the scheduler_priority field for
besteffort jobs from the asynchronous OAR part. Now the value is set when
the jobs are turned into toLaunch state and in Error/Terminated.
- Bug #7691: add --array and --array-param-file options parsing into the
submitted script. Fix also some parsing errors.
- Bug #7962: enable resource property "cm_availability" to be manipulated by
the oarnodesetting command
version 2.3.3:
--------------
- Fix default admission rules: case unsensitive check for properties used in
oarsub
- Add new oaradmin subcommand: oaradmin conf. Useful to edit conf files and
keep changes in a Subversion repository.
- Kill correctly each taktuk command children in case of a timeout.
- New feature: array jobs (option --array) (on oarsub, oarstat oardel,
oarhold and oarresume) and file-based parametric array jobs
(oarsub --array-param-file)
/!\ in this version the DB scheme has changed. If you want to upgrade your
installation from a previous 2.3 release then you have to execute in your
database one of these SQL script (stop OAR before)::
mysql:
DB/mysql_structure_upgrade_2.3.1-2.3.3.sql
postgres:
DB/pg_structure_upgrade_2.3.1-2.3.3.sql
version 2.3.2:
--------------
- Change scheduler timeout implementation to schedule the maximum of jobs.
- Bug #5879: do not show initial_request in oarstat when it is not a job of
the user who launched the oarstat command (oar or root).
- Add a --event option to oarnodes and oarstat to display events recorded for
a job or node
- Display reserved resources for a validated waiting reservation, with a hint
in their state
- Fix oarproperty: property names are lowercase
- Fix OAR_JOB_PROPERTIES_FILE: do not display system properties
- Add a new user command: oarprint which allow to pretty print resource
properties of a job
- Debug temporary job UID feature
- Add 'kill -9' on subprocesses that reached a timeout (avoid Perl to
wait something)
- desktop computing feature is now available again. (ex: oarsub -t
desktop_computing date)
- Add versioning feature for admission rules with Subversion
version 2.3.1:
--------------
- Add new oarmonitor command. This will permit to monitor OAR jobs on compute
nodes.
- Remove sudo dependency and replace it by the commands "oardo" and
"oardodo".
- Add possibility to create a temporary user for each jobs on compute nodes.
So you can perform very strong restrictions for each job (ex: bandwidth
restrictions with iptable, memory management, ... everything that can be
handled with a user id)
- Debian packaging: Run OAR specific sshd with root privileges (under heavy
load, kernel may be more responsive for root processes...)
- Remove ALLOWED_NETWORKS tag in oar.conf (added more complexeity than
resolving problems)
- /!\ change database scheme for the field *exit_code* in the table *jobs*.
Now *oarstat* *exit_code* line reflects the right exit code of the user
passive job (before, even when the user script was not launched the
*exit_code* was 0 which was BAD)
- /!\ add DB field *initial_request* in the table *jobs* that stores the
oarsub line of the user
- Feature Request #4868: Add a parameter to specify what the "nodes" resource
is a synomym for. Network_address must be seen as an internal data and not
used.
- Scheduler: add timeout for each job == 1/4 of the remaining scheduler
timeout.
- Bug #4866: now the whole node is Suspected instead of just the par where
there is no job onto. So it is possible to have a job on Suspected nodes.
- Add job walltime (in seconds) in parameter of prologue and epilogue on
compute nodes.
- oarnodes does not show system properties anymore.
- New feature: container job type now allows to submit inner jobs for a
scheduling within the container job
- Monika refactoring and now in the oar packaging.
- Added a table schema in the db with the field version, reprensenting the
version of the db schema.
- Added a field DB_PORT in the oar config file.
- Bug #5518: add right initialization of the job user name.
- Add new oaradmin command. This will permit to create resources and
manage admission rules more easily.
- Bug #5692: change source code into a right Perl 5.10 syntax.
version 2.2.12:
---------------
- Bug #5239: fix the bug if there are spaces into job name or project
- Fix the bug in Iolib if DEAD_SWITCH_TIME >0
- Fix a bug in bipbip when calling the cpuset_manager to clean jobs in error
- Bug #5469: fix the bug with reservations and Dead resources
- Bug #5535: checks for reservations made at a same time was wrong.
- New feature: local checks on nodes can be plugged in the oarnodecheck
mechanism. Results can be asynchronously checked from the server (taktuk
ping checker)
- Add 2 new tables to keep track of the scheduling decisions
(gantt_jobs_predictions_log and gantt_jobs_resources_log). This will help
debugging scheduling troubles (see SCHEDULER_LOG_DECISIONS in oar.conf)
- Now reservations are scheduled only once (at submission time). Resources
allocated to a reservations are definitively set once the validated is
done and won't change in next scheduler's pass.
- Fix DrawGantt to not display besteffort jobs in the future which is
meaningless.
version 2.2.11:
---------------
- Fix Debian package dependency on a CGI web server.
- Fix little bug: remove notification (scheduled start time) for Interactive
reservation.
- Fix bug in reservation: take care of the SCHEDULER_JOB_SECURITY_TIME for
reservations to check.
- Fix bug: add a lock around the section which creates and feed the OAR
cpuset.
- Taktuk command line API has changed (we need taktuk >= 3.6).
- Fix extra ' in the name of output files when using a job name.
- Bug #4740: open the file in oarsub with user privileges (-S option)
- Bug #4787: check if the remote socket is defined (problem of timing with
nmap)
- Feature Request #4874: check system names when renaming properties
- DrawGantt can export charts to be reused to build a global multi-OAR view
(e.g. DrawGridGantt).
- Bug #4990: DrawGantt now uses the database localtime as its time reference.
version 2.2.10:
---------------
- Job dependencies: if the required jobs do not have an exit code == 0 and in
the state Terminated then the schedulers refuse to schedule this job.
- Add the possibility to disable the halt command on nodes with
cm_availability value.
- Enhance oarsub "-S" option (more #OAR parsed).
- Add the possibility to use oarsh without configuring the CPUSETs (can be
useful for users that don't want to configure there ssh keys)
version 2.2.9:
--------------
- Bug 4225: Dump only 1 data structure when using -X or -Y or -D.
- Bug fix in Finishing sequence (Suspect right nodes).
version 2.2.8:
--------------
- Bug 4159: remove unneeded Dump print from oarstat.
- Bug 4158: replace XML::Simple module by XML::Dumper one.
- Bug fix for reservation (recalculate the right walltime).
- Print job dependencies in oarstat.
version 2.2.7:
--------------
- Bug 4106: fix oarsh and oarcp issue with some options (erroneous leading
space).
- Bug 4125: remove exit_code data when it is not relevant.
- Fix potential bug when changing asynchronously the state of the jobs into
"Terminated" or "Error".
version 2.2.6:
--------------
- Bug fix: job types was not sent to cpuset manager script anymore.
(border effect from bug 4069 resolution)
version 2.2.5:
--------------
- Bug fix: remove user command when oar execute the epilogue script on the
nodes.
- Clean debug and mail messages format.
- Remove bad oarsub syntax from oarsub doc.
- Debug xauth path.
- bug 3995: set project correctly when resubmitting a job
- debug 'bash -c' on Fedora
- bug 4069: reservations with CPUSET_ERROR (remove bad hosts and continue
with a right integrity in the database)
- bug 4044: fix free resources query for reservation (get the nearest hole
from the beginning of the reservation)
- bug 4013: now Dead, Suspected and Absent resources have different colors in
drawgantt with a popup on them.
version 2.2.4:
--------------
- Redirect third party commands into oar.log (easier to debug).
- Add user info into drawgantt interface.
- Some bug fixes.
version 2.2.3:
--------------
- Debug prologue and epilogue when oarexec receives a signal.
version 2.2.2:
--------------
- Switch nice value of the user processes into 0 in oarsh_shell (in case of
sshd was launched with a different priority).
- debug taktuk zombies in pingchecker and oar_Tools
version 2.2.1:
--------------
- install the "allow_clasic_ssh" feature by default
- debug DB installer
version 2.2:
------------
- oar_server_proepilogue.pl: can be used for server prologue and epilogue to
authorize users to access to nodes that are completely allocated by OAR. If
the whole node is assigned then it kills all jobs from the user if all cpus
are assigned.
- the same thing can be done with cpuset_manager_PAM.pl as the script used to
configure the cpuset. More efficient if cpusets are configured.
- debug cm_availability feature to switch on and off nodes automatically
depending on waiting jobs.
- reservations now take care of cm_availability field
version 2.1.0:
--------------
- add "oarcp" command to help the users to copy files using oarsh.
- add sudo configuration to deal with bash. Now oarsub and oarsh have the
same behaviour as ssh (the bash configuration files are loaded correctly)
- bug fix in drawgantt (loose jobs after submission of a moldable one)
- add SCHEDULER_RESOURCES_ALWAYS_ASSIGNED_TYPE into oar.conf. Thus admin can
add some resources for each jobs (like frontale node)
- add possibility to use taktuk to check the aliveness of the nodes
- %jobid% is now replaced in stdout and stderr file names by the effective
job id
- change interface to shu down or wake up nodes automatically (now the node
list is read on STDIN)
- add OARSUB_FORCE_JOB_KEY in oar.conf. It says to create a job ssh key by
default for each job.
- %jobid% is now replaced in the ssh job key name (oarsub -k ...).
- add NODE_FILE_DB_FIELD_DISTINCT_VALUES in oar.conf that enables the admin
to configure the generated containt of the OAR_NODE_FILE
- change ssh job key oarsub options behaviour
- add options "--reinitialize" and "--delete-before" to the oaraccounting
command
- cpuset are now stored in /dev/cpuset/oar
- debian packaging: configure and launch a specific sshd for the user oar
- use a file descriptor to send the node list --> able to handle a very large
amount of nodes
- every config files are now in /etc/oar/
- oardel can add a besteffort type to jobs and vis versa
version 2.0.2:
--------------
- add warnings and exit code to oarnodesetting when there is a bad node name
or resource number
- change package version
- change default behaviour for the cpuset_manager.pl (more portable)
- enable a user to use the same ssh key for several jobs (at his own risk!)
- add node hostnames in oarstat -f
- add --accounting and -u options in oarstat
- bug fix on index fields in the database (syncro): bug 2020
- bug fix about server pro/epilogue: bug 2022
- change the default output of oarstat. Now it is usable: bug 1875
- remove keys in authorized_keys of oar (on the nodes) that do not
correspond to an active cpuset (clean after a reboot)
- reread oar.conf after each database connection tries
- add support for X11 forwarding in oarsub -I and -C
- debug mysql initialization script in debian package
- add a variable in oarsh for the default options of ssh to use
(more useful to change if the ssh version installed does not
handle one of these options)
- read oar.conf in oarsh (so admin can more easily change options in this
script)
- add support for X11 forwarding via oarsh
- change variable for oarsh: OARSH_JOB_ID --> OAR_JOB_ID
version 2.0.0:
--------------
- Now, with the ability to declare any type of resources like licences,
VLAN, IP range, computing resources must have the type *default* and a
network_address not null.
- Possibility to declare associated resources like licences, IP ranges, ...
and to reserve them like others.
- Now you can connect to your jobs (not only for reservations).
- Add "cosystem" job type (execute and do nothing for these jobs).
- New scheduler: "oar_sched_gantt_with_timesharing". You can specify jobs
with the type "timesharing" that indicates that this scheduler can launch
more than 1 job on a resource at a time. It is possible to restrict this
feature with words "user and name". For example, '-t
timesharing=user,name' indicates that only a job from the same user with
the same name can be launched in the same time than it.
- Add PostGresSQL support. So there is a choice to make between MySQL and
PostgresSQL.
- New approach for the scheduling: administrators have to insert into the
databases descriptions about resources and not nodes. Resources have a
network address (physical node) and properties. For example, if you have
dual-processor, then you can create 2 different resources with the same
natwork address but with 2 different processor names.
- The scheduler can now handle resource properties in a hierarchical
manner. Thus, for example, you can do "oarsub -l /switch=1/cpu=5" which
submit a job on 5 processors on the same switch.
- Add a signal handler in oarexec and propagate this signal to the user
process.
- Support '#OAR -p ...' options in user script.
- Add in oar.conf:
* DB_BASE_PASSWD_RO: for security issues, it is possible to execute
request with parts specified by users with a read only account (like
"-p" option).
* OARSUB_DEFAULT_RESOURCES: when nothing is specified with the oarsub
command then OAR takes this default resource description.
* OAREXEC_DEBUG_MODE: turn on or off debug mode in oarexec (create
/tmp/oar/oar.log on nodes).
* FINAUD_FREQUENCY: indicates the frequency when OAR launchs Finaud
(search dead nodes).
* SCHEDULER_TIMEOUT: indicates to the scheduler the amount of time
after what it must end itself.
* SCHEDULER_JOB_SECURITY_TIME: time between each job.
* DEAD_SWITCH_TIME: after this time Absent and Suspected resources are
turned on the Dead state.
* PROLOGUE_EPILOGUE_TIMEOUT: the possibility to specify a different
timeout for prologue and epilogue (PROLOGUE_EPILOGUE_TIMEOUT).
* PROLOGUE_EXEC_FILE: you can specify the path of the prologue script
executed on nodes.
* EPILOGUE_EXEC_FILE: you can specify the path of the epilogue script
executed on nodes.
* GENERIC_COMMAND: a specific script may be used instead of ping to
check aliveness of nodes. The script must return bad nodes on STDERR
(1 line for a bad node and it must have exactly the same name that
OAR has given in argument of the command).
* JOBDEL_SOFTWALLTIME: time after a normal frag that the system waits
to retry to frag the job.
* JOBDEL_WALLTIME: time after a normal frag that the system waits
before to delete the job arbitrary and suspects nodes.
* LOG_FILE: specify the path of OAR log file (default: /var/log/oar.log).
- Add wait() in pingchecker to avoid zombies.
- Better code modularization.
- Remove node install part to launch jobs. So it is easier to upgrade from
one version to another (oarnodesetting must already be installed on each
nodes if we want to use it).
- Users can specify a method to be notified (mail or script).
- Add cpuset support
- Add prologue and epilogue script to be executed on the OAR server before
and after launching a job.
- Add dependency support between jobs ("-a" option in oarsub).
- In oarsub you can specify the launching directory ("-d" option).
- In oarsub you can specify a job name ("-n" option).
- In oarsub you can specify stdout and stderr file names.
- User can resubmit a job (option "--resubmit" in oarsub).
- It is possible to specify a read only database account and it will be
used to evaluate SQL properties given by the user with the oarsub command
(more scecure).
- Add possibility to order assigned resources with their properties by the
scheduler. So you can privilege some resources than others
(SCHEDULER_RESOURCE_ORDER tag in oar.conf file)
- a command can be specified to switch off idle nodes
(SCHEDULER_NODE_MANAGER_SLEEP_CMD, SCHEDULER_NODE_MANAGER_IDLE_TIME,
SCHEDULER_NODE_MANAGER_SLEEP_TIME in oar.conf)
- a command can be specified to switch on nodes in the Absent state
according to the resource property *cm_availability* in the table
resources (SCHEDULER_NODE_MANAGER_WAKE_UP_CMD in oar.conf).
- if a job goes in Error state and this is not its fault then OAR will
resubmit this one.
|