1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
|
CH_TEST_TAG=$ch_test_tag
load "${CHTEST_DIR}/common.bash"
setup () {
scope full
prerequisites_ok "$ch_tag"
pmix_or_skip
if [[ $srun_mpi != pmix* ]]; then
skip 'pmix required'
fi
}
count_ranks () {
echo "$1" \
| grep -E '^0: init ok' \
| tail -1 \
| sed -r 's/^.+ ([0-9]+) ranks.+$/\1/'
}
@test "${ch_tag}/guest starts ranks" {
openmpi_or_skip
# shellcheck disable=SC2086
run ch-run $ch_unslurm "$ch_img" -- mpirun $ch_mpirun_np /hello/hello
echo "$output"
[[ $status -eq 0 ]]
rank_ct=$(count_ranks "$output")
echo "found ${rank_ct} ranks, expected ${ch_cores_node}"
[[ $rank_ct -eq "$ch_cores_node" ]]
[[ $output = *'0: send/receive ok'* ]]
[[ $output = *'0: finalize ok'* ]]
}
@test "${ch_tag}/inject cray mpi ($cray_prov)" {
cray_ofi_or_skip "$ch_img"
run ch-run "$ch_img" -- fi_info
echo "$output"
[[ $output == *"provider: $cray_prov"* ]]
[[ $output == *"fabric: $cray_prov"* ]]
[[ $status -eq 0 ]]
}
@test "${ch_tag}/validate $cray_prov injection" {
[[ -n "$ch_cray" ]] || skip "host is not cray"
[[ -n "$CH_TEST_OFI_PATH" ]] || skip "--fi-provider not set"
run $ch_mpirun_node ch-run --join "$ch_img" -- sh -c \
"FI_PROVIDER=$cray_prov FI_LOG_LEVEL=info /hello/hello 2>&1"
echo "$output"
[[ $status -eq 0 ]]
if [[ "$cray_prov" == gni ]]; then
[[ "$output" == *' registering provider: gni'* ]]
[[ "$output" == *'gni:'*'gnix_ep_nic_init()'*'Allocated new NIC for EP'* ]]
fi
if [[ "$cray_prov" == cxi ]]; then
[[ "$output" == *'cxi:mr:ofi_'*'stats:'*'searches'*'deletes'*'hits'* ]]
fi
}
@test "${ch_tag}/MPI version" {
[[ -z $ch_cray ]] || skip 'serial launches unsupported on Cray'
# shellcheck disable=SC2086
run ch-run $ch_unslurm "$ch_img" -- /hello/hello
echo "$output"
[[ $status -eq 0 ]]
if [[ $ch_mpi = openmpi ]]; then
[[ $output = *'Open MPI'* ]]
else
[[ $ch_mpi = mpich ]]
if [[ $ch_cray ]]; then
[[ $output = *'CRAY MPICH'* ]]
else
[[ $output = *'MPICH Version:'* ]]
fi
fi
}
@test "${ch_tag}/empty stderr" {
multiprocess_ok
output=$($ch_mpirun_core ch-run --join "$ch_img" -- \
/hello/hello 2>&1 1>/dev/null)
echo "$output"
[[ -z "$output" ]]
}
@test "${ch_tag}/serial" {
[[ -z $ch_cray ]] || skip 'serial launches unsupported on Cray'
# This seems to start up the MPI infrastructure (daemons, etc.) within the
# guest even though there's no mpirun.
# shellcheck disable=SC2086
run ch-run $ch_unslurm "$ch_img" -- /hello/hello
echo "$output"
[[ $status -eq 0 ]]
[[ $output = *' 1 ranks'* ]]
[[ $output = *'0: send/receive ok'* ]]
[[ $output = *'0: finalize ok'* ]]
}
@test "${ch_tag}/host starts ranks" {
multiprocess_ok
echo "starting ranks with: ${ch_mpirun_core}"
guest_mpi=$(ch-run "$ch_img" -- mpirun --version | head -1)
echo "guest MPI: ${guest_mpi}"
# shellcheck disable=SC2086
run $ch_mpirun_core ch-run --join "$ch_img" -- /hello/hello 2>&1
echo "$output"
[[ $status -eq 0 ]]
rank_ct=$(count_ranks "$output")
echo "found ${rank_ct} ranks, expected ${ch_cores_total}"
[[ $rank_ct -eq "$ch_cores_total" ]]
[[ $output = *'0: send/receive ok'* ]]
[[ $output = *'0: finalize ok'* ]]
}
@test "${ch_tag}/Cray bind mounts" {
[[ $ch_cray ]] || skip 'host is not a Cray'
ch-run "$ch_img" -- mount | grep -F /dev/hugepages
if [[ $cray_prov == 'gni' ]]; then
ch-run "$ch_img" -- mount | grep -F /var/opt/cray/alps/spool
else
ch-run "$ch_img" -- mount | grep -F /var/spool/slurmd
fi
}
@test "${ch_tag}/revert image" {
unpack_img_all_nodes "$ch_cray"
}
|