1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250
|
// Copyright (c) 2018-2022, Sylabs Inc. All rights reserved.
// This software is licensed under a 3-clause BSD license. Please consult the
// LICENSE.md file distributed with the sources of this project regarding your
// rights to use or distribute this software.
// Includes code from https://github.com/containers/podman
// Released under the Apache License Version 2.0
package oci
import (
"bufio"
"context"
"encoding/json"
"fmt"
"os"
"os/exec"
"path"
"path/filepath"
"syscall"
"time"
"github.com/google/uuid"
"github.com/sylabs/singularity/v4/internal/pkg/buildcfg"
"github.com/sylabs/singularity/v4/internal/pkg/util/bin"
"github.com/sylabs/singularity/v4/pkg/sylog"
"golang.org/x/sys/unix"
)
type ociError struct {
Level string `json:"level,omitempty"`
Time string `json:"time,omitempty"`
Msg string `json:"msg,omitempty"`
}
// Create creates a container from an OCI bundle
func Create(containerID, bundlePath string, systemdCgroups bool) error {
conmon, err := bin.FindBin("conmon")
if err != nil {
return err
}
runtimeBin, err := Runtime()
if err != nil {
return err
}
// chdir to bundle and lock it, so another oci create cannot use the same bundle
absBundle, err := filepath.Abs(bundlePath)
if err != nil {
return fmt.Errorf("failed to determine bundle absolute path: %w", err)
}
if err := os.Chdir(absBundle); err != nil {
return fmt.Errorf("failed to change directory to %s: %w", absBundle, err)
}
if err := lockBundle(absBundle); err != nil {
return fmt.Errorf("while locking bundle: %w", err)
}
// Create our own state location for conmon and singularity related files
sd, err := stateDir(containerID)
if err != nil {
return fmt.Errorf("while computing state directory: %w", err)
}
err = os.MkdirAll(sd, 0o700)
if err != nil {
return fmt.Errorf("while creating state directory: %w", err)
}
containerUUID, err := uuid.NewRandom()
if err != nil {
return err
}
// Pipes for sync and start communication with conmon
syncFds, err := unix.Socketpair(unix.AF_LOCAL, unix.SOCK_SEQPACKET|unix.SOCK_CLOEXEC, 0)
if err != nil {
return fmt.Errorf("could not create sync socket pair: %w", err)
}
syncChild := os.NewFile(uintptr(syncFds[0]), "sync_child")
syncParent := os.NewFile(uintptr(syncFds[1]), "sync_parent")
defer syncParent.Close()
startFds, err := unix.Socketpair(unix.AF_LOCAL, unix.SOCK_SEQPACKET|unix.SOCK_CLOEXEC, 0)
if err != nil {
return fmt.Errorf("could not create sync socket pair: %w", err)
}
startChild := os.NewFile(uintptr(startFds[0]), "start_child")
startParent := os.NewFile(uintptr(startFds[1]), "start_parent")
defer startParent.Close()
singularityBin := filepath.Join(buildcfg.BINDIR, "singularity")
rsd, err := runtimeStateDir()
if err != nil {
return err
}
cmdArgs := []string{
"--api-version", "1",
"--cid", containerID,
"--name", containerID,
"--cuuid", containerUUID.String(),
"--runtime", runtimeBin,
"--conmon-pidfile", path.Join(sd, conmonPidFile),
"--container-pidfile", path.Join(sd, containerPidFile),
"--log-path", path.Join(sd, containerLogFile),
"--runtime-arg", "--root",
"--runtime-arg", rsd,
"--runtime-arg", "--log",
"--runtime-arg", path.Join(sd, runcLogFile),
"--full-attach",
"--terminal",
"--bundle", absBundle,
"--exit-command", singularityBin,
"--exit-command-arg", "--debug",
"--exit-command-arg", "oci",
"--exit-command-arg", "cleanup",
"--exit-command-arg", containerID,
}
if systemdCgroups {
cmdArgs = append(cmdArgs, "--systemd-cgroup")
}
cmd := exec.Command(conmon, cmdArgs...)
cmd.Dir = absBundle
cmd.Env = append(cmd.Env, fmt.Sprintf("_OCI_SYNCPIPE=%d", 3), fmt.Sprintf("_OCI_STARTPIPE=%d", 4))
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
cmd.SysProcAttr = &syscall.SysProcAttr{
Setpgid: true,
}
cmd.ExtraFiles = append(cmd.ExtraFiles, syncChild, startChild)
// Run conmon and close it's end of the pipes in our parent process
sylog.Debugf("Starting conmon with args %v", cmdArgs)
if err := cmd.Start(); err != nil {
if err2 := releaseBundle(absBundle); err2 != nil {
sylog.Errorf("while releasing bundle: %v", err)
}
return fmt.Errorf("while starting conmon: %w", err)
}
syncChild.Close()
startChild.Close()
// No other setup at present... just signal conmon to start work
writeConmonPipeData(startParent)
// After conmon receives from start pipe it should start container and exit
// without error.
err = cmd.Wait()
if err != nil {
if err2 := releaseBundle(absBundle); err2 != nil {
sylog.Errorf("while releasing bundle: %v", err)
}
return fmt.Errorf("while starting conmon: %w", err)
}
// We check for errors from runc (which conmon invokes) via the sync pipe
pid, err := readConmonPipeData(syncParent, path.Join(sd, runcLogFile))
if err != nil {
if err2 := Delete(context.TODO(), containerID, systemdCgroups); err2 != nil {
sylog.Errorf("Removing container %s from runtime after creation failed", containerID)
}
return err
}
// Create a symlink from the state dir to the bundle, so it's easy to find later on.
bundleLink := path.Join(sd, "bundle")
if err := os.Symlink(absBundle, bundleLink); err != nil {
return fmt.Errorf("could not link attach socket: %w", err)
}
sylog.Infof("Container %s created with PID %d", containerID, pid)
return nil
}
// The following utility functions are taken from https://github.com/containers/podman
// Released under the Apache License Version 2.0
func readConmonPipeData(pipe *os.File, ociLog string) (int, error) {
// syncInfo is used to return data from monitor process to daemon
type syncInfo struct {
Data int `json:"data"`
Message string `json:"message,omitempty"`
}
// Wait to get container pid from conmon
type syncStruct struct {
si *syncInfo
err error
}
ch := make(chan syncStruct)
go func() {
var si *syncInfo
rdr := bufio.NewReader(pipe)
b, err := rdr.ReadBytes('\n')
if err != nil {
ch <- syncStruct{err: err}
}
if err := json.Unmarshal(b, &si); err != nil {
ch <- syncStruct{err: err}
return
}
ch <- syncStruct{si: si}
}()
data := -1
select {
case ss := <-ch:
if ss.err != nil {
if ociLog != "" {
ociLogData, err := os.ReadFile(ociLog)
if err == nil {
var ociErr ociError
if err := json.Unmarshal(ociLogData, &ociErr); err == nil {
return -1, fmt.Errorf("runc error: %s", ociErr.Msg)
}
}
}
return -1, fmt.Errorf("container create failed (no logs from conmon): %w", ss.err)
}
sylog.Debugf("Received: %d", ss.si.Data)
if ss.si.Data < 0 {
if ociLog != "" {
ociLogData, err := os.ReadFile(ociLog)
if err == nil {
var ociErr ociError
if err := json.Unmarshal(ociLogData, &ociErr); err == nil {
return ss.si.Data, fmt.Errorf("runc error: %s", ociErr.Msg)
}
}
}
// If we failed to parse the JSON errors, then print the output as it is
if ss.si.Message != "" {
return ss.si.Data, fmt.Errorf("runc error: %s", ss.si.Message)
}
return ss.si.Data, fmt.Errorf("container creation failed")
}
data = ss.si.Data
case <-time.After(createTimeout):
return -1, fmt.Errorf("container creation timeout")
}
return data, nil
}
// writeConmonPipeData writes nonce data to a pipe
func writeConmonPipeData(pipe *os.File) error {
someData := []byte{0}
_, err := pipe.Write(someData)
return err
}
|