File: sequoia-git.md

package info (click to toggle)
rust-sequoia-git 0.4.0-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,580 kB
  • sloc: sh: 367; makefile: 32
file content (851 lines) | stat: -rw-r--r-- 36,960 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
---
title: Supply Chain Security for Version Control Systems
abbrev: Supply Chain Security for VCSs
docname: draft-nhw-openpgp-supply-chain-security-vcs-00
date: 2023-06-20
category: info
submissiontype: independent

ipr: trust200902
area: int
workgroup: openpgp
keyword: Internet-Draft

stand_alone: yes
pi: [toc, sortrefs, symrefs]

venue:
  group: "OpenPGP"
  type: "Working Group"
  mail: "openpgp@ietf.org"
  arch: "https://mailarchive.ietf.org/arch/browse/openpgp/"
  repo: "https://gitlab.com/sequoia-pgp/sequoia-git"
  latest: "https://sequoia-pgp.gitlab.io/sequoia-git/"

author:
 -
    ins: N.H. Walfield
    name: Neal H. Walfield
    org: Sequoia PGP
    email: neal@sequoia-pgp.org
 -
    ins: J. Winter
    name: Justus Winter
    org: Sequoia PGP
    email: justus@sequoia-pgp.org
normative:
  RFC2119:
  RFC4880:
  RFC8174:
  toml:
    author:
      -
        ins: T. Preston Werner
        name: Tom Preston-Werner
      -
        ins: P. Gedam
        name: Pradyun Gedam
    title: TOML v1.0.0
    date: 2021-01-12
    target: https://toml.io/en/v1.0.0
informative:
  event-stream:
    author:
      -
        ins: T. Hunter
        name: Thomas Hunter II
    title: "Compromised npm Package: event-stream"
    date: 2018-11-27
    target: https://medium.com/intrinsic-blog/compromised-npm-package-event-stream-d47d08605502
  dependency-confusion:
    author:
      -
        ins: A. Birsan
        name: Alex Birsan
    title: "Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies"
    date: 2021-02-09
    target: https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610
  reflections-on-trusting-trust: DOI.10.1145/358198.358210
  guix:
    author:
      -
        ins: L. Courtès
        name: Ludovic Courtès
    title: Building a Secure Software Supply Chain with GNU Guix
    date: 2022-06
    doi: 10.48550/arXiv.2206.14606
    target: https://arxiv.org/abs/2206.14606
--- abstract

In a software supply chain attack, an attacker injects malicious code
into some software, which they then leverage to compromise systems
that depend on that software.  A simple example of a supply chain
attack is when SourceForge, a once popular open source software forge,
injected advertising into the binaries that they delivered on behalf
of the projects that they hosted.  Software supply chain attacks are
different from normal bugs in that the intent of the perpetrator is
different: in the former case, bugs are added with the intent to harm,
and in the latter they are added inadvertently, or due to negligence.

Software supply chain security starts on a developer's machine.  By
signing a commit or a tag, a developer can assert that they wrote or
approved the change.  This allows users of a code base to determine
whether a version has been approved, and by whom, and then make a
policy decision based on that information.  For instance, a packager
may require that software releases be signed with a particular
certificate.

Version control systems such as git have long included support for
signed commits and tags.  Most developers don't sign their commits,
and in the cases where they do, it is usually unclear what the
semantics are.

This document describes a set of semantics for signed commits and
tags, and a framework to work with them in a version control system,
in particular, in a git repository.  The framework is designed to be
self contained.  That is, given a repository, it is possible to add
changes, or authenticate a version without consulting any third
parties; all of the relevant information is stored in the repository
itself.

By publishing this draft we hope to clarify and enrich the semantics
of signing in version control system repositories thereby enabling a
new tooling ecosystem, which can strengthen software supply chain
security.

--- middle

# Introduction

## Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY",
and "OPTIONAL" in this document are to be interpreted as described
in BCP 14 {{RFC2119}} {{RFC8174}} when, and only when, they appear in
all capitals, as shown here.

## Terminology

  - "Maintainer" is a software developer, who is responsible for a
    software project in the sense that they act as a gatekeeper, and
    decide with other maintainers what changes are acceptable, and
    should be added to the software.

  - "Contributor" is someone who contributes changes to a software
    project.  Unlike a maintainer, a contributor cannot add their
    changes to a project on their own accord.

  - "Software supply chain" is the collection of software that
    something depends on.  For instance, a software package depends on
    libraries, it is built by a compiler, it is distributed by a
    package registry, etc.

  - "Software supply chain attack" is an attack in which an attacker
    compromises a software supply chain.  For instance, a maintainer
    or a contributor may stealthily insert malicious code into a
    software project in order to compromise the security of a system
    that depends on that software.

  - "Version control system" is a database, which contains versions of
    a software project.  Each version includes links to preceding
    versions.

  - "git" is a popular version control system.  Although "git" is
    distributed and does not rely on a central authority, it is often
    used with one to simplify collaboration.  Examples of centralized
    authorities include gitea, GitHub, and Gitlab.

  - "Commit" is a version that is added to the "version control
    system".  In git, commits are identified by their message digest.

  - "Branch" is a typically human readable name given to a particular
    commit.  When a commit is superseded, the branch is updated to
    point to the new commit.  Repositories normally have at least one
    branch called "main" or "master" where most work is done.

  - "Tag" is a name given to a particular commit.  Tags are usually
    only added for significant versions like releases and are normally
    not changed once published.

  - "Change" is a commit or a tag.

  - "Forge" is a service which hosts software repositories, and often
    provides additional services like a bug tracker.  Examples of
    forges are codeberg, GitHub, and GitLab.

  - "Registry" or "Package Registry" is a service that provides an
    index of software packages.  Maintainers register their software
    there under a well-known name.  Build tools like `cargo` fetch
    dependencies by looking up the software by its name.

  - "Authentication" is the process of determining whether something
    should be considered authentic.

  - "Trust model" is a process for determining what evidence to
    consider, and how to weigh it when doing authentication.

  - "OpenPGP certificate" or just "certificate" is the data structure
    that section 11.2 of {{RFC4880}} defines as a "Transferable Public
    Key".  A certificate is sometimes called a key, but this is
    confusing, because a certificate contains components that are also
    called keys.

  - "Liveness" is a property of a certificate, a signature, etc.  An
    object is considered live with respect to some reference time if,
    as of the reference time, its creation time is in the past, and it
    has not expired.

# Problem Statement

Consider the following scenario.  Alice and Bob are developers.  They
are the primary maintainers of the Xyzzy project, which is a free and
open source project.  Although they do most of the work on the
project, they also have occasional collaborators like Carol, and
drive-by contributions from people like Dave.  Paul packages their
software for an operating system distribution.  Ted from Ty Coon
Corporation integrates it into his company's software.  And, Mallory
is an adversary who is trying to subvert the project.

When someone updates their local copy of Xyzzy's source code
repository, they want to authentic any changes before they use them.
That is, they want to know that each change was made or approved by
someone whom they consider authorized to make that change.

In the Xyzzy project, Alice is willing to rely on Bob to check-in
changes he makes, and to approve contributions from third parties
without auditing the code herself.  But, she doesn't want to rely on
anyone else without checking their proposed changes manually.  Bob
feels the same way about Alice.

In version control systems like `git`, the meta-data for a commit or
tag includes `author` and `committer` fields.  By themselves, these
fields cannot be used to reliably determine who a change's author and
committer are, because these fields are set by the committer and
unauthenticated.  That is, Mallory could author a commit, set both of
these fields to "Bob," and push the malicious commit.  No one would be
able to tell that they came from Mallory and not Bob.

There are two main ways to authenticate changes.  First, changes to a
repository or branch can be mediated by a trusted third party, which
enforces a policy at the time a change is added to the repository.
Second, individual changes can be signed, and a policy can be
evaluated at any time.  These two approaches can be mixed.

## Repositories Protected by a Trusted Third Party

When using a trusted third party, only certain users are allowed to
change the repository.  This is often realized using access control
lists: the trusted third party has a list of users who are allowed to
do certain types of modifications.  Before the trusted third party
allows a user to modify the repository, the user has to authenticate
themselves.  When they attempt to make a change, the trusted third
party checks that they are authorized.  If they are, the third party
allows the modification.  If not, it is rejected.  A user of this
repository can now conclude that if they can authenticate the trusted
third party, then the changes were approved.

A drawback of using a trusted third party is that it relies on
centralized infrastructure.  This means the only way for a user to
determine if a version of Xyzzy is authentic is to fetch it from the
trusted third party; the repository is not self authenticating.  If
the third party ever disappears, users will no longer be able to
authenticate the project's source code.

Another disadvantage is that this approach doesn't expose the
project's policy to its users.  This means that both first-parties
like Alice and third-parties like Paul are not able to audit the
trusted third party.  This is the case even if the set of users that
are currently authorized to make changes are exposed via a separate
API end point: because the set of authorized users changes with time,
all updates to the ACLs would need to be exposed along with
information about what user authorized each change.

## Self-Authenticating Repositories

An alternative approach is to have authors and committers sign their
changes.  Users then check that the changes are signed correctly, and
authenticate the signers.  For instance, for the Xyzzy project, Paul
might decide that Alice or Bob are allowed to make changes.  So when
Paul fetches changes, he checks whether Alice or Bob signed the new
changes, and flags changes made by anyone else.  If Alice and Bob
later decide that Carol should also be allowed to directly commit her
changes, Paul needs to update his policy.  If Bob leaves the team,
Paul needs to pay enough attention to notice, and then disallow
changes made by Bob after a certain date.

For projects that sign their commits today, this is more or less the
status quo.  Most users, however, do not want to maintain their own
policy, and aren't even in a good position to do so.  Since users are
willing to rely on the maintainers to make changes to the project,
they can just as well delegate the policy to them.  Now, a user like
Paul just needs to designate an initial policy.  If he knows when the
policy changes, and can authenticate changes to the policy based on
the existing policy, then he is able to authenticate any subsequent
changes to the repository.

An easy way to manage the policy is to include it in the repository
itself.  Then changes to the policy can be authenticated in the same
way as normal changes.  This also makes the repository self
authenticating, because it is self contained.

One issue is how users should handle forks to a project.  A fork in a
project may occur due to a social or technical conflict, or because
the project dies, and is later revived by a different party.  In both
cases, it may not be possible for there to be a clean hand off to the
new maintainer.  That is, Alice or Bob may not be willing or able to
change the policy file to allow Dave to seamlessly continue the
development of Xyzzy.

Forks are straightforward to handle, but require user intervention:
from the system's perspective, Dave is not authorized, so his changes
are rejected.  And that's good, as Dave may be an attacker; the system
can't tell.  Users opt in to a fork by changing their trust root to
designate a version in which Dave is authorized to make changes.

# Threat Model

Consider an attacker, Mallory, who is trying to compromise a user,
Ursula, by injecting a vulnerability into the software supply chain of
a piece of software, Super Frob, that she uses.  There are several
different ways that Mallory could accomplish this.  These include:

 - Mallory could pose as a contributor, and convince a develop to
   authorize a malicious change to one of Super Frob's dependencies,
   such as a library.

 - Mallory could take over an abandoned package that Super Frob
   depends on, and publish a new version with malicious code.

 - Mallory could use typo squatting to opportunistically or through
   social engineering inject malicious software into Super Frob's
   supply chain.

   For instance, Mallory could publish a library called `libevent`,
   which is a copy of `libevents`, but includes a malicious change,
   and Super Frob accidentally includes `libevent` as a dependency
   instead of `libevents`.

 - Mallory could publish a malicious package that has the same name as
   a package on another registry in order to confuse Super Frob's
   build tools.

   This type of attack is called a dependency confusion attack,
   {{dependency-confusion}}.  It can be launched when an organization
   uses an internal registry and a public registry to find
   dependencies.  As dependencies are often referenced by name, and
   that name does not include the registry, an attacker may trick the
   organization into using their malicious version of the package.

 - Mallory could sneak a change into one of Super Frob's build
   dependencies, like the compiler.

   Whereas software maintainers have a large degree of control over
   their direct dependencies, they have more limited control over the
   tools downstream users use to build their software.  In the
   extreme, a software project may include a copy of a dependency in
   their version control system, or depend on a specific version of a
   dependency by cryptographic hash, but only specify a standard that
   the compiler needs, like C99.

   This attack is most well-known from Ken Thompson's Reflections on
   Trusting Trust Turning award lecture,
   {{reflections-on-trusting-trust}}.

 - Mallory could compromise the tools that a developer uses, e.g., by
   publishing a useful, but malicious plug-in for an editor, which
   detects certain code patterns, and quietly modifies them to insert
   malicious code.

 - Mallory could compromise the systems that the developers use, and
   modify their source code repositories.

   For instance, if Mallory gets access to a developer's machine, he
   could stealthy modify code before it is signed and committed.  Or,
   he could exfiltrate the developer's signing key, or login
   credentials and imitate her.  Similarly, if a software project uses
   a forge and Mallory is able to compromise the forge, he could
   modify the source code.

 - Mallory could compromise Super Frob or one of its dependencies as
   it is being downloaded.

   For instance, if a package registry like `crates.io` depends on a
   content delivery network (CDN) to distribute packages, a
   compromised node in the CDN may return a modified version of the
   software to the user.

The setting is as follows.  To protect herself from Mallory, Ursula
has to make sure that versions of the software she obtains do not
contain malicious code.  Ursula cannot afford to audit every version
of the software, but she is willing to rely on the maintainers of the
project to not add malicious code, and to review contributions from
third parties.

The framework presented in this specification allows Ursula to audit a
dependency and its developers once, and then to delegate decisions of
what code and dependencies to include to the developers.  Assuming the
developers are reliable, this can protect Ursula from attacks where
Mallory is not explicitly authorized to make a change.  For instance,
if the developers of an abandoned software package do not authorize a
new maintainer, Ursula will be warned when a package has a new
maintainer, as she can no longer authenticate it.  She can then
reaudit it.  Similarly, when the software is modified in transit by a
machine in the middle, Ursula will not be able to authenticate it.
This can also stop dependency confusion attacks, because the software
cannot be authenticated.  It won't however, stop a downgrade attack,
as older versions can still be authenticated.

This framework cannot protect Ursula from mistakes that she or a
developer of the software that she depends on makes.  For instance, if
Mallory is able to convince a developer to authorize a malicious
change to their software, this framework consider the change to be
legitimate.  This framework can facilitate forensic analysis in these
case by making it easier to identify changes approved by the same
person (potentially across different projects) and thereby conduct a
targeted audit.

# Authentication

This framework helps users authenticate three types of artifacts:
commits, tags, and tarballs or other archives.

## Policy

Every commit has an associated policy.  If a commit contains the file
`openpgp-policy.toml` in the root directory, then that file describes
the commit's policy.  If the commit does not contain that file, the
void policy is used.  The void policy rejects everything.

`openpgp-policy.toml` is a TOML v1.0.0 file {{toml}}.  Version 0
defines the following three top-level keys: `version`,
`authorization`, and `commit_goodlist`.

If a parser recognizes the version, but encounters keys that it does
not know, then it must ignore the unknown keys.  This allows a degree
of forwards compatibility.

### version

The value of the `version` key is an integer and must be `0`:

    version = 0

If the value of `version` is not recognized, the implementation SHOULD
error out.  It MAY instead treat the policy as the void policy.

### authorization

`authorization` is a table of authorization entries.

Each key in the `authorization` table is a free-form identifier, which
is chosen by the user of the system.  The identifier SHOULD be a UTF-8
encoded, human-readable string that identifies an entity.  Examples of
identifiers are `alice`, `Bob <bob@example.org>`, `Boty McBotface
<bot@mcbotface.org>`.

The value of each authorization entry is another table.  The table has
the following entries:

 - `keyring`
 - `sign_commit`
 - `sign_tag`
 - `sign_archive`
 - `audit`
 - `add_user`
 - `retire_user`

#### keyring

The value of `keyring` is a string.  It contains one or more OpenPGP
certificates.  The OpenPGP certificates MUST be ASCII-armored.  An
ASCII-armored block MAY contain more than one OpenPGP certificate.
The string MAY contain multiple ASCII-armored blocks.

An implementation SHOULD ignore valid OpenPGP certificates that is
does not support, and MAY emit a warning that a certificate, or
component is not supported.  An implementation SHOULD return an error
if it encounters something other than an OpenPGP certificate encoded
with ASCII armor.

When adding a certificate, an implementation SHOULD only add
components that are needed to validate the signatures.  That is, an
implementation SHOULD strip subkeys that are not signing capable, and
third-party signatures.  For components that are kept, an
implementation SHOULD include all known self signatures, and not just
the newest self signature.

#### sign_commit

The value of `sign_commit` is a boolean.  If `true`, then the entity
is authorized to sign commits.

#### sign_tag

The value of `sign_tag` is a boolean.  If `true`, then the entity is
authorized to sign tags.

#### sign_archive

The value of `sign_archive` is a boolean.  If `true`, then the entity
is authorized to sign tarballs or other archives.

#### audit

The value of `audit` is a boolean.  If `true`, then the entity is
authorized to add commits to the top-level `commit_goodlist` array.

#### add_user

The value of `add_user` is a boolean.  If `true`, then the entity is
authorized to add new entities to the authorization table, to grant
them any capabilities that they have, and to add new certificates to
any entity's keyring.

Note: no special capability is required to extend an existing
certificate.  For instance, an entity that has the `sign_commit`
capability can add new user IDs, new subkeys, and new signatures to
any existing certificate.  Adding new certificates requires the
`add_user` capability, and removing most packets from an existing
certificate requires the `retire_user` capability.

#### retire_user

The value of `retire_user` is a boolean.  If `true`, then the entity
is authorized to retire capabilities from any entity.  This includes
capabilities that they do not have.  The entity is also authorized to
remove certificates, and to strip components and signatures from
existing certificates.

If an entity does not have the `retire_user` capability, it is still
possible for the entity remove some packets.  The following algorithm
determines whether a change is allowed:

  - Ignore marker packets.
  - Ignore third-party certifications.  A third-party certification is a
    signature packet where none of the issuer packets and none of the
    issuer fingerprint packets alias the certificate's fingerprint.
  - Consider all of the remaining non-signature packets to be
    components.
  - Iterate over the packets in the certificate in the parent commit's
    policy in order.  For each signature create a tuple consisting of
    the signature and the preceding component.  Call the set of tuples
    `P`.
  - Repeat the previous step for the version of the certificate in the
    child commit's policy, but call the set of tuples `C`.
  - If `P`, the set of tuples derived from the version of the
    certificate in the parent policy, minus `C`, the set of tuples
    derived from the version of the certificate in the child policy, is
    not empty, then the update requires the `retire_user` right.

Note: This algorithm does not check signatures for cryptographic validity.
This means it is possible to handle signatures that use signature
versions, and cryptographyic algorithms that the implementation does not
support.

Changing a signature's associated component is only allowed if the entity
has the `retire_user` right.

An entity can always add new signatures.

Components are only considered in the context of a signature.  Consider
the following certificate:

  - Primary Key
  - Signature
  - User ID A
  - User ID B
  - Signature

Since the algorithm above would not create any tuples consisting of user
ID `A` and a signature, removing the user ID `A` packet does not require
the `retire_user` right.

#### Example

The following is an example of an authorization entry.  The user has
been granted all the capabilities.  The user is identified by two
different OpenPGP certificates.  The certificates are contained in two
concatenated ASCII armored blocks.

    [authorization."Neal H. Walfield <neal@pep.foundation>"]
    sign_commit = true
    sign_tag = true
    sign_archive = true
    add_user = true
    retire_user = true
    audit = true
    keyring = """
    -----BEGIN PGP PUBLIC KEY BLOCK-----
    Comment: F717 3B3C 7C68 5CD9 ECC4  191B 74E4 45BA 0E15 C957
    Comment: Neal H. Walfield (Code Signing Key) <neal@pep.foundatio
    
    xjMEWhaZ2xYJKwYBBAHaRw8BAQdAinglS6SRXyMb51hMk+mpM4y0Uh0vcGcTyXa+
    ...
    =i3xd
    -----END PGP PUBLIC KEY BLOCK-----
    -----BEGIN PGP PUBLIC KEY BLOCK-----
    Comment: 8F17 7771 18A3 3DDA 9BA4  8E62 AACB 3243 6300 52D9
    Comment: Neal H. Walfield <neal@gnupg.org>
    Comment: Neal H. Walfield <neal@pep-project.org>
    Comment: Neal H. Walfield <neal@pep.foundation>
    Comment: Neal H. Walfield <neal@sequoia-pgp.org>
    Comment: Neal H. Walfield <neal@walfield.org>
    
    xsEhBFUjmukBDqCpmVI7Ve+2xTFSTG+mXMFHml63/Yai2nqxBk9gBfQfRFIjMt74
    =MESu
    -----END PGP PUBLIC KEY BLOCK-----
    """

### commit_goodlist

The value of `commit_goodlist` is an array of strings where each
string contains a commit identifier.  The commit identifier MUST be a
full hash.  The commit identifier MUST NOT be a branch name, a tag
name, or a truncated hash.

Commits listed in the `commit_goodlist` are commits that have
retroactively been marked as valid.  This may be useful when a
certificate's private key material has been compromised.

## Authenticating Commits

Each commit in a `git` repository is part of a directed acyclic graph
(DAG) where a node is a commit, and a directed edge shows how two
commits are related.  Specifically, the head of a directed edge is a
commit that is derived from the tail.  Except for the root commits,
each commit has one or more parents.  A commit that has multiple
parents is derived from multiple commits.  Conceptually, it merges
multiple paths, and as such is called a merge commit.

A commit is consider authenticated if at least one of its parent
commits considers the commit to be authenticated.  This rule is
different from Guix's *authorization invariant* as described in
{{guix}}, which states that all parent commits must consider the
commit to be authenticated.  The semantics described here allow a
developer to add commits from unauthorized third-parties as-is using a
merge commit.  Using Guix's authorization invariant, the third party's
commit would have to be resigned, which loses the third-party's
signature, and consequently complicates forensic analysis.

A commit's parent authenticates it as follows.

First, the implementation looks up the signer's certificate in the
parent commit's policy file.  If the implementation finds a
certificate, it scans the commit's policy file for any updates to that
certificate (and only that certificate) except for revocations.  That
is, the implementation iterates over all of the certificates in the
commit's policy file, and looks for certificates with the same
fingerprint.  If it finds any, it merges them into the original
certificate with the exception of any revocation signatures.  In this
way, it is straightforward for a user to recover if the certificate in
the parent commit's policy file is no longer usable, e.g., because it
has expired, or the signing subkey has been replaced.  Consider a
parent commit whose policy file that contains a certificate that
expires at time `t`.  After `t`, the certificate is unusable; it can't
be used to authenticate any commits made at or after `t`.  This
mechanism allows the user to easily add new commits by extending their
certificate's expiration, and adding the update to a new commit.
Revocation certificates are skipped so that it is possible for a user
to add a commit that revokes their own certificate, or a component
thereof.

The implementation SHOULD then canonicalize the certificate so that
the active self signatures are those that were active when the
signature was made.  A self signature is valid, if it is not revoked,
and not expired.  A self signature is active, if it is the most
recent, valid self signature prior to a reference time.  That is, if a
new commit was made on June 9, 2023, then each component's most recent
signature as of June 9, 2023, which is also not revoked, and not
expired, is considered that component's active self signature.

If the canonicalized certificate is valid as of the signature's time,
not expired as of that time, not soft revoked as of that time, not
hard revoked at any time, and the signature is correct, then the
signature is considered verified.  The implementation MAY consider
certificate updates from other sources.  If it does, it SHOULD only
consider hard revocations.

The implementation MUST then check that the type of change is
authorized by the policy.

The following capabilities allow the specified types of changes:

  - `sign_commit`: Needed for any change.
  - `add_user`: Needed to delegate a capability to another user.
    Updating `keyring` does not require this capability if a
    certificate is only updated, and not added.
  - `retire_user`: Needed to rescind a capability from another user.
  - `audit`: Needed to modify the `version` field, and the
    `commit_goodlist` list.

If the signature is considered verified, and the signer is authorized
to make the type of change that was made, then the commit is
considered authenticated.

If the commit is not considered authenticated, because the signer's
certificate has been hard revoked, but the commit is included in a
later commit's `commit_goodlist`, then the commit is considered to be
authenticated.

A commit is considered to occur later if when authenticating a range
of commits, a commit is a direct descendant of the commit in question,
and it is in the commit range.  Consider the three commits `a`, `b`,
and `c` where `a` is `b`'s parent, `b` is `c`'s parent, the
certificate used to sign `b` has been hard revoked, and `c` includes
`b` in its `commit_goodlist`.  In this case, the hard revocation for
the certificate to use `b` is ignored.  All other criteria including
the fact that the signature on `b` is valid are still checked.

## Authenticating Tags

A tag is a special type of commit in `git`, which has no content, but
assigns a name to a specific commit.  A tag is usually used to mark
release points.

A tag is authenticated in the same way as a commit, as described in
the previous section, with the following exceptions.

First, the tagged commit is considered a parent commit, and the tag is
considered its child commit.

The entity that signed the tag needs the `sign_tag` capability, and
only the `sign_tag` capability.

## Authenticating Archives

Archives like tarballs are often generated as part of a software's
release process.  These may be signed.  To authenticate an archive
with respect to a signature, and a trust root, the trust root's policy
is used to authenticate the tarball's signature.  The entity that
signed the tarball must have the `sign_archive` capability.

Unlike a commit, an archive does not have a pointer to the commit that
it was derived from.  Thus, if an archive is derived from commit `c`,
it may be possible to authenticate commit `c`, as well as tags
referring to commit `c` using a given trust root, but to not
authenticate an archive derived from commit `c` using the same trust
root, because the policy changed in the meantime.

If the signature includes the notation
`commit@notations.sequoia-pgp.org`, then the value of the notation is
interpreted as the commit that the archive is derived from.  The value
of the notation is a hexadecimal value corresponding to the commit's
full hash.  Truncated hashes MUST be considered erroneous.  The commit
identifier MUST NOT be a branch name, a tag name, or a truncated hash.

Since archives are often verified outside of a repository, one or more
repositories may be specified using the
`repository@notations.sequoia-pgp.org` notation.  In that case, each
notation indicates a git repository.  For example, the main repository
of the reference implementation, `sq-git`, is
`https://gitlab.com/sequoia-pgp/sequoia-git.git`.  So, archives SHOULD
include the `repository@notations.sequoia-pgp.org` notation with
`https://gitlab.com/sequoia-pgp/sequoia-git.git` as the value.

When `commit@notations.sequoia-pgp.org` is present in the signature,
the implementation MUST use that commit's policy to authenticate the
archive, and then authenticate that commit by chaining back to the
trust root, as described above; in this case, it MUST NOT use the
trust root's policy directly unless the specified commit is also the
trust root.

# Reference implementation

A Rust implementation of this specification is part of Sequoia.  See
https://gitlab.com/sequoia-pgp/sequoia-git for the source code.

# Security Concerns

## Malicious vs. Buggy Changes

The scheme presented here can help mitigate malicious attacks on a
code base, but it does nothing to prevent design flaws or code errors.
That is, this scheme does not and cannot provide any protections from
normal bugs.

## Trusted Developers

The protections outlined in this document are mainly designed to stop
third-parties from adding malicious code to a project.  This system
provides no protection from a developer who is authorized to make
changes and turns out to be malicious.  That said, because commits are
signed, when malicious code is discovered, an audit is required to
restore trust in the code base.  Using this system, it is easier to
identify other code added by the same person, and focus an audit on
that code.

## Judging Code vs. Judging Humans

The approach described in this document relies on transitive trust.
The basic idea is that if a user is willing to run a developer's code,
then they can reasonably rely on that developer to modify the code,
and to delegate that capability to a third party.

Yet, writing and reviewing code is fundamentally different from
evaluating another person's intents.  This is demonstrated quite well
by the events surrounding the popular `event-stream` npm package,
{{event-stream}}.  In 2018, a new developer gained the trust of the
package's maintainer by contributing a number of high-quality changes.
The original developer eventually made the new developer the
maintainer, and the new maintainer introduced malicious code to steal
user's credentials.

## Operational Security

Signing commits relies on each developer having a long-term identity
key, which they keep safe.  If the key is compromised, the attacker is
able to impersonate the developer.  It is possible to limit the damage
by revoking the compromised key, or having another authorized user
retire the developer's access.

In this regard, sigstore appears to be better as it relies on
ephemeral signing keys, which are issued by a central authority.
However, in order to obtain a signing key, the user needs to log in.
If they use a password, then if an attacker gets access to the
password, an attacker can impersonate the developer.  If the developer
uses a second factor like a hardware token, then they are again using
private key cryptography, and may as well put their private keys on a
hardware token, and forego the centralized infrastructure.

## Dependencies

This specification has concentrated on enabling a user of a software
project to authenticate new versions.  But most software has its own
dependencies, and those also need to be authenticated.  A user could
identify all software that they are willing to rely on, but this is
more work than most users are willing and able to do.  But, just as
developers are usually in a better position to evaluate who should be
allowed to contribute to their project, they are also in a better
position to designate a trust root for their dependencies.

Enabling this functionality requires ecosystem-specific tooling.  The
developer needs to be able to specifying a trust root for each
dependency, and the build infrastructure needs to authenticate the
dependencies.  For instance, the Rust ecosystem uses Cargo for
building and dependency management.  Currently, to add
`sequoia-openpgp` as a dependency to a project, a developer would
modify their `Cargo.toml` file as follows:

    [dependencies]
    sequoia-openpgp = { version = "1" }

Instead, they would also specify a trust root, which they've
presumably audited:

    [dependencies]
    sequoia-openpgp = { version = "1", trust-root = "HASH" }

When downloading the dependency, `cargo` would make sure that the
dependency can be authenticated from the specified trust root, and if
not throw an error.

## Document History

This is a first draft that has not been published.

# Acknowledgments

My thanks go---in particular, but not only---to the Sequoia PGP team
for many fruitful discussions.  Funding for this project was provided
by the Sovereign Tech Fund.