File: what_to_display.md

package info (click to toggle)
apt-listchanges 4.8
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 2,336 kB
  • sloc: python: 3,477; xml: 693; makefile: 167; sh: 71; perl: 61
file content (116 lines) | stat: -rw-r--r-- 5,304 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
<!-- Copyright (C) 2024 THE PACKAGE'S COPYRIGHT HOLDER -->
# Determining which entries the user has already seen

## Historical perspective

Earlier versions of this program used the following approach to
determine which changlog or NEWS entries (hereafter "entries") are new
and should be displayed to the user:

- Group packages by source package.
- Keep track of the highest version number of any of the packages in
  the group and use that as the threshold for identifying new entries.
- Display any entries with version numbers not less than the
  previously determined version number.

This approach was based on two assumptions, neither of which is always
true:

1. Assume that the version numbering for all packages that come from
   the same source package are in the same series.
2. Assume that the version numbering of entries always matches the
   aforementioned version numbering.

For an example of where these assumptions break down, look at the
dmsetup package:

- The source package for dmsetup is lvm2.
- The version number for the dmsetup package is lower, but with a
  higher epoch than, the version number for the lvm2 package.
- The entries in changelog.Debian.gz use lvm2 version numbers, while
  the ones in changelog.Debian.devmapper.gz use dmsetup version
  numbers.

This approach was also limited in that it only looked at
NEWS.Debian[.gz], changelog.Debian[.gz], changlog.Debian.arch[.gz],
and changelog[.gz]. For an example of where this fails, again look at
dmsetup, which has changelog.Debian.devmapper.gz.

Another technique used in earlier versions of this program was to
attempt heuristically to ignore version number suffixes which should
not be considered when evaluating whether a particular entry was new.
The employed heuristics were brittle, potentially leading to missed
entries or entries displayed multiple times.

## Current approach

The current approach continues to use version numbers to assist in
determining which entries to display to users. However, it does so
more cautiously and in a more limited way which dramatically reduces
the likelihood of failing to show the user entries that they should
see.

Specifically:

* Version numbers are only considered when looking at changelog files,
  not NEWS files.
* Version numbers are used independently per package, not grouped by
  source package.
* Version numbers are only used for packages that already have an
  earlier version installed when apt-listchanges is considering which
  changelog entries to display for a new version.
* The program must see both the new version's and the earlier
  version's version number in a changelog before it will decide to
  ignore the rest of the entries in that changelog.

The final requirement above is only half-enforced if the package's
name matches a list of package patterns whose changelog version
numbers are trusted. In that case, we don't require seeing the new
version's package number before ignoring the rest; it's sufficient for
us to see a version number that is semantically less than the new
version. At the this paragraph was written, the only pattern on this
list is `linux-image-*`, which is necessary because the maintainers of
the signed kernel packages do funky things with the version numbers in
their changelog files which prevent the full test from working
properly.

In addition to version numbers, the current approach also uses
checksums of changelog and NEWS entries to determine which entries the
user has already seen and therefore does not need to see again.

For each entry, the program stores two checksums: a checksum of the
entire entry including its header (the line that contains the package
name, version number, suite, and urgency), content, and footer (the line
containing the maintainer and timestamp); and a second checksum of the
content and footer, with the header omitted.

Whenever the program sees an entry whose full checksum matches a
checksum already in the database, it stops parsing the NEWS or
changelog file at that point. Whenever the program sees an entry whose
content/footer checksum matches a checksum already in the database, it
omits that entry from what is displayed to the user but continues
parsing the file to see if any earlier entries should be displayed.

Caveat: none of the logic above applies when `--since`, `--latest`, or
`--show-all` are specified.

The database used by the current approach is significantly larger than
the database required for the historical approach -- a few megabytes
vs. a few kilobytes -- but it is still relatively small, and we
consider this an acceptable amount of space to use for a significantly
better-performing algorithm.

Because this approach uses entry checksums, it could theoretically
able entries from files like changelog.Debian.devmapper that the
historical approach ignored, though that functionality has not yet
been implemented.

### Edge case: no database, or no data for a package in the database

When the persistent database is not being used in a particular
invocation of the program, or when there is no data for a particular
package in the database, then the above approach requires
modification.

In this case, we read and calculate checksums for the same package on
disk to seed the database before we parse the files in the package.