Package: git / 1:2.11.0-3+deb9u4

Metadata

Package Version Patches format
git 1:2.11.0-3+deb9u4 3.0 (quilt)

Patch series

view the series file
Patch File delta Description
0001 pre rebase hook capture documentation in a here docum.diff | (download)

templates/hooks--pre-rebase.sample | 6 3 + 3 - 0 !
1 file changed, 3 insertions(+), 3 deletions(-)

 pre-rebase hook: capture documentation in a <<here document

Without this change, the sample hook does not pass a syntax check
(sh -n):

  $ sh -n hooks--pre-rebase.sample
  hooks--pre-rebase.sample: line 101: syntax error near unexpected token `('
  hooks--pre-rebase.sample: line 101: `   merged into it again (either directly or indirectly).'

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Improved-by: Junio C Hamano <gitster@pobox.com>

Normalize generated asciidoc timestamps with SOURCE_D.diff | (download)

Documentation/Makefile | 7 5 + 2 - 0 !
Documentation/technical/api-index.sh | 4 4 + 0 - 0 !
2 files changed, 9 insertions(+), 2 deletions(-)

 normalize generated asciidoc timestamps with source_date_epoch

This is needed to pass the Debian build reproducibility test
(https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal).

Signed-off-by: Anders Kaseorg <andersk@mit.edu>

git gui Sort entries in optimized tclIndex.diff | (download)

git-gui/Makefile | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 git-gui: sort entries in optimized tclindex
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

auto_mkindex expands wildcards in directory order, which depends on
the underlying filesystem.  To improve build reproducibility, sort the
list of *.tcl files in the Makefile.

The unoptimized loading case was previously fixed in
v2.11.0-rc0~31^2^2~14 “git-gui: sort entries in tclIndex”.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>

xdiff Do not enable XDL_FAST_HASH by default.diff | (download)

Makefile | 1 0 + 1 - 0 !
config.mak.uname | 5 0 + 5 - 0 !
2 files changed, 6 deletions(-)

 xdiff: do not enable xdl_fast_hash by default
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Although XDL_FAST_HASH computes hashes slightly faster on some
architectures, its collision characteristics are much worse, resulting
shell disallow repo names beginning with dash.patch | (download)

shell.c | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 [patch] shell: disallow repo names beginning with dash

When a remote server uses git-shell, the client side will
connect to it like:

  ssh server "git-upload-pack 'foo.git'"

and we literally exec ("git-upload-pack", "foo.git"). In
early versions of upload-pack and receive-pack, we took a
repository argument and nothing else. But over time they
learned to accept dashed options. If the user passes a
repository name that starts with a dash, the results are
confusing at best (we complain of a bogus option instead of
a non-existent repository) and malicious at worst (the user
can start an interactive pager via "--help").

We could pass "--" to the sub-process to make sure the
user's argument is interpreted as a branch name. I.e.:

  git-upload-pack -- -foo.git

But adding "--" automatically would make us inconsistent
with a normal shell (i.e., when git-shell is not in use),
where "-foo.git" would still be an error. For that case, the
client would have to specify the "--", but they can't do so
reliably, as existing versions of git-shell do not allow
more than a single argument.

The simplest thing is to simply disallow "-" at the start of
the repo name argument. This hasn't worked either with or
without git-shell since version 1.0.0, and nobody has
complained.

Note that this patch just applies to do_generic_cmd(), which
runs upload-pack, receive-pack, and upload-archive. There
are two other types of commands that git-shell runs:

  - do_cvs_cmd(), but this already restricts the argument to
    be the literal string "server"

  - admin-provided commands in the git-shell-commands
    directory. We'll pass along arbitrary arguments there,
    so these commands could have similar problems. But these
    commands might actually understand dashed arguments, so
    we cannot just block them here. It's up to the writer of
    the commands to make sure they are safe. With great
    power comes great responsibility.

Reported-by: Timo Schmid <tschmid@ernw.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

connect reject ssh hostname that begins with a dash.diff | (download)

connect.c | 3 3 + 0 - 0 !
1 file changed, 3 insertions(+)

 connect: reject ssh hostname that begins with a dash

When commands like "git fetch" talk with ssh://$rest_of_URL/, the
code splits $rest_of_URL into components like host, port, etc., and
then spawns the underlying "ssh" program by formulating argv[] array
that has:

 - the path to ssh command taken from GIT_SSH_COMMAND, etc.

 - dashed options like '-batch' (for Tortoise), '-p <port>' as
   needed.

 - ssh_host, which is supposed to be the hostname parsed out of
   $rest_of_URL.

 - then the command to be run on the other side, e.g. git
   upload-pack.

If the ssh_host ends up getting '-<anything>', the argv[] that is
used to spawn the command becomes something like:

    { "ssh", "-p", "22", "-<anything>", "command", "to", "run", NULL }

which obviously is bogus, but depending on the actual value of
"<anything>", will make "ssh" parse and use it as an option.

Prevent this by forbidding ssh_host that begins with a "-".

Noticed-by: Joern Schneeweisz of Recurity Labs
Reported-by: Brian at GitLab
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t5813 add test for hostname starting with dash.diff | (download)

t/t5813-proto-disable-ssh.sh | 9 9 + 0 - 0 !
1 file changed, 9 insertions(+)

 t5813: add test for hostname starting with dash

Per the explanation in the previous patch, this should be
(and is) rejected.

Signed-off-by: Jeff King <peff@peff.net>
connect factor out looks like command line option che.diff | (download)

cache.h | 8 8 + 0 - 0 !
connect.c | 2 1 + 1 - 0 !
path.c | 5 5 + 0 - 0 !
3 files changed, 14 insertions(+), 1 deletion(-)

 connect: factor out "looks like command line option" check

We reject hostnames that start with a dash because they may
be confused for command-line options. Let's factor out that
notion into a helper function, as we'll use it in more
places. And while it's simple now, it's not clear if some
systems might need more complex logic to handle all cases.

Signed-off-by: Jeff King <peff@peff.net>
connect reject dashed arguments for proxy commands.diff | (download)

connect.c | 5 5 + 0 - 0 !
t/t5532-fetch-proxy.sh | 5 5 + 0 - 0 !
2 files changed, 10 insertions(+)

 connect: reject dashed arguments for proxy commands

If you have a GIT_PROXY_COMMAND configured, we will run it
with the host/port on the command-line. If a URL contains a
mischievous host like "--foo", we don't know how the proxy
command may handle it. It's likely to break, but it may also
do something dangerous and unwanted (technically it could
even do something useful, but that seems unlikely).

We should err on the side of caution and reject this before
we even run the command.

The hostname check matches the one we do in a similar
circumstance for ssh. The port check is not present for ssh,
but there it's not necessary because the syntax is "-p
<port>", and there's no ambiguity on the parsing side.

It's not clear whether you can actually get a negative port
to the proxy here or not. Doing:

  git fetch git://remote:-1234/repo.git

keeps the "-1234" as part of the hostname, with the default
port of 9418. But it's a good idea to keep this check close
to the point of running the command to make it clear that
there's no way to circumvent it (and at worst it serves as a
belt-and-suspenders check).

Signed-off-by: Jeff King <peff@peff.net>
connect reject paths that look like command line opti.diff | (download)

connect.c | 3 3 + 0 - 0 !
t/t5810-proto-disable-local.sh | 23 23 + 0 - 0 !
t/t5813-proto-disable-ssh.sh | 14 14 + 0 - 0 !
3 files changed, 40 insertions(+)

 connect: reject paths that look like command line options

If we get a repo path like "-repo.git", we may try to invoke
"git-upload-pack -repo.git". This is going to fail, since
upload-pack will interpret it as a set of bogus options. But
let's reject this before we even run the sub-program, since
we would not want to allow any mischief with repo names that
actually are real command-line options.

You can still ask for such a path via git-daemon, but there's no
security problem there, because git-daemon enters the repo itself
and then passes "."  on the command line.

Signed-off-by: Jeff King <peff@peff.net>
cvsserver move safe_pipe_capture to the main package.diff | (download)

git-cvsserver.perl | 47 22 + 25 - 0 !
1 file changed, 22 insertions(+), 25 deletions(-)

 cvsserver: move safe_pipe_capture() to the main package

As a preparation for replacing `command` with a call to this
function from outside GITCVS::updater package, move it to the main
package.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

cvsserver use safe_pipe_capture instead of backticks.diff | (download)

git-cvsserver.perl | 22 11 + 11 - 0 !
1 file changed, 11 insertions(+), 11 deletions(-)

 cvsserver: use safe_pipe_capture instead of backticks

This makes the script pass arguments that are derived from end-user
input in safer way when invoking subcommands.

Reported-by: joernchen <joernchen@phenoelit.de>
Signed-off-by: joernchen <joernchen@phenoelit.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cvsserver use safe_pipe_capture for constant commands.diff | (download)

git-cvsserver.perl | 8 4 + 4 - 0 !
1 file changed, 4 insertions(+), 4 deletions(-)

 cvsserver: use safe_pipe_capture for `constant commands` as well

This is not strictly necessary, but it is a good code hygiene.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

shell drop git cvsserver support by default.diff | (download)

Documentation/git-shell.txt | 16 16 + 0 - 0 !
shell.c | 14 0 + 14 - 0 !
t/t9400-git-cvsserver-server.sh | 48 48 + 0 - 0 !
3 files changed, 64 insertions(+), 14 deletions(-)

 shell: drop git-cvsserver support by default

The git-cvsserver script is old and largely unmaintained
these days. But git-shell allows untrusted users to run it
out of the box, significantly increasing its attack surface.

Let's drop it from git-shell's list of internal handlers so
that it cannot be run by default.  This is not backwards
compatible. But given the age and development activity on
CVS-related parts of Git, this is likely to impact very few
users, while helping many more (i.e., anybody who runs
git-shell and had no intention of supporting CVS).

There's no configuration mechanism in git-shell for us to
add a boolean and flip it to "off". But there is a mechanism
for adding custom commands, and adding CVS support here is
fairly trivial. Let's document it to give guidance to
anybody who really is still running cvsserver.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

archimport use safe_pipe_capture for user input.diff | (download)

git-archimport.perl | 4 2 + 2 - 0 !
1 file changed, 2 insertions(+), 2 deletions(-)

 archimport: use safe_pipe_capture for user input

Refnames can contain shell metacharacters which need to be
passed verbatim to sub-processes. Using safe_pipe_capture
skips the shell entirely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cvsimport shell quote variable used in backticks.diff | (download)

git-cvsimport.perl | 1 1 + 0 - 0 !
1 file changed, 1 insertion(+)

 cvsimport: shell-quote variable used in backticks

We run `git rev-parse` though the shell, and quote its
argument only with single-quotes. This prevents most
metacharacters from being a problem, but misses the obvious
case when $name itself has single-quotes in it. We can fix
this by applying the usual shell-quoting formula.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule config verify submodule names as paths.diff | (download)

builtin/submodule--helper.c | 26 25 + 1 - 0 !
git-submodule.sh | 5 5 + 0 - 0 !
submodule-config.c | 31 31 + 0 - 0 !
submodule-config.h | 7 7 + 0 - 0 !
t/t7415-submodule-names.sh | 77 77 + 0 - 0 !
5 files changed, 145 insertions(+), 1 deletion(-)

 submodule-config: verify submodule names as paths

commit 0383bbb9015898cbc79abd7b64316484d7713b44 upstream.

Submodule "names" come from the untrusted .gitmodules file,
but we blindly append them to $GIT_DIR/modules to create our
on-disk repo paths. This means you can do bad things by
putting "../" into the name (among other things).

Let's sanity-check these names to avoid building a path that
can be exploited. There are two main decisions:

  1. What should the allowed syntax be?

     It's tempting to reuse verify_path(), since submodule
     names typically come from in-repo paths. But there are
     two reasons not to:

       a. It's technically more strict than what we need, as
          we really care only about breaking out of the
          $GIT_DIR/modules/ hierarchy.  E.g., having a
          submodule named "foo/.git" isn't actually
          dangerous, and it's possible that somebody has
          manually given such a funny name.

       b. Since we'll eventually use this checking logic in
          fsck to prevent downstream repositories, it should
          be consistent across platforms. Because
          verify_path() relies on is_dir_sep(), it wouldn't
          block "foo\..\bar" on a non-Windows machine.

  2. Where should we enforce it? These days most of the
     .gitmodules reads go through submodule-config.c, so
     I've put it there in the reading step. That should
     cover all of the C code.

     We also construct the name for "git submodule add"
     inside the git-submodule.sh script. This is probably
     not a big deal for security since the name is coming
     from the user anyway, but it would be polite to remind
     them if the name they pick is invalid (and we need to
     expose the name-checker to the shell anyway for our
     test scripts).

     This patch issues a warning when reading .gitmodules
     and just ignores the related config entry completely.
     This will generally end up producing a sensible error,
     as it works the same as a .gitmodules file which is
     missing a submodule entry (so "submodule update" will
     barf, but "git clone --recurse-submodules" will print
     an error but not abort the clone.

     There is one minor oddity, which is that we print the
     warning once per malformed config key (since that's how
     the config subsystem gives us the entries). So in the
     new test, for example, the user would see three
     warnings. That's OK, since the intent is that this case
     should never come up outside of malicious repositories
     (and then it might even benefit the user to see the
     message multiple times).

Credit for finding this vulnerability and the proof of
concept from which the test script was adapted goes to
Etienne Stalmans.

[jn: the original patch expects 'git clone' to succeed in
 the test because v2.13.0-rc0~10^2~3 (clone: teach
 --recurse-submodules to optionally take a pathspec,
 2017-03-17) makes 'git clone' skip invalid submodules.
 Updated the test to pass in older Git versions where the
 submodule name check makes 'git clone' fail.]

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

is_ntfs_dotgit use a size_t for traversing string.diff | (download)

path.c | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 is_ntfs_dotgit: use a size_t for traversing string

commit 11a9f4d807a0d71dc6eff51bb87baf4ca2cccf1d upstream.

We walk through the "name" string using an int, which can
wrap to a negative value and cause us to read random memory
before our array (e.g., by creating a tree with a name >2GB,
since "int" is still 32 bits even on most 64-bit platforms).
Worse, this is easy to trigger during the fsck_tree() check,
which is supposed to be protecting us from malicious
garbage.

Note one bit of trickiness in the existing code: we
sometimes assign -1 to "len" at the end of the loop, and
then rely on the "len++" in the for-loop's increment to take
it back to 0. This is still legal with a size_t, since
assigning -1 will turn into SIZE_MAX, which then wraps
around to 0 on increment.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

is_hfs_dotgit match other .git files.diff | (download)

utf8.c | 58 46 + 12 - 0 !
utf8.h | 5 5 + 0 - 0 !
2 files changed, 51 insertions(+), 12 deletions(-)

 is_hfs_dotgit: match other .git files

commit 0fc333ba20b43a8afee5023e92cb3384ff4e59a6 upstream.

Both verify_path() and fsck match ".git", ".GIT", and other
variants specific to HFS+. Let's allow matching other
special files like ".gitmodules", which we'll later use to
enforce extra restrictions via verify_path() and fsck.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

is_ntfs_dotgit match other .git files.diff | (download)

cache.h | 10 9 + 1 - 0 !
path.c | 84 84 + 0 - 0 !
2 files changed, 93 insertions(+), 1 deletion(-)

 is_ntfs_dotgit: match other .git files

commit e7cb0b4455c85b53aeba40f88ffddcf6d4002498 upstream.

When we started to catch NTFS short names that clash with .git, we only
looked for GIT~1. This is sufficient because we only ever clone into an
empty directory, so .git is guaranteed to be the first subdirectory or
file in that directory.

However, even with a fresh clone, .gitmodules is *not* necessarily the
first file to be written that would want the NTFS short name GITMOD~1: a
malicious repository can add .gitmodul0000 and friends, which sorts
before `.gitmodules` and is therefore checked out *first*. For that
reason, we have to test not only for ~1 short names, but for others,
too.

It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
Windows 2000 (i.e., in all Windows versions still supported by Git),
NTFS short names are only generated in the <prefix>~<number> form up to
is_ hfs ntfs _dotgitmodules add tests.diff | (download)

t/helper/test-path-utils.c | 20 20 + 0 - 0 !
t/t0060-path-utils.sh | 86 86 + 0 - 0 !
2 files changed, 106 insertions(+)

 is_{hfs,ntfs}_dotgitmodules: add tests

commit dc2d9ba3187fcd0ca8eeab9aa9ddef70cf8627a6 upstream.

This tests primarily for NTFS issues, but also adds one example of an
HFS+ issue.

Thanks go to Congyi Wu for coming up with the list of examples where
NTFS would possibly equate the filename with `.gitmodules`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

skip_prefix add case insensitive variant.diff | (download)

git-compat-util.h | 17 17 + 0 - 0 !
1 file changed, 17 insertions(+)

 skip_prefix: add case-insensitive variant

commit 41a80924aec0e94309786837b6f954a3b3f19b71 upstream.

We have the convenient skip_prefix() helper, but if you want
to do case-insensitive matching, you're stuck doing it by
hand. We could add an extra parameter to the function to
let callers ask for this, but the function is small and
somewhat performance-critical. Let's just re-implement it
for the case-insensitive version.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

verify_path drop clever fallthrough.diff | (download)

read-cache.c | 8 4 + 4 - 0 !
1 file changed, 4 insertions(+), 4 deletions(-)

 verify_path: drop clever fallthrough

commit e19e5e66d691bdeeeb5e0ed2ffcecdd7666b0d7b upstream.

We check ".git" and ".." in the same switch statement, and
fall through the cases to share the end-of-component check.
While this saves us a line or two, it makes modifying the
function much harder. Let's just write it out.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

verify_dotfile mention case insensitivity in comment.diff | (download)

read-cache.c | 5 4 + 1 - 0 !
1 file changed, 4 insertions(+), 1 deletion(-)

 verify_dotfile: mention case-insensitivity in comment

commit 641084b618ddbe099f0992161988c3e479ae848b upstream.

We're more restrictive than we need to be in matching ".GIT"
on case-sensitive filesystems; let's make a note that this
is intentional.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

update index stat updated files earlier.diff | (download)

builtin/update-index.c | 25 17 + 8 - 0 !
1 file changed, 17 insertions(+), 8 deletions(-)

 update-index: stat updated files earlier

commit eb12dd0c764d2b71bebd5ffffb7379a3835253ae upstream.

In the update_one(), we check verify_path() on the proposed
path before doing anything else. In preparation for having
verify_path() look at the file mode, let's stat the file
earlier, so we can check the mode accurately.

This is made a bit trickier by the fact that this function
only does an lstat in a few code paths (the ones that flow
down through process_path()). So we can speculatively do the
lstat() here and pass the results down, and just use a dummy
mode for cases where we won't actually be updating the index
from the filesystem.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

verify_path disallow symlinks in .gitmodules.diff | (download)

apply.c | 4 2 + 2 - 0 !
builtin/update-index.c | 6 3 + 3 - 0 !
cache.h | 2 1 + 1 - 0 !
read-cache.c | 40 31 + 9 - 0 !
4 files changed, 37 insertions(+), 15 deletions(-)

 verify_path: disallow symlinks in .gitmodules

commit 10ecfa76491e4923988337b2e2243b05376b40de upstream.

There are a few reasons it's not a good idea to make
.gitmodules a symlink, including:

  1. It won't be portable to systems without symlinks.

  2. It may behave inconsistently, since Git may look at
     this file in the index or a tree without bothering to
     resolve any symbolic links. We don't do this _yet_, but
     the config infrastructure is there and it's planned for
     the future.

With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow it:

  a. A symlinked .gitmodules file may circumvent any fsck
     checks of the content.

  b. Git may read and write from the on-disk file without
     sanity checking the symlink target. So for example, if
     you link ".gitmodules" to "../oops" and run "git
     submodule add", we'll write to the file "oops" outside
     the repository.

Again, both of those are problems that _could_ be solved
with sufficient code, but given the complications in (1) and
(2), we're better off just outlawing it explicitly.

Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

sha1_file add read_loose_object function.diff | (download)

cache.h | 13 13 + 0 - 0 !
sha1_file.c | 132 130 + 2 - 0 !
2 files changed, 143 insertions(+), 2 deletions(-)

 sha1_file: add read_loose_object() function

commit f6371f9210418f1beabc85b097e2a3470aeeb54d upstream.

It's surprisingly hard to ask the sha1_file code to open a
_specific_ incarnation of a loose object. Most of the
functions take a sha1, and loop over the various object
types (packed versus loose) and locations (local versus
alternates) at a low level.

However, some tools like fsck need to look at a specific
file. This patch gives them a function they can use to open
the loose object at a given path.

The implementation unfortunately ends up repeating bits of
related functions, but there's not a good way around it
without some major refactoring of the whole sha1_file stack.
We need to mmap the specific file, then partially read the
zlib stream to know whether we're streaming or not, and then
finally either stream it or copy the data to a buffer.

We can do that by assembling some of the more arcane
internal sha1_file functions, but we end up having to
essentially reimplement unpack_sha1_file(), along with the
streaming bits of check_sha1_signature().

Still, most of the ugliness is contained in the new
function, and the interface is clean enough that it may be
reusable (though it seems unlikely anything but git-fsck
would care about opening a specific file).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck parse loose object paths directly.diff | (download)

builtin/fsck.c | 46 33 + 13 - 0 !
t/t1450-fsck.sh | 16 16 + 0 - 0 !
2 files changed, 49 insertions(+), 13 deletions(-)

 fsck: parse loose object paths directly

commit c68b489e56431cf27f7719913ab09ddc62f95912 upstream.

When we iterate over the list of loose objects to check, we
get the actual path of each object. But we then throw it
away and pass just the sha1 to fsck_sha1(), which will do a
fresh lookup. Usually it would find the same object, but it
may not if an object exists both as a loose and a packed
object. We may end up checking the packed object twice, and
never look at the loose one.

In practice this isn't too terrible, because if fsck doesn't
complain, it means you have at least one good copy. But
since the point of fsck is to look for corruption, we should
be thorough.

The new read_loose_object() interface can help us get the
data from disk, and then we replace parse_object() with
parse_object_buffer(). As a bonus, our error messages now
mention the path to a corrupted object, which should make it
easier to track down errors when they do happen.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

index pack make fsck error message more specific.diff | (download)

builtin/index-pack.c | 2 1 + 1 - 0 !
builtin/unpack-objects.c | 2 1 + 1 - 0 !
2 files changed, 2 insertions(+), 2 deletions(-)

 index-pack: make fsck error message more specific

commit db5a58c1bda5b20169b9958af1e8b05ddd178b01 upstream.

If fsck reports an error, we say only "Error in object".
This isn't quite as bad as it might seem, since the fsck
code would have dumped some errors to stderr already. But it
might help to give a little more context. The earlier output
would not have even mentioned "fsck", and that may be a clue
that the "fsck.*" or "*.fsckObjects" config may be relevant.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck simplify .git check.diff | (download)

fsck.c | 4 1 + 3 - 0 !
1 file changed, 1 insertion(+), 3 deletions(-)

 fsck: simplify ".git" check

commit ed9c3220621d634d543bc4dd998d12167dfc57d4 upstream.

There's no need for us to manually check for ".git"; it's a
subset of the other filesystem-specific tests. Dropping it
makes our code slightly shorter. More importantly, the
existing code may make a reader wonder why ".GIT" is not
covered here, and whether that is a bug (it isn't, as it's
also covered in the filesystem-specific tests).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck actually fsck blob data.diff | (download)

builtin/fsck.c | 51 25 + 26 - 0 !
fsck.c | 8 7 + 1 - 0 !
sha1_file.c | 2 1 + 1 - 0 !
3 files changed, 33 insertions(+), 28 deletions(-)

 fsck: actually fsck blob data

commit 7ac4f3a007e2567f9d2492806186aa063f9a08d6 upstream.

Because fscking a blob has always been a noop, we didn't
bother passing around the blob data. In preparation for
content-level checks, let's fix up a few things:

  1. The fsck_object() function just returns success for any
     blob. Let's a noop fsck_blob(), which we can fill in
     with actual logic later.

  2. The fsck_loose() function in builtin/fsck.c
     just threw away blob content after loading it. Let's
     hold onto it until after we've called fsck_object().

     The easiest way to do this is to just drop the
     parse_loose_object() helper entirely. Incidentally,
     this also fixes a memory leak: if we successfully
     loaded the object data but did not parse it, we would
     have left the function without freeing it.

  3. When fsck_loose() loads the object data, it
     does so with a custom read_loose_object() helper. This
     function streams any blobs, regardless of size, under
     the assumption that we're only checking the sha1.

     Instead, let's actually load blobs smaller than
     big_file_threshold, as the normal object-reading
     code-paths would do. This lets us fsck small files, and
     a NULL return is an indication that the blob was so big
     that it needed to be streamed, and we can pass that
     information along to fsck_blob().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck detect gitmodules files.diff | (download)

fsck.c | 90 90 + 0 - 0 !
fsck.h | 7 7 + 0 - 0 !
2 files changed, 97 insertions(+)

 fsck: detect gitmodules files

commit 159e7b080bfa5d34559467cacaa79df89a01afc0 upstream.

In preparation for performing fsck checks on .gitmodules
files, this commit plumbs in the actual detection of the
files. Note that unlike most other fsck checks, this cannot
be a property of a single object: we must know that the
object is found at a ".gitmodules" path at the root tree of
a commit.

Since the fsck code only sees one object at a time, we have
to mark the related objects to fit the puzzle together. When
we see a commit we mark its tree as a root tree, and when
we see a root tree with a .gitmodules file, we mark the
corresponding blob to be checked.

In an ideal world, we'd check the objects in topological
order: commits followed by trees followed by blobs. In that
case we can avoid ever loading an object twice, since all
markings would be complete by the time we get to the marked
objects. And indeed, if we are checking a single packfile,
this is the order in which Git will generally write the
objects. But we can't count on that:

  1. git-fsck may show us the objects in arbitrary order
     (loose objects are fed in sha1 order, but we may also
     have multiple packs, and we process each pack fully in
     sequence).

  2. The type ordering is just what git-pack-objects happens
     to write now. The pack format does not require a
     specific order, and it's possible that future versions
     of Git (or a custom version trying to fool official
fsck check .gitmodules content.diff | (download)

fsck.c | 59 58 + 1 - 0 !
1 file changed, 58 insertions(+), 1 deletion(-)

 fsck: check .gitmodules content

commit ed8b10f631c9a71df3351d46187bf7f3fa4f9b7e upstream.

This patch detects and blocks submodule names which do not
match the policy set forth in submodule-config. These should
already be caught by the submodule code itself, but putting
the check here means that newer versions of Git can protect
older ones from malicious entries (e.g., a server with
receive.fsckObjects will block the objects, protecting
clients which fetch from it).

As a side effect, this means fsck will also complain about
.gitmodules files that cannot be parsed (or were larger than
core.bigFileThreshold).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck call fsck_finish after fscking objects.diff | (download)

builtin/fsck.c | 3 3 + 0 - 0 !
t/t7415-submodule-names.sh | 4 4 + 0 - 0 !
2 files changed, 7 insertions(+)

 fsck: call fsck_finish() after fscking objects

commit 1995b5e03e1cc97116be58cdc0502d4a23547856 upstream.

Now that the internal fsck code is capable of checking
.gitmodules files, we just need to teach its callers to use
the "finish" function to check any queued objects.

With this, we can now catch the malicious case in t7415 with
git-fsck.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

unpack objects call fsck_finish after fscking objects.diff | (download)

builtin/unpack-objects.c | 5 4 + 1 - 0 !
t/t7415-submodule-names.sh | 7 7 + 0 - 0 !
2 files changed, 11 insertions(+), 1 deletion(-)

 unpack-objects: call fsck_finish() after fscking objects

commit 6e328d6caef218db320978e3e251009135d87d0e upstream.

As with the previous commit, we must call fsck's "finish"
function in order to catch any queued objects for
.gitmodules checks.

This second pass will be able to access any incoming
objects, because we will have exploded them to loose objects
by now.

This isn't quite ideal, because it means that bad objects
may have been written to the object database (and a
subsequent operation could then reference them, even if the
other side doesn't send the objects again). However, this is
sufficient when used with receive.fsckObjects, since those
loose objects will all be placed in a temporary quarantine
area that will get wiped if we find any problems.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

index pack check .gitmodules files with strict.diff | (download)

builtin/index-pack.c | 10 10 + 0 - 0 !
t/lib-pack.sh | 12 12 + 0 - 0 !
t/t7415-submodule-names.sh | 38 38 + 0 - 0 !
3 files changed, 60 insertions(+)

 index-pack: check .gitmodules files with --strict

commit 73c3f0f704a91b6792e0199a3f3ab6e3a1971675 upstream.

Now that the internal fsck code has all of the plumbing we
need, we can start checking incoming .gitmodules files.
Naively, it seems like we would just need to add a call to
fsck_finish() after we've processed all of the objects. And
that would be enough to cover the initial test included
here. But there are two extra bits:

  1. We currently don't bother calling fsck_object() at all
     for blobs, since it has traditionally been a noop. We'd
     actually catch these blobs in fsck_finish() at the end,
     but it's more efficient to check them when we already
     have the object loaded in memory.

  2. The second pass done by fsck_finish() needs to access
     the objects, but we're actually indexing the pack in
     this process. In theory we could give the fsck code a
     special callback for accessing the in-pack data, but
     it's actually quite tricky:

       a. We don't have an internal efficient index mapping
	  oids to packfile offsets. We only generate it on
	  the fly as part of writing out the .idx file.

       b. We'd still have to reconstruct deltas, which means
          we'd basically have to replicate all of the
	  reading logic in packfile.c.

     Instead, let's avoid running fsck_finish() until after
     we've written out the .idx file, and then just add it
     to our internal packed_git list.

     This does mean that the objects are "in the repository"
     before we finish our fsck checks. But unpack-objects
     already exhibits this same behavior, and it's an
     acceptable tradeoff here for the same reason: the
     quarantine mechanism means that pushes will be
     fully protected.

In addition to a basic push test in t7415, we add a sneaky
pack that reverses the usual object order in the pack,
requiring that index-pack access the tree and blob during
the "finish" step.

This already works for unpack-objects (since it will have
written out loose objects), but we'll check it with this
sneaky pack for good measure.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck complain when .gitmodules is a symlink.diff | (download)

fsck.c | 11 9 + 2 - 0 !
t/t7415-submodule-names.sh | 29 29 + 0 - 0 !
2 files changed, 38 insertions(+), 2 deletions(-)

 fsck: complain when .gitmodules is a symlink

commit b7b1fca175f1ed7933f361028c631b9ac86d868d upstream.

We've recently forbidden .gitmodules to be a symlink in
verify_path(). And it's an easy way to circumvent our fsck
checks for .gitmodules content. So let's complain when we
see it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

submodule_init die cleanly on submodules without url .diff | (download)

builtin/submodule--helper.c | 6 3 + 3 - 0 !
t/t7400-submodule-basic.sh | 8 8 + 0 - 0 !
2 files changed, 11 insertions(+), 3 deletions(-)

 submodule_init: die cleanly on submodules without url defined

commit 627fde102515a7807dba89acaa88cb053b38a44a upstream.

When we init a submodule, we try to die when it has no URL
defined:

  url = xstrdup(sub->url);
  if (!url)
	  die(...);

But that's clearly nonsense. xstrdup() will never return
NULL, and if sub->url is NULL, we'll segfault.

These two bits of code need to be flipped, so we check
sub->url before looking at it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

submodule helper use to signal end of clone options.diff | (download)

builtin/submodule--helper.c | 1 1 + 0 - 0 !
1 file changed, 1 insertion(+)

 submodule--helper: use "--" to signal end of clone options

commit 98afac7a7cefdca0d2c4917dd8066a59f7088265 upstream.

When we clone a submodule, we call "git clone $url $path".
But there's nothing to say that those components can't begin
with a dash themselves, confusing git-clone into thinking
they're options. Let's pass "--" to make it clear what we
expect.

There's no test here, because it's actually quite hard to
make these names work, even with "git clone" parsing them
correctly. And we're going to restrict these cases even
further in future commits. So we'll leave off testing until
then; this is just the minimal fix to prevent us from doing
something stupid with a badly formed entry.

Reported-by: joernchen <joernchen@phenoelit.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

submodule config ban submodule urls that start with d.diff | (download)

submodule-config.c | 8 8 + 0 - 0 !
t/t7416-submodule-dash-url.sh | 34 34 + 0 - 0 !
2 files changed, 42 insertions(+)

 submodule-config: ban submodule urls that start with dash

commit f6adec4e329ef0e25e14c63b735a5956dc67b8bc upstream.

The previous commit taught the submodule code to invoke our
"git clone $url $path" with a "--" separator so that we
aren't confused by urls or paths that start with dashes.

However, that's just one code path. It's not clear if there
are others, and it would be an easy mistake to add one in
the future. Moreover, even with the fix in the previous
commit, it's quite hard to actually do anything useful with
such an entry. Any url starting with a dash must fall into
one of three categories:

 - it's meant as a file url, like "-path". But then any
   clone is not going to have the matching path, since it's
   by definition relative inside the newly created clone. If
   you spell it as "./-path", the submodule code sees the
   "/" and translates this to an absolute path, so it at
   least works (assuming the receiver has the same
   filesystem layout as you). But that trick does not apply
   for a bare "-path".

 - it's meant as an ssh url, like "-host:path". But this
   already doesn't work, as we explicitly disallow ssh
   hostnames that begin with a dash (to avoid option
   injection against ssh).

 - it's a remote-helper scheme, like "-scheme::data". This
   _could_ work if the receiver bends over backwards and
   creates a funny-named helper like "git-remote--scheme".
   But normally there would not be any helper that matches.

Since such a url does not work today and is not likely to do
anything useful in the future, let's simply disallow them
entirely. That protects the existing "git clone" path (in a
belt-and-suspenders way), along with any others that might
exist.

Our tests cover two cases:

  1. A file url with "./" continues to work, showing that
     there's an escape hatch for people with truly silly
     repo names.

  2. A url starting with "-" is rejected.

Note that we expect case (2) to fail, but it would have done
so even without this commit, for the reasons given above.
So instead of just expecting failure, let's also check for
the magic word "ignoring" on stderr. That lets us know that
we failed for the right reason.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

submodule config ban submodule paths that start with .diff | (download)

submodule-config.c | 2 2 + 0 - 0 !
t/t7417-submodule-path-url.sh | 20 20 + 0 - 0 !
2 files changed, 22 insertions(+)

 submodule-config: ban submodule paths that start with a dash

commit 273c61496f88c6495b886acb1041fe57965151da upstream.

We recently banned submodule urls that look like
command-line options. This is the matching change to ban
leading-dash paths.

As with the urls, this should not break any use cases that
currently work. Even with our "--" separator passed to
git-clone, git-submodule.sh gets confused. Without the code
portion of this patch, the clone of "-sub" added in t7417
would yield results like:

    /path/to/git-submodule: 410: cd: Illegal option -s
    /path/to/git-submodule: 417: cd: Illegal option -s
    /path/to/git-submodule: 410: cd: Illegal option -s
    /path/to/git-submodule: 417: cd: Illegal option -s
    Fetched in submodule path '-sub', but it did not contain b56243f8f4eb91b2f1f8109452e659f14dd3fbe4. Direct fetching of that commit failed.

Moreover, naively adding such a submodule doesn't work:

  $ git submodule add $url -sub
  The following path is ignored by one of your .gitignore files:
  -sub

even though there is no such ignore pattern (the test script
hacks around this with a well-placed "git mv").

Unlike leading-dash urls, though, it's possible that such a
path _could_ be useful if we eventually made it work. So
this commit should be seen not as recommending a particular
policy, but rather temporarily closing off a broken and
possibly dangerous code-path. We may revisit this decision
later.

fsck detect submodule urls starting with dash.diff | (download)

fsck.c | 7 7 + 0 - 0 !
t/t7416-submodule-dash-url.sh | 15 15 + 0 - 0 !
2 files changed, 22 insertions(+)

 fsck: detect submodule urls starting with dash

commit a124133e1e6ab5c7a9fef6d0e6bcb084e3455b46 upstream.

Urls with leading dashes can cause mischief on older
versions of Git. We should detect them so that they can be
rejected by receive.fsckObjects, preventing modern versions
of git from being a vector by which attacks can spread.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck detect submodule paths starting with dash.diff | (download)

fsck.c | 7 7 + 0 - 0 !
t/t7417-submodule-path-url.sh | 8 8 + 0 - 0 !
2 files changed, 15 insertions(+)

 fsck: detect submodule paths starting with dash

commit 1a7fd1fb2998002da6e9ff2ee46e1bdd25ee8404 upstream.

As with urls, submodule paths with dashes are ignored by
git, but may end up confusing older versions. Detecting them
via fsck lets us prevent modern versions of git from being a
vector to spread broken .gitmodules to older versions.

Compared to blocking leading-dash urls, though, this
detection may be less of a good idea:

  1. While such paths provide confusing and broken results,
     they don't seem to actually work as option injections
     against anything except "cd". In particular, the
     submodule code seems to canonicalize to an absolute
     path before running "git clone" (so it passes
     /your/clone/-sub).

  2. It's more likely that we may one day make such names
     actually work correctly. Even after we revert this fsck
     check, it will continue to be a hassle until hosting
     servers are all updated.

On the other hand, it's not entirely clear that the behavior
in older versions is safe. And if we do want to eventually
allow this, we may end up doing so with a special syntax
anyway (e.g., writing "./-sub" in the .gitmodules file, and
teaching the submodule code to canonicalize it when
comparing).

So on balance, this is probably a good protection.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

cvsimport apply shell quoting regex globally.diff | (download)

git-cvsimport.perl | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 cvsimport: apply shell-quoting regex globally

commit 8c87bdfb2137c9e9e945df13e2f2e1eb995ddf83 upstream.

Commit 5b4efea666 (cvsimport: shell-quote variable used in
backticks, 2017-09-11) tried to shell-quote a variable, but
forgot to use the "/g" modifier to apply the quoting to the
whole variable. This means we'd miss any embedded
single-quotes after the first one.

Reported-by: <littlelailo@yahoo.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>