Package: git / 1:2.11.0-3+deb9u7

Metadata

Package Version Patches format
git 1:2.11.0-3+deb9u7 3.0 (quilt)

Patch series

view the series file
Patch File delta Description
0001 pre rebase hook capture documentation in a here docum.diff | (download)

templates/hooks--pre-rebase.sample | 6 3 + 3 - 0 !
1 file changed, 3 insertions(+), 3 deletions(-)

 pre-rebase hook: capture documentation in a <<here document

Without this change, the sample hook does not pass a syntax check
(sh -n):

  $ sh -n hooks--pre-rebase.sample
  hooks--pre-rebase.sample: line 101: syntax error near unexpected token `('
  hooks--pre-rebase.sample: line 101: `   merged into it again (either directly or indirectly).'

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Improved-by: Junio C Hamano <gitster@pobox.com>

Normalize generated asciidoc timestamps with SOURCE_D.diff | (download)

Documentation/Makefile | 7 5 + 2 - 0 !
Documentation/technical/api-index.sh | 4 4 + 0 - 0 !
2 files changed, 9 insertions(+), 2 deletions(-)

 normalize generated asciidoc timestamps with source_date_epoch

This is needed to pass the Debian build reproducibility test
(https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal).

Signed-off-by: Anders Kaseorg <andersk@mit.edu>

git gui Sort entries in optimized tclIndex.diff | (download)

git-gui/Makefile | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 git-gui: sort entries in optimized tclindex
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

auto_mkindex expands wildcards in directory order, which depends on
the underlying filesystem.  To improve build reproducibility, sort the
list of *.tcl files in the Makefile.

The unoptimized loading case was previously fixed in
v2.11.0-rc0~31^2^2~14 “git-gui: sort entries in tclIndex”.

Signed-off-by: Anders Kaseorg <andersk@mit.edu>

xdiff Do not enable XDL_FAST_HASH by default.diff | (download)

Makefile | 1 0 + 1 - 0 !
config.mak.uname | 5 0 + 5 - 0 !
2 files changed, 6 deletions(-)

 xdiff: do not enable xdl_fast_hash by default
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Although XDL_FAST_HASH computes hashes slightly faster on some
architectures, its collision characteristics are much worse, resulting
shell disallow repo names beginning with dash.patch | (download)

shell.c | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 [patch] shell: disallow repo names beginning with dash

When a remote server uses git-shell, the client side will
connect to it like:

  ssh server "git-upload-pack 'foo.git'"

and we literally exec ("git-upload-pack", "foo.git"). In
early versions of upload-pack and receive-pack, we took a
repository argument and nothing else. But over time they
learned to accept dashed options. If the user passes a
repository name that starts with a dash, the results are
confusing at best (we complain of a bogus option instead of
a non-existent repository) and malicious at worst (the user
can start an interactive pager via "--help").

We could pass "--" to the sub-process to make sure the
user's argument is interpreted as a branch name. I.e.:

  git-upload-pack -- -foo.git

But adding "--" automatically would make us inconsistent
with a normal shell (i.e., when git-shell is not in use),
where "-foo.git" would still be an error. For that case, the
client would have to specify the "--", but they can't do so
reliably, as existing versions of git-shell do not allow
more than a single argument.

The simplest thing is to simply disallow "-" at the start of
the repo name argument. This hasn't worked either with or
without git-shell since version 1.0.0, and nobody has
complained.

Note that this patch just applies to do_generic_cmd(), which
runs upload-pack, receive-pack, and upload-archive. There
are two other types of commands that git-shell runs:

  - do_cvs_cmd(), but this already restricts the argument to
    be the literal string "server"

  - admin-provided commands in the git-shell-commands
    directory. We'll pass along arbitrary arguments there,
    so these commands could have similar problems. But these
    commands might actually understand dashed arguments, so
    we cannot just block them here. It's up to the writer of
    the commands to make sure they are safe. With great
    power comes great responsibility.

Reported-by: Timo Schmid <tschmid@ernw.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

connect reject ssh hostname that begins with a dash.diff | (download)

connect.c | 3 3 + 0 - 0 !
1 file changed, 3 insertions(+)

 connect: reject ssh hostname that begins with a dash

When commands like "git fetch" talk with ssh://$rest_of_URL/, the
code splits $rest_of_URL into components like host, port, etc., and
then spawns the underlying "ssh" program by formulating argv[] array
that has:

 - the path to ssh command taken from GIT_SSH_COMMAND, etc.

 - dashed options like '-batch' (for Tortoise), '-p <port>' as
   needed.

 - ssh_host, which is supposed to be the hostname parsed out of
   $rest_of_URL.

 - then the command to be run on the other side, e.g. git
   upload-pack.

If the ssh_host ends up getting '-<anything>', the argv[] that is
used to spawn the command becomes something like:

    { "ssh", "-p", "22", "-<anything>", "command", "to", "run", NULL }

which obviously is bogus, but depending on the actual value of
"<anything>", will make "ssh" parse and use it as an option.

Prevent this by forbidding ssh_host that begins with a "-".

Noticed-by: Joern Schneeweisz of Recurity Labs
Reported-by: Brian at GitLab
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t5813 add test for hostname starting with dash.diff | (download)

t/t5813-proto-disable-ssh.sh | 9 9 + 0 - 0 !
1 file changed, 9 insertions(+)

 t5813: add test for hostname starting with dash

Per the explanation in the previous patch, this should be
(and is) rejected.

Signed-off-by: Jeff King <peff@peff.net>
connect factor out looks like command line option che.diff | (download)

cache.h | 8 8 + 0 - 0 !
connect.c | 2 1 + 1 - 0 !
path.c | 5 5 + 0 - 0 !
3 files changed, 14 insertions(+), 1 deletion(-)

 connect: factor out "looks like command line option" check

We reject hostnames that start with a dash because they may
be confused for command-line options. Let's factor out that
notion into a helper function, as we'll use it in more
places. And while it's simple now, it's not clear if some
systems might need more complex logic to handle all cases.

Signed-off-by: Jeff King <peff@peff.net>
connect reject dashed arguments for proxy commands.diff | (download)

connect.c | 5 5 + 0 - 0 !
t/t5532-fetch-proxy.sh | 5 5 + 0 - 0 !
2 files changed, 10 insertions(+)

 connect: reject dashed arguments for proxy commands

If you have a GIT_PROXY_COMMAND configured, we will run it
with the host/port on the command-line. If a URL contains a
mischievous host like "--foo", we don't know how the proxy
command may handle it. It's likely to break, but it may also
do something dangerous and unwanted (technically it could
even do something useful, but that seems unlikely).

We should err on the side of caution and reject this before
we even run the command.

The hostname check matches the one we do in a similar
circumstance for ssh. The port check is not present for ssh,
but there it's not necessary because the syntax is "-p
<port>", and there's no ambiguity on the parsing side.

It's not clear whether you can actually get a negative port
to the proxy here or not. Doing:

  git fetch git://remote:-1234/repo.git

keeps the "-1234" as part of the hostname, with the default
port of 9418. But it's a good idea to keep this check close
to the point of running the command to make it clear that
there's no way to circumvent it (and at worst it serves as a
belt-and-suspenders check).

Signed-off-by: Jeff King <peff@peff.net>
connect reject paths that look like command line opti.diff | (download)

connect.c | 3 3 + 0 - 0 !
t/t5810-proto-disable-local.sh | 23 23 + 0 - 0 !
t/t5813-proto-disable-ssh.sh | 14 14 + 0 - 0 !
3 files changed, 40 insertions(+)

 connect: reject paths that look like command line options

If we get a repo path like "-repo.git", we may try to invoke
"git-upload-pack -repo.git". This is going to fail, since
upload-pack will interpret it as a set of bogus options. But
let's reject this before we even run the sub-program, since
we would not want to allow any mischief with repo names that
actually are real command-line options.

You can still ask for such a path via git-daemon, but there's no
security problem there, because git-daemon enters the repo itself
and then passes "."  on the command line.

Signed-off-by: Jeff King <peff@peff.net>
cvsserver move safe_pipe_capture to the main package.diff | (download)

git-cvsserver.perl | 47 22 + 25 - 0 !
1 file changed, 22 insertions(+), 25 deletions(-)

 cvsserver: move safe_pipe_capture() to the main package

As a preparation for replacing `command` with a call to this
function from outside GITCVS::updater package, move it to the main
package.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

cvsserver use safe_pipe_capture instead of backticks.diff | (download)

git-cvsserver.perl | 22 11 + 11 - 0 !
1 file changed, 11 insertions(+), 11 deletions(-)

 cvsserver: use safe_pipe_capture instead of backticks

This makes the script pass arguments that are derived from end-user
input in safer way when invoking subcommands.

Reported-by: joernchen <joernchen@phenoelit.de>
Signed-off-by: joernchen <joernchen@phenoelit.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cvsserver use safe_pipe_capture for constant commands.diff | (download)

git-cvsserver.perl | 8 4 + 4 - 0 !
1 file changed, 4 insertions(+), 4 deletions(-)

 cvsserver: use safe_pipe_capture for `constant commands` as well

This is not strictly necessary, but it is a good code hygiene.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

shell drop git cvsserver support by default.diff | (download)

Documentation/git-shell.txt | 16 16 + 0 - 0 !
shell.c | 14 0 + 14 - 0 !
t/t9400-git-cvsserver-server.sh | 48 48 + 0 - 0 !
3 files changed, 64 insertions(+), 14 deletions(-)

 shell: drop git-cvsserver support by default

The git-cvsserver script is old and largely unmaintained
these days. But git-shell allows untrusted users to run it
out of the box, significantly increasing its attack surface.

Let's drop it from git-shell's list of internal handlers so
that it cannot be run by default.  This is not backwards
compatible. But given the age and development activity on
CVS-related parts of Git, this is likely to impact very few
users, while helping many more (i.e., anybody who runs
git-shell and had no intention of supporting CVS).

There's no configuration mechanism in git-shell for us to
add a boolean and flip it to "off". But there is a mechanism
for adding custom commands, and adding CVS support here is
fairly trivial. Let's document it to give guidance to
anybody who really is still running cvsserver.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

archimport use safe_pipe_capture for user input.diff | (download)

git-archimport.perl | 4 2 + 2 - 0 !
1 file changed, 2 insertions(+), 2 deletions(-)

 archimport: use safe_pipe_capture for user input

Refnames can contain shell metacharacters which need to be
passed verbatim to sub-processes. Using safe_pipe_capture
skips the shell entirely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

cvsimport shell quote variable used in backticks.diff | (download)

git-cvsimport.perl | 1 1 + 0 - 0 !
1 file changed, 1 insertion(+)

 cvsimport: shell-quote variable used in backticks

We run `git rev-parse` though the shell, and quote its
argument only with single-quotes. This prevents most
metacharacters from being a problem, but misses the obvious
case when $name itself has single-quotes in it. We can fix
this by applying the usual shell-quoting formula.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

submodule config verify submodule names as paths.diff | (download)

builtin/submodule--helper.c | 26 25 + 1 - 0 !
git-submodule.sh | 5 5 + 0 - 0 !
submodule-config.c | 31 31 + 0 - 0 !
submodule-config.h | 7 7 + 0 - 0 !
t/t7415-submodule-names.sh | 77 77 + 0 - 0 !
5 files changed, 145 insertions(+), 1 deletion(-)

 submodule-config: verify submodule names as paths

commit 0383bbb9015898cbc79abd7b64316484d7713b44 upstream.

Submodule "names" come from the untrusted .gitmodules file,
but we blindly append them to $GIT_DIR/modules to create our
on-disk repo paths. This means you can do bad things by
putting "../" into the name (among other things).

Let's sanity-check these names to avoid building a path that
can be exploited. There are two main decisions:

  1. What should the allowed syntax be?

     It's tempting to reuse verify_path(), since submodule
     names typically come from in-repo paths. But there are
     two reasons not to:

       a. It's technically more strict than what we need, as
          we really care only about breaking out of the
          $GIT_DIR/modules/ hierarchy.  E.g., having a
          submodule named "foo/.git" isn't actually
          dangerous, and it's possible that somebody has
          manually given such a funny name.

       b. Since we'll eventually use this checking logic in
          fsck to prevent downstream repositories, it should
          be consistent across platforms. Because
          verify_path() relies on is_dir_sep(), it wouldn't
          block "foo\..\bar" on a non-Windows machine.

  2. Where should we enforce it? These days most of the
     .gitmodules reads go through submodule-config.c, so
     I've put it there in the reading step. That should
     cover all of the C code.

     We also construct the name for "git submodule add"
     inside the git-submodule.sh script. This is probably
     not a big deal for security since the name is coming
     from the user anyway, but it would be polite to remind
     them if the name they pick is invalid (and we need to
     expose the name-checker to the shell anyway for our
     test scripts).

     This patch issues a warning when reading .gitmodules
     and just ignores the related config entry completely.
     This will generally end up producing a sensible error,
     as it works the same as a .gitmodules file which is
     missing a submodule entry (so "submodule update" will
     barf, but "git clone --recurse-submodules" will print
     an error but not abort the clone.

     There is one minor oddity, which is that we print the
     warning once per malformed config key (since that's how
     the config subsystem gives us the entries). So in the
     new test, for example, the user would see three
     warnings. That's OK, since the intent is that this case
     should never come up outside of malicious repositories
     (and then it might even benefit the user to see the
     message multiple times).

Credit for finding this vulnerability and the proof of
concept from which the test script was adapted goes to
Etienne Stalmans.

[jn: the original patch expects 'git clone' to succeed in
 the test because v2.13.0-rc0~10^2~3 (clone: teach
 --recurse-submodules to optionally take a pathspec,
 2017-03-17) makes 'git clone' skip invalid submodules.
 Updated the test to pass in older Git versions where the
 submodule name check makes 'git clone' fail.]

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

is_ntfs_dotgit use a size_t for traversing string.diff | (download)

path.c | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 is_ntfs_dotgit: use a size_t for traversing string

commit 11a9f4d807a0d71dc6eff51bb87baf4ca2cccf1d upstream.

We walk through the "name" string using an int, which can
wrap to a negative value and cause us to read random memory
before our array (e.g., by creating a tree with a name >2GB,
since "int" is still 32 bits even on most 64-bit platforms).
Worse, this is easy to trigger during the fsck_tree() check,
which is supposed to be protecting us from malicious
garbage.

Note one bit of trickiness in the existing code: we
sometimes assign -1 to "len" at the end of the loop, and
then rely on the "len++" in the for-loop's increment to take
it back to 0. This is still legal with a size_t, since
assigning -1 will turn into SIZE_MAX, which then wraps
around to 0 on increment.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

is_hfs_dotgit match other .git files.diff | (download)

utf8.c | 58 46 + 12 - 0 !
utf8.h | 5 5 + 0 - 0 !
2 files changed, 51 insertions(+), 12 deletions(-)

 is_hfs_dotgit: match other .git files

commit 0fc333ba20b43a8afee5023e92cb3384ff4e59a6 upstream.

Both verify_path() and fsck match ".git", ".GIT", and other
variants specific to HFS+. Let's allow matching other
special files like ".gitmodules", which we'll later use to
enforce extra restrictions via verify_path() and fsck.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

is_ntfs_dotgit match other .git files.diff | (download)

cache.h | 10 9 + 1 - 0 !
path.c | 84 84 + 0 - 0 !
2 files changed, 93 insertions(+), 1 deletion(-)

 is_ntfs_dotgit: match other .git files

commit e7cb0b4455c85b53aeba40f88ffddcf6d4002498 upstream.

When we started to catch NTFS short names that clash with .git, we only
looked for GIT~1. This is sufficient because we only ever clone into an
empty directory, so .git is guaranteed to be the first subdirectory or
file in that directory.

However, even with a fresh clone, .gitmodules is *not* necessarily the
first file to be written that would want the NTFS short name GITMOD~1: a
malicious repository can add .gitmodul0000 and friends, which sorts
before `.gitmodules` and is therefore checked out *first*. For that
reason, we have to test not only for ~1 short names, but for others,
too.

It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
Windows 2000 (i.e., in all Windows versions still supported by Git),
NTFS short names are only generated in the <prefix>~<number> form up to
is_ hfs ntfs _dotgitmodules add tests.diff | (download)

t/helper/test-path-utils.c | 20 20 + 0 - 0 !
t/t0060-path-utils.sh | 86 86 + 0 - 0 !
2 files changed, 106 insertions(+)

 is_{hfs,ntfs}_dotgitmodules: add tests

commit dc2d9ba3187fcd0ca8eeab9aa9ddef70cf8627a6 upstream.

This tests primarily for NTFS issues, but also adds one example of an
HFS+ issue.

Thanks go to Congyi Wu for coming up with the list of examples where
NTFS would possibly equate the filename with `.gitmodules`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

skip_prefix add case insensitive variant.diff | (download)

git-compat-util.h | 17 17 + 0 - 0 !
1 file changed, 17 insertions(+)

 skip_prefix: add case-insensitive variant

commit 41a80924aec0e94309786837b6f954a3b3f19b71 upstream.

We have the convenient skip_prefix() helper, but if you want
to do case-insensitive matching, you're stuck doing it by
hand. We could add an extra parameter to the function to
let callers ask for this, but the function is small and
somewhat performance-critical. Let's just re-implement it
for the case-insensitive version.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

verify_path drop clever fallthrough.diff | (download)

read-cache.c | 8 4 + 4 - 0 !
1 file changed, 4 insertions(+), 4 deletions(-)

 verify_path: drop clever fallthrough

commit e19e5e66d691bdeeeb5e0ed2ffcecdd7666b0d7b upstream.

We check ".git" and ".." in the same switch statement, and
fall through the cases to share the end-of-component check.
While this saves us a line or two, it makes modifying the
function much harder. Let's just write it out.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

verify_dotfile mention case insensitivity in comment.diff | (download)

read-cache.c | 5 4 + 1 - 0 !
1 file changed, 4 insertions(+), 1 deletion(-)

 verify_dotfile: mention case-insensitivity in comment

commit 641084b618ddbe099f0992161988c3e479ae848b upstream.

We're more restrictive than we need to be in matching ".GIT"
on case-sensitive filesystems; let's make a note that this
is intentional.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

update index stat updated files earlier.diff | (download)

builtin/update-index.c | 25 17 + 8 - 0 !
1 file changed, 17 insertions(+), 8 deletions(-)

 update-index: stat updated files earlier

commit eb12dd0c764d2b71bebd5ffffb7379a3835253ae upstream.

In the update_one(), we check verify_path() on the proposed
path before doing anything else. In preparation for having
verify_path() look at the file mode, let's stat the file
earlier, so we can check the mode accurately.

This is made a bit trickier by the fact that this function
only does an lstat in a few code paths (the ones that flow
down through process_path()). So we can speculatively do the
lstat() here and pass the results down, and just use a dummy
mode for cases where we won't actually be updating the index
from the filesystem.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

verify_path disallow symlinks in .gitmodules.diff | (download)

apply.c | 4 2 + 2 - 0 !
builtin/update-index.c | 6 3 + 3 - 0 !
cache.h | 2 1 + 1 - 0 !
read-cache.c | 40 31 + 9 - 0 !
4 files changed, 37 insertions(+), 15 deletions(-)

 verify_path: disallow symlinks in .gitmodules

commit 10ecfa76491e4923988337b2e2243b05376b40de upstream.

There are a few reasons it's not a good idea to make
.gitmodules a symlink, including:

  1. It won't be portable to systems without symlinks.

  2. It may behave inconsistently, since Git may look at
     this file in the index or a tree without bothering to
     resolve any symbolic links. We don't do this _yet_, but
     the config infrastructure is there and it's planned for
     the future.

With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow it:

  a. A symlinked .gitmodules file may circumvent any fsck
     checks of the content.

  b. Git may read and write from the on-disk file without
     sanity checking the symlink target. So for example, if
     you link ".gitmodules" to "../oops" and run "git
     submodule add", we'll write to the file "oops" outside
     the repository.

Again, both of those are problems that _could_ be solved
with sufficient code, but given the complications in (1) and
(2), we're better off just outlawing it explicitly.

Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

sha1_file add read_loose_object function.diff | (download)

cache.h | 13 13 + 0 - 0 !
sha1_file.c | 132 130 + 2 - 0 !
2 files changed, 143 insertions(+), 2 deletions(-)

 sha1_file: add read_loose_object() function

commit f6371f9210418f1beabc85b097e2a3470aeeb54d upstream.

It's surprisingly hard to ask the sha1_file code to open a
_specific_ incarnation of a loose object. Most of the
functions take a sha1, and loop over the various object
types (packed versus loose) and locations (local versus
alternates) at a low level.

However, some tools like fsck need to look at a specific
file. This patch gives them a function they can use to open
the loose object at a given path.

The implementation unfortunately ends up repeating bits of
related functions, but there's not a good way around it
without some major refactoring of the whole sha1_file stack.
We need to mmap the specific file, then partially read the
zlib stream to know whether we're streaming or not, and then
finally either stream it or copy the data to a buffer.

We can do that by assembling some of the more arcane
internal sha1_file functions, but we end up having to
essentially reimplement unpack_sha1_file(), along with the
streaming bits of check_sha1_signature().

Still, most of the ugliness is contained in the new
function, and the interface is clean enough that it may be
reusable (though it seems unlikely anything but git-fsck
would care about opening a specific file).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck parse loose object paths directly.diff | (download)

builtin/fsck.c | 46 33 + 13 - 0 !
t/t1450-fsck.sh | 16 16 + 0 - 0 !
2 files changed, 49 insertions(+), 13 deletions(-)

 fsck: parse loose object paths directly

commit c68b489e56431cf27f7719913ab09ddc62f95912 upstream.

When we iterate over the list of loose objects to check, we
get the actual path of each object. But we then throw it
away and pass just the sha1 to fsck_sha1(), which will do a
fresh lookup. Usually it would find the same object, but it
may not if an object exists both as a loose and a packed
object. We may end up checking the packed object twice, and
never look at the loose one.

In practice this isn't too terrible, because if fsck doesn't
complain, it means you have at least one good copy. But
since the point of fsck is to look for corruption, we should
be thorough.

The new read_loose_object() interface can help us get the
data from disk, and then we replace parse_object() with
parse_object_buffer(). As a bonus, our error messages now
mention the path to a corrupted object, which should make it
easier to track down errors when they do happen.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

index pack make fsck error message more specific.diff | (download)

builtin/index-pack.c | 2 1 + 1 - 0 !
builtin/unpack-objects.c | 2 1 + 1 - 0 !
2 files changed, 2 insertions(+), 2 deletions(-)

 index-pack: make fsck error message more specific

commit db5a58c1bda5b20169b9958af1e8b05ddd178b01 upstream.

If fsck reports an error, we say only "Error in object".
This isn't quite as bad as it might seem, since the fsck
code would have dumped some errors to stderr already. But it
might help to give a little more context. The earlier output
would not have even mentioned "fsck", and that may be a clue
that the "fsck.*" or "*.fsckObjects" config may be relevant.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck simplify .git check.diff | (download)

fsck.c | 4 1 + 3 - 0 !
1 file changed, 1 insertion(+), 3 deletions(-)

 fsck: simplify ".git" check

commit ed9c3220621d634d543bc4dd998d12167dfc57d4 upstream.

There's no need for us to manually check for ".git"; it's a
subset of the other filesystem-specific tests. Dropping it
makes our code slightly shorter. More importantly, the
existing code may make a reader wonder why ".GIT" is not
covered here, and whether that is a bug (it isn't, as it's
also covered in the filesystem-specific tests).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck actually fsck blob data.diff | (download)

builtin/fsck.c | 51 25 + 26 - 0 !
fsck.c | 8 7 + 1 - 0 !
sha1_file.c | 2 1 + 1 - 0 !
3 files changed, 33 insertions(+), 28 deletions(-)

 fsck: actually fsck blob data

commit 7ac4f3a007e2567f9d2492806186aa063f9a08d6 upstream.

Because fscking a blob has always been a noop, we didn't
bother passing around the blob data. In preparation for
content-level checks, let's fix up a few things:

  1. The fsck_object() function just returns success for any
     blob. Let's a noop fsck_blob(), which we can fill in
     with actual logic later.

  2. The fsck_loose() function in builtin/fsck.c
     just threw away blob content after loading it. Let's
     hold onto it until after we've called fsck_object().

     The easiest way to do this is to just drop the
     parse_loose_object() helper entirely. Incidentally,
     this also fixes a memory leak: if we successfully
     loaded the object data but did not parse it, we would
     have left the function without freeing it.

  3. When fsck_loose() loads the object data, it
     does so with a custom read_loose_object() helper. This
     function streams any blobs, regardless of size, under
     the assumption that we're only checking the sha1.

     Instead, let's actually load blobs smaller than
     big_file_threshold, as the normal object-reading
     code-paths would do. This lets us fsck small files, and
     a NULL return is an indication that the blob was so big
     that it needed to be streamed, and we can pass that
     information along to fsck_blob().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck detect gitmodules files.diff | (download)

fsck.c | 90 90 + 0 - 0 !
fsck.h | 7 7 + 0 - 0 !
2 files changed, 97 insertions(+)

 fsck: detect gitmodules files

commit 159e7b080bfa5d34559467cacaa79df89a01afc0 upstream.

In preparation for performing fsck checks on .gitmodules
files, this commit plumbs in the actual detection of the
files. Note that unlike most other fsck checks, this cannot
be a property of a single object: we must know that the
object is found at a ".gitmodules" path at the root tree of
a commit.

Since the fsck code only sees one object at a time, we have
to mark the related objects to fit the puzzle together. When
we see a commit we mark its tree as a root tree, and when
we see a root tree with a .gitmodules file, we mark the
corresponding blob to be checked.

In an ideal world, we'd check the objects in topological
order: commits followed by trees followed by blobs. In that
case we can avoid ever loading an object twice, since all
markings would be complete by the time we get to the marked
objects. And indeed, if we are checking a single packfile,
this is the order in which Git will generally write the
objects. But we can't count on that:

  1. git-fsck may show us the objects in arbitrary order
     (loose objects are fed in sha1 order, but we may also
     have multiple packs, and we process each pack fully in
     sequence).

  2. The type ordering is just what git-pack-objects happens
     to write now. The pack format does not require a
     specific order, and it's possible that future versions
     of Git (or a custom version trying to fool official
fsck check .gitmodules content.diff | (download)

fsck.c | 59 58 + 1 - 0 !
1 file changed, 58 insertions(+), 1 deletion(-)

 fsck: check .gitmodules content

commit ed8b10f631c9a71df3351d46187bf7f3fa4f9b7e upstream.

This patch detects and blocks submodule names which do not
match the policy set forth in submodule-config. These should
already be caught by the submodule code itself, but putting
the check here means that newer versions of Git can protect
older ones from malicious entries (e.g., a server with
receive.fsckObjects will block the objects, protecting
clients which fetch from it).

As a side effect, this means fsck will also complain about
.gitmodules files that cannot be parsed (or were larger than
core.bigFileThreshold).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck call fsck_finish after fscking objects.diff | (download)

builtin/fsck.c | 3 3 + 0 - 0 !
t/t7415-submodule-names.sh | 4 4 + 0 - 0 !
2 files changed, 7 insertions(+)

 fsck: call fsck_finish() after fscking objects

commit 1995b5e03e1cc97116be58cdc0502d4a23547856 upstream.

Now that the internal fsck code is capable of checking
.gitmodules files, we just need to teach its callers to use
the "finish" function to check any queued objects.

With this, we can now catch the malicious case in t7415 with
git-fsck.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

unpack objects call fsck_finish after fscking objects.diff | (download)

builtin/unpack-objects.c | 5 4 + 1 - 0 !
t/t7415-submodule-names.sh | 7 7 + 0 - 0 !
2 files changed, 11 insertions(+), 1 deletion(-)

 unpack-objects: call fsck_finish() after fscking objects

commit 6e328d6caef218db320978e3e251009135d87d0e upstream.

As with the previous commit, we must call fsck's "finish"
function in order to catch any queued objects for
.gitmodules checks.

This second pass will be able to access any incoming
objects, because we will have exploded them to loose objects
by now.

This isn't quite ideal, because it means that bad objects
may have been written to the object database (and a
subsequent operation could then reference them, even if the
other side doesn't send the objects again). However, this is
sufficient when used with receive.fsckObjects, since those
loose objects will all be placed in a temporary quarantine
area that will get wiped if we find any problems.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

index pack check .gitmodules files with strict.diff | (download)

builtin/index-pack.c | 10 10 + 0 - 0 !
t/lib-pack.sh | 12 12 + 0 - 0 !
t/t7415-submodule-names.sh | 38 38 + 0 - 0 !
3 files changed, 60 insertions(+)

 index-pack: check .gitmodules files with --strict

commit 73c3f0f704a91b6792e0199a3f3ab6e3a1971675 upstream.

Now that the internal fsck code has all of the plumbing we
need, we can start checking incoming .gitmodules files.
Naively, it seems like we would just need to add a call to
fsck_finish() after we've processed all of the objects. And
that would be enough to cover the initial test included
here. But there are two extra bits:

  1. We currently don't bother calling fsck_object() at all
     for blobs, since it has traditionally been a noop. We'd
     actually catch these blobs in fsck_finish() at the end,
     but it's more efficient to check them when we already
     have the object loaded in memory.

  2. The second pass done by fsck_finish() needs to access
     the objects, but we're actually indexing the pack in
     this process. In theory we could give the fsck code a
     special callback for accessing the in-pack data, but
     it's actually quite tricky:

       a. We don't have an internal efficient index mapping
	  oids to packfile offsets. We only generate it on
	  the fly as part of writing out the .idx file.

       b. We'd still have to reconstruct deltas, which means
          we'd basically have to replicate all of the
	  reading logic in packfile.c.

     Instead, let's avoid running fsck_finish() until after
     we've written out the .idx file, and then just add it
     to our internal packed_git list.

     This does mean that the objects are "in the repository"
     before we finish our fsck checks. But unpack-objects
     already exhibits this same behavior, and it's an
     acceptable tradeoff here for the same reason: the
     quarantine mechanism means that pushes will be
     fully protected.

In addition to a basic push test in t7415, we add a sneaky
pack that reverses the usual object order in the pack,
requiring that index-pack access the tree and blob during
the "finish" step.

This already works for unpack-objects (since it will have
written out loose objects), but we'll check it with this
sneaky pack for good measure.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck complain when .gitmodules is a symlink.diff | (download)

fsck.c | 11 9 + 2 - 0 !
t/t7415-submodule-names.sh | 29 29 + 0 - 0 !
2 files changed, 38 insertions(+), 2 deletions(-)

 fsck: complain when .gitmodules is a symlink

commit b7b1fca175f1ed7933f361028c631b9ac86d868d upstream.

We've recently forbidden .gitmodules to be a symlink in
verify_path(). And it's an easy way to circumvent our fsck
checks for .gitmodules content. So let's complain when we
see it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

submodule_init die cleanly on submodules without url .diff | (download)

builtin/submodule--helper.c | 6 3 + 3 - 0 !
t/t7400-submodule-basic.sh | 8 8 + 0 - 0 !
2 files changed, 11 insertions(+), 3 deletions(-)

 submodule_init: die cleanly on submodules without url defined

commit 627fde102515a7807dba89acaa88cb053b38a44a upstream.

When we init a submodule, we try to die when it has no URL
defined:

  url = xstrdup(sub->url);
  if (!url)
	  die(...);

But that's clearly nonsense. xstrdup() will never return
NULL, and if sub->url is NULL, we'll segfault.

These two bits of code need to be flipped, so we check
sub->url before looking at it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

submodule helper use to signal end of clone options.diff | (download)

builtin/submodule--helper.c | 1 1 + 0 - 0 !
1 file changed, 1 insertion(+)

 submodule--helper: use "--" to signal end of clone options

commit 98afac7a7cefdca0d2c4917dd8066a59f7088265 upstream.

When we clone a submodule, we call "git clone $url $path".
But there's nothing to say that those components can't begin
with a dash themselves, confusing git-clone into thinking
they're options. Let's pass "--" to make it clear what we
expect.

There's no test here, because it's actually quite hard to
make these names work, even with "git clone" parsing them
correctly. And we're going to restrict these cases even
further in future commits. So we'll leave off testing until
then; this is just the minimal fix to prevent us from doing
something stupid with a badly formed entry.

Reported-by: joernchen <joernchen@phenoelit.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

submodule config ban submodule urls that start with d.diff | (download)

submodule-config.c | 8 8 + 0 - 0 !
t/t7416-submodule-dash-url.sh | 34 34 + 0 - 0 !
2 files changed, 42 insertions(+)

 submodule-config: ban submodule urls that start with dash

commit f6adec4e329ef0e25e14c63b735a5956dc67b8bc upstream.

The previous commit taught the submodule code to invoke our
"git clone $url $path" with a "--" separator so that we
aren't confused by urls or paths that start with dashes.

However, that's just one code path. It's not clear if there
are others, and it would be an easy mistake to add one in
the future. Moreover, even with the fix in the previous
commit, it's quite hard to actually do anything useful with
such an entry. Any url starting with a dash must fall into
one of three categories:

 - it's meant as a file url, like "-path". But then any
   clone is not going to have the matching path, since it's
   by definition relative inside the newly created clone. If
   you spell it as "./-path", the submodule code sees the
   "/" and translates this to an absolute path, so it at
   least works (assuming the receiver has the same
   filesystem layout as you). But that trick does not apply
   for a bare "-path".

 - it's meant as an ssh url, like "-host:path". But this
   already doesn't work, as we explicitly disallow ssh
   hostnames that begin with a dash (to avoid option
   injection against ssh).

 - it's a remote-helper scheme, like "-scheme::data". This
   _could_ work if the receiver bends over backwards and
   creates a funny-named helper like "git-remote--scheme".
   But normally there would not be any helper that matches.

Since such a url does not work today and is not likely to do
anything useful in the future, let's simply disallow them
entirely. That protects the existing "git clone" path (in a
belt-and-suspenders way), along with any others that might
exist.

Our tests cover two cases:

  1. A file url with "./" continues to work, showing that
     there's an escape hatch for people with truly silly
     repo names.

  2. A url starting with "-" is rejected.

Note that we expect case (2) to fail, but it would have done
so even without this commit, for the reasons given above.
So instead of just expecting failure, let's also check for
the magic word "ignoring" on stderr. That lets us know that
we failed for the right reason.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

submodule config ban submodule paths that start with .diff | (download)

submodule-config.c | 2 2 + 0 - 0 !
t/t7417-submodule-path-url.sh | 20 20 + 0 - 0 !
2 files changed, 22 insertions(+)

 submodule-config: ban submodule paths that start with a dash

commit 273c61496f88c6495b886acb1041fe57965151da upstream.

We recently banned submodule urls that look like
command-line options. This is the matching change to ban
leading-dash paths.

As with the urls, this should not break any use cases that
currently work. Even with our "--" separator passed to
git-clone, git-submodule.sh gets confused. Without the code
portion of this patch, the clone of "-sub" added in t7417
would yield results like:

    /path/to/git-submodule: 410: cd: Illegal option -s
    /path/to/git-submodule: 417: cd: Illegal option -s
    /path/to/git-submodule: 410: cd: Illegal option -s
    /path/to/git-submodule: 417: cd: Illegal option -s
    Fetched in submodule path '-sub', but it did not contain b56243f8f4eb91b2f1f8109452e659f14dd3fbe4. Direct fetching of that commit failed.

Moreover, naively adding such a submodule doesn't work:

  $ git submodule add $url -sub
  The following path is ignored by one of your .gitignore files:
  -sub

even though there is no such ignore pattern (the test script
hacks around this with a well-placed "git mv").

Unlike leading-dash urls, though, it's possible that such a
path _could_ be useful if we eventually made it work. So
this commit should be seen not as recommending a particular
policy, but rather temporarily closing off a broken and
possibly dangerous code-path. We may revisit this decision
later.

fsck detect submodule urls starting with dash.diff | (download)

fsck.c | 7 7 + 0 - 0 !
t/t7416-submodule-dash-url.sh | 15 15 + 0 - 0 !
2 files changed, 22 insertions(+)

 fsck: detect submodule urls starting with dash

commit a124133e1e6ab5c7a9fef6d0e6bcb084e3455b46 upstream.

Urls with leading dashes can cause mischief on older
versions of Git. We should detect them so that they can be
rejected by receive.fsckObjects, preventing modern versions
of git from being a vector by which attacks can spread.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fsck detect submodule paths starting with dash.diff | (download)

fsck.c | 7 7 + 0 - 0 !
t/t7417-submodule-path-url.sh | 8 8 + 0 - 0 !
2 files changed, 15 insertions(+)

 fsck: detect submodule paths starting with dash

commit 1a7fd1fb2998002da6e9ff2ee46e1bdd25ee8404 upstream.

As with urls, submodule paths with dashes are ignored by
git, but may end up confusing older versions. Detecting them
via fsck lets us prevent modern versions of git from being a
vector to spread broken .gitmodules to older versions.

Compared to blocking leading-dash urls, though, this
detection may be less of a good idea:

  1. While such paths provide confusing and broken results,
     they don't seem to actually work as option injections
     against anything except "cd". In particular, the
     submodule code seems to canonicalize to an absolute
     path before running "git clone" (so it passes
     /your/clone/-sub).

  2. It's more likely that we may one day make such names
     actually work correctly. Even after we revert this fsck
     check, it will continue to be a hassle until hosting
     servers are all updated.

On the other hand, it's not entirely clear that the behavior
in older versions is safe. And if we do want to eventually
allow this, we may end up doing so with a special syntax
anyway (e.g., writing "./-sub" in the .gitmodules file, and
teaching the submodule code to canonicalize it when
comparing).

So on balance, this is probably a good protection.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

cvsimport apply shell quoting regex globally.diff | (download)

git-cvsimport.perl | 2 1 + 1 - 0 !
1 file changed, 1 insertion(+), 1 deletion(-)

 cvsimport: apply shell-quoting regex globally

commit 8c87bdfb2137c9e9e945df13e2f2e1eb995ddf83 upstream.

Commit 5b4efea666 (cvsimport: shell-quote variable used in
backticks, 2017-09-11) tried to shell-quote a variable, but
forgot to use the "/g" modifier to apply the quoting to the
whole variable. This means we'd miss any embedded
single-quotes after the first one.

Reported-by: <littlelailo@yahoo.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

t9300 drop some useless uses of cat.diff | (download)

t/t9300-fast-import.sh | 10 5 + 5 - 0 !
1 file changed, 5 insertions(+), 5 deletions(-)

 t9300: drop some useless uses of cat

commit f94804c1f2626831c6bdf8cc269a571324e3f2f2 upstream.

These waste a process, and make the line longer than it needs to be.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

t9300 create marks files for double import marks test.diff | (download)

t/t9300-fast-import.sh | 2 2 + 0 - 0 !
1 file changed, 2 insertions(+)

 t9300: create marks files for double-import-marks test

commit 816f806786e12435163c591942a204c5a3bdd795 upstream.

Our tests confirm that providing two "import-marks" options in a
fast-import stream is an error. However, the invoked command would fail
even without covering this case, because the marks files themselves do
not actually exist.  Let's create the files to make sure we fail for the
right reason (we actually do, because the option parsing happens before
we open anything, but this future-proofs our test).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fast import tighten parsing of boolean command line o.diff | (download)

fast-import.c | 4 2 + 2 - 0 !
1 file changed, 2 insertions(+), 2 deletions(-)

 fast-import: tighten parsing of boolean command line options

commit 11e934d56e46875b24d8a047d44b45ff243f6715 upstream.

We parse options like "--max-pack-size=" using skip_prefix(), which
makes sense to get at the bytes after the "=". However, we also parse
"--quiet" and "--stats" with skip_prefix(), which allows things like
"--quiet-nonsense" to behave like "--quiet".

This was a mistaken conversion in 0f6927c229 (fast-import: put option
parsing code in separate functions, 2009-12-04). Let's tighten this to
an exact match, which was the original intent.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fast import stop creating leading directories for imp.diff | (download)

fast-import.c | 1 0 + 1 - 0 !
1 file changed, 1 deletion(-)

 fast-import: stop creating leading directories for import-marks

commit e075dba3723875f478654068609f69b2a5af8566 upstream.

When asked to import marks from "subdir/file.marks", we create the
leading directory "subdir" if it doesn't exist. This makes no sense for
importing marks, where we only ever open the path for reading.

Most of the time this would be a noop, since if the marks file exists,
then the leading directories exist, too. But if it doesn't (e.g.,
because --import-marks-if-exists was used), then we'd create the useless
directory.

This dates back to 580d5f83e7 (fast-import: always create marks_file
directories, 2010-03-29). Even then it was useless, so it seems to have
been added in error alongside the --export-marks case (which _is_
helpful).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fast import delay creating leading directories for ex.diff | (download)

fast-import.c | 7 6 + 1 - 0 !
t/t9300-fast-import.sh | 13 11 + 2 - 0 !
2 files changed, 17 insertions(+), 3 deletions(-)

 fast-import: delay creating leading directories for export-marks

commit 019683025f1b14d7cb671312ab01f7330e9b33e7 upstream.

When we parse the --export-marks option, we don't immediately open the
file, but we do create any leading directories. This can be especially
confusing when a command-line option overrides an in-stream one, in
which case we'd create the leading directory for the in-stream file,
even though we never actually write the file.

Let's instead create the directories just before opening the file, which
means we'll create only useful directories. Note that this could change
the handling of relative paths if we chdir() in between, but we don't
actually do so; the only permanent chdir is from setup_git_directory()
which runs before either code path (potentially we should take the
pre-setup dir into account to avoid surprising the user, but that's an
orthogonal change).

The test just adapts the existing "override" test to use paths with
leading directories. This checks both that the correct directory is
created (which worked before but was not tested), and that the
overridden one is not (our new fix here).

While we're here, let's also check the error result of
safe_create_leading_directories(). We'd presumably notice any failure
immediately after when we try to open the file itself, but we can give a
more specific error message in this case.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fast import disallow feature export marks by default.diff | (download)

Documentation/git-fast-import.txt | 14 14 + 0 - 0 !
fast-import.c | 25 25 + 0 - 0 !
t/t9300-fast-import.sh | 23 15 + 8 - 0 !
transport-helper.c | 1 1 + 0 - 0 !
4 files changed, 55 insertions(+), 8 deletions(-)

 fast-import: disallow "feature export-marks" by default

commit 68061e3470210703cb15594194718d35094afdc0 upstream.

The fast-import stream command "feature export-marks=<path>" lets the
stream write marks to an arbitrary path. This may be surprising if you
are running fast-import against an untrusted input (which otherwise
cannot do anything except update Git objects and refs).

Let's disallow the use of this feature by default, and provide a
command-line option to re-enable it (you can always just use the
command-line --export-marks as well, but the in-stream version provides
an easy way for exporters to control the process).

This is a backwards-incompatible change, since the default is flipping
to the new, safer behavior. However, since the main users of the
in-stream versions would be import/export-based remote helpers, and
since we trust remote helpers already (which are already running
arbitrary code), we'll pass the new option by default when reading a
remote helper's stream. This should minimize the impact.

Note that the implementation isn't totally simple, as we have to work
around the fact that fast-import doesn't parse its command-line options
until after it has read any "feature" lines from the stream. This is how
it lets command-line options override in-stream. But in our case, it's
important to parse the new --allow-unsafe-features first.

There are three options for resolving this:

  1. Do a separate "early" pass over the options. This is easy for us to
     do because there are no command-line options that allow the
     "unstuck" form (so there's no chance of us mistaking an argument
     for an option), though it does introduce a risk of incorrect
     parsing later (e.g,. if we convert to parse-options).

  2. Move the option parsing phase back to the start of the program, but
     teach the stream-reading code never to override an existing value.
     This is tricky, because stream "feature" lines override each other
     (meaning we'd have to start tracking the source for every option).

  3. Accept that we might parse a "feature export-marks" line that is
     forbidden, as long we don't _act_ on it until after we've parsed
     the command line options.

     This would, in fact, work with the current code, but only because
     the previous patch fixed the export-marks parser to avoid touching
     the filesystem.

     So while it works, it does carry risk of somebody getting it wrong
     in the future in a rather subtle and unsafe way.

I've gone with option (1) here as simple, safe, and unlikely to cause
regressions.

This fixes CVE-2019-1348.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

fast import disallow feature import marks by default.diff | (download)

Documentation/git-fast-import.txt | 3 2 + 1 - 0 !
fast-import.c | 2 2 + 0 - 0 !
t/t9300-fast-import.sh | 22 17 + 5 - 0 !
3 files changed, 21 insertions(+), 6 deletions(-)

 fast-import: disallow "feature import-marks" by default

commit a52ed76142f6e8d993bb4c50938a408966eb2b7c upstream.

As with export-marks in the previous commit, import-marks can access the
filesystem. This is significantly less dangerous than export-marks
because it only involves reading from arbitrary paths, rather than
writing them. However, it could still be surprising and have security
implications (e.g., exfiltrating data from a service that accepts
fast-import streams).

Let's lump it (and its "if-exists" counterpart) in with export-marks,
and enable the in-stream version only if --allow-unsafe-features is set.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

clone recurse submodules prevent name squatting on Wi.diff | (download)

builtin/clone.c | 2 1 + 1 - 0 !
builtin/submodule--helper.c | 13 12 + 1 - 0 !
git-submodule.sh | 6 6 + 0 - 0 !
t/t7415-submodule-names.sh | 31 31 + 0 - 0 !
4 files changed, 50 insertions(+), 2 deletions(-)

 clone --recurse-submodules: prevent name squatting on windows

commit 0060fd1511b94c918928fa3708f69a3f33895a4a upstream.

In addition to preventing `.git` from being tracked by Git, on Windows
we also have to prevent `git~1` from being tracked, as the default NTFS
short name (also known as the "8.3 filename") for the file name `.git`
is `git~1`, otherwise it would be possible for malicious repositories to
write directly into the `.git/` directory, e.g. a `post-checkout` hook
that would then be executed _during_ a recursive clone.

When we implemented appropriate protections in 2b4c6efc821 (read-cache:
optionally disallow NTFS .git variants, 2014-12-16), we had analyzed
carefully that the `.git` directory or file would be guaranteed to be
the first directory entry to be written. Otherwise it would be possible
e.g. for a file named `..git` to be assigned the short name `git~1` and
subsequently, the short name generated for `.git` would be `git~2`. Or
`git~3`. Or even `~9999999` (for a detailed explanation of the lengths
we have to go to protect `.gitmodules`, see the commit message of
e7cb0b4455c (is_ntfs_dotgit: match other .git files, 2018-05-11)).

However, by exploiting two issues (that will be addressed in a related
patch series close by), it is currently possible to clone a submodule
into a non-empty directory:

- On Windows, file names cannot end in a space or a period (for
  historical reasons: the period separating the base name from the file
  extension was not actually written to disk, and the base name/file
  extension was space-padded to the full 8/3 characters, respectively).
  Helpfully, when creating a directory under the name, say, `sub.`, that
  trailing period is trimmed automatically and the actual name on disk
  is `sub`.

  This means that while Git thinks that the submodule names `sub` and
path.c document the purpose of is_ntfs_dotgit.diff | (download)

path.c | 28 28 + 0 - 0 !
1 file changed, 28 insertions(+)

 path.c: document the purpose of `is_ntfs_dotgit()`

commit 525e7fba7854c23ee3530d0bf88d75f106f14c95 upstream.

Previously, this function was completely undocumented. It is worth,
though, to explain what is going on, as it is not really obvious at all.

Suggested-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

is_ntfs_dotgit only verify the leading segment.diff | (download)

fsck.c | 18 17 + 1 - 0 !
path.c | 5 1 + 4 - 0 !
read-cache.c | 8 8 + 0 - 0 !
3 files changed, 26 insertions(+), 5 deletions(-)

 is_ntfs_dotgit(): only verify the leading segment

commit 288a74bcd28229a00c3632f18cba92dbfdf73ee9 upstream.

The config setting `core.protectNTFS` is specifically designed to work
not only on Windows, but anywhere, to allow for repositories hosted on,
say, Linux servers to be protected against NTFS-specific attack vectors.

As a consequence, `is_ntfs_dotgit()` manually splits backslash-separated
paths (but does not do the same for paths separated by forward slashes),
under the assumption that the backslash might not be a valid directory
separator on the _current_ Operating System.

However, the two callers, `verify_path()` and `fsck_tree()`, are
supposed to feed only individual path segments to the `is_ntfs_dotgit()`
function.

This causes a lot of duplicate scanning (and very inefficient scanning,
too, as the inner loop of `is_ntfs_dotgit()` was optimized for
readability rather than for speed.

Let's simplify the design of `is_ntfs_dotgit()` by putting the burden of
splitting the paths by backslashes as directory separators on the
callers of said function.

Consequently, the `verify_path()` function, which already splits the
path by directory separators, now treats backslashes as directory
separators _explicitly_ when `core.protectNTFS` is turned on, even on
platforms where the backslash is _not_ a directory separator.

Note that we have to repeat some code in `verify_path()`: if the
backslash is not a directory separator on the current Operating System,
we want to allow file names like `\`, but we _do_ want to disallow paths
that are clearly intended to cause harm when the repository is cloned on
Windows.

The `fsck_tree()` function (the other caller of `is_ntfs_dotgit()`) now
needs to look for backslashes in tree entries' names specifically when
`core.protectNTFS` is turned on. While it would be tempting to
completely disallow backslashes in that case (much like `fsck` reports
names containing forward slashes as "full paths"), this would be
overzealous: when `core.protectNTFS` is turned on in a non-Windows
setup, backslashes are perfectly valid characters in file names while we
_still_ want to disallow tree entries that are clearly designed to
exploit NTFS-specific behavior.

This simplification will make subsequent changes easier to implement,
such as turning `core.protectNTFS` on by default (not only on Windows)
or protecting against attack vectors involving NTFS Alternate Data
Streams.

Incidentally, this change allows for catching malicious repositories
that contain tree entries of the form `dir\.gitmodules` already on the
server side rather than only on the client side (and previously only on
Windows): in contrast to `is_ntfs_dotgit()`, the
`is_ntfs_dotgitmodules()` function already expects the caller to split
the paths by directory separators.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

path safeguard .git against NTFS Alternate Streams Ac.diff | (download)

path.c | 12 11 + 1 - 0 !
t/t1014-read-tree-confusing.sh | 1 1 + 0 - 0 !
2 files changed, 12 insertions(+), 1 deletion(-)

 path: safeguard `.git` against ntfs alternate streams accesses

commit 7c3745fc6185495d5765628b4dfe1bd2c25a2981 upstream.

Probably inspired by HFS' resource streams, NTFS supports "Alternate
Data Streams": by appending `:<stream-name>` to the file name,
information in addition to the file contents can be written and read,
information that is copied together with the file (unless copied to a
non-NTFS location).

These Alternate Data Streams are typically used for things like marking
an executable as having just been downloaded from the internet (and
hence not necessarily being trustworthy).

In addition to a stream name, a stream type can be appended, like so:
`:<stream-name>:<stream-type>`. Unless specified, the default stream
type is `$DATA` for files and `$INDEX_ALLOCATION` for directories. In
other words, `.git::$INDEX_ALLOCATION` is a valid way to reference the
`.git` directory!

In our work in Git v2.2.1 to protect Git on NTFS drives under
`core.protectNTFS`, we focused exclusively on NTFS short names, unaware
of the fact that NTFS Alternate Data Streams offer a similar attack
vector.

Let's fix this.

Seeing as it is better to be safe than sorry, we simply disallow paths
referring to *any* NTFS Alternate Data Stream of `.git`, not just
`::$INDEX_ALLOCATION`. This also simplifies the implementation.

This closes CVE-2019-1352.

Further reading about NTFS Alternate Data Streams:
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c54dec26-1551-4d3a-a0ea-4fa40f848eb3

Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

path also guard .gitmodules against NTFS Alternate Da.diff | (download)

path.c | 2 1 + 1 - 0 !
t/t0060-path-utils.sh | 7 6 + 1 - 0 !
2 files changed, 7 insertions(+), 2 deletions(-)

 path: also guard `.gitmodules` against ntfs alternate data streams

commit 91bd46588e6959e6903e275f78b10bd07830d547 upstream.

We just safe-guarded `.git` against NTFS Alternate Data Stream-related
attack vectors, and now it is time to do the same for `.gitmodules`.

Note: In the added regression test, we refrain from verifying all kinds
of variations between short names and NTFS Alternate Data Streams: as
the new code disallows _all_ Alternate Data Streams of `.gitmodules`, it
is enough to test one in order to know that all of them are guarded
against.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

is_ntfs_dotgit speed it up.diff | (download)

path.c | 55 30 + 25 - 0 !
1 file changed, 30 insertions(+), 25 deletions(-)

 is_ntfs_dotgit(): speed it up

commit 3a85dc7d534fc2d410ddc0c771c963b20d1b4857 upstream.

Previously, this function was written without focusing on speed,
intending to make reviewing the code as easy as possible, to avoid any
bugs in this critical code.

Turns out: we can do much better on both accounts. With this patch, we
make it as fast as this developer can make it go:

- We avoid the call to `is_dir_sep()` and make all the character
  comparisons explicit.

- We avoid the cost of calling `strncasecmp()` and unroll the test for
  `.git` and `git~1`, not even using `tolower()` because it is faster to
  compare against two constant values.

- We look for `.git` and `.git~1` first thing, and return early if not
  found.

- We also avoid calling a separate function for detecting chains of
  spaces and periods.

Each of these improvements has a noticeable impact on the speed of
`is_ntfs_dotgit()`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

protect_ntfs turn on NTFS protection by default.diff | (download)

config.mak.uname | 2 0 + 2 - 0 !
environment.c | 2 1 + 1 - 0 !
2 files changed, 1 insertion(+), 3 deletions(-)

 protect_ntfs: turn on ntfs protection by default

commit 9102f958ee5254b10c0be72672aa3305bf4f4704 upstream.

Back in the DOS days, in the FAT file system, file names always
consisted of a base name of length 8 plus a file extension of length 3.
Shorter file names were simply padded with spaces to the full 8.3
format.

Later, the FAT file system was taught to support _also_ longer names,
with an 8.3 "short name" as primary file name. While at it, the same
facility allowed formerly illegal file names, such as `.git` (empty base
names were not allowed), which would have the "short name" `git~1`
associated with it.

For backwards-compatibility, NTFS supports alternative 8.3 short
filenames, too, even if starting with Windows Vista, they are only
generated on the system drive by default.

We addressed the problem that the `.git/` directory can _also_ be
accessed via `git~1/` (when short names are enabled) in 2b4c6efc821
(read-cache: optionally disallow NTFS .git variants, 2014-12-16), i.e.
since Git v1.9.5, by introducing the config setting `core.protectNTFS`
and enabling it by default on Windows.

In the meantime, Windows 10 introduced the "Windows Subsystem for Linux"
(short: WSL), i.e. a way to run Linux applications/distributions in a
thinly-isolated subsystem on Windows (giving rise to many a "2016 is the
Year of Linux on the Desktop" jokes). WSL is getting increasingly
popular, also due to the painless way Linux application can operate
directly ("natively") on files on Windows' file system: the Windows
drives are mounted automatically (e.g. `C:` as `/mnt/c/`).

Taken together, this means that we now have to enable the safe-guards of
Git v1.9.5 also in WSL: it is possible to access a `.git` directory
inside `/mnt/c/` via the 8.3 name `git~1` (unless short name generation
was disabled manually). Since regular Linux distributions run in WSL,
this means we have to enable `core.protectNTFS` at least on Linux, too.

To enable Services for Macintosh in Windows NT to store so-called
resource forks, NTFS introduced "Alternate Data Streams". Essentially,
these constitute additional metadata that are connected to (and copied
with) their associated files, and they are accessed via pseudo file
names of the form `filename:<stream-name>:<stream-type>`.

In a recent patch, we extended `core.protectNTFS` to also protect
against accesses via NTFS Alternate Data Streams, e.g. to prevent
contents of the `.git/` directory to be "tracked" via yet another
alternative file name.

While it is not possible (at least by default) to access files via NTFS
Alternate Data Streams from within WSL, the defaults on macOS when
mounting network shares via SMB _do_ allow accessing files and
directories in that way. Therefore, we need to enable `core.protectNTFS`
on macOS by default, too, and really, on any Operating System that can
mount network shares via SMB/CIFS.

A couple of approaches were considered for fixing this:

1. We could perform a dynamic NTFS check similar to the `core.symlinks`
   check in `init`/`clone`: instead of trying to create a symbolic link
   in the `.git/` directory, we could create a test file and try to
   access `.git/config` via 8.3 name and/or Alternate Data Stream.

2. We could simply "flip the switch" on `core.protectNTFS`, to make it
   "on by default".

The obvious downside of 1. is that it won't protect worktrees that were
clone with a vulnerable Git version already. We considered patching code
paths that check out files to check whether we're running on an NTFS
system dynamically and persist the result in the repository-local config
setting `core.protectNTFS`, but in the end decided that this solution
would be too fragile, and too involved.

The obvious downside of 2. is that everybody will have to "suffer" the
performance penalty incurred from calling `is_ntfs_dotgit()` on every
path, even in setups where.

After the recent work to accelerate `is_ntfs_dotgit()` in most cases,
it looks as if the time spent on validating ten million random
file names increases only negligibly (less than 20ms, well within the
standard deviation of ~50ms). Therefore the benefits outweigh the cost.

Another downside of this is that paths that might have been acceptable
previously now will be forbidden. Realistically, though, this is an
improvement because public Git hosters already would reject any `git
push` that contains such file names.

Note: There might be a similar problem mounting HFS+ on Linux. However,
this scenario has been considered unlikely and in light of the cost (in
the aforementioned benchmark, `core.protectHFS = true` increased the
time from ~440ms to ~610ms), it was decided _not_ to touch the default
of `core.protectHFS`.

This change addresses CVE-2019-1353.

Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Helped-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

Disallow dubiously nested submodule git directories.diff | (download)

builtin/submodule--helper.c | 24 24 + 0 - 0 !
git-submodule.sh | 5 5 + 0 - 0 !
submodule.c | 41 41 + 0 - 0 !
submodule.h | 5 5 + 0 - 0 !
t/t7415-submodule-names.sh | 23 23 + 0 - 0 !
5 files changed, 98 insertions(+)

 disallow dubiously-nested submodule git directories

commit a8dee3ca610f5a1d403634492136c887f83b59d2 upstream.

Currently it is technically possible to let a submodule's git
directory point right into the git dir of a sibling submodule.

Example: the git directories of two submodules with the names `hippo`
and `hippo/hooks` would be `.git/modules/hippo/` and
`.git/modules/hippo/hooks/`, respectively, but the latter is already
intended to house the former's hooks.

In most cases, this is just confusing, but there is also a (quite
contrived) attack vector where Git can be fooled into mistaking remote
content for file contents it wrote itself during a recursive clone.

Let's plug this bug.

To do so, we introduce the new function `validate_submodule_git_dir()`
which simply verifies that no git dir exists for any leading directories
of the submodule name (if there are any).

Note: this patch specifically continues to allow sibling modules names
of the form `core/lib`, `core/doc`, etc, as long as `core` is not a
submodule name.

This fixes CVE-2019-1387.

[jn: backported to 2.11.y:
 - port to git-submodule.sh
 - use explicit chdir to emulate test_commit -C in test]

Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>

unpack trees let merged_entry pass through do_add_ent.diff | (download)

unpack-trees.c | 3 2 + 1 - 0 !
1 file changed, 2 insertions(+), 1 deletion(-)

 unpack-trees: let merged_entry() pass through do_add_entry()'s errors

commit cc756edda63769cf6d7acc99e6ad3a9cbb5dc3ec upstream.

A `git clone` will end with exit code 0 when `merged_entry()` returns a
positive value during a call of `unpack_trees()` to `traverse_trees()`.
The reason is that `unpack_trees()` will interpret a positive value not
to be an error.

The problem is, however, that `add_index_entry()` (which is called by
`merged_entry()` can report an error, and we really should fail the
entire clone in such a case.

Let's fix this problem, in preparation for a Windows-specific patch
disallowing `mkdir()` with directory names that contain a trailing space
(which is illegal on NTFS): we want `git clone` to abort when a path
cannot be checked out due to that condition.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>