## Package: git / 1:2.11.0-3+deb9u7

Package Version Patches format
git 1:2.11.0-3+deb9u7 3.0 (quilt)

### Patch series

view the series file
Patch File delta Description
0001 pre rebase hook capture documentation in a here docum.diff | (download)

templates/hooks--pre-rebase.sample | 6 3 + 3 - 0 !
1 file changed, 3 insertions(+), 3 deletions(-)

 pre-rebase hook: capture documentation in a <<here document

Without this change, the sample hook does not pass a syntax check
(sh -n):

$sh -n hooks--pre-rebase.sample hooks--pre-rebase.sample: line 101: syntax error near unexpected token (' hooks--pre-rebase.sample: line 101:  merged into it again (either directly or indirectly).' Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Improved-by: Junio C Hamano <gitster@pobox.com>  Normalize generated asciidoc timestamps with SOURCE_D.diff | (download) Documentation/Makefile | 7 5 + 2 - 0 ! Documentation/technical/api-index.sh | 4 4 + 0 - 0 ! 2 files changed, 9 insertions(+), 2 deletions(-)  normalize generated asciidoc timestamps with source_date_epoch This is needed to pass the Debian build reproducibility test (https://wiki.debian.org/ReproducibleBuilds/TimestampsProposal). Signed-off-by: Anders Kaseorg <andersk@mit.edu>  git gui Sort entries in optimized tclIndex.diff | (download) git-gui/Makefile | 2 1 + 1 - 0 ! 1 file changed, 1 insertion(+), 1 deletion(-)  git-gui: sort entries in optimized tclindex MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit auto_mkindex expands wildcards in directory order, which depends on the underlying filesystem. To improve build reproducibility, sort the list of *.tcl files in the Makefile. The unoptimized loading case was previously fixed in v2.11.0-rc0~31^2^2~14 “git-gui: sort entries in tclIndex”. Signed-off-by: Anders Kaseorg <andersk@mit.edu>  xdiff Do not enable XDL_FAST_HASH by default.diff | (download) Makefile | 1 0 + 1 - 0 ! config.mak.uname | 5 0 + 5 - 0 ! 2 files changed, 6 deletions(-)  xdiff: do not enable xdl_fast_hash by default MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Although XDL_FAST_HASH computes hashes slightly faster on some architectures, its collision characteristics are much worse, resulting  shell disallow repo names beginning with dash.patch | (download) shell.c | 2 1 + 1 - 0 ! 1 file changed, 1 insertion(+), 1 deletion(-)  [patch] shell: disallow repo names beginning with dash When a remote server uses git-shell, the client side will connect to it like: ssh server "git-upload-pack 'foo.git'" and we literally exec ("git-upload-pack", "foo.git"). In early versions of upload-pack and receive-pack, we took a repository argument and nothing else. But over time they learned to accept dashed options. If the user passes a repository name that starts with a dash, the results are confusing at best (we complain of a bogus option instead of a non-existent repository) and malicious at worst (the user can start an interactive pager via "--help"). We could pass "--" to the sub-process to make sure the user's argument is interpreted as a branch name. I.e.: git-upload-pack -- -foo.git But adding "--" automatically would make us inconsistent with a normal shell (i.e., when git-shell is not in use), where "-foo.git" would still be an error. For that case, the client would have to specify the "--", but they can't do so reliably, as existing versions of git-shell do not allow more than a single argument. The simplest thing is to simply disallow "-" at the start of the repo name argument. This hasn't worked either with or without git-shell since version 1.0.0, and nobody has complained. Note that this patch just applies to do_generic_cmd(), which runs upload-pack, receive-pack, and upload-archive. There are two other types of commands that git-shell runs: - do_cvs_cmd(), but this already restricts the argument to be the literal string "server" - admin-provided commands in the git-shell-commands directory. We'll pass along arbitrary arguments there, so these commands could have similar problems. But these commands might actually understand dashed arguments, so we cannot just block them here. It's up to the writer of the commands to make sure they are safe. With great power comes great responsibility. Reported-by: Timo Schmid <tschmid@ernw.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>  connect reject ssh hostname that begins with a dash.diff | (download) connect.c | 3 3 + 0 - 0 ! 1 file changed, 3 insertions(+)  connect: reject ssh hostname that begins with a dash When commands like "git fetch" talk with ssh://$rest_of_URL/, the
code splits $rest_of_URL into components like host, port, etc., and then spawns the underlying "ssh" program by formulating argv[] array that has: - the path to ssh command taken from GIT_SSH_COMMAND, etc. - dashed options like '-batch' (for Tortoise), '-p <port>' as needed. - ssh_host, which is supposed to be the hostname parsed out of$rest_of_URL.

- then the command to be run on the other side, e.g. git

If the ssh_host ends up getting '-<anything>', the argv[] that is
used to spawn the command becomes something like:

{ "ssh", "-p", "22", "-<anything>", "command", "to", "run", NULL }

which obviously is bogus, but depending on the actual value of
"<anything>", will make "ssh" parse and use it as an option.

Prevent this by forbidding ssh_host that begins with a "-".

Noticed-by: Joern Schneeweisz of Recurity Labs
Reported-by: Brian at GitLab
Signed-off-by: Junio C Hamano <gitster@pobox.com>


t/t5813-proto-disable-ssh.sh | 9 9 + 0 - 0 !
1 file changed, 9 insertions(+)

 t5813: add test for hostname starting with dash

Per the explanation in the previous patch, this should be
(and is) rejected.

Signed-off-by: Jeff King <peff@peff.net>


cache.h | 8 8 + 0 - 0 !
connect.c | 2 1 + 1 - 0 !
path.c | 5 5 + 0 - 0 !
3 files changed, 14 insertions(+), 1 deletion(-)

 connect: factor out "looks like command line option" check

be confused for command-line options. Let's factor out that
notion into a helper function, as we'll use it in more
places. And while it's simple now, it's not clear if some
systems might need more complex logic to handle all cases.

Signed-off-by: Jeff King <peff@peff.net>


connect.c | 5 5 + 0 - 0 !
t/t5532-fetch-proxy.sh | 5 5 + 0 - 0 !
2 files changed, 10 insertions(+)

 connect: reject dashed arguments for proxy commands

If you have a GIT_PROXY_COMMAND configured, we will run it
with the host/port on the command-line. If a URL contains a
mischievous host like "--foo", we don't know how the proxy
command may handle it. It's likely to break, but it may also
do something dangerous and unwanted (technically it could
even do something useful, but that seems unlikely).

We should err on the side of caution and reject this before
we even run the command.

The hostname check matches the one we do in a similar
circumstance for ssh. The port check is not present for ssh,
but there it's not necessary because the syntax is "-p
<port>", and there's no ambiguity on the parsing side.

It's not clear whether you can actually get a negative port
to the proxy here or not. Doing:

git fetch git://remote:-1234/repo.git

keeps the "-1234" as part of the hostname, with the default
port of 9418. But it's a good idea to keep this check close
to the point of running the command to make it clear that
there's no way to circumvent it (and at worst it serves as a
belt-and-suspenders check).

Signed-off-by: Jeff King <peff@peff.net>


connect.c | 3 3 + 0 - 0 !
t/t5810-proto-disable-local.sh | 23 23 + 0 - 0 !
t/t5813-proto-disable-ssh.sh | 14 14 + 0 - 0 !
3 files changed, 40 insertions(+)

 connect: reject paths that look like command line options

If we get a repo path like "-repo.git", we may try to invoke
"git-upload-pack -repo.git". This is going to fail, since
upload-pack will interpret it as a set of bogus options. But
let's reject this before we even run the sub-program, since
we would not want to allow any mischief with repo names that
actually are real command-line options.

You can still ask for such a path via git-daemon, but there's no
security problem there, because git-daemon enters the repo itself
and then passes "."  on the command line.

Signed-off-by: Jeff King <peff@peff.net>


git-cvsserver.perl | 47 22 + 25 - 0 !
1 file changed, 22 insertions(+), 25 deletions(-)

 cvsserver: move safe_pipe_capture() to the main package

As a preparation for replacing command with a call to this
function from outside GITCVS::updater package, move it to the main
package.

Signed-off-by: Junio C Hamano <gitster@pobox.com>



git-cvsserver.perl | 22 11 + 11 - 0 !
1 file changed, 11 insertions(+), 11 deletions(-)

 cvsserver: use safe_pipe_capture instead of backticks

This makes the script pass arguments that are derived from end-user
input in safer way when invoking subcommands.

Reported-by: joernchen <joernchen@phenoelit.de>
Signed-off-by: joernchen <joernchen@phenoelit.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>



git-cvsserver.perl | 8 4 + 4 - 0 !
1 file changed, 4 insertions(+), 4 deletions(-)

 cvsserver: use safe_pipe_capture for constant commands as well

This is not strictly necessary, but it is a good code hygiene.

Signed-off-by: Junio C Hamano <gitster@pobox.com>



Documentation/git-shell.txt | 16 16 + 0 - 0 !
shell.c | 14 0 + 14 - 0 !
t/t9400-git-cvsserver-server.sh | 48 48 + 0 - 0 !
3 files changed, 64 insertions(+), 14 deletions(-)

 shell: drop git-cvsserver support by default

The git-cvsserver script is old and largely unmaintained
these days. But git-shell allows untrusted users to run it
out of the box, significantly increasing its attack surface.

Let's drop it from git-shell's list of internal handlers so
that it cannot be run by default.  This is not backwards
compatible. But given the age and development activity on
CVS-related parts of Git, this is likely to impact very few
users, while helping many more (i.e., anybody who runs
git-shell and had no intention of supporting CVS).

There's no configuration mechanism in git-shell for us to
add a boolean and flip it to "off". But there is a mechanism
fairly trivial. Let's document it to give guidance to
anybody who really is still running cvsserver.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>



git-archimport.perl | 4 2 + 2 - 0 !
1 file changed, 2 insertions(+), 2 deletions(-)

 archimport: use safe_pipe_capture for user input

Refnames can contain shell metacharacters which need to be
passed verbatim to sub-processes. Using safe_pipe_capture
skips the shell entirely.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>



git-cvsimport.perl | 1 1 + 0 - 0 !
1 file changed, 1 insertion(+)

 cvsimport: shell-quote variable used in backticks

We run git rev-parse though the shell, and quote its
argument only with single-quotes. This prevents most
metacharacters from being a problem, but misses the obvious
case when $name itself has single-quotes in it. We can fix this by applying the usual shell-quoting formula. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>  submodule config verify submodule names as paths.diff | (download) builtin/submodule--helper.c | 26 25 + 1 - 0 ! git-submodule.sh | 5 5 + 0 - 0 ! submodule-config.c | 31 31 + 0 - 0 ! submodule-config.h | 7 7 + 0 - 0 ! t/t7415-submodule-names.sh | 77 77 + 0 - 0 ! 5 files changed, 145 insertions(+), 1 deletion(-)  submodule-config: verify submodule names as paths commit 0383bbb9015898cbc79abd7b64316484d7713b44 upstream. Submodule "names" come from the untrusted .gitmodules file, but we blindly append them to$GIT_DIR/modules to create our
on-disk repo paths. This means you can do bad things by
putting "../" into the name (among other things).

Let's sanity-check these names to avoid building a path that
can be exploited. There are two main decisions:

1. What should the allowed syntax be?

It's tempting to reuse verify_path(), since submodule
names typically come from in-repo paths. But there are
two reasons not to:

a. It's technically more strict than what we need, as
we really care only about breaking out of the
$GIT_DIR/modules/ hierarchy. E.g., having a submodule named "foo/.git" isn't actually dangerous, and it's possible that somebody has manually given such a funny name. b. Since we'll eventually use this checking logic in fsck to prevent downstream repositories, it should be consistent across platforms. Because verify_path() relies on is_dir_sep(), it wouldn't block "foo\..\bar" on a non-Windows machine. 2. Where should we enforce it? These days most of the .gitmodules reads go through submodule-config.c, so I've put it there in the reading step. That should cover all of the C code. We also construct the name for "git submodule add" inside the git-submodule.sh script. This is probably not a big deal for security since the name is coming from the user anyway, but it would be polite to remind them if the name they pick is invalid (and we need to expose the name-checker to the shell anyway for our test scripts). This patch issues a warning when reading .gitmodules and just ignores the related config entry completely. This will generally end up producing a sensible error, as it works the same as a .gitmodules file which is missing a submodule entry (so "submodule update" will barf, but "git clone --recurse-submodules" will print an error but not abort the clone. There is one minor oddity, which is that we print the warning once per malformed config key (since that's how the config subsystem gives us the entries). So in the new test, for example, the user would see three warnings. That's OK, since the intent is that this case should never come up outside of malicious repositories (and then it might even benefit the user to see the message multiple times). Credit for finding this vulnerability and the proof of concept from which the test script was adapted goes to Etienne Stalmans. [jn: the original patch expects 'git clone' to succeed in the test because v2.13.0-rc0~10^2~3 (clone: teach --recurse-submodules to optionally take a pathspec, 2017-03-17) makes 'git clone' skip invalid submodules. Updated the test to pass in older Git versions where the submodule name check makes 'git clone' fail.] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  is_ntfs_dotgit use a size_t for traversing string.diff | (download) path.c | 2 1 + 1 - 0 ! 1 file changed, 1 insertion(+), 1 deletion(-)  is_ntfs_dotgit: use a size_t for traversing string commit 11a9f4d807a0d71dc6eff51bb87baf4ca2cccf1d upstream. We walk through the "name" string using an int, which can wrap to a negative value and cause us to read random memory before our array (e.g., by creating a tree with a name >2GB, since "int" is still 32 bits even on most 64-bit platforms). Worse, this is easy to trigger during the fsck_tree() check, which is supposed to be protecting us from malicious garbage. Note one bit of trickiness in the existing code: we sometimes assign -1 to "len" at the end of the loop, and then rely on the "len++" in the for-loop's increment to take it back to 0. This is still legal with a size_t, since assigning -1 will turn into SIZE_MAX, which then wraps around to 0 on increment. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  is_hfs_dotgit match other .git files.diff | (download) utf8.c | 58 46 + 12 - 0 ! utf8.h | 5 5 + 0 - 0 ! 2 files changed, 51 insertions(+), 12 deletions(-)  is_hfs_dotgit: match other .git files commit 0fc333ba20b43a8afee5023e92cb3384ff4e59a6 upstream. Both verify_path() and fsck match ".git", ".GIT", and other variants specific to HFS+. Let's allow matching other special files like ".gitmodules", which we'll later use to enforce extra restrictions via verify_path() and fsck. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  is_ntfs_dotgit match other .git files.diff | (download) cache.h | 10 9 + 1 - 0 ! path.c | 84 84 + 0 - 0 ! 2 files changed, 93 insertions(+), 1 deletion(-)  is_ntfs_dotgit: match other .git files commit e7cb0b4455c85b53aeba40f88ffddcf6d4002498 upstream. When we started to catch NTFS short names that clash with .git, we only looked for GIT~1. This is sufficient because we only ever clone into an empty directory, so .git is guaranteed to be the first subdirectory or file in that directory. However, even with a fresh clone, .gitmodules is *not* necessarily the first file to be written that would want the NTFS short name GITMOD~1: a malicious repository can add .gitmodul0000 and friends, which sorts before .gitmodules and is therefore checked out *first*. For that reason, we have to test not only for ~1 short names, but for others, too. It's hard to just adapt the existing checks in is_ntfs_dotgit(): since Windows 2000 (i.e., in all Windows versions still supported by Git), NTFS short names are only generated in the <prefix>~<number> form up to  is_ hfs ntfs _dotgitmodules add tests.diff | (download) t/helper/test-path-utils.c | 20 20 + 0 - 0 ! t/t0060-path-utils.sh | 86 86 + 0 - 0 ! 2 files changed, 106 insertions(+)  is_{hfs,ntfs}_dotgitmodules: add tests commit dc2d9ba3187fcd0ca8eeab9aa9ddef70cf8627a6 upstream. This tests primarily for NTFS issues, but also adds one example of an HFS+ issue. Thanks go to Congyi Wu for coming up with the list of examples where NTFS would possibly equate the filename with .gitmodules. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  skip_prefix add case insensitive variant.diff | (download) git-compat-util.h | 17 17 + 0 - 0 ! 1 file changed, 17 insertions(+)  skip_prefix: add case-insensitive variant commit 41a80924aec0e94309786837b6f954a3b3f19b71 upstream. We have the convenient skip_prefix() helper, but if you want to do case-insensitive matching, you're stuck doing it by hand. We could add an extra parameter to the function to let callers ask for this, but the function is small and somewhat performance-critical. Let's just re-implement it for the case-insensitive version. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  verify_path drop clever fallthrough.diff | (download) read-cache.c | 8 4 + 4 - 0 ! 1 file changed, 4 insertions(+), 4 deletions(-)  verify_path: drop clever fallthrough commit e19e5e66d691bdeeeb5e0ed2ffcecdd7666b0d7b upstream. We check ".git" and ".." in the same switch statement, and fall through the cases to share the end-of-component check. While this saves us a line or two, it makes modifying the function much harder. Let's just write it out. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  verify_dotfile mention case insensitivity in comment.diff | (download) read-cache.c | 5 4 + 1 - 0 ! 1 file changed, 4 insertions(+), 1 deletion(-)  verify_dotfile: mention case-insensitivity in comment commit 641084b618ddbe099f0992161988c3e479ae848b upstream. We're more restrictive than we need to be in matching ".GIT" on case-sensitive filesystems; let's make a note that this is intentional. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  update index stat updated files earlier.diff | (download) builtin/update-index.c | 25 17 + 8 - 0 ! 1 file changed, 17 insertions(+), 8 deletions(-)  update-index: stat updated files earlier commit eb12dd0c764d2b71bebd5ffffb7379a3835253ae upstream. In the update_one(), we check verify_path() on the proposed path before doing anything else. In preparation for having verify_path() look at the file mode, let's stat the file earlier, so we can check the mode accurately. This is made a bit trickier by the fact that this function only does an lstat in a few code paths (the ones that flow down through process_path()). So we can speculatively do the lstat() here and pass the results down, and just use a dummy mode for cases where we won't actually be updating the index from the filesystem. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  verify_path disallow symlinks in .gitmodules.diff | (download) apply.c | 4 2 + 2 - 0 ! builtin/update-index.c | 6 3 + 3 - 0 ! cache.h | 2 1 + 1 - 0 ! read-cache.c | 40 31 + 9 - 0 ! 4 files changed, 37 insertions(+), 15 deletions(-)  verify_path: disallow symlinks in .gitmodules commit 10ecfa76491e4923988337b2e2243b05376b40de upstream. There are a few reasons it's not a good idea to make .gitmodules a symlink, including: 1. It won't be portable to systems without symlinks. 2. It may behave inconsistently, since Git may look at this file in the index or a tree without bothering to resolve any symbolic links. We don't do this _yet_, but the config infrastructure is there and it's planned for the future. With some clever code, we could make (2) work. And some people may not care about (1) if they only work on one platform. But there are a few security reasons to simply disallow it: a. A symlinked .gitmodules file may circumvent any fsck checks of the content. b. Git may read and write from the on-disk file without sanity checking the symlink target. So for example, if you link ".gitmodules" to "../oops" and run "git submodule add", we'll write to the file "oops" outside the repository. Again, both of those are problems that _could_ be solved with sufficient code, but given the complications in (1) and (2), we're better off just outlawing it explicitly. Note the slightly tricky call to verify_path() in update-index's update_one(). There we may not have a mode if we're not updating from the filesystem (e.g., we might just be removing the file). Passing "0" as the mode there works fine; since it's not a symlink, we'll just skip the extra checks. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  sha1_file add read_loose_object function.diff | (download) cache.h | 13 13 + 0 - 0 ! sha1_file.c | 132 130 + 2 - 0 ! 2 files changed, 143 insertions(+), 2 deletions(-)  sha1_file: add read_loose_object() function commit f6371f9210418f1beabc85b097e2a3470aeeb54d upstream. It's surprisingly hard to ask the sha1_file code to open a _specific_ incarnation of a loose object. Most of the functions take a sha1, and loop over the various object types (packed versus loose) and locations (local versus alternates) at a low level. However, some tools like fsck need to look at a specific file. This patch gives them a function they can use to open the loose object at a given path. The implementation unfortunately ends up repeating bits of related functions, but there's not a good way around it without some major refactoring of the whole sha1_file stack. We need to mmap the specific file, then partially read the zlib stream to know whether we're streaming or not, and then finally either stream it or copy the data to a buffer. We can do that by assembling some of the more arcane internal sha1_file functions, but we end up having to essentially reimplement unpack_sha1_file(), along with the streaming bits of check_sha1_signature(). Still, most of the ugliness is contained in the new function, and the interface is clean enough that it may be reusable (though it seems unlikely anything but git-fsck would care about opening a specific file). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fsck parse loose object paths directly.diff | (download) builtin/fsck.c | 46 33 + 13 - 0 ! t/t1450-fsck.sh | 16 16 + 0 - 0 ! 2 files changed, 49 insertions(+), 13 deletions(-)  fsck: parse loose object paths directly commit c68b489e56431cf27f7719913ab09ddc62f95912 upstream. When we iterate over the list of loose objects to check, we get the actual path of each object. But we then throw it away and pass just the sha1 to fsck_sha1(), which will do a fresh lookup. Usually it would find the same object, but it may not if an object exists both as a loose and a packed object. We may end up checking the packed object twice, and never look at the loose one. In practice this isn't too terrible, because if fsck doesn't complain, it means you have at least one good copy. But since the point of fsck is to look for corruption, we should be thorough. The new read_loose_object() interface can help us get the data from disk, and then we replace parse_object() with parse_object_buffer(). As a bonus, our error messages now mention the path to a corrupted object, which should make it easier to track down errors when they do happen. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  index pack make fsck error message more specific.diff | (download) builtin/index-pack.c | 2 1 + 1 - 0 ! builtin/unpack-objects.c | 2 1 + 1 - 0 ! 2 files changed, 2 insertions(+), 2 deletions(-)  index-pack: make fsck error message more specific commit db5a58c1bda5b20169b9958af1e8b05ddd178b01 upstream. If fsck reports an error, we say only "Error in object". This isn't quite as bad as it might seem, since the fsck code would have dumped some errors to stderr already. But it might help to give a little more context. The earlier output would not have even mentioned "fsck", and that may be a clue that the "fsck.*" or "*.fsckObjects" config may be relevant. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fsck simplify .git check.diff | (download) fsck.c | 4 1 + 3 - 0 ! 1 file changed, 1 insertion(+), 3 deletions(-)  fsck: simplify ".git" check commit ed9c3220621d634d543bc4dd998d12167dfc57d4 upstream. There's no need for us to manually check for ".git"; it's a subset of the other filesystem-specific tests. Dropping it makes our code slightly shorter. More importantly, the existing code may make a reader wonder why ".GIT" is not covered here, and whether that is a bug (it isn't, as it's also covered in the filesystem-specific tests). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fsck actually fsck blob data.diff | (download) builtin/fsck.c | 51 25 + 26 - 0 ! fsck.c | 8 7 + 1 - 0 ! sha1_file.c | 2 1 + 1 - 0 ! 3 files changed, 33 insertions(+), 28 deletions(-)  fsck: actually fsck blob data commit 7ac4f3a007e2567f9d2492806186aa063f9a08d6 upstream. Because fscking a blob has always been a noop, we didn't bother passing around the blob data. In preparation for content-level checks, let's fix up a few things: 1. The fsck_object() function just returns success for any blob. Let's a noop fsck_blob(), which we can fill in with actual logic later. 2. The fsck_loose() function in builtin/fsck.c just threw away blob content after loading it. Let's hold onto it until after we've called fsck_object(). The easiest way to do this is to just drop the parse_loose_object() helper entirely. Incidentally, this also fixes a memory leak: if we successfully loaded the object data but did not parse it, we would have left the function without freeing it. 3. When fsck_loose() loads the object data, it does so with a custom read_loose_object() helper. This function streams any blobs, regardless of size, under the assumption that we're only checking the sha1. Instead, let's actually load blobs smaller than big_file_threshold, as the normal object-reading code-paths would do. This lets us fsck small files, and a NULL return is an indication that the blob was so big that it needed to be streamed, and we can pass that information along to fsck_blob(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fsck detect gitmodules files.diff | (download) fsck.c | 90 90 + 0 - 0 ! fsck.h | 7 7 + 0 - 0 ! 2 files changed, 97 insertions(+)  fsck: detect gitmodules files commit 159e7b080bfa5d34559467cacaa79df89a01afc0 upstream. In preparation for performing fsck checks on .gitmodules files, this commit plumbs in the actual detection of the files. Note that unlike most other fsck checks, this cannot be a property of a single object: we must know that the object is found at a ".gitmodules" path at the root tree of a commit. Since the fsck code only sees one object at a time, we have to mark the related objects to fit the puzzle together. When we see a commit we mark its tree as a root tree, and when we see a root tree with a .gitmodules file, we mark the corresponding blob to be checked. In an ideal world, we'd check the objects in topological order: commits followed by trees followed by blobs. In that case we can avoid ever loading an object twice, since all markings would be complete by the time we get to the marked objects. And indeed, if we are checking a single packfile, this is the order in which Git will generally write the objects. But we can't count on that: 1. git-fsck may show us the objects in arbitrary order (loose objects are fed in sha1 order, but we may also have multiple packs, and we process each pack fully in sequence). 2. The type ordering is just what git-pack-objects happens to write now. The pack format does not require a specific order, and it's possible that future versions of Git (or a custom version trying to fool official  fsck check .gitmodules content.diff | (download) fsck.c | 59 58 + 1 - 0 ! 1 file changed, 58 insertions(+), 1 deletion(-)  fsck: check .gitmodules content commit ed8b10f631c9a71df3351d46187bf7f3fa4f9b7e upstream. This patch detects and blocks submodule names which do not match the policy set forth in submodule-config. These should already be caught by the submodule code itself, but putting the check here means that newer versions of Git can protect older ones from malicious entries (e.g., a server with receive.fsckObjects will block the objects, protecting clients which fetch from it). As a side effect, this means fsck will also complain about .gitmodules files that cannot be parsed (or were larger than core.bigFileThreshold). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fsck call fsck_finish after fscking objects.diff | (download) builtin/fsck.c | 3 3 + 0 - 0 ! t/t7415-submodule-names.sh | 4 4 + 0 - 0 ! 2 files changed, 7 insertions(+)  fsck: call fsck_finish() after fscking objects commit 1995b5e03e1cc97116be58cdc0502d4a23547856 upstream. Now that the internal fsck code is capable of checking .gitmodules files, we just need to teach its callers to use the "finish" function to check any queued objects. With this, we can now catch the malicious case in t7415 with git-fsck. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  unpack objects call fsck_finish after fscking objects.diff | (download) builtin/unpack-objects.c | 5 4 + 1 - 0 ! t/t7415-submodule-names.sh | 7 7 + 0 - 0 ! 2 files changed, 11 insertions(+), 1 deletion(-)  unpack-objects: call fsck_finish() after fscking objects commit 6e328d6caef218db320978e3e251009135d87d0e upstream. As with the previous commit, we must call fsck's "finish" function in order to catch any queued objects for .gitmodules checks. This second pass will be able to access any incoming objects, because we will have exploded them to loose objects by now. This isn't quite ideal, because it means that bad objects may have been written to the object database (and a subsequent operation could then reference them, even if the other side doesn't send the objects again). However, this is sufficient when used with receive.fsckObjects, since those loose objects will all be placed in a temporary quarantine area that will get wiped if we find any problems. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  index pack check .gitmodules files with strict.diff | (download) builtin/index-pack.c | 10 10 + 0 - 0 ! t/lib-pack.sh | 12 12 + 0 - 0 ! t/t7415-submodule-names.sh | 38 38 + 0 - 0 ! 3 files changed, 60 insertions(+)  index-pack: check .gitmodules files with --strict commit 73c3f0f704a91b6792e0199a3f3ab6e3a1971675 upstream. Now that the internal fsck code has all of the plumbing we need, we can start checking incoming .gitmodules files. Naively, it seems like we would just need to add a call to fsck_finish() after we've processed all of the objects. And that would be enough to cover the initial test included here. But there are two extra bits: 1. We currently don't bother calling fsck_object() at all for blobs, since it has traditionally been a noop. We'd actually catch these blobs in fsck_finish() at the end, but it's more efficient to check them when we already have the object loaded in memory. 2. The second pass done by fsck_finish() needs to access the objects, but we're actually indexing the pack in this process. In theory we could give the fsck code a special callback for accessing the in-pack data, but it's actually quite tricky: a. We don't have an internal efficient index mapping oids to packfile offsets. We only generate it on the fly as part of writing out the .idx file. b. We'd still have to reconstruct deltas, which means we'd basically have to replicate all of the reading logic in packfile.c. Instead, let's avoid running fsck_finish() until after we've written out the .idx file, and then just add it to our internal packed_git list. This does mean that the objects are "in the repository" before we finish our fsck checks. But unpack-objects already exhibits this same behavior, and it's an acceptable tradeoff here for the same reason: the quarantine mechanism means that pushes will be fully protected. In addition to a basic push test in t7415, we add a sneaky pack that reverses the usual object order in the pack, requiring that index-pack access the tree and blob during the "finish" step. This already works for unpack-objects (since it will have written out loose objects), but we'll check it with this sneaky pack for good measure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fsck complain when .gitmodules is a symlink.diff | (download) fsck.c | 11 9 + 2 - 0 ! t/t7415-submodule-names.sh | 29 29 + 0 - 0 ! 2 files changed, 38 insertions(+), 2 deletions(-)  fsck: complain when .gitmodules is a symlink commit b7b1fca175f1ed7933f361028c631b9ac86d868d upstream. We've recently forbidden .gitmodules to be a symlink in verify_path(). And it's an easy way to circumvent our fsck checks for .gitmodules content. So let's complain when we see it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  submodule_init die cleanly on submodules without url .diff | (download) builtin/submodule--helper.c | 6 3 + 3 - 0 ! t/t7400-submodule-basic.sh | 8 8 + 0 - 0 ! 2 files changed, 11 insertions(+), 3 deletions(-)  submodule_init: die cleanly on submodules without url defined commit 627fde102515a7807dba89acaa88cb053b38a44a upstream. When we init a submodule, we try to die when it has no URL defined: url = xstrdup(sub->url); if (!url) die(...); But that's clearly nonsense. xstrdup() will never return NULL, and if sub->url is NULL, we'll segfault. These two bits of code need to be flipped, so we check sub->url before looking at it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  submodule helper use to signal end of clone options.diff | (download) builtin/submodule--helper.c | 1 1 + 0 - 0 ! 1 file changed, 1 insertion(+)  submodule--helper: use "--" to signal end of clone options commit 98afac7a7cefdca0d2c4917dd8066a59f7088265 upstream. When we clone a submodule, we call "git clone$url $path". But there's nothing to say that those components can't begin with a dash themselves, confusing git-clone into thinking they're options. Let's pass "--" to make it clear what we expect. There's no test here, because it's actually quite hard to make these names work, even with "git clone" parsing them correctly. And we're going to restrict these cases even further in future commits. So we'll leave off testing until then; this is just the minimal fix to prevent us from doing something stupid with a badly formed entry. Reported-by: joernchen <joernchen@phenoelit.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  submodule config ban submodule urls that start with d.diff | (download) submodule-config.c | 8 8 + 0 - 0 ! t/t7416-submodule-dash-url.sh | 34 34 + 0 - 0 ! 2 files changed, 42 insertions(+)  submodule-config: ban submodule urls that start with dash commit f6adec4e329ef0e25e14c63b735a5956dc67b8bc upstream. The previous commit taught the submodule code to invoke our "git clone$url $path" with a "--" separator so that we aren't confused by urls or paths that start with dashes. However, that's just one code path. It's not clear if there are others, and it would be an easy mistake to add one in the future. Moreover, even with the fix in the previous commit, it's quite hard to actually do anything useful with such an entry. Any url starting with a dash must fall into one of three categories: - it's meant as a file url, like "-path". But then any clone is not going to have the matching path, since it's by definition relative inside the newly created clone. If you spell it as "./-path", the submodule code sees the "/" and translates this to an absolute path, so it at least works (assuming the receiver has the same filesystem layout as you). But that trick does not apply for a bare "-path". - it's meant as an ssh url, like "-host:path". But this already doesn't work, as we explicitly disallow ssh hostnames that begin with a dash (to avoid option injection against ssh). - it's a remote-helper scheme, like "-scheme::data". This _could_ work if the receiver bends over backwards and creates a funny-named helper like "git-remote--scheme". But normally there would not be any helper that matches. Since such a url does not work today and is not likely to do anything useful in the future, let's simply disallow them entirely. That protects the existing "git clone" path (in a belt-and-suspenders way), along with any others that might exist. Our tests cover two cases: 1. A file url with "./" continues to work, showing that there's an escape hatch for people with truly silly repo names. 2. A url starting with "-" is rejected. Note that we expect case (2) to fail, but it would have done so even without this commit, for the reasons given above. So instead of just expecting failure, let's also check for the magic word "ignoring" on stderr. That lets us know that we failed for the right reason. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  submodule config ban submodule paths that start with .diff | (download) submodule-config.c | 2 2 + 0 - 0 ! t/t7417-submodule-path-url.sh | 20 20 + 0 - 0 ! 2 files changed, 22 insertions(+)  submodule-config: ban submodule paths that start with a dash commit 273c61496f88c6495b886acb1041fe57965151da upstream. We recently banned submodule urls that look like command-line options. This is the matching change to ban leading-dash paths. As with the urls, this should not break any use cases that currently work. Even with our "--" separator passed to git-clone, git-submodule.sh gets confused. Without the code portion of this patch, the clone of "-sub" added in t7417 would yield results like: /path/to/git-submodule: 410: cd: Illegal option -s /path/to/git-submodule: 417: cd: Illegal option -s /path/to/git-submodule: 410: cd: Illegal option -s /path/to/git-submodule: 417: cd: Illegal option -s Fetched in submodule path '-sub', but it did not contain b56243f8f4eb91b2f1f8109452e659f14dd3fbe4. Direct fetching of that commit failed. Moreover, naively adding such a submodule doesn't work:$ git submodule add $url -sub The following path is ignored by one of your .gitignore files: -sub even though there is no such ignore pattern (the test script hacks around this with a well-placed "git mv"). Unlike leading-dash urls, though, it's possible that such a path _could_ be useful if we eventually made it work. So this commit should be seen not as recommending a particular policy, but rather temporarily closing off a broken and possibly dangerous code-path. We may revisit this decision later.  fsck detect submodule urls starting with dash.diff | (download) fsck.c | 7 7 + 0 - 0 ! t/t7416-submodule-dash-url.sh | 15 15 + 0 - 0 ! 2 files changed, 22 insertions(+)  fsck: detect submodule urls starting with dash commit a124133e1e6ab5c7a9fef6d0e6bcb084e3455b46 upstream. Urls with leading dashes can cause mischief on older versions of Git. We should detect them so that they can be rejected by receive.fsckObjects, preventing modern versions of git from being a vector by which attacks can spread. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fsck detect submodule paths starting with dash.diff | (download) fsck.c | 7 7 + 0 - 0 ! t/t7417-submodule-path-url.sh | 8 8 + 0 - 0 ! 2 files changed, 15 insertions(+)  fsck: detect submodule paths starting with dash commit 1a7fd1fb2998002da6e9ff2ee46e1bdd25ee8404 upstream. As with urls, submodule paths with dashes are ignored by git, but may end up confusing older versions. Detecting them via fsck lets us prevent modern versions of git from being a vector to spread broken .gitmodules to older versions. Compared to blocking leading-dash urls, though, this detection may be less of a good idea: 1. While such paths provide confusing and broken results, they don't seem to actually work as option injections against anything except "cd". In particular, the submodule code seems to canonicalize to an absolute path before running "git clone" (so it passes /your/clone/-sub). 2. It's more likely that we may one day make such names actually work correctly. Even after we revert this fsck check, it will continue to be a hassle until hosting servers are all updated. On the other hand, it's not entirely clear that the behavior in older versions is safe. And if we do want to eventually allow this, we may end up doing so with a special syntax anyway (e.g., writing "./-sub" in the .gitmodules file, and teaching the submodule code to canonicalize it when comparing). So on balance, this is probably a good protection. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  cvsimport apply shell quoting regex globally.diff | (download) git-cvsimport.perl | 2 1 + 1 - 0 ! 1 file changed, 1 insertion(+), 1 deletion(-)  cvsimport: apply shell-quoting regex globally commit 8c87bdfb2137c9e9e945df13e2f2e1eb995ddf83 upstream. Commit 5b4efea666 (cvsimport: shell-quote variable used in backticks, 2017-09-11) tried to shell-quote a variable, but forgot to use the "/g" modifier to apply the quoting to the whole variable. This means we'd miss any embedded single-quotes after the first one. Reported-by: <littlelailo@yahoo.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  t9300 drop some useless uses of cat.diff | (download) t/t9300-fast-import.sh | 10 5 + 5 - 0 ! 1 file changed, 5 insertions(+), 5 deletions(-)  t9300: drop some useless uses of cat commit f94804c1f2626831c6bdf8cc269a571324e3f2f2 upstream. These waste a process, and make the line longer than it needs to be. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  t9300 create marks files for double import marks test.diff | (download) t/t9300-fast-import.sh | 2 2 + 0 - 0 ! 1 file changed, 2 insertions(+)  t9300: create marks files for double-import-marks test commit 816f806786e12435163c591942a204c5a3bdd795 upstream. Our tests confirm that providing two "import-marks" options in a fast-import stream is an error. However, the invoked command would fail even without covering this case, because the marks files themselves do not actually exist. Let's create the files to make sure we fail for the right reason (we actually do, because the option parsing happens before we open anything, but this future-proofs our test). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fast import tighten parsing of boolean command line o.diff | (download) fast-import.c | 4 2 + 2 - 0 ! 1 file changed, 2 insertions(+), 2 deletions(-)  fast-import: tighten parsing of boolean command line options commit 11e934d56e46875b24d8a047d44b45ff243f6715 upstream. We parse options like "--max-pack-size=" using skip_prefix(), which makes sense to get at the bytes after the "=". However, we also parse "--quiet" and "--stats" with skip_prefix(), which allows things like "--quiet-nonsense" to behave like "--quiet". This was a mistaken conversion in 0f6927c229 (fast-import: put option parsing code in separate functions, 2009-12-04). Let's tighten this to an exact match, which was the original intent. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fast import stop creating leading directories for imp.diff | (download) fast-import.c | 1 0 + 1 - 0 ! 1 file changed, 1 deletion(-)  fast-import: stop creating leading directories for import-marks commit e075dba3723875f478654068609f69b2a5af8566 upstream. When asked to import marks from "subdir/file.marks", we create the leading directory "subdir" if it doesn't exist. This makes no sense for importing marks, where we only ever open the path for reading. Most of the time this would be a noop, since if the marks file exists, then the leading directories exist, too. But if it doesn't (e.g., because --import-marks-if-exists was used), then we'd create the useless directory. This dates back to 580d5f83e7 (fast-import: always create marks_file directories, 2010-03-29). Even then it was useless, so it seems to have been added in error alongside the --export-marks case (which _is_ helpful). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fast import delay creating leading directories for ex.diff | (download) fast-import.c | 7 6 + 1 - 0 ! t/t9300-fast-import.sh | 13 11 + 2 - 0 ! 2 files changed, 17 insertions(+), 3 deletions(-)  fast-import: delay creating leading directories for export-marks commit 019683025f1b14d7cb671312ab01f7330e9b33e7 upstream. When we parse the --export-marks option, we don't immediately open the file, but we do create any leading directories. This can be especially confusing when a command-line option overrides an in-stream one, in which case we'd create the leading directory for the in-stream file, even though we never actually write the file. Let's instead create the directories just before opening the file, which means we'll create only useful directories. Note that this could change the handling of relative paths if we chdir() in between, but we don't actually do so; the only permanent chdir is from setup_git_directory() which runs before either code path (potentially we should take the pre-setup dir into account to avoid surprising the user, but that's an orthogonal change). The test just adapts the existing "override" test to use paths with leading directories. This checks both that the correct directory is created (which worked before but was not tested), and that the overridden one is not (our new fix here). While we're here, let's also check the error result of safe_create_leading_directories(). We'd presumably notice any failure immediately after when we try to open the file itself, but we can give a more specific error message in this case. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fast import disallow feature export marks by default.diff | (download) Documentation/git-fast-import.txt | 14 14 + 0 - 0 ! fast-import.c | 25 25 + 0 - 0 ! t/t9300-fast-import.sh | 23 15 + 8 - 0 ! transport-helper.c | 1 1 + 0 - 0 ! 4 files changed, 55 insertions(+), 8 deletions(-)  fast-import: disallow "feature export-marks" by default commit 68061e3470210703cb15594194718d35094afdc0 upstream. The fast-import stream command "feature export-marks=<path>" lets the stream write marks to an arbitrary path. This may be surprising if you are running fast-import against an untrusted input (which otherwise cannot do anything except update Git objects and refs). Let's disallow the use of this feature by default, and provide a command-line option to re-enable it (you can always just use the command-line --export-marks as well, but the in-stream version provides an easy way for exporters to control the process). This is a backwards-incompatible change, since the default is flipping to the new, safer behavior. However, since the main users of the in-stream versions would be import/export-based remote helpers, and since we trust remote helpers already (which are already running arbitrary code), we'll pass the new option by default when reading a remote helper's stream. This should minimize the impact. Note that the implementation isn't totally simple, as we have to work around the fact that fast-import doesn't parse its command-line options until after it has read any "feature" lines from the stream. This is how it lets command-line options override in-stream. But in our case, it's important to parse the new --allow-unsafe-features first. There are three options for resolving this: 1. Do a separate "early" pass over the options. This is easy for us to do because there are no command-line options that allow the "unstuck" form (so there's no chance of us mistaking an argument for an option), though it does introduce a risk of incorrect parsing later (e.g,. if we convert to parse-options). 2. Move the option parsing phase back to the start of the program, but teach the stream-reading code never to override an existing value. This is tricky, because stream "feature" lines override each other (meaning we'd have to start tracking the source for every option). 3. Accept that we might parse a "feature export-marks" line that is forbidden, as long we don't _act_ on it until after we've parsed the command line options. This would, in fact, work with the current code, but only because the previous patch fixed the export-marks parser to avoid touching the filesystem. So while it works, it does carry risk of somebody getting it wrong in the future in a rather subtle and unsafe way. I've gone with option (1) here as simple, safe, and unlikely to cause regressions. This fixes CVE-2019-1348. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  fast import disallow feature import marks by default.diff | (download) Documentation/git-fast-import.txt | 3 2 + 1 - 0 ! fast-import.c | 2 2 + 0 - 0 ! t/t9300-fast-import.sh | 22 17 + 5 - 0 ! 3 files changed, 21 insertions(+), 6 deletions(-)  fast-import: disallow "feature import-marks" by default commit a52ed76142f6e8d993bb4c50938a408966eb2b7c upstream. As with export-marks in the previous commit, import-marks can access the filesystem. This is significantly less dangerous than export-marks because it only involves reading from arbitrary paths, rather than writing them. However, it could still be surprising and have security implications (e.g., exfiltrating data from a service that accepts fast-import streams). Let's lump it (and its "if-exists" counterpart) in with export-marks, and enable the in-stream version only if --allow-unsafe-features is set. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  clone recurse submodules prevent name squatting on Wi.diff | (download) builtin/clone.c | 2 1 + 1 - 0 ! builtin/submodule--helper.c | 13 12 + 1 - 0 ! git-submodule.sh | 6 6 + 0 - 0 ! t/t7415-submodule-names.sh | 31 31 + 0 - 0 ! 4 files changed, 50 insertions(+), 2 deletions(-)  clone --recurse-submodules: prevent name squatting on windows commit 0060fd1511b94c918928fa3708f69a3f33895a4a upstream. In addition to preventing .git from being tracked by Git, on Windows we also have to prevent git~1 from being tracked, as the default NTFS short name (also known as the "8.3 filename") for the file name .git is git~1, otherwise it would be possible for malicious repositories to write directly into the .git/ directory, e.g. a post-checkout hook that would then be executed _during_ a recursive clone. When we implemented appropriate protections in 2b4c6efc821 (read-cache: optionally disallow NTFS .git variants, 2014-12-16), we had analyzed carefully that the .git directory or file would be guaranteed to be the first directory entry to be written. Otherwise it would be possible e.g. for a file named ..git to be assigned the short name git~1 and subsequently, the short name generated for .git would be git~2. Or git~3. Or even ~9999999 (for a detailed explanation of the lengths we have to go to protect .gitmodules, see the commit message of e7cb0b4455c (is_ntfs_dotgit: match other .git files, 2018-05-11)). However, by exploiting two issues (that will be addressed in a related patch series close by), it is currently possible to clone a submodule into a non-empty directory: - On Windows, file names cannot end in a space or a period (for historical reasons: the period separating the base name from the file extension was not actually written to disk, and the base name/file extension was space-padded to the full 8/3 characters, respectively). Helpfully, when creating a directory under the name, say, sub., that trailing period is trimmed automatically and the actual name on disk is sub. This means that while Git thinks that the submodule names sub and  path.c document the purpose of is_ntfs_dotgit.diff | (download) path.c | 28 28 + 0 - 0 ! 1 file changed, 28 insertions(+)  path.c: document the purpose of is_ntfs_dotgit() commit 525e7fba7854c23ee3530d0bf88d75f106f14c95 upstream. Previously, this function was completely undocumented. It is worth, though, to explain what is going on, as it is not really obvious at all. Suggested-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  is_ntfs_dotgit only verify the leading segment.diff | (download) fsck.c | 18 17 + 1 - 0 ! path.c | 5 1 + 4 - 0 ! read-cache.c | 8 8 + 0 - 0 ! 3 files changed, 26 insertions(+), 5 deletions(-)  is_ntfs_dotgit(): only verify the leading segment commit 288a74bcd28229a00c3632f18cba92dbfdf73ee9 upstream. The config setting core.protectNTFS is specifically designed to work not only on Windows, but anywhere, to allow for repositories hosted on, say, Linux servers to be protected against NTFS-specific attack vectors. As a consequence, is_ntfs_dotgit() manually splits backslash-separated paths (but does not do the same for paths separated by forward slashes), under the assumption that the backslash might not be a valid directory separator on the _current_ Operating System. However, the two callers, verify_path() and fsck_tree(), are supposed to feed only individual path segments to the is_ntfs_dotgit() function. This causes a lot of duplicate scanning (and very inefficient scanning, too, as the inner loop of is_ntfs_dotgit() was optimized for readability rather than for speed. Let's simplify the design of is_ntfs_dotgit() by putting the burden of splitting the paths by backslashes as directory separators on the callers of said function. Consequently, the verify_path() function, which already splits the path by directory separators, now treats backslashes as directory separators _explicitly_ when core.protectNTFS is turned on, even on platforms where the backslash is _not_ a directory separator. Note that we have to repeat some code in verify_path(): if the backslash is not a directory separator on the current Operating System, we want to allow file names like \, but we _do_ want to disallow paths that are clearly intended to cause harm when the repository is cloned on Windows. The fsck_tree() function (the other caller of is_ntfs_dotgit()) now needs to look for backslashes in tree entries' names specifically when core.protectNTFS is turned on. While it would be tempting to completely disallow backslashes in that case (much like fsck reports names containing forward slashes as "full paths"), this would be overzealous: when core.protectNTFS is turned on in a non-Windows setup, backslashes are perfectly valid characters in file names while we _still_ want to disallow tree entries that are clearly designed to exploit NTFS-specific behavior. This simplification will make subsequent changes easier to implement, such as turning core.protectNTFS on by default (not only on Windows) or protecting against attack vectors involving NTFS Alternate Data Streams. Incidentally, this change allows for catching malicious repositories that contain tree entries of the form dir\.gitmodules already on the server side rather than only on the client side (and previously only on Windows): in contrast to is_ntfs_dotgit(), the is_ntfs_dotgitmodules() function already expects the caller to split the paths by directory separators. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>  path safeguard .git against NTFS Alternate Streams Ac.diff | (download) path.c | 12 11 + 1 - 0 ! t/t1014-read-tree-confusing.sh | 1 1 + 0 - 0 ! 2 files changed, 12 insertions(+), 1 deletion(-)  path: safeguard .git against ntfs alternate streams accesses commit 7c3745fc6185495d5765628b4dfe1bd2c25a2981 upstream. Probably inspired by HFS' resource streams, NTFS supports "Alternate Data Streams": by appending :<stream-name> to the file name, information in addition to the file contents can be written and read, information that is copied together with the file (unless copied to a non-NTFS location). These Alternate Data Streams are typically used for things like marking an executable as having just been downloaded from the internet (and hence not necessarily being trustworthy). In addition to a stream name, a stream type can be appended, like so: :<stream-name>:<stream-type>. Unless specified, the default stream type is $DATA for files and $INDEX_ALLOCATION for directories. In other words, .git::$INDEX_ALLOCATION is a valid way to reference the
.git directory!

In our work in Git v2.2.1 to protect Git on NTFS drives under
core.protectNTFS, we focused exclusively on NTFS short names, unaware
of the fact that NTFS Alternate Data Streams offer a similar attack
vector.

Let's fix this.

Seeing as it is better to be safe than sorry, we simply disallow paths
referring to *any* NTFS Alternate Data Stream of .git, not just
::\$INDEX_ALLOCATION. This also simplifies the implementation.

This closes CVE-2019-1352.

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c54dec26-1551-4d3a-a0ea-4fa40f848eb3

Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>



path.c | 2 1 + 1 - 0 !
t/t0060-path-utils.sh | 7 6 + 1 - 0 !
2 files changed, 7 insertions(+), 2 deletions(-)

 path: also guard .gitmodules against ntfs alternate data streams

commit 91bd46588e6959e6903e275f78b10bd07830d547 upstream.

We just safe-guarded .git against NTFS Alternate Data Stream-related
attack vectors, and now it is time to do the same for .gitmodules.

Note: In the added regression test, we refrain from verifying all kinds
of variations between short names and NTFS Alternate Data Streams: as
the new code disallows _all_ Alternate Data Streams of .gitmodules, it
is enough to test one in order to know that all of them are guarded
against.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>



path.c | 55 30 + 25 - 0 !
1 file changed, 30 insertions(+), 25 deletions(-)

 is_ntfs_dotgit(): speed it up

commit 3a85dc7d534fc2d410ddc0c771c963b20d1b4857 upstream.

Previously, this function was written without focusing on speed,
intending to make reviewing the code as easy as possible, to avoid any
bugs in this critical code.

Turns out: we can do much better on both accounts. With this patch, we
make it as fast as this developer can make it go:

- We avoid the call to is_dir_sep() and make all the character
comparisons explicit.

- We avoid the cost of calling strncasecmp() and unroll the test for
.git and git~1, not even using tolower() because it is faster to
compare against two constant values.

- We look for .git and .git~1 first thing, and return early if not
found.

- We also avoid calling a separate function for detecting chains of
spaces and periods.

Each of these improvements has a noticeable impact on the speed of
is_ntfs_dotgit().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>



config.mak.uname | 2 0 + 2 - 0 !
environment.c | 2 1 + 1 - 0 !
2 files changed, 1 insertion(+), 3 deletions(-)

 protect_ntfs: turn on ntfs protection by default

commit 9102f958ee5254b10c0be72672aa3305bf4f4704 upstream.

Back in the DOS days, in the FAT file system, file names always
consisted of a base name of length 8 plus a file extension of length 3.
Shorter file names were simply padded with spaces to the full 8.3
format.

Later, the FAT file system was taught to support _also_ longer names,
with an 8.3 "short name" as primary file name. While at it, the same
facility allowed formerly illegal file names, such as .git (empty base
names were not allowed), which would have the "short name" git~1
associated with it.

For backwards-compatibility, NTFS supports alternative 8.3 short
filenames, too, even if starting with Windows Vista, they are only
generated on the system drive by default.

We addressed the problem that the .git/ directory can _also_ be
accessed via git~1/ (when short names are enabled) in 2b4c6efc821
(read-cache: optionally disallow NTFS .git variants, 2014-12-16), i.e.
since Git v1.9.5, by introducing the config setting core.protectNTFS
and enabling it by default on Windows.

In the meantime, Windows 10 introduced the "Windows Subsystem for Linux"
(short: WSL), i.e. a way to run Linux applications/distributions in a
thinly-isolated subsystem on Windows (giving rise to many a "2016 is the
Year of Linux on the Desktop" jokes). WSL is getting increasingly
popular, also due to the painless way Linux application can operate
directly ("natively") on files on Windows' file system: the Windows
drives are mounted automatically (e.g. C: as /mnt/c/).

Taken together, this means that we now have to enable the safe-guards of
Git v1.9.5 also in WSL: it is possible to access a .git directory
inside /mnt/c/ via the 8.3 name git~1 (unless short name generation
was disabled manually). Since regular Linux distributions run in WSL,
this means we have to enable core.protectNTFS at least on Linux, too.

To enable Services for Macintosh in Windows NT to store so-called
resource forks, NTFS introduced "Alternate Data Streams". Essentially,
with) their associated files, and they are accessed via pseudo file
names of the form filename:<stream-name>:<stream-type>.

In a recent patch, we extended core.protectNTFS to also protect
against accesses via NTFS Alternate Data Streams, e.g. to prevent
contents of the .git/ directory to be "tracked" via yet another
alternative file name.

While it is not possible (at least by default) to access files via NTFS
Alternate Data Streams from within WSL, the defaults on macOS when
mounting network shares via SMB _do_ allow accessing files and
directories in that way. Therefore, we need to enable core.protectNTFS
on macOS by default, too, and really, on any Operating System that can
mount network shares via SMB/CIFS.

A couple of approaches were considered for fixing this:

1. We could perform a dynamic NTFS check similar to the core.symlinks
check in init/clone: instead of trying to create a symbolic link
in the .git/ directory, we could create a test file and try to
access .git/config via 8.3 name and/or Alternate Data Stream.

2. We could simply "flip the switch" on core.protectNTFS, to make it
"on by default".

The obvious downside of 1. is that it won't protect worktrees that were
clone with a vulnerable Git version already. We considered patching code
paths that check out files to check whether we're running on an NTFS
system dynamically and persist the result in the repository-local config
setting core.protectNTFS, but in the end decided that this solution
would be too fragile, and too involved.

The obvious downside of 2. is that everybody will have to "suffer" the
performance penalty incurred from calling is_ntfs_dotgit() on every
path, even in setups where.

After the recent work to accelerate is_ntfs_dotgit() in most cases,
it looks as if the time spent on validating ten million random
file names increases only negligibly (less than 20ms, well within the
standard deviation of ~50ms). Therefore the benefits outweigh the cost.

Another downside of this is that paths that might have been acceptable
previously now will be forbidden. Realistically, though, this is an
improvement because public Git hosters already would reject any git
push that contains such file names.

Note: There might be a similar problem mounting HFS+ on Linux. However,
this scenario has been considered unlikely and in light of the cost (in
the aforementioned benchmark, core.protectHFS = true increased the
time from ~440ms to ~610ms), it was decided _not_ to touch the default
of core.protectHFS.

Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Helped-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>



builtin/submodule--helper.c | 24 24 + 0 - 0 !
git-submodule.sh | 5 5 + 0 - 0 !
submodule.c | 41 41 + 0 - 0 !
submodule.h | 5 5 + 0 - 0 !
t/t7415-submodule-names.sh | 23 23 + 0 - 0 !
5 files changed, 98 insertions(+)

 disallow dubiously-nested submodule git directories

commit a8dee3ca610f5a1d403634492136c887f83b59d2 upstream.

Currently it is technically possible to let a submodule's git
directory point right into the git dir of a sibling submodule.

Example: the git directories of two submodules with the names hippo
and hippo/hooks would be .git/modules/hippo/ and
.git/modules/hippo/hooks/, respectively, but the latter is already
intended to house the former's hooks.

In most cases, this is just confusing, but there is also a (quite
contrived) attack vector where Git can be fooled into mistaking remote
content for file contents it wrote itself during a recursive clone.

Let's plug this bug.

To do so, we introduce the new function validate_submodule_git_dir()
which simply verifies that no git dir exists for any leading directories
of the submodule name (if there are any).

Note: this patch specifically continues to allow sibling modules names
of the form core/lib, core/doc, etc, as long as core is not a
submodule name.

This fixes CVE-2019-1387.

[jn: backported to 2.11.y:
- port to git-submodule.sh
- use explicit chdir to emulate test_commit -C in test]

Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>



unpack-trees.c | 3 2 + 1 - 0 !
1 file changed, 2 insertions(+), 1 deletion(-)

 unpack-trees: let merged_entry() pass through do_add_entry()'s errors

A git clone will end with exit code 0 when merged_entry() returns a
positive value during a call of unpack_trees() to traverse_trees().
The reason is that unpack_trees() will interpret a positive value not
to be an error.

The problem is, however, that add_index_entry() (which is called by
merged_entry() can report an error, and we really should fail the
entire clone in such a case.

Let's fix this problem, in preparation for a Windows-specific patch
disallowing mkdir() with directory names that contain a trailing space
(which is illegal on NTFS): we want git clone to abort when a path
cannot be checked out due to that condition.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>