From: Pradyun Gedam <pradyunsg@gmail.com>
Date: Tue, 11 May 2021 20:04:10 -0400
Subject: Security: Don't split git references on unicode separators

Previously, maliciously formatted tags could be used to hijack a
commit-based pin. Using the fact that the split here allowed for
all of unicode's whitespace characters as separators -- which git allows
as a part of a tag name -- it is possible to force a different revision
to be installed; if an attacker gains access to the repository.

This change stops splitting the string on unicode characters, by forcing
the splits to happen on newlines and ASCII spaces.

Origin: upstream, https://github.com/pypa/pip/pull/9827
---
 src/pip/_internal/vcs/git.py | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/src/pip/_internal/vcs/git.py b/src/pip/_internal/vcs/git.py
index 565961a..4423a91 100644
--- a/src/pip/_internal/vcs/git.py
+++ b/src/pip/_internal/vcs/git.py
@@ -149,9 +149,15 @@ class Git(VersionControl):
             on_returncode='ignore',
         )
         refs = {}
-        for line in output.strip().splitlines():
+        # NOTE: We do not use splitlines here since that would split on other
+        #       unicode separators, which can be maliciously used to install a
+        #       different revision.
+        for line in output.strip().split("\n"):
+            line = line.rstrip("\r")
+            if not line:
+                continue
             try:
-                sha, ref = line.split()
+                sha, ref = line.split(" ", maxsplit=2)
             except ValueError:
                 # Include the offending line to simplify troubleshooting if
                 # this error ever occurs.
