1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
|
# coding: utf-8
# This script autogenerates `IPython.core.latex_symbols.py`, which contains a
# single dict , named `latex_symbols`. The keys in this dict are latex symbols,
# such as `\\alpha` and the values in the dict are the unicode equivalents for
# those. Most importantly, only unicode symbols that are valid identifiers in
# Python 3 are included.
#
# The original mapping of latex symbols to unicode comes from the `latex_symbols.jl` files from Julia.
from pathlib import Path
# Import the Julia LaTeX symbols
print("Importing latex_symbols.jl from Julia...")
import requests
url = "https://raw.githubusercontent.com/JuliaLang/julia/master/stdlib/REPL/src/latex_symbols.jl"
r = requests.get(url)
# Build a list of key, value pairs
print("Building a list of (latex, unicode) key-value pairs...")
lines = r.text.splitlines()
prefixes_line = lines.index('# "font" prefixes')
symbols_line = lines.index("# manual additions:")
prefix_dict = {}
for l in lines[prefixes_line + 1 : symbols_line]:
p = l.split()
if not p or p[1] == "latex_symbols":
continue
prefix_dict[p[1]] = p[3]
idents = []
for l in lines[symbols_line:]:
if "=>" not in l:
continue # if it's not a def, skip
if "#" in l:
l = l[: l.index("#")] # get rid of eol comments
x, y = l.strip().split("=>")
if "*" in x: # if a prefix is present substitute it with its value
p, x = x.split("*")
x = prefix_dict[p][:-1] + x[1:]
try:
x = x.split('"')[1]
y = y.split('"')[1] # get the values in quotes
except IndexError as e:
continue
raise ValueError((x, y)) from e
if not x.startswith("\\\\"):
# reverse mapping
continue
idents.append((x, y))
# Filter out non-valid identifiers
print("Filtering out characters that are not valid Python 3 identifiers")
def test_ident(i):
"""Is the unicode string valid in a Python 3 identifier."""
# Some characters are not valid at the start of a name, but we still want to
# include them. So prefix with 'a', which is valid at the start.
return ("a" + i).isidentifier()
assert test_ident("α")
assert not test_ident("‴")
valid_idents = [line for line in idents if test_ident(line[1])]
# Write the `latex_symbols.py` module in the cwd
s = f"""# encoding: utf-8
# DO NOT EDIT THIS FILE BY HAND.
# To update this file, run the script /tools/gen_latex_symbols.py using Python 3
# This file is autogenerated from the file:
# {url}
# This original list is filtered to remove any unicode characters that are not valid
# Python identifiers.
latex_symbols = {{
"""
for line in valid_idents:
s += ' "%s": "%s",\n' % (line[0], line[1])
s += "}\n"
s += """
reverse_latex_symbol = {v: k for k, v in latex_symbols.items()}
"""
HERE = Path(__file__).parent
fn = HERE / ".." / "IPython" / "core" / "latex_symbols.py"
print("Writing the file: %s" % str(fn))
fn.write_text(s, encoding="utf-8")
|