# Python Helpers

The python helpers in this IPython notebook serve to generate Elixir modules out of Pygments's styles. This gives us 29 styles for free, even if some of the styles are a little weird. It's good to have a choice. Because this introspects python code, it is written in python and will continue to be written in python for as long as it's needed.

I've done this in an IPython notebook because it's the best environment for exploratory programming using python.

This is not to be regularly used during development, so I didn't bother creating a proper python package or even a requirements file. The external dependencies needed to run this notebook are:

* jupyter
* pygments
* jinja2

## Style Modules Generation

The code is pretty simple. It introspects the Python classes with some help from Pygments itself, and then generates Elixir modules with the same functionality. The architecture is of course quite different (Elixir lexers are *data*, not Objects).

Please don't touch the `lib/makeup/styles/html/style_map.ex` file between these markers:

```elixir
 # %% Start Pygments - Don't remove this line 
 ...
 # %% End Pygments - Don't remove this line 
```

Because they will be overwritten if this is run again.

In [67]:
import pygments.styles
import jinja2
import textwrap
from itertools import chain
import os
import re

tokens = [
  ':text',
  ':whitespace',
  ':escape',
  ':error',
  ':other' ,
  ':keyword',
  ':keyword_constant',
  ':keyword_declaration',
  ':keyword_namespace',
  ':keyword_pseudo',
  ':keyword_reserved',
  ':keyword_type' ,
  ':name',
  ':name_attribute',
  ':name_builtin',
  ':name_builtin_pseudo',
  ':name_class',
  ':name_constant',
  ':name_decorator',
  ':name_entity',
  ':name_exception',
  ':name_function',
  ':name_function_magic',
  ':name_property',
  ':name_label',
  ':name_namespace',
  ':name_other',
  ':name_tag',
  ':name_variable',
  ':name_variable_class',
  ':name_variable_global',
  ':name_variable_instance',
  ':name_variable_magic',
  ':literal',
  ':literal_date',
  ':string',
  ':string_affix',
  ':string_backtick',
  ':string_char',
  ':string_delimiter',
  ':string_doc',
  ':string_double',
  ':string_escape',
  ':string_heredoc',
  ':string_interpol',
  ':string_other',
  ':string_regex',
  ':string_sigil',
  ':string_single',
  ':string_symbol',
  ':number',
  ':number_bin',
  ':number_float',
  ':number_hex',
  ':number_integer',
  ':number_integer_long',
  ':number_oct',
  ':operator',
  ':operator_word',
  ':punctuation',
  ':comment',
  ':comment_hashbang',
  ':comment_multiline',
  ':comment_preproc',
  ':comment_preproc_file',
  ':comment_single',
  ':comment_special',
  ':generic',
  ':generic_deleted',
  ':generic_emph',
  ':generic_error',
  ':generic_heading',
  ':generic_inserted',
  ':generic_output',
  ':generic_prompt',
  ':generic_strong',
  ':generic_subheading',
  ':generic_traceback']

style_module_template = jinja2.Template('''
defmodule Makeup.Styles.HTML.{{module_name}} do
  @moduledoc false

  require Makeup.Token.TokenTypes
  alias Makeup.Token.TokenTypes, as: Tok

  @styles %{
    {% for tok in tokens %}
    {%- if styles[ex_to_py[tok]] %}{{ tok }} => "{{ styles[ex_to_py[tok]] }}"{% if not loop.last %},{% endif %}
    {% endif -%}
    {%- endfor %}
  }
  
  alias Makeup.Styles.HTML.Style
  
  @style_struct Style.make_style(
      short_name: "{{ short_name }}",
      long_name: "{{ long_name }}",
      background_color: "{{ background_color }}",
      highlight_color: "{{ highlight_color }}",
      styles: @styles)
      
  def style() do
    @style_struct
  end
end
''')

style_map_file_fragment = jinja2.Template('''
  {% for (lowercase, uppercase) in pairs %}
  @doc """
  The *{{ lowercase }}* style. Example [here](https://elixir-makeup.github.io/makeup_demo/elixir.html#{{ lowercase }}).
  """
  def {{ lowercase }}_style, do: HTML.{{ uppercase }}.style()
  
  {% endfor -%}
  
  # All styles
  @pygments_style_map_binaries %{
  {% for (lowercase, uppercase) in pairs %}  "{{ lowercase }}" => HTML.{{ uppercase }}.style(),
  {% endfor %}  }
    
  @pygments_style_map_atoms %{
  {% for (lowercase, uppercase) in pairs %}  {{ lowercase }}: HTML.{{ uppercase }}.style(),
  {% endfor %}}


''')

def py_to_ex(cls):
    # We don't want to operate on token classes, only their names
    name = str(cls)
    # They are of the form "Token.*"
    # Trim the "Token." prefix
    name = name.replace('Token.Literal.', 'Token.')
    trimmed = name[6:]
    # Convert to lower case
    # It would be confusing to have them in uppercase in Elixir
    # because they could be mistaken by aliases.
    # Besides, having them in lowercase allows us to use macros
    # to make sure at compile time we're not using any inexistent styles.
    lowered = trimmed.lower()
    # Continue turning them into valid identifiers
    replaced = lowered.replace('.', '_')
    # Turn it into a macro under the Tok alias
    return (str(cls), ':' + replaced)

def invert(pairs):
    return [(y, x) for (x, y) in pairs]
        
def stringify_styles(styles):
    return dict((str(k), v) for (k,v) in styles.items())

def correct_docs(text, level=2):
    # The module docs are writte in rST.
    # rST is similar enough to markdown that we can fake it
    # by removing the first lines with the title and
    # replacing some directives.
    
    # First, remove all indent
    md = textwrap.dedent(text)
    # Replace the :copyright directive
    md = md.strip().replace(':copyright:', '&copy;')
    # Replace the :license: directive
    md = md.replace(':license:', 'License:')
    # Add a link to the BDS license
    md = md.replace('see LICENSE for details',
                    'see [here](https://opensource.org/licenses/BSD-3-Clause) for details')
    # Escape the '*' character, which is probably not used for emphasis by the license
    md = md.replace('*', '\\*')
    # remove the first 3 lines, which contain the title
    # and indent all lines (2 spaces by default)
    indented = "\n".join(((" " * level) + line) for line in md.split('\n')[3:])
    return indented

def style_to_ex_module(key, value, tokens):
    # Pygments stores the module name and the class name under this weird format
    module_name, class_name = value.split('::')
    # Import the module
    __import__('pygments.styles.' + module_name)
    # Store the module in a variable
    module = getattr(pygments.styles, module_name)
    short_name = module_name
    long_name = class_name[:-5] + " " + class_name[-5:]
    # Extract the class from the module
    style_class = getattr(module, class_name)
    # Map the Elixir styles into Python stringified token classes
    ex_to_py = dict(invert([py_to_ex(k) for k in style_class.styles.keys()]))
    stringified_styles = stringify_styles(style_class.styles)
    # Render the tokens
    return style_module_template.render(
        # Preprocess the docs
        moduledoc=correct_docs(module.__doc__, 2),
        # We take the style name unchanged from Python
        # (including the *Style suffix)
        module_name=style_class.__name__,
        # The elixir token styles
        tokens=tokens,
        # Other class attributes
        short_name=short_name,
        long_name=long_name,
        background_color=style_class.background_color,
        highlight_color=style_class.highlight_color,
        styles=stringified_styles,
        ex_to_py=ex_to_py)

def all_styles(style_map, tokens):
    # This function generates elixir an elixir file (with a module for each Pygments style.
    # It will overwrite existing files.
    for key, value in style_map.items():
        source = style_to_ex_module(key, value, tokens)
        # The path where we'll generate the file
        file_path = os.path.join('lib/makeup/styles/html/pygments/', key + '.ex')
        with open(file_path, 'wb') as f:
            f.write(source.encode())

def generate_style_map_file(style_map):
    sorted_pairs = sorted([
      # Turn the key into a valid Elxir identifier
      (key.replace('-', '_'), value.split('::')[1])
        for (key, value) in style_map.items()
    ])
    # Generate the new text fragment
    new_fragment = style_map_file_fragment.render(pairs=sorted_pairs)
    file_path = os.path.join('lib/makeup/styles/html/style_map.ex')
    with open(file_path, 'r') as f:
        source = f.read()
    
    # Recognize the pattern to replace
    pattern = re.compile(
                 "(?<=  # %% Start Pygments %%)(\r?\n)"
                 "(.*?\r?\n)"
                 "(?=  # %% End Pygments %%)", re.DOTALL)
    # Replace the text between the markers
    replaced = re.sub(
        pattern,
        new_fragment,
        source)
    # Check we've done the right thing
    print(replaced)
    # Replace the file contents
    with open(file_path, 'wb') as f:
        source = f.write(replaced.encode())
    
# (Re)generate modules for all styles
all_styles(pygments.styles.STYLE_MAP, tokens)
# Regenerate the style_map file
generate_style_map_file(pygments.styles.STYLE_MAP)

defmodule Makeup.Styles.HTML.StyleMap do
  alias Makeup.Styles.HTML

  # %% Start Pygments %%
  
  @doc """
  The *abap* style. Example [here](https://elixir-makeup.github.io/makeup_demo/elixir.html#abap).
  """
  def abap_style, do: HTML.AbapStyle.style()
  
  
  @doc """
  The *algol* style. Example [here](https://elixir-makeup.github.io/makeup_demo/elixir.html#algol).
  """
  def algol_style, do: HTML.AlgolStyle.style()
  
  
  @doc """
  The *algol_nu* style. Example [here](https://elixir-makeup.github.io/makeup_demo/elixir.html#algol_nu).
  """
  def algol_nu_style, do: HTML.Algol_NuStyle.style()
  
  
  @doc """
  The *arduino* style. Example [here](https://elixir-makeup.github.io/makeup_demo/elixir.html#arduino).
  """
  def arduino_style, do: HTML.ArduinoStyle.style()
  
  
  @doc """
  The *autumn* style. Example [here](https://elixir-makeup.github.io/makeup_demo/elixir.html#autumn).
  """
  def autumn_style, do: HTML.AutumnStyle.style()
  
  
  @doc """
  The *borland* style.

# Default Language Names



In [5]:
from pygments.lexers import get_all_lexers
import jinja2

In [38]:
def get_lexer_data():
    data = []
    for lexer in get_all_lexers():
        (name, shortnames, filetypes, mimetypes) = lexer
        lexer_class = pygments.lexers.get_lexer_by_name(shortnames[0])
        row = {
            'class_name': lexer_class.__class__.__name__,
            'name': name,
            'shortnames': shortnames,
            'filetypes': filetypes,
            'mimetypes': mimetypes
          }
        data.append(row)
    return data
    
template = jinja2.Template("""
%{
{% for lexer in lexer_data %}
  %{\
module: Makeup.Lexers.{{ lexer['class_name'] }}, \
name: "{{ lexer['name'] }}", \
shortnames: [{% for shortname in lexer['shortnames'] %}"{{ shortname }}"{% if not loop.last  %}, {% endif %}{% endfor %}]}\
{% if not loop.last %},{% endif %}
{%- endfor %}\
""")

print(template.render(lexer_data=get_lexer_data()))


%{

  %{module: Makeup.Lexers.ABAPLexer, name: "ABAP", shortnames: ["abap"]},
  %{module: Makeup.Lexers.APLLexer, name: "APL", shortnames: ["apl"]},
  %{module: Makeup.Lexers.AbnfLexer, name: "ABNF", shortnames: ["abnf"]},
  %{module: Makeup.Lexers.ActionScript3Lexer, name: "ActionScript 3", shortnames: ["as3", "actionscript3"]},
  %{module: Makeup.Lexers.ActionScriptLexer, name: "ActionScript", shortnames: ["as", "actionscript"]},
  %{module: Makeup.Lexers.AdaLexer, name: "Ada", shortnames: ["ada", "ada95", "ada2005"]},
  %{module: Makeup.Lexers.AdlLexer, name: "ADL", shortnames: ["adl"]},
  %{module: Makeup.Lexers.AgdaLexer, name: "Agda", shortnames: ["agda"]},
  %{module: Makeup.Lexers.AheuiLexer, name: "Aheui", shortnames: ["aheui"]},
  %{module: Makeup.Lexers.AlloyLexer, name: "Alloy", shortnames: ["alloy"]},
  %{module: Makeup.Lexers.AmbientTalkLexer, name: "AmbientTalk", shortnames: ["at", "ambienttalk", "ambienttalk/2"]},
  %{module: Makeup.Lexers.AmplLexer, name: "Ampl", shor

In [15]:
dir(pygments.lexers)

['ClassNotFound',
 'LEXERS',
 'PythonLexer',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_automodule',
 '_fn_matches',
 '_iter_lexerclasses',
 '_lexer_cache',
 '_load_lexers',
 '_mapping',
 '_pattern_cache',
 'actionscript',
 'agile',
 'basename',
 'css',
 'd',
 'data',
 'factor',
 'find_lexer_class',
 'find_lexer_class_by_name',
 'find_lexer_class_for_filename',
 'find_plugin_lexers',
 'fnmatch',
 'get_all_lexers',
 'get_filetype_from_buffer',
 'get_lexer_by_name',
 'get_lexer_for_filename',
 'get_lexer_for_mimetype',
 'guess_decode',
 'guess_lexer',
 'guess_lexer_for_filename',
 'html',
 'iolang',
 'itervalues',
 'javascript',
 'jvm',
 'lisp',
 'load_lexer_from_file',
 'perl',
 'php',
 'python',
 're',
 'ruby',
 'scripting',
 'tcl',
 'web',
 'webmisc']

In [23]:
pygments.lexers.get_lexer_by_name("HTML+Cheetah").__class__.__name__

'CheetahHtmlLexer'