File: postprocess.ex

package info (click to toggle)
elixir-makeup 1.2.1-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 384 kB
  • sloc: javascript: 24; makefile: 9
file content (115 lines) | stat: -rw-r--r-- 3,908 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
defmodule Makeup.Lexer.Postprocess do
  @moduledoc """
  Often you'll want to run the token list through a postprocessing stage before
  running the formatter.

  Most of what we can do in a post-processing stage can be done with more parsing rules,
  but doing it in a post-processing stage is often easier and faster.
  Never assume one of the options is faster than the other, always measure performance.
  """

  @doc """
  Takes a list of the format `[{key1, [val11, val12, ...]}, {key2, [val22, ...]}]` and
  returns a map of the form `%{val11 => key1, val12 => key2, ..., val22 => key2, ...}`.

  The resulting map may be useful to highlight some tokens in a special way
  in a postprocessing step.

  You can also use pattern matching instead of the inverted map,
  and it will probably be faster, but always benchmark the alternatives before
  committing to an implementation.
  """
  def invert_word_map(pairs) do
    nested =
      for {ttype, words} <- pairs do
        for word <- words, do: {word, ttype}
      end

    nested
    |> List.flatten()
    |> Enum.into(%{})
  end

  @doc """
  Converts the value of a token into a binary.

  Token values are usually iolists for performance reasons.
  The BEAM is actually quite fast at printing or concatenating iolists,
  and some of the basic combinators output iolists, so there is no need
  to convert the token values into binaries.

  This function should only be used for testing purposes, when you might
  want to compare the token list into a reference output.

  Converting the tokens into binaries has two advantges:
  1. It's much easier to compare tokens by visual inspection when the value is a binary
  2. When testing, two iolists that print to the same binary should be considered equal.

  This function hasn't been optimized for speed.
  Don't use in production code.
  """
  def token_value_to_binary({ttype, meta, value}) do
    {ttype, meta, to_string([value])}
  end

  @doc """
  Converts the values of the tokens in the list into binaries.

  Token values are usually iolists for performance reasons.
  The BEAM is actually quite fast at printing or concatenating iolists,
  and some of the basic combinators output iolists, so there is no need
  to convert the token values into binaries.

  This function should only be used for testing purposes, when you might
  want to compare the token list into a reference output.

  Converting the tokens into binaries has two advantges:
  1. It's much easier to compare tokens by visual inspection when the value is a binary
  2. When testing, two iolists that print to the same binary should be considered equal.

  ## Example

  ```elixir
  defmodule MyTest do
    use ExUnit.Case
    alias Makeup.Lexers.ElixirLexer
    alias Makeup.Lexer.Postprocess

    test "binaries are much easier on the eyes" do
      naive_tokens = ElixirLexer.lex(":atom")
      # Hard to inspect visually
      assert naive_tokens == [{:string_symbol, %{language: :elixir}, [":", 97, 116, 111, 109]}]
      better_tokens =
        ":atom"
        |> ElixirLexer.lex()
        |> Postprocess.token_values_to_binaries()
      # Easy to inspect visually
      assert better_tokens == [{:string_symbol, %{language: :elixir}, ":atom"}]
    end
  end
  ```

  Actually, you'll want to define some kind of helper to make it less verbose.
  For example:

      defmodule MyTest do
        use ExUnit.Case
        alias Makeup.Lexers.ElixirLexer
        alias Makeup.Lexer.Postprocess

        def lex(text) do
          text
          |> ElixirLexer.lex(group_prefix: "group")
          |> Postprocess.token_values_to_binaries()
        end

        test "even better with our little helper" do
          assert lex(":atom") == [{:string_symbol, %{language: :elixir}, ":atom"}]
        end
      end

  """
  def token_values_to_binaries(tokens) do
    Enum.map(tokens, &token_value_to_binary/1)
  end
end