File: inline-parsing.md

package info (click to toggle)
php-league-commonmark 2.7.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 8,260 kB
  • sloc: php: 20,378; xml: 1,988; ruby: 45; makefile: 21; javascript: 15
file content (179 lines) | stat: -rw-r--r-- 7,774 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
---
layout: default
title: Inline Parsing
description: Parsing inline elements with a custom parser
---

# Inline Parsing

There are two ways to implement custom inline syntax:

- Inline Parsers (covered here)
- [Delimiter Processors](/2.1/customization/delimiter-processing/)

The difference between normal inlines and delimiter-run-based inlines is subtle but important to understand.  In a nutshell, delimiter-run-based inlines:

- Are denoted by "wrapping" text with one or more characters before **and** after those inner contents
- Can contain other delimiter runs or inlines inside of them

An example of this would be emphasis:

```markdown
This is an example of **emphasis**. Note how the text is *wrapped* with the same character(s) before and after.
```

If your syntax looks like that, consider using a [delimiter processor](/2.1/customization/delimiter-processing/) instead.  Otherwise, an inline parser is your best bet.

## Implementing Inline Parsers

Inline parsers should implement `InlineParserInterface` and the following two methods:

### getMatchDefinition()

This method should return an instance of `InlineParserMatch` which defines the text the parser is looking for.  Examples of this might be something like:

```php
use League\CommonMark\Parser\Inline\InlineParserMatch;

InlineParserMatch::string('@');                  // Match any '@' characters found in the text
InlineParserMatch::string('foo');                // Match the text 'foo' (case insensitive)

InlineParserMatch::oneOf('@', '!');              // Match either character
InlineParserMatch::oneOf('http://', 'https://'); // Match either string

InlineParserMatch::regex('\d+');                 // Match the regular expression (omit the regex delimiters and any flags)
```

Once a match is found, the `parse()` method below may be called.

### parse()

This method will be called if both conditions are met:

1. The engine has found at a matching string in the current line; and,
2. No other inline parsers with a [higher priority](/2.1/customization/environment/#addinlineparser) have successfully parsed the text at this point in the line

#### Parameters

- `InlineParserContext $inlineContext` - Encapsulates the current state of the inline parser - see more information below.

##### InlineParserContext

This class has several useful methods:

- `getContainer()` - Returns the current container block the inline text was found in.  You'll almost always call `$inlineContext->getContainer()->appendChild(...)` to add the parsed inline text inside that block.
- `getReferenceMap()` - Returns the document's reference map
- `getCursor()` - Returns the current [`Cursor`](/2.1/customization/cursor/) used to parse the current line.  (Note that the cursor will be positioned **before** the matched text, so you must advance it yourself if you determine it's a valid match)
- `getDelimiterStack()` - Returns the current delimiter stack. Only used in advanced use cases.
- `getFullMatch()` - Returns the full string that matched you `InlineParserMatch` definition
- `getFullMatchLength()` - Returns the length of the full match - useful for advancing the cursor
- `getSubMatches()` - If your `InlineParserMatch` used a regular expression with capture groups, this will return the text matches by those groups.
- `getMatches()` - Returns an array where index `0` is the "full match", plus any sub-matches.  It basically simulates `preg_match()`'s behavior.

#### Return value

`parse()` should return `false` if it's unable to handle the text at the current position for any reason.  Other parsers will then have a chance to try parsing that text.  If all registered parsers return false, the text will be added as plain text.

Returning `true` tells the engine that you've successfully parsed the character (and related ones after it).  It is your responsibility to:

1. Advance the cursor to the end of the parsed/matched text
2. Add the parsed inline to the container (`$inlineContext->getContainer()->appendChild(...)`)

## Inline Parser Examples

### Example 1 - Twitter Handles

Let's say you wanted to autolink Twitter handles without using the link syntax.  This could be accomplished by registering a new inline parser to handle the `@` character:

```php
use League\CommonMark\Environment\Environment;
use League\CommonMark\Extension\CommonMark\CommonMarkCoreExtension;
use League\CommonMark\Extension\CommonMark\Node\Inline\Link;
use League\CommonMark\Parser\Inline\InlineParserInterface;
use League\CommonMark\Parser\Inline\InlineParserMatch;
use League\CommonMark\Parser\InlineParserContext;

class TwitterHandleParser implements InlineParserInterface
{
    public function getMatchDefinition(): InlineParserMatch
    {
        return InlineParserMatch::regex('@([A-Za-z0-9_]{1,15}(?!\w))');
    }

    public function parse(InlineParserContext $inlineContext): bool
    {
        $cursor = $inlineContext->getCursor();
        // The @ symbol must not have any other characters immediately prior
        $previousChar = $cursor->peek(-1);
        if ($previousChar !== null && $previousChar !== ' ') {
            // peek() doesn't modify the cursor, so no need to restore state first
            return false;
        }

        // This seems to be a valid match
        // Advance the cursor to the end of the match
        $cursor->advanceBy($inlineContext->getFullMatchLength());

        // Grab the Twitter handle
        [$handle] = $inlineContext->getSubMatches();
        $profileUrl = 'https://twitter.com/' . $handle;
        $inlineContext->getContainer()->appendChild(new Link($profileUrl, '@' . $handle));
        return true;
    }
}

// And here's how to hook it up:

$environment = new Environment();
$environment->addExtension(new CommonMarkCoreExtension());
$environment->addInlineParser(new TwitterHandleParser());
```

### Example 2 - Emoticons

Let's say you want to automatically convert smilies (or "frownies") to emoticon images.  This is incredibly easy with an inline parser:

```php
use League\CommonMark\Environment\Environment;
use League\CommonMark\Extension\CommonMark\Node\Inline\Image;
use League\CommonMark\Parser\Inline\InlineParserInterface;
use League\CommonMark\Parser\Inline\InlineParserMatch;
use League\CommonMark\Parser\InlineParserContext;

class SmilieParser implements InlineParserInterface
{
    public function getMatchDefinition(): InlineParserMatch
    {
        return InlineParserMatch::oneOf(':)', ':(');
    }

    public function parse(InlineParserContext $inlineContext): bool
    {
        $cursor = $inlineContext->getCursor();

        // Advance the cursor past the 2 matched chars since we're able to parse them successfully
        $cursor->advanceBy(2);

        // Add the corresponding image
        if ($inlineContext->getFullMatch() === ':)') {
            $inlineContext->getContainer()->appendChild(new Image('/img/happy.png'));
        } elseif ($inlineContext->getFullMatch() === ':(') {
            $inlineContext->getContainer()->appendChild(new Image('/img/sad.png'));
        }

        return true;
    }
}

$environment = new Environment();
$environment->addExtension(new CommonMarkCoreExtension());
$environment->addInlineParser(new SmilieParserParser());
```

## Tips

- For best performance:
  - Avoid using overly-complex regular expressions in `getMatchDefinition()` - use the simplest regex you can and have `parse()` do the heavier validation
  - Have your `parse()` method `return false` **as soon as possible**.
- You can `peek()` without modifying the cursor state. This makes it useful for validating nearby characters as it's quick and you can bail without needed to restore state.
- You can look at (and modify) any part of the AST if needed (via `$inlineContext->getContainer()`).