File: CLAUDE.md

package info (click to toggle)
php-league-commonmark 2.8.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 6,544 kB
  • sloc: php: 20,645; xml: 1,998; ruby: 45; makefile: 21; javascript: 15
file content (160 lines) | stat: -rw-r--r-- 7,763 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

league/commonmark is a highly-extensible PHP Markdown parser that fully supports the CommonMark spec and GitHub-Flavored Markdown (GFM). It's based on the CommonMark JS reference implementation and provides a robust, extensible architecture for parsing and rendering Markdown content.

## Development Commands

### Testing
- `composer test` - Run all tests (includes linting, static analysis, unit tests, and pathological tests)
- `composer phpunit` - Run PHPUnit tests only (no coverage)
- `composer pathological` - Run pathological performance tests

### Code Quality
- `composer phpcs` - Run PHP CodeSniffer for coding standards
- `composer phpcbf` - Automatically fix coding standards issues
- `composer phpstan` - Run PHPStan static analysis
- `composer psalm` - Run Psalm static analysis with stats

(IMPORTANT: you MUST ALWAYS use PHP 7.4 to run `phpcs` and `phpcbf`. You SHOULD use the `php` service from docker-compose, which uses that version. Example: `docker compose exec php composer phpcs`)

### Benchmarking
- `./tests/benchmark/benchmark.php` - Compare performance against other Markdown parsers

## Architecture Overview

### Core Components

**Converters**: Main entry points using Facade pattern
- `CommonMarkConverter` - Preconfigured with `CommonMarkCoreExtension`
- `GithubFlavoredMarkdownConverter` - Includes GFM extensions bundle
- `MarkdownConverter` - Base class orchestrating `MarkdownParser` + `HtmlRenderer`
- Pattern: Factory with default configurations + Facade for complex pipeline

**Environment System**: Service container and registry
- `Environment` - Central registry managing parsers/renderers with priorities
- Implements PSR-14 event dispatcher for pre/post processing hooks
- Uses lazy initialization - extensions registered on first use
- Pattern: Registry + Builder + Dependency Injection

**Parser Architecture**: Two-phase recursive descent parsing
- **Block Phase**: `MarkdownParser` processes line-by-line with active parser stack
  - `BlockStartParserInterface` - Strategy pattern for block detection
  - State machine with continuation tracking and reference processing
  - Security: NUL character replacement, configurable nesting limits
- **Inline Phase**: `InlineParserEngine` with regex pre-compilation
  - `InlineParserInterface` - Strategy with regex-based matching
  - Position-based parser coordination with delimiter processing
  - Adjacent text merging optimization

**AST (Abstract Syntax Tree)**: Composite pattern with doubly-linked structure
- `Node` base class with tree navigation/manipulation methods
- `AbstractBlock`/`AbstractInline` - Template method pattern for element types
- `Document` - Root node with reference map storage
- Uses `Dflydev\DotAccessData\Data` for flexible metadata storage
- Supports multiple traversal: iterator, walker, query system

**Rendering**: Visitor pattern with strategy delegation
- `HtmlRenderer` - Traverses AST, delegates to node-specific renderers
- `NodeRendererInterface` - Strategy pattern for extensible rendering
- Hierarchical renderer lookup supporting inheritance
- Pre/post-render events with configurable block separators

**Extension System**: Plugin pattern with composite support
- `ExtensionInterface` - Simple contract for environment configuration
- `CommonMarkCoreExtension` - Complete spec implementation with priorities
- `GithubFlavoredMarkdownExtension` - Composite bundling multiple GFM features
- Performance: Optimized parser ordering and lazy registration

### Key Directories

**`src/Extension/`**: All built-in extensions
- `CommonMark/` - Core CommonMark specification features
- `GithubFlavoredMarkdownExtension.php` - GFM bundle extension
- Individual feature extensions: `Table/`, `Strikethrough/`, `TaskList/`, etc.

**`src/Parser/`**: Parsing logic
- `Block/` - Block-level parsing components
- `Inline/` - Inline parsing components
- `MarkdownParser.php` - Main parsing coordinator

**`src/Node/`**: AST node definitions
- `Block/` - Block-level nodes (paragraphs, headings, lists, etc.)
- `Inline/` - Inline nodes (text, emphasis, links, etc.)

**`src/Renderer/`**: Output rendering
- `Block/` and `Inline/` subdirectories mirror node structure
- `HtmlRenderer.php` - Main HTML output renderer

## AST (Abstract Syntax Tree) Manipulation

The library uses a doubly-linked AST where all elements (including the root `Document`) extend from the `Node` class:

### AST Traversal Methods

- **Iterator**: `$node->iterator()` - Fastest for complete tree traversal
- **Walker**: `$node->walker()` - Full control with enter/leave events, use `resumeAt()` for safe modifications
- **Query**: `(new Query())->where()->findAll($node)` - Easy but memory-intensive, creates snapshots
- **Manual**: `$node->next()`, `$node->parent()`, `$node->children()` - Best for direct relationships

### AST Modification

- **Adding**: `appendChild()`, `prependChild()`, `insertAfter()`, `insertBefore()`
- **Removing**: `detach()`, `replaceWith()`, `detachChildren()`, `replaceChildren()`
- **Data**: `$node->data->set('custom/info', $value)`, `$node->data->set('attributes/class', 'css-class')`

## Extension Development

### Creating Extensions
1. Implement `ExtensionInterface` with `register(EnvironmentBuilderInterface $environment)` method
2. Register components with priorities: `addInlineParser()`, `addBlockStartParser()`, `addRenderer()`
3. Follow existing extension patterns in `src/Extension/`

### Key Interfaces
- **Block Parsers**: `BlockStartParserInterface` - implement `tryStart()` and `tryContinue()`
- **Inline Parsers**: `InlineParserInterface` - implement `getMatchDefinition()` and `parse()`
- **Delimiter Processors**: `DelimiterProcessorInterface` - for emphasis-style wrapping syntax
- **Renderers**: `NodeRendererInterface` - implement `render()`, use `HtmlElement` for safety
- **Events**: PSR-14 events like `DocumentParsedEvent` for AST manipulation
- **Configuration**: `ConfigurableExtensionInterface` with `league/config` validation

### Cursor Usage & Parsing
- `Cursor` class: dual ASCII/UTF-8 paths, character caching, position state management
- Key methods: `peek()`, `match()`, `saveState()`/`restoreState()`, `advanceBy()`

## Testing Strategy

### Test Categories & Commands
- **Unit Tests** (`tests/unit/`) - Component testing, mirrors source structure
- **Functional Tests** (`tests/functional/`) - End-to-end with `.md`/`.html` pairs
- **Pathological Tests** (`tests/pathological/`) - Security/DoS prevention
- **Extension Tests** (`tests/functional/Extension/`) - Per-extension testing

### Running Tests
- `composer test` - Full test suite
- `composer phpunit` - PHPUnit tests only
- `composer pathological` - Security/performance tests

## Security Configuration (CRITICAL for Untrusted Input)

When handling untrusted user input, certain security settings are essential to prevent XSS, DoS, and other attacks. These particular ones should be checked where necessary:

### HTML Input Security (`html_input`)

**Implementation**: `HtmlFilter::filter()` in `HtmlBlockRenderer` and `HtmlInlineRenderer`
**Default**: `'allow'` (unsafe for untrusted input)
**Attack Vector**: XSS through raw HTML injection

**Options**:
- `HtmlFilter::STRIP` returns empty string
- `HtmlFilter::ESCAPE` uses `htmlspecialchars($html, ENT_NOQUOTES)`
- `HtmlFilter::ALLOW` returns raw HTML unchanged

### Unsafe Links Protection (`allow_unsafe_links`)

**Implementation**: `RegexHelper::isLinkPotentiallyUnsafe()` in `LinkRenderer` and `ImageRenderer`
**Default**: `true` (allows unsafe links)
**Attack Vector**: XSS through malicious protocols (javascript:, vbscript:, file:, data:)