1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
|
# Design
This document covers the architecture and key patterns in `yaml-edit`.
## Lossless editing
The primary goal is lossless editing — modifying YAML files while preserving
formatting, comments, whitespace, quote styles, key ordering, and
anchors/aliases. This distinguishes `yaml-edit` from libraries that parse into
a data model and re-serialize, losing formatting.
## Rowan foundation
The crate is built on [rowan](https://github.com/rust-analyzer/rowan), the
lossless syntax tree library from rust-analyzer. Rowan provides a two-layer
tree: immutable, deduplicated "green" nodes for storage, and mutable "red"
nodes (`SyntaxNode`) for navigation and mutation. Red nodes use interior
mutability, so methods take `&self` rather than `&mut self`.
This is documented in the crate-level docs with examples, since it's
surprising for Rust developers.
## Type hierarchy
The syntax tree types (`Document`, `Mapping`, `Sequence`, `Scalar`,
`TaggedNode`) are thin wrappers around `SyntaxNode`. A `YamlFile` contains one
or more `Document`s, each of which holds a tree of these wrappers.
`MappingEntry` is the key-value pair inside a `Mapping`; `Directive` covers
`%YAML` and `%TAG` headers.
`YamlNode` is a type-erased enum (`Scalar | Mapping | Sequence | TaggedNode |
Alias`) returned by navigation methods like `Mapping::get()`.
## `splice_children` for mutation
Never rebuild entire nodes. Use rowan's `splice_children` to replace only the
parts that changed. This preserves formatting on everything else.
When using `splice_children`, collect the new children into a `Vec` first —
passing an iterator directly causes borrow conflicts:
```rust
let children: Vec<_> = new_node.children_with_tokens()
.map(|c| c.into())
.collect();
self.0.splice_children(range, children);
```
Also note that `splice_children` uses `children_with_tokens()` indices, not
`children()` indices.
## `AsYaml` trait
Mutation APIs accept `impl AsYaml` rather than concrete types. This lets
syntax nodes pass through without serializing to a string and re-parsing,
preserving formatting and comments. Primitive types (`i64`, `&str`, `bool`,
etc.) also implement `AsYaml` for ergonomic use.
The trait:
```rust
pub trait AsYaml {
fn as_node(&self) -> Option<&SyntaxNode>;
fn kind(&self) -> YamlKind;
fn build_content(
&self,
builder: &mut rowan::GreenNodeBuilder,
indent: usize,
flow_context: bool,
) -> bool; // returns true if content ends with NEWLINE
fn is_inline(&self) -> bool;
}
```
Syntax wrappers return `Some` from `as_node()` and copy their existing tree
structure in `build_content()`. Primitive types return `None` and emit fresh
tokens. `is_inline()` controls whether the value goes on the same line as the
key (scalars, empties) or on a new indented line (block collections).
## `ScalarValue`: `string()` vs `parse()`
Two factory methods with explicit intent:
```rust
ScalarValue::string("123") // always a string, no type detection
ScalarValue::parse("123") // detects type → integer
```
## Newline ownership
YAML newlines are terminators, not separators. Every block-style
`MAPPING_ENTRY` and `SEQUENCE_ENTRY` owns its trailing `NEWLINE` token as a
direct child. The parent `MAPPING`/`SEQUENCE` node does not own any newlines
itself.
```
MAPPING_ENTRY
KEY → SCALAR "host"
COLON
WHITESPACE
VALUE → SCALAR "localhost"
NEWLINE ← owned by the entry
```
When inserting or replacing entries, check whether the entry already ends with
a newline (nested collections do, scalars don't) before adding separators.
### Inline vs block values
After a colon in a mapping:
- **Inline** (WHITESPACE after colon): scalars, flow collections, empties
- **Block** (NEWLINE + INDENT after colon): non-empty block collections
After a dash in a sequence, content is always inline. For block
mappings/sequences as sequence items, the first entry shares the dash line;
subsequent entries go on new indented lines. This is the "inline-start"
pattern, as opposed to the "block-start" pattern used after colons.
### CST example
```yaml
config:
host: localhost
port: 8080
```
```
DOCUMENT
MAPPING
MAPPING_ENTRY
KEY → SCALAR "config"
COLON
VALUE
NEWLINE
INDENT " "
MAPPING
MAPPING_ENTRY
KEY → SCALAR "host"
COLON
WHITESPACE
VALUE → SCALAR "localhost"
NEWLINE
INDENT " "
MAPPING_ENTRY
KEY → SCALAR "port"
COLON
WHITESPACE
VALUE → SCALAR "8080"
NEWLINE
NEWLINE
```
## Error handling
The primary error type is `YamlError` (in `src/error.rs`), with variants for
I/O, parse errors (with optional line/column), key-not-found (with available
keys), type mismatch, invalid index, and invalid operation.
Domain-specific error types (`CustomTagError`, `ValidationError`, etc.) are
kept separate intentionally.
## Code organization
- `lib.rs` — public API re-exports
- `yaml.rs` — core types, `YamlFile`, parser
- `lex.rs` — lexer, token types (`SyntaxKind`)
- `parse.rs` — parse result types
- `nodes/` — AST node wrappers (`Document`, `Mapping`, `Sequence`, …)
- `as_yaml.rs` — `AsYaml` trait, `YamlNode` enum, tagged collection types
- `value.rs` — `YamlValue` (deprecated)
- `scalar.rs` — `ScalarValue` and type detection
- `builder.rs` — fluent builder API
- `path.rs` — dot-separated path access
- `error.rs` — error types
- `schema.rs` — schema validation (Failsafe, JSON, Core)
- `custom_tags.rs` — custom tag registry
- `visitor.rs` — visitor pattern traversal
- `anchor_resolution.rs` — anchor/alias resolution
- `error_recovery.rs` — parse error recovery
- `validator.rs` — YAML spec validation
- `debug.rs` — tree visualization
## Checklist
- Use `splice_children` for mutations — don't rebuild nodes
- Methods take `&self`, not `&mut self` (interior mutability)
- Test that formatting is preserved (lossless round-trip)
- Use `debug::print_tree()` to understand the CST structure
- Use `assert_eq!` with exact expected values in tests
## References
- [rowan](https://github.com/rust-analyzer/rowan)
- [YAML 1.2 Spec](https://yaml.org/spec/1.2.2/)
|