File: parser-python.rst

package info (click to toggle)
codelite 17.0.0%2Bdfsg-6
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 136,384 kB
  • sloc: cpp: 491,550; ansic: 280,393; php: 10,259; sh: 8,930; lisp: 7,664; vhdl: 6,518; python: 6,020; lex: 4,920; yacc: 3,123; perl: 2,385; javascript: 1,715; cs: 1,193; xml: 1,110; makefile: 805; cobol: 741; sql: 709; ruby: 620; f90: 566; ada: 534; asm: 464; fortran: 350; objc: 289; tcl: 258; java: 157; erlang: 61; pascal: 51; ml: 49; awk: 44; haskell: 36
file content (40 lines) | stat: -rw-r--r-- 1,723 bytes parent folder | download | duplicates (8)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
.. _python:

======================================================================
The new Python parser
======================================================================

:Maintainer: Colomban Wendling <ban@herbesfolles.org>

Introduction
---------------------------------------------------------------------

The old Python parser was a line-oriented parser that grew way beyond
its capabilities, and ended up riddled with hacks and easily fooled by
perfectly valid input.   By design, it especially had problems dealing
with constructs spanning multiple lines, like triple-quoted strings
or implicitly continued lines; but several less tricky constructs were
also mishandled, and handling of lexical constructs was duplicated and
each clone evolved in its own direction, supporting different features
and having different bugs depending on the location.

All this made it very hard to fix some existing bugs, or add new
features.  To fix this regrettable state of things, the parser has been
rewritten from scratch separating lexical analysis (generating tokens)
from syntactical analysis (understanding what the lexemes mean).
This moves understanding lexemes to a single location, making it
consistent and easier to extend with new lexemes, and lightens the
burden on the parsing code making it more concise, robust and clear.

This rewrite allowed to quite easily fix all known bugs of the old
parser, and add many new features, including:

- Tagging function parameters
- Extraction of decorators
- Proper handling of semicolons
- Extracting multiple variables in a combined declaration
- More accurate support of mixed indentation
- Tagging local variables


The parser should be compatible with the old one.