1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172
|
--[[
********************************************************************************
* *
* Polygen Grammars Syntax Definition *
* *
* v1.0.1 (2018/01/18) | Highlight v3.41 | Lua 5.3 *
* *
* by Tristano Ajmone *
* *
********************************************************************************
Associated file extensions: ".grm"
Syntax type: EBNF
--------------------------------------------------------------------------------
Polygen is a cross-platform command line tool for generating random sentences
according to a grammar definition -- ie: following custom syntactical and
lexical rules. It takes an Ascii text file ("*.grm") as source program defining
a grammar by means of EBNF-like probabilistic rules and executes it. At each
execution, the grammar will be run against different random seeds, therefore
producing a different text output.
The main goal of Polygen is to generate cursory nonsense for entertainment;
or, in the words of its author, "a first effort towards satyre in computer
science". Polygen was created by Alvise Spanò.
Polygen Website and GitHub repository:
http://www.polygen.org/
https://github.com/alvisespano/Polygen
Polygen grammars documentation (in Italian):
http://www.polygen.org/it/manuale
An outdated English translation (probably from an earlier version of Polygen,
since it doesn't cover the full syntax) can be found at:
http://lapo.it/polygen/polygen-1.0.6-20040705-doc.zip
--------------------------------------------------------------------------------
Written by Tristano Ajmone:
<tajmone@gmail.com>
https://github.com/tajmone
Released into the public domain according to the Unlicense terms:
http://unlicense.org/
--------------------------------------------------------------------------------
--]]
Description="Polygen"
Categories = {"source", "script"}
IgnoreCase=false
EnableIndentation=false
---------------------------------------------------------------------------------
-- DISABLE/OVERRIDE UNUSED SYNTAX ELEMENTS
---------------------------------------------------------------------------------
NEVER_MATCH_RE=[=[ \A(?!x)x ]=] -- A Never-Matching RegEx!
Digits=NEVER_MATCH_RE -- Numbers are just text in Polygen!
Identifiers=NEVER_MATCH_RE -- Highlight's default Identifiers RegEx prevents
-- capturing the Epsilon operator ('_'). Since in this syntax, all identifiers
-- are defined as RegEx Keywords, and because we don't use any Keywords lists,
-- we may as well disable Identifiers by defining them as a never-matching RegEx.
-- NOTE: Defining Identifiers as a never-matching RegEx prevents using Kewyords
-- lists (the parser will fail to capture them).
-- ==============================================================================
-- COMMENTS
-- ==============================================================================
-- OCaml style comments, no nesting: (* ...COMMENT BLOCK... *)
Comments={
{ Block=true,
Nested=false,
Delimiter = {
[=[ \(\* ]=], -- Comment start: '(*'
[=[ \*\) ]=] -- Comment end: '*)'
}
},
}
-- =============================================================================
-- STRINGS
-- =============================================================================
Strings={
------------------------------------------------------------------------------
-- STRING DELIMITERS
------------------------------------------------------------------------------
-- Polygen reckognises only double quotes as string delimiter: "...STRING..."
Delimiter=[=[ " ]=],
--[[----------------------------------------------------------------------------
ESCAPE SEQUENCES
----------------------------------------------------------------------------
Escape sequences can occur only inside strings -- here enforced via a custom
OnStateChange() hook-function, further on. Valid escape sequences:
\\ Backslash
\" Quote
\n New line
\r Carriage return
\b Backspace
\t Tab
\nnn ASCII decimal code (must always be three digits) --]]
Escape=[=[ \\\d{3}|\\[nrbt\\"] ]=],
}
--[[============================================================================
OPERATORS
============================================================================
::= := : ; ^ . , _ | + - > < \
>> << ( ) [ ] { }
--]]
Operators=[=[ ::?=|\^|\.|:|\+|-|>|<|\(|\)|\[|]|\{|}|\||,|;|_|\\ ]=]
-- =============================================================================
-- KEYWORDS
-- =============================================================================
Keywords={
-- KNOWN ISSUES: An unspaced non-terminal symbol definition will be parsed as
-- as a label (eg: 'S::=` and 'X:=`, instead of 'S ::=` and 'X :=`) because of
-- the colon; and a label with spaces before the colon will be parsed as a non-
-- terminal symbol (eg: 'Label :' instead of 'Label:'). Since both usages are
-- considered bad (albeit valid) styles in Polygen grammars (and indeed are
-- rarely found in actual gramamrs), it's not worth implementing complex RegExs
-- to capture such edge cases.
------------------------------------------------------------------------------
-- Non-Terminal Symbol
------------------------------------------------------------------------------
{ Id=1,
Regex=[=[ (?<!\.)([A-Z][A-Za-z0-9]*)\b(?!:) ]=],
Group=1
},
------------------------------------------------------------------------------
-- Label Identifier
------------------------------------------------------------------------------
-- Captures a label identifier at definition time: LABEL: <..definition...>
{ Id=2,
Regex=[=[ ([A-Za-z0-9]+)(?::) ]=],
Group=1
},
------------------------------------------------------------------------------
-- Label Selector
------------------------------------------------------------------------------
-- Either a dot followed by a single Label or by a group of labels within round
-- brackets: .LABEL .(LABEL1|LABEL2) .(++LABEL1|-LABEL2)
-- The dot selector is excluded from the match; the whole bracketed group will
-- be treated as a single keyword (as in PolyGUI tool).
{ Id=3,
Regex=[=[ (?:\.)(\(.*?\)|[A-Za-z0-9]+) ]=],
Group=1
},
}
-- *****************************************************************************
-- * *
-- * CUSTOM HOOK-FUNCTIONS *
-- * *
-- *****************************************************************************
-- =============================================================================
-- Escape Sequences Only Inside String
-- =============================================================================
function OnStateChange(oldState, newState, token, kwgroup)
-- This function ensures that escape sequences outside strings are ignored.
-- Based on André Simon's reply to Issue #23:
-- https://github.com/andre-simon/highlight/issues/23#issuecomment-332002639
if newState==HL_ESC_SEQ and oldState~=HL_STRING then
return HL_STANDARD
end
return newState
end
--[[============================================================================
CHANGELOG
================================================================================
v1.0.1 (2018/01/18) | Highlight v3.41)
- Changed "PolyGen" to "Polygen" (the author has now officially adopted the
latter syntax).
v1.0.0 (2018/01/04) | Highlight v3.41)
- First release.
--]]
|