File: visual-regexp.txt

package info (click to toggle)
visual-regexp 3.1-7
links: PTS, VCS
area: main
in suites: bookworm, forky, sid, trixie
size: 400 kB
sloc: tcl: 3,963; xml: 37; sh: 11; makefile: 9
file content (158 lines) | stat: -rw-r--r-- 6,769 bytes
NAME
  visual-regexp - graphical front-end to write/debug regular expression

SYNOPSIS
  visual-regexp file

DESCRIPTION
  visual-regexp is a program that interactively creates and shows the output
  from regular expressions in Perl. It is ideal for debugging complicated perl
  expressions.

  Helps you to design, debug or more generally work with perl regular expression.
  As it is often difficult to write the right regexp at the first try, this
  tool will show you the effect of your regexp on a sample you can choose.

DESIGN OF REGEXP
  To design regexp, just type the expression in the top text widget.
  Press the 'Go' button to highlight the matched part of the text in the sample
  text widget.

  To get a quickref of the regexp syntax use the menu 'View/Show regexp help'.

  You can specify some options using the checkboxes (please read Tcl help to
  learn the meaning of these options).

RECURSIVE DESIGN OF REGEXPS
  Sometimes you will need more than one step to extract the information you want
  from the sample. For example, imagine you want to retrieve information from
  an HTML table inside an another HTML table :

      <html><body>
      <table border=1>
        <tr><td>
          <table bgcolor="#FFFF00" border=1>
            <tr> <td>One</td> <td>1</td> </tr>
            <tr> <td>Two</td> <td>2</td> </tr>
          </table>
        <tr> <td>Foo</td> <td>Bar</td> </tr>
      </table>
      </body></html>

  You cannot use one global regexp to extract the two lines "One 1" and "Two 2".
  You have to use a first regexp to narrow the processed region.
  Type the following regexp '<table bg[^>]*?>(.*?)</table>' and press 'Go'.
  You see now that the interesting area is shown in blue. Press the Match '1'
  button which will extract the blue text (the regexp to use to get this text
  is then printed on the console).
  Now use '<td>(.*?)</td>.*?<td>(.*?)</td>' to get the information you need.

OPTIMIZATION OF REGEXPS
  When you need to match a list of words, use the menu
  'Insert regexp/Make regexp' to design an optimized version of the word list.

  For example, the list 'aa aab ab ad' is optimized into 'a(ab?|b|d)'.

PROCESSING THE SAMPLE TEXT
  Can use visual-regexp to perform modification of a text.
  Just use the menu 'Select mode/Use replace'. You can now design a regexp to
  match what you want. Then use the replace text widget to enter the substitution
  you want to apply (use \0, \1, \2, ... to match the subregexp, use the color
  to map the number with the matched sub-expressions).

  After the substitution, you can save the new text using the 'File/Save ...'
  menu. You can let the program choose the end-of-line format or force them for
  a specific environment (Unix, Windows, Mac).

KNOWN PROBLEMS
  - Some regexp can consume a lot of CPU time. This seems to be caused by the use
  of -all, -inline and -indices flags together.
  - When a subexpression is not matched (empty match), the last character of the
  previous match are coloured. This is due to a problem in Tcl (bug submitted to
  Scriptics).

REGULAR EXPRESSIONS IN PERL
 METACHARACTERS
  "^"    beginning of string
  "$"    end of string
  "."    any character except newline
  "*"    match 0 or more times
  "+"    match 1 or more times
  "?"    match 0 or 1 times; or: shortest match
  "|"    alternative
  "( )"  grouping; “storing”
  "[ ]"  set of characters
  "{ }"  repetition modifier
  "\\"    quote or special

 REPETITION
  a*           zero or more a’s
  a+           one or more a’s
  a?           zero or one a’s (i.e., optional a)
  a{m}         exactly m a’s
  a{m,}        at least m a’s
  a{m,n}       at least m but at most n a’s
  repetition?  same as repetition but the shortest match is taken

 SPECIAL NOTATIONS WITH \\

  Single characters
  \\t    tab
  \\n    newline
  \\r    return (CR)
  \\xhh  character with hex. code hh

  “Zero-width assertions”
  \\b  “word” boundary
  \\B  not a “word” boundary

  Matching
  \\w  matches any single character classified as a “word” character (alphanumeric or “_”)
  \\W  matches any non-“word” character
  \\s  matches any whitespace character (space, tab, newline)
  \\S  matches any non-whitespace character
  \\d  matches any digit character, equivalent to [0-9]
  \\D  matches any non-digit character

 CHARACTER SETS: SPECIALITIES INSIDE [...]
  [characters]  matches any of the characters in the sequence
  [x-y]         matches any of the characters from x to y (inclusively) in the ASCII code
  [\\-]         matches the hyphen character “-”
  [\\n]         matches the newline; other single character denotations with \ apply normally, too
  [^something]  matches any character except those that [something] denotes; that is, immediately after the leading “[”, the circumflex “^” means “not” applied to all of the rest 

EXAMPLES
  abc        abc (that exact character sequence, but anywhere in the string)
  ^abc       abc at the beginning of the string
  abc$       abc at the end of the string
  a|b        either of a and b
  ^abc|abc$  the string abc at the beginning or at the end of the string
  ab{2,4}c   an a followed by two, three or four b’s followed by a c
  ab{2,}c    an a followed by at least two b’s followed by a c
  ab*c       an a followed by any number (zero or more) of b’s followed by a c
  ab+c       an a followed by one or more b’s followed by a c
  ab?c       an a followed by an optional b followed by a c; that is, either abc or ac
  a.c        an a followed by any single character (not newline) followed by a c
  a\\.c      a.c exactly
  [abc]      any one of a, b and c
  [Aa]bc     either of Abc and abc
  [abc]+     any (nonempty) string of a’s, b’s and c’s (such as a, abba, acbabcacaa)
  [^abc]+    any (nonempty) string which does not contain any of a, b and c (such as 'defg')
  \\d\\d     any two decimal digits, such as 42; same as \\d{2}
  \\w+       a “word”: a nonempty sequence of alphanumeric characters and low lines (underscores), such as foo and 12bar8 and foo_1
  100\\s*mk  the strings 100 and mk optionally separated by any amount of white space (spaces, tabs, newlines)
  abc\\b     abc when followed by a word boundary (e.g. in abc! but not in abcd)
  perl\\B    perl when not followed by a word boundary (e.g. in perlert but not in perl stuff)

REQUIREMENTS
  This program requires Tcl/Tk 8.3.0 or later with the script version.
  Nothing with the standalone program.

SEE ALSO
  perlre(1), perlrequick(1)

AUTHOR
  visual-regexp was written by Laurent Riesterer <laurent.riesterer@free.fr>.

  This manual page was written by Braulio Henrique Marques Souto <braulio@disroot.org>
  for the Debian project (but may be used by others).