File: only.1

package info (click to toggle)
only 0.0.6.0-2
links: PTS
area: main
in suites: bookworm, bullseye, buster
size: 208 kB
sloc: haskell: 304; sh: 81; makefile: 12
file content (204 lines) | stat: -rw-r--r-- 5,820 bytes
.\" Process this file with
.\" groff -man -Tascii foo.1
.\"
.TH ONLY 1 "" Haskell ""
.SH NAME
only \- an advanced filter for words, lines, and more
.SH SYNOPSIS
.B only [\-[bcwlf]
.I EXPR
.B ] ...
.I file
.B ...
.SH DESCRIPTION
.B Only
is an advanced filtering tool, like
.B grep, 
but instead of filtering only on lines,
it can also filter on characters or words, called
.I tokens
in general.
When tokens 
.I match, 
there are two options that allow for greater control than
.B grep. 
They can appear before and/or after a
.I regex
and are called
.I absolute indices
and
.I relative indices, 
respectively. 
.I Absolute indices
refer to matches, whereas
.I relative indices
refer to tokens, with the match being token zero.
For example, 
.B -l
.I N/regex/M
will show the M-th line after the N-th occurance of
.I regex.
.P
For a more detailed description, see below.
.SH OPTIONS
.IP (-b|--bytes)=EXPR
Byte mode
.IP (-c|--chars)=EXPR
Character mode
.IP (-w|--words)=EXPR
Word mode
.IP (-l|--lines)=EXPR
Line mode
.IP (-f|--files)=EXPR
File mode
.SH EXTENDED DESCRIPTION
The original goal of
.B 'only'
was to combine the features of
.B head,
.B tail,
.B grep,
and
.B cut
into a single utility that was capable of all of their features,
but with the power to do so much more. For example, 
.B head
and
.B tail
are good for selecting the first n-lines or last n-lines of a file,
but what if you want lines 10-30? Neither utility would be very
good alone, and combining them to accomplish your goal would be
a nightmare. Granted, one could probably construct a one liner in
.B awk
or
.B perl
to achieve the desired effect, but at the expense of clarity.
.P
To overview the features of
.B only,
there are two major kinds of inputs: 
.I files 
and 
.I modes. 
A file can
either be a filename or
.B '\-'
which means standard input.
The modes currently supported are:
.B bytes,
.B characters,
.B words,
.B lines,
and
.B files.
The difference between each mode is what the pattern /^.*$/ will match.
When no pattern is given, and a number is given instead, then
it will refer to the appropriate token type, for example, the first word, 
the second line, etc. 
.P
In byte mode, the input is broken up into 8-bit octets,
so the patterns must only match a single byte. In character mode, the input
is broken up according to the specified encoding (or UTF-8 if unspecified),
where each character may be multiple bytes. In word mode, the separators
can be any white-space, so it tries to remember what
separator was there in the beginning, 
and puts it back before displaying.
In line mode,
.B only
behaves very similar to
.B grep
but with a few extra features.
In file mode, the filenames are not shown (unless -F is used) but the 
entire file is shown if it matches the pattern.
.SH \ \ \ Syntax
.I Matching expressions
are expressions written in a small language 
that forms a super-set of regular expressions.
The 
.B syntax 
of matching expressions
are the same regardless of what the current mode is.
This is true even of byte mode, where you must write "\\xFF" if you want a
non-printable character. Matching expressions can be as simple as a number
or a word. First,
.B only
tries to parse an expression as a number, then as an expression of the form
.I M/regex/N
and if that fails, then it treats the entire expression as a regex. Each M
and N may be a numeric expression.
.I Numeric expressions
have the syntax (in pseudo-Parsec):

  num = [+\-][0-9]+
  numeric = sepBy numbers ','
  numbers = num ';' num ':' num     # from A to C step (A-B)
          | num ':' num ';' num     # from A to B step C
          | num ':' num             # from A to B
          | num

which means you can specify just a single number (3) or something as complicated
as multiple ranges (such as 3:5,100:109). These numeric expressions can occur on
either side of the regex, or both sides with a combined effect. The
syntax of the entire matching expression is:

  expr = do optional numeric
            c <- punct ; regex ; c
            (try c ; regex ; c ; optional num
               | optional numeric)
       | numeric
       | regex

where
.I punct
is any ASCII punctuation character except ".,:;",
and
.I regex
is a POSIX extended regular expression. 
.\" This serves to discribe the syntax of matching expressions.
.SH \ \ \ Semantics
The
.B semantics
of matching expressions are a little harder to describe. However, a generalization
of the example given above should hold true:

  "N/regex/M"  means the M-th 
.I tokens 
relative to the N-th 
.I matches

The default for
.I N
is 
.B 1:-1 
and the default for
.I M
is 
.B 0. The
.I N
are known as
.I absolute indices,
and the
.I M
are known as
.I relative indices. 
Absolute indices will take the list of matches (the list of tokens that were matched by the regular expression), and apply use the numbers in N as the indices of this list. This gives you the ability to select the first match (1) or the last match (-1). If you use negative numbers, then it will count from the end of file going backwards, so (-2) would be the second to last match. Relative indices will take the list of matches, the original list of tokens, and for each match, it forms a virtual list where 0 refers to the match's index in the list of tokens. This allows one to emulate
.B grep's
\-A (after) and \-B (before) options. 
.P
Here are some equivalent command-lines for "after" a match:

  grep -A3 expr file.txt
  only -l/expr/0:3 file.txt

Here are some equivalent command-lines for "before" a match:

  grep -B3 expr file.txt
  only -l/expr/-3:0 file.txt

Normally, these would be used to select
line numbers, like if you got a compiler error in a file with 10 million lines,
and you just wanted to see the surrounding text.
.SH FILES
.I ~/.onlyrc
.RS
A user configuration file. [Not implemented yet]