File: differ.txi

package info (click to toggle)
slib 2d2-1
links: PTS
area: main
in suites: woody
size: 2,468 kB
ctags: 1,726
sloc: lisp: 20,736; makefile: 333; sh: 158
file content (95 lines) | stat: -rw-r--r-- 3,382 bytes
@noindent
This package implements the algorithm:

@ifinfo
@example
S. Wu, E. Myers, U. Manber, and W. Miller,
   "An O(NP) Sequence Comparison Algorithm,"
   Information Processing Letters 35, 6 (1990), 317-323.
   @url{http://www.cs.arizona.edu/people/gene/vita.html}
@end example
@end ifinfo
@ifset html
S. Wu, <A HREF="http://www.cs.arizona.edu/people/gene/vita.html">
E. Myers,</A> U. Manber, and W. Miller,
<A HREF="http://www.cs.arizona.edu/people/gene/PAPERS/np_diff.ps">
"An O(NP) Sequence Comparison Algorithm,"</A>
Information Processing Letters 35, 6 (1990), 317-323.
@end ifset

@noindent
If the items being sequenced are text lines, then the computed
edit-list is equivalent to the output of the @dfn{diff} utility
@cindex diff
program.  If the items being sequenced are words, then it is like the
lesser known @dfn{spiff} program.
@cindex spiff

@noindent
The values returned by @code{diff:edit-length} can be used to gauge
the degree of match between two sequences.

@noindent
I believe that this algorithm is currently the fastest for these
tasks, but genome sequencing applications fuel extensive research in
this area.


@defun diff:longest-common-subsequence array1 array2 =?


@defunx diff:longest-common-subsequence array1 array2
@var{array1} and @var{array2} are one-dimensional arrays.  The procedure @var{=?} is used
to compare sequence tokens for equality.  @var{=?} defaults to @code{eqv?}.
@code{diff:longest-common-subsequence} returns a one-dimensional array of length @code{(quotient (- (+
len1 len2) (fp:edit-length @var{array1} @var{array2})) 2)} holding the longest sequence
common to both @var{array}s.
@end defun

@defun diff:edits array1 array2 =?


@defunx diff:edits array1 array2
@var{array1} and @var{array2} are one-dimensional arrays.  The procedure @var{=?} is used
to compare sequence tokens for equality.  @var{=?} defaults to @code{eqv?}.
@code{diff:edits} returns a list of length @code{(fp:edit-length @var{array1} @var{array2})} composed of
a shortest sequence of edits transformaing @var{array1} to @var{array2}.

Each edit is a list of an integer and a symbol:
@table @asis
@item (@var{j} insert)
Inserts @code{(array-ref @var{array1} @var{j})} into the sequence.
@item (@var{k} delete)
Deletes @code{(array-ref @var{array2} @var{k})} from the sequence.
@end table
@end defun

@defun diff:edit-length array1 array2 =?


@defunx diff:edit-length array1 array2
@var{array1} and @var{array2} are one-dimensional arrays.  The procedure @var{=?} is used
to compare sequence tokens for equality.  @var{=?} defaults to @code{eqv?}.
@code{diff:edit-length} returns the length of the shortest sequence of edits transformaing
@var{array1} to @var{array2}.
@end defun
@example
(diff:longest-common-subsequence '#(f g h i e j c k l m)
                                 '#(f g e h i j k p q r l m))
                                 @result{} #(f g h i j k l m)

(diff:edit-length '#(f g h i e j c k l m)
                  '#(f g e h i j k p q r l m))
@result{} 6

(pretty-print (diff:edits '#(f g h i e j c k l m)
                          '#(f g e h i j k p q r l m)))
@print{}
((3 insert)                           ; e
 (4 delete)                           ; c
 (6 delete)                           ; h
 (7 insert)                           ; p
 (8 insert)                           ; q
 (9 insert))                          ; r
@end example