1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
|
@noindent
This package implements the algorithm:
@ifinfo
@example
S. Wu, E. Myers, U. Manber, and W. Miller,
"An O(NP) Sequence Comparison Algorithm,"
Information Processing Letters 35, 6 (1990), 317-323.
@url{http://www.cs.arizona.edu/people/gene/vita.html}
@end example
@end ifinfo
@ifset html
S. Wu, <A HREF="http://www.cs.arizona.edu/people/gene/vita.html">
E. Myers,</A> U. Manber, and W. Miller,
<A HREF="http://www.cs.arizona.edu/people/gene/PAPERS/np_diff.ps">
"An O(NP) Sequence Comparison Algorithm,"</A>
Information Processing Letters 35, 6 (1990), 317-323.
@end ifset
@noindent
If the items being sequenced are text lines, then the computed
edit-list is equivalent to the output of the @dfn{diff} utility
@cindex diff
program. If the items being sequenced are words, then it is like the
lesser known @dfn{spiff} program.
@cindex spiff
@noindent
The values returned by @code{diff:edit-length} can be used to gauge
the degree of match between two sequences.
@noindent
I believe that this algorithm is currently the fastest for these
tasks, but genome sequencing applications fuel extensive research in
this area.
@defun diff:longest-common-subsequence array1 array2 =?
@defunx diff:longest-common-subsequence array1 array2
@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}.
@code{diff:longest-common-subsequence} returns a one-dimensional array of length @code{(quotient (- (+
len1 len2) (fp:edit-length @var{array1} @var{array2})) 2)} holding the longest sequence
common to both @var{array}s.
@end defun
@defun diff:edits array1 array2 =?
@defunx diff:edits array1 array2
@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}.
@code{diff:edits} returns a list of length @code{(fp:edit-length @var{array1} @var{array2})} composed of
a shortest sequence of edits transformaing @var{array1} to @var{array2}.
Each edit is a list of an integer and a symbol:
@table @asis
@item (@var{j} insert)
Inserts @code{(array-ref @var{array1} @var{j})} into the sequence.
@item (@var{k} delete)
Deletes @code{(array-ref @var{array2} @var{k})} from the sequence.
@end table
@end defun
@defun diff:edit-length array1 array2 =?
@defunx diff:edit-length array1 array2
@var{array1} and @var{array2} are one-dimensional arrays. The procedure @var{=?} is used
to compare sequence tokens for equality. @var{=?} defaults to @code{eqv?}.
@code{diff:edit-length} returns the length of the shortest sequence of edits transformaing
@var{array1} to @var{array2}.
@end defun
@example
(diff:longest-common-subsequence '#(f g h i e j c k l m)
'#(f g e h i j k p q r l m))
@result{} #(f g h i j k l m)
(diff:edit-length '#(f g h i e j c k l m)
'#(f g e h i j k p q r l m))
@result{} 6
(pretty-print (diff:edits '#(f g h i e j c k l m)
'#(f g e h i j k p q r l m)))
@print{}
((3 insert) ; e
(4 delete) ; c
(6 delete) ; h
(7 insert) ; p
(8 insert) ; q
(9 insert)) ; r
@end example
|