File: stringdist-package.Rd

package info (click to toggle)
r-cran-stringdist 0.9.15-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 1,424 kB
  • sloc: ansic: 1,690; sh: 13; makefile: 2
file content (96 lines) | stat: -rw-r--r-- 3,545 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/stringdist.R
\docType{package}
\name{stringdist-package}
\alias{stringdist-package}
\title{A package for string distance calculation and approximate string matching.}
\description{
The \pkg{stringdist} package offers fast and platform-independent string
metrics. Its main purpose is to compute various string distances and to do 
approximate text matching between character vectors. As of version 0.9.3,
it is also possible to compute distances between sequences represented by
integer vectors.
}
\details{
A typical use is to match strings that are not precisely the same. For
example

\code{  amatch(c("hello","g'day"),c("hi","hallo","ola"),maxDist=2)}

returns \code{c(2,NA)} since \code{"hello"} matches closest with
\code{"hallo"}, and within the maximum (optimal string alignment) distance.
The second element, \code{"g'day"}, matches closest with \code{"ola"} but
since the distance equals 4, no match is reported.

A second typical use is to compute string distances. For example 

\code{  stringdist(c("g'day"),c("hi","hallo","ola"))}

Returns \code{c(5,5,4)} since these are the distances between \code{"g'day"}
and respectively \code{"hi"}, \code{"hallo"}, and \code{"ola"}.

A third typical use would be to compute a \code{dist} object. The command

\code{stringdistmatrix(c("foo","bar","boo","baz"))}

returns an object of class \code{dist} that can be used by clustering
algorithms such as \code{stats::hclust}.

A fourth use is to compute string distances between general sequences,
represented as integer vectors (which must be stored in a \code{list}):

\code{seq_dist( list(c(1L,1L,2L)), list(c(1L,2L,1L),c(2L,3L,1L,2L)) )}

The above code yields the vector \code{c(1,2)} (the first shorter first
argument is recycled over the longer second argument)

Besides documentation for each function, the main topics documented are:

\itemize{
\item{\code{\link{stringdist-metrics}} -- string metrics supported by the package}
\item{\code{\link{stringdist-encoding}} -- how encoding is handled by the package}
\item{\code{\link{stringdist-parallelization}} -- on multithreading }
}
}
\section{Acknowledgements}{

\itemize{
  \item{The code for the full Damerau-Levenshtein distance was adapted from Nick Logan's
  \href{https://github.com/ugexe/Text--Levenshtein--Damerau--XS/blob/master/damerau-int.c}{public github repository}.}
  \item{C code for converting UTF-8 to integer was copied from the R core for performance reasons.}
  \item{The code for soundex conversion and string similarity was kindly contributed by Jan van der Laan.}
}
}

\section{Citation}{

If you would like to cite this package, please cite the \href{https://journal.r-project.org/archive/2014-1/loo.pdf}{R Journal Paper}: 
\itemize{
\item{M.P.J. van der Loo (2014). The \code{stringdist} package for approximate string matching. 
 R Journal 6(1) pp 111-122}
}
Or use \code{citation('stringdist')} to get a bibtex item.
}

\seealso{
Useful links:
\itemize{
  \item \url{https://github.com/markvanderloo/stringdist}
  \item Report bugs at \url{https://github.com/markvanderloo/stringdist/issues}
}

}
\author{
\strong{Maintainer}: Mark van der Loo \email{mark.vanderloo@gmail.com} (\href{https://orcid.org/0000-0002-9807-4686}{ORCID})

Other contributors:
\itemize{
  \item Jan van der Laan [contributor]
  \item R Core Team [contributor]
  \item Nick Logan [contributor]
  \item Chris Muir [contributor]
  \item Johannes Gruber [contributor]
  \item Brian Ripley [contributor]
}

}