1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
|
Source: r-cran-tokenizers
Maintainer: Debian R Packages Maintainers <r-pkg-team@alioth-lists.debian.net>
Uploaders: Andreas Tille <tille@debian.org>
Section: gnu-r
Testsuite: autopkgtest-pkg-r
Priority: optional
Build-Depends: debhelper-compat (= 13),
dh-r,
r-base-dev,
r-cran-stringi,
r-cran-rcpp,
r-cran-snowballc
Standards-Version: 4.6.2
Vcs-Browser: https://salsa.debian.org/r-pkg-team/r-cran-tokenizers
Vcs-Git: https://salsa.debian.org/r-pkg-team/r-cran-tokenizers.git
Homepage: https://cran.r-project.org/package=tokenizers
Rules-Requires-Root: no
Package: r-cran-tokenizers
Architecture: any
Depends: ${R:Depends},
${shlibs:Depends},
${misc:Depends}
Recommends: ${R:Recommends}
Suggests: ${R:Suggests}
Description: GNU R fast, consistent tokenization of natural language text
Convert natural language text into tokens. Includes tokenizers for
shingled n-grams, skip n-grams, words, word stems, sentences,
paragraphs, characters, shingled characters, lines, tweets, Penn
Treebank, regular expressions, as well as functions for counting
characters, words, and sentences, and a function for splitting longer
texts into separate documents, each with the same number of words.
The tokenizers have a consistent interface, and the package is built
on the 'stringi' and 'Rcpp' packages for fast yet correct
tokenization in 'UTF-8'.
|