File: TextDocument.Rd

package info (click to toggle)
r-cran-nlp 0.1-9-1~bpo8%2B1
  • links: PTS, VCS
  • area: main
  • in suites: jessie-backports
  • size: 376 kB
  • sloc: makefile: 1
file content (35 lines) | stat: -rw-r--r-- 1,519 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
\name{TextDocument}
\alias{TextDocument}
\title{Text Documents}
\description{
  Representing and computing on text documents.
}
\details{
  \emph{Text documents} are documents containing (natural language)
  text.  In packages which employ the infrastructure provided by package
  \pkg{NLP}, such documents are represented via the virtual S3 class
  \code{"TextDocument"}: such packages then provide S3 text document
  classes extending the virtual base class (such as the
  \code{\link{AnnotatedPlainTextDocument}} objects provided by package
  \pkg{NLP} itself).

  All extension classes must provide an \code{\link{as.character}()}
  method which extracts the natural language text in documents of the
  respective classes in a \dQuote{suitable} (not necessarily structured)
  form, as well as \code{\link{content}()} and \code{\link{meta}()}
  methods for accessing the (possibly raw) document content and metadata.

  In addition, the infrastructure features the generic functions
  \code{\link{words}()}, \code{\link{sents}()}, etc., for which
  extension classes can provide methods giving a structured view of the
  text contained in documents of these classes (returning, e.g., a
  character vector with the word tokens in these documents, and a list
  of such character vectors).
}
\seealso{
  \code{\link{AnnotatedPlainTextDocument}},
  \code{\link{CoNLLTextDocument}},
  \code{\link{TaggedTextDocument}}, and
  \code{\link{WordListDocument}}
  for the text document classes provided by package \pkg{NLP}.
}