File: TextDocument.Rd

package info (click to toggle)
r-cran-nlp 0.3-2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 456 kB
  • sloc: makefile: 2
file content (36 lines) | stat: -rw-r--r-- 1,555 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
\name{TextDocument}
\alias{TextDocument}
\title{Text Documents}
\description{
  Representing and computing on text documents.
}
\details{
  \emph{Text documents} are documents containing (natural language)
  text.  In packages which employ the infrastructure provided by package
  \pkg{NLP}, such documents are represented via the virtual S3 class
  \code{"TextDocument"}: such packages then provide S3 text document
  classes extending the virtual base class (such as the
  \code{\link{AnnotatedPlainTextDocument}} objects provided by package
  \pkg{NLP} itself).

  All extension classes must provide an \code{\link{as.character}()}
  method which extracts the natural language text in documents of the
  respective classes in a \dQuote{suitable} (not necessarily structured)
  form, as well as \code{\link{content}()} and \code{\link{meta}()}
  methods for accessing the (possibly raw) document content and metadata.

  In addition, the infrastructure features the generic functions
  \code{\link{words}()}, \code{\link{sents}()}, etc., for which
  extension classes can provide methods giving a structured view of the
  text contained in documents of these classes (returning, e.g., a
  character vector with the word tokens in these documents, and a list
  of such character vectors).
}
\seealso{
  \code{\link{AnnotatedPlainTextDocument}},
  \code{\link{CoNLLTextDocument}},
  \code{\link{CoNLLUTextDocument}},
  \code{\link{TaggedTextDocument}}, and
  \code{\link{WordListDocument}}
  for the text document classes provided by package \pkg{NLP}.
}