File: features.Rd

package info (click to toggle)
r-cran-nlp 0.3-2-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 456 kB
  • sloc: makefile: 2
file content (43 lines) | stat: -rw-r--r-- 1,601 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
\name{features}
\alias{features}
\title{Extract Annotation Features}
\description{
  Conveniently extract features from annotations and annotated plain
  text documents.
}
\usage{
features(x, type = NULL, simplify = TRUE)
}
\arguments{
  \item{x}{an object inheriting from class \code{"Annotation"} or
    \code{"AnnotatedPlainTextDocument"}.}
  \item{type}{a character vector of annotation types to be used for
    selecting annotations, or \code{NULL} (default) to use all
    annotations.  When selecting, the elements of \code{type} will
    partially be matched against the annotation types.}
  \item{simplify}{a logical indicating whether to simplify feature
    values to a vector.}
}
\details{
  \code{features()} conveniently gathers all feature tag-value pairs in
  the selected annotations into a data frame with variables the values
  for all tags found (using a \code{NULL} value for tags without a
  value).  In general, variables will be \emph{lists} of extracted
  values.  By default, variables where all elements are length one
  atomic vectors are simplified into an atomic vector of values.  The
  values for specific tags can be extracted by suitably subscripting the
  obtained data frame. 
}
\examples{
## Use a pre-built annotated plain text document,
## see ? AnnotatedPlainTextDocument.
d <- readRDS(system.file("texts", "stanford.rds", package = "NLP"))
## Extract features of all *word* annotations in doc:
x <- features(d, "word")
## Could also have abbreviated "word" to "w".
x
## Only lemmas:
x$lemma
## Words together with lemmas:
paste(words(d), x$lemma, sep = "/")
}