File: suffix_extract.Rd

package info (click to toggle)
r-cran-urltools 1.7.2%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: buster
  • size: 512 kB
  • sloc: cpp: 1,234; ansic: 303; sh: 13; makefile: 2
file content (54 lines) | stat: -rw-r--r-- 1,868 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/suffix.R
\name{suffix_extract}
\alias{suffix_extract}
\title{extract the suffix from domain names}
\usage{
suffix_extract(domains, suffixes = NULL)
}
\arguments{
\item{domains}{a vector of damains, from \code{\link{domain}}
or \code{\link{url_parse}}. Alternately, full URLs can be provided
and will then be run through \code{\link{domain}} internally.}

\item{suffixes}{a dataset of suffixes. By default, this is NULL and the function
relies on \code{\link{suffix_dataset}}. Optionally, if you want more updated
suffix data, you can provide the result of \code{\link{suffix_refresh}} for
this parameter.}
}
\value{
a data.frame of four columns, "host" "subdomain", "domain" & "suffix".
"host" is what was passed in. "subdomain" is the subdomain of the suffix.
"domain" contains the part of the domain name that came before the matched suffix.
"suffix" is, well, the suffix.
}
\description{
domain names have suffixes - common endings that people
can or could register domains under. This includes things like ".org", but
also things like ".edu.co". A simple Top Level Domain list, as a
result, probably won't cut it.

\code{\link{suffix_extract}} takes the list of public suffixes,
as maintained by Mozilla (see \code{\link{suffix_dataset}}) and
a vector of domain names, and produces a data.frame containing the
suffix that each domain uses, and the remaining fragment.
}
\examples{

# Using url_parse
domain_name <- url_parse("http://en.wikipedia.org")$domain
suffix_extract(domain_name)

# Using domain()
domain_name <- domain("http://en.wikipedia.org")
suffix_extract(domain_name)

\dontrun{
#Relying on a fresh version of the suffix dataset
suffix_extract(domain("http://en.wikipedia.org"), suffix_refresh())
}

}
\seealso{
\code{\link{suffix_dataset}} for the dataset of suffixes.
}