1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511
|
'\" t
.\" $Id$
.tr ~
.TH WNINPUT 5WN "Dec 2006" "WordNet 3.0" "WordNet\(tm File Formats"
.SH NAME
noun.\fIsuffix\fP, verb.\fIsuffix\fP, adj.\fIsuffix\fP, adv.\fIsuffix\fP \-
WordNet lexicographer files that are input to
.BR grind (1WN)
.SH DESCRIPTION
WordNet's source files are written by lexicographers. They are the
product of a detailed relational analysis of lexical semantics: a
variety of lexical and semantic relations are used to represent the
organization of lexical knowledge. Two kinds of building blocks are
distinguished in the source files: word forms and word meanings. Word
forms are represented in their familiar orthography; word meanings are
represented by synonym sets (\fIsynset\fPs) \- lists of synonymous
word forms that are interchangeable in some context. Two kinds of
relations are recognized: lexical and semantic. Lexical relations
hold between word forms; semantic relations hold between word
meanings.
Lexicographer files correspond to the syntactic categories implemented
in WordNet \- noun, verb, adjective and adverb. All of the synsets in
a lexicographer file are in the same syntactic category. Each synset
consists of a list of synonymous words or collocations
(eg. \fB"fountain pen"\fP, \fB"take in"\fP), and pointers that
describe the relations between this synset and other synsets. These
relations include (but are not limited to) hypernymy/hyponymy,
antonymy, entailment, and meronymy/holonymy. A word or collocation
may appear in more than one synset, and in more than one part of
speech. Each use of a word in a synset represents a sense of that
word in the part of speech corresponding to the synset.
Adjectives may be organized into clusters containing head synsets and
satellite synsets. Adverbs generally point to the adjectives from
which they are derived.
See
.BR wngloss (7WN)
for a glossary of WordNet terminology and a discussion of the
database's content and logical organization.
.SS Lexicographer File Names
The names of the lexicographer files are of the form:
.RS
.IR pos . suffix
.RE
where \fIpos\fP is either \fBnoun\fP, \fBverb\fP, \fBadj\fP or
\fBadv\fP. \fIsuffix\fP may be used to organize groups of synsets
into different files, for example \fBnoun.animal\fP and
\fBnoun.plant\fP. See
.BR lexnames (5WN)
for a list of lexicographer file names that are used in building
WordNet.
.SS Pointers
Pointers are used to represent the relations between the words in one
synset and another. Semantic pointers represent relations between
word meanings, and therefore pertain to all of the words in the source
and target synsets. Lexical pointers represent relations between word
forms, and pertain only to specific words in the source and target
synsets. The following pointer types are usually used to indicate
lexical relations: Antonym, Pertainym, Participle, Also See, Derivationally
Related. The remaining pointer types are generally used to represent semantic
relations.
A relation from a source to a target synset is formed by specifying
a word from the target synset in the source synset, followed by the
\fIpointer_symbol\fP indicating the pointer type. The location of a pointer
within a synset defines it as either lexical or semantic.
The
.SB "Lexicographer File Format"
section describes the syntax for entering a semantic pointer, and
.SB "Word Syntax"
describes the syntax for entering a lexical pointer.
Although there are many pointer types, only certain types of relations
are permitted between synsets of each syntactic category.
The \fIpointer_symbol\fPs for nouns are:
.RS
.nf
\fB!\fP Antonym
\fB@\fP Hypernym
\fB@i\fP Instance Hypernym
\fB\(ap\fP Hyponym
\fB\(api\fP Instance Hyponym
\fB#m\fP Member holonym
\fB#s\fP Substance holonym
\fB#p\fP Part holonym
\fB%m\fP Member meronym
\fB%s\fP Substance meronym
\fB%p\fP Part meronym
\fB=\fP Attribute
\fB+\fP Derivationally related form
\fB;c\fP Domain of synset - TOPIC
\fB-c\fP Member of this domain - TOPIC
\fB;r\fP Domain of synset - REGION
\fB-r\fP Member of this domain - REGION
\fB;u\fP Domain of synset - USAGE
\fB-u\fP Member of this domain - USAGE
.RE
.fi
The \fIpointer_symbol\fPs for verbs are:
.RS
.nf
\fB!\fP Antonym
\fB@\fP Hypernym
\fB\(ap\fP Hyponym
\fB*\fP Entailment
\fB>\fP Cause
\fB^\fP Also see
\fB$\fP Verb Group
\fB+\fP Derivationally related form
\fB;c\fP Domain of synset - TOPIC
\fB;r\fP Domain of synset - REGION
\fB;u\fP Domain of synset - USAGE
.fi
.RE
The \fIpointer_symbol\fPs for adjectives are:
.RS
.nf
\fB!\fP Antonym
\fB&\fP Similar to
\fB<\fP Participle of verb
\fB\e\fP Pertainym (pertains to noun)
\fB=\fP Attribute
\fB^\fP Also see
\fB;c\fP Domain of synset - TOPIC
\fB;r\fP Domain of synset - REGION
\fB;u\fP Domain of synset - USAGE
.fi
.RE
The \fIpointer_symbol\fPs for adverbs are:
.RS
.nf
\fB!\fP Antonym
\fB\e\fP Derived from adjective
\fB;c\fP Domain of synset - TOPIC
\fB;r\fP Domain of synset - REGION
\fB;u\fP Domain of synset - USAGE
.fi
.RE
Many pointer types are reflexive, meaning that if a synset contains a
pointer to another synset, the other synset should contain a
corresponding reflexive pointer.
.BR grind (1WN)
automatically inserts missing reflexive pointers for the following
pointer types:
.TS
center box ;
c | c
l | l .
\fBPointer\fP \fBReflect\fP
_
Antonym Antonym
Hyponym Hypernym
Hypernym Hyponym
Instance Hyponym Instance Hypernym
Instance Hypernym Instance Hyponym
Holonym Meronym
Meronym Holonym
Similar to Similar to
Attribute Attribute
Verb Group Verb Group
Derivationally Related Derivationally Related
Domain of synset Member of Doman
.TE
.SS Verb Frames
Each verb synset contains a list of generic sentence frames
illustrating the types of simple sentences in which the verbs in the
synset can be used. For some verb senses, example sentences
illustrating actual uses of the verb are provided. (See
.SB "Verb Example Sentences"
in
.BR wndb (5WN).)
Whenever there is no example sentence, the generic sentence frames
specified by the lexicographer are used. The generic sentence frames
are entered in a synset as a comma-separated list of integer frame
numbers. The following list is the text of the generic frames,
preceded by their frame numbers:
.RS
.nf
1 Something ----s
2 Somebody ----s
3 It is ----ing
4 Something is ----ing PP
5 Something ----s something Adjective/Noun
6 Something ----s Adjective/Noun
7 Somebody ----s Adjective
8 Somebody ----s something
9 Somebody ----s somebody
10 Something ----s somebody
11 Something ----s something
12 Something ----s to somebody
13 Somebody ----s on something
14 Somebody ----s somebody something
15 Somebody ----s something to somebody
16 Somebody ----s something from somebody
17 Somebody ----s somebody with something
18 Somebody ----s somebody of something
19 Somebody ----s something on somebody
20 Somebody ----s somebody PP
21 Somebody ----s something PP
22 Somebody ----s PP
23 Somebody's (body part) ----s
24 Somebody ----s somebody to INFINITIVE
25 Somebody ----s somebody INFINITIVE
26 Somebody ----s that CLAUSE
27 Somebody ----s to somebody
28 Somebody ----s to INFINITIVE
29 Somebody ----s whether INFINITIVE
30 Somebody ----s somebody into V-ing something
31 Somebody ----s something with something
32 Somebody ----s INFINITIVE
33 Somebody ----s VERB-ing
34 It ----s that CLAUSE
35 Something ----s INFINITIVE
.fi
.RE
.SS Lexicographer File Format
Synsets are entered one per line, and each line is terminated with a
newline character. A line containing a synset may be as long as
necessary, but no newlines can be entered within a synset. Within a
synset, spaces or tabs may be used to separate entities. Items
enclosed in italicized square brackets may not be present.
The general synset syntax is:
.RS
.nf
\fB{\fP \fI~~words~~pointers~~\fP \fB(\fP \fI~gloss~\fP \fB)~~}\fR
.fi
.RE
Synsets of this form are valid for all syntactic categories except
verb, and are referred to as basic synsets. At least one \fIword\fP
and a \fIgloss\fP are required to form a valid synset. Pointers
entered following all the \fIwords\fP in a synset represent semantic
relations between all the words in the source and target synsets.
For verbs, the basic synset syntax is defined as follows:
.RS
.nf
\fB{\fP \fI~~words~~pointers~~frames~~\fP \fB(\fP ~\fIgloss~\fP \fB)~~}\fR
.fi
.RE
Adjective may be organized into clusters containing one or more head
synsets and optional satellite synsets. Adjective clusters are of the
form:
.RS
.nf
\fB[
\fIhead synset
[satellite synsets]
[\-]
[additional head/satellite synsets]
\fB]\fR
.fi
.RE
Each adjective cluster is enclosed in square brackets, and may have
one or more parts. Each part consists of a head synset and optional
satellite synsets that are conceptually similar to the head synset's
meaning. Parts of a cluster are separated by one or more hyphens
(\fB\-\fP) on a line by themselves, with the terminating square
bracket following the last synset. Head and satellite synsets follow
the syntax of basic synsets, however a "Similar to" pointer must be
specified in a head synset for each of its satellite synsets. Most
adjective clusters contain two antonymous parts. See
.BR wngloss (7WN)
for a discussion of adjective clusters, and
.SB "Special Adjective Syntax"
for more information on adjective cluster syntax.
Synsets for relational adjectives (pertainyms) and participial
adjectives do not adhere to the cluster structure. They use the basic
synset syntax.
Comments can be entered in a lexicographer file by enclosing the text
of the comment in parentheses. Note that comments \fBcannot\fP appear
within a synset, as parentheses within a synset have an entirely
different meaning (see
.SB "Gloss Syntax"
). However, entire synsets (or adjective clusters) can be "commented
out" by enclosing them in parentheses. This is often used by the
lexicographers to verify the syntax of files under development or to
leave a note to oneself while working on entries.
.SS Word Syntax
A synset must have at least one word, and the words of a synset must
appear after the opening brace and before any other synset constructs.
A word may be entered in either the simple word or word/pointer
syntax.
A simple word is of the form:
.RS
.nf
\fIword[\fP \fB(\fP \fImarker\fP \fB)\fP \fI][lex_id]\fP \fB,\fR
.fi
.RE
\fIword\fP may be entered in any combination of upper and lower case
unless it is in an adjective cluster. A collocation is entered by
joining the individual words with an underscore character (\fB_\fP).
Numbers (integer or real) may be entered, either by themselves or as
part of a word string, by following the number with a double quote
(\fB"\fP).
See
.SB "Special Adjective Syntax"
for a description of adjective clusters and markers.
\fIword\fP may be followed by an integer \fIlex_id\fP from \fB1\fP to
\fB15\fP. The \fIlex_id\fP is used to distinguish different senses of
the same word within a lexicographer file. The lexicographer assigns
\fIlex_id\fP values, usually in ascending order, although there is no
requirement that the numbers be consecutive. The default is \fB0\fP,
and does not have to be specified. A \fIlex_id\fP must be used on
pointers if the desired sense has a non-zero \fIlex_id\fP in its
synset specification.
Word/pointer syntax is of the form:
.RS
.nf
\fB[~~\fP \fIword[\fP \fB(\fP \fImarker\fP \fB)\fP \fI][lex_id]\fP \fB,\fP \fI~~pointers~~\fP \fB]\fR
.fi
.RE
This syntax is used when one or more pointers correspond only to the
specific word in the word/pointer set, rather than all the words in
the synset, and represents a lexical relation. Note that a
word/pointer set appears within a synset, therefore the square
brackets used to enclose it are treated differently from those used to
define an adjective cluster. Only one word can be specified in each
word/pointer set, and any number of pointers may be included. A
synset can have any number of word/pointer sets. Each is treated by
.BR grind (1WN)
essentially as a \fIword\fP, so they all must appear
before any synset \fIpointers\fP representing semantic relations.
For verbs, the word/pointer syntax is extended in the following manner
to allow the user to specify generic sentence frames that, like
pointers, correspond only to a specific word, rather than all the
words in the synset. In this case, \fIpointers\fP are optional.
.RS
.nf
\fB[~~\fP \fIword\fP \fB,\fP ~~\fI[pointers]~~frames~~\fP \fB]\fR
.fi
.RE
.SS Pointer Syntax
Pointers are optional in synsets. If a pointer is specified outside
of a word/pointer set, the relation is applied to all of the words in
the synset, including any words specified using the word/pointer
syntax. This indicates a semantic relation between the meanings of
the words in the synsets. If specified within a word/pointer set, the
relation corresponds only to the word in the set and represents a
lexical relation.
A pointer is of the form:
.RS
.nf
\fI[lex_filename\fP\fB:\fP \fI]word[lex_id]\fP\fB,\fP\fIpointer_symbol\fR
.fi
.RE
or:
.RS
.nf
\fI[lex_filename\fP\fB:\fP \fI]word[lex_id]\fP\fB^\fP\fIword[lex_id]\fP\fB,\fP\fIpointer_symbol\fR
.fi
.RE
For pointers, \fIword\fP indicates a word in another synset. When the
second form of a pointer is used, the first \fIword\fP indicates a
word in a head synset, and the second is a word in a satellite of that
cluster. \fIword\fP may be followed by a \fIlex_id\fP that is used to
match the pointer to the correct target synset. The synset containing
\fIword\fP may reside in another lexicographer file. In this case,
\fIword\fP is preceded by \fIlex_filename\fP as shown.
See
.SB "Pointers"
for a list of \fIpointer_symbol\fPs and their meanings.
.SS Verb Frame List Syntax
Frame numbers corresponding to generic sentence frames must be entered
in each verb synset. If a frame list is specified outside of a
word/pointer set, the verb frames in the list apply to all of the
words in the synset, including any words specified using the
word/pointer syntax. If specified within a word/pointer set, the verb
frames in the list correspond only to the word in the set.
A frame number list is entered as follows:
.RS
\fBframes:\fP~~\fIf_num\fP[\fB,\fP\fIf_num...]\fR
.RE
Where \fIf_num\fP specifies a generic frame number.
See
.SB "Verb Frames"
for a list of generic sentences and their corresponding frame numbers.
.SS Gloss Syntax
A gloss is included in all synsets. The lexicographer may enter a
text string of any length desired. A gloss is simply a string
enclosed in parentheses with no embedded carriage returns. It
provides a definition of what the synset represents and/or example
sentences.
.SS Special Adjective Syntax
The syntax for representing antonymous adjective synsets requires
several additional conditions.
The first word of a head synset \fBmust\fP be entered in upper case,
and can be thought of as the head word of the head synset. The
\fIword\fP part of a pointer from one head synset to another head
synset within the same cluster (usually an antonym) must also be
entered in upper case. Usually antonymous adjectives are entered
using the word/pointer syntax described in
.SB "Word Syntax"
to indicate a lexical relation. There is no restriction on the number
of parts that a cluster may have, and some clusters have three parts,
representing antonymous triplets, such as \fBsolid\fP, \fBliquid\fP,
and \fBgas\fP.
A cross-cluster pointer may be specified, allowing a head or satellite
synset to point to a head synset in a different cluster. A
cross-cluster pointer is indicated by entering the \fIword\fP part of
the pointer in upper case.
An adjective may be annotated with a syntactic marker indicating a
limitation on the syntactic position the adjective may have in
relation to noun that it modifies. If so marked, the marker appears
between the word and its following comma. If a \fIlex_id\fP is
specified, the marker immediately follows it. The syntactic markers
are:
.RS
.nf
\fB(p)\fP predicate position
\fB(a)\fP prenominal (attributive) position
\fB(ip)\fP immediately postnominal position
.fi
.RE
.SH EXAMPLES
\fI(Note that these are hypothetical examples not found in the WordNet
lexicographer files.)\fP
Sample noun synsets:
.RS
.nf
{ canine, [ dog1, cat,! ] pooch, canid,@ }
{ collie, dog1,@ (large multi-colored dog with pointy nose) }
{ hound, hunting_dog, pack,#m dog1,@ }
{ dog, }
.fi
.RE
Sample verb synsets:
.RS
.nf
{ [ confuse, clarify,! frames: 1 ] blur, obscure, frames: 8, 10 }
{ [ clarify, confuse,! ] make_clear, interpret,@ frames: 8 }
{ interpret, construe, understand,@ frames: 8 }
.fi
.RE
Sample adjective clusters:
.RS
.nf
[
{ [ HOT, COLD,! ] lukewarm(a), TEPID,^ (hot to the touch) }
{ warm, }
\-
{ [ COLD, HOT,! ] frigid, (cold to the touch) }
{ freezing, }
]
.fi
.RE
Sample adverb synsets:
.RS
.nf
{ [ basically, adj.all:essential^basic,\e ] [ essentially, adj.all:basic^fundamental,\e ] ( by one's very nature )}
{ pointedly, adj.all:pungent^pointed,\e }
{ [ badly, adj.all:bad,\e well,! ] ill, ("He was badly prepared") }
.fi
.RE
.SH SEE ALSO
.BR grind (1),
.BR wnintro (5),
.BR lexnames (5),
.BR wndb (5),
.BR uniqbeg (7),
.BR wngloss (7).
.LP
Fellbaum, C. (1998), ed.
\fI"WordNet: An Electronic Lexical Database"\fP.
MIT Press, Cambridge, MA.
|