File: wordnet.rst

package info (click to toggle)
python-wn 1.0.0-3
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,100 kB
  • sloc: python: 8,429; xml: 566; sql: 238; makefile: 12
file content (145 lines) | stat: -rw-r--r-- 5,917 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
.. raw:: html

    <style>.center {margin-left:20%}</style>


The Structure of a Wordnet
==========================
A **wordnet** is an online lexicon which is organized by concepts. 

The basic unit of a wordnet is the synonym set (**synset**), a group of words that all refer to the 
same concept. Words and synsets are linked by means of conceptual-semantic relations to form the 
structure of wordnet. 

Words, Senses, and Synsets
--------------------------
We all know that **words** are the basic building blocks of languages, a word is built up with two parts, 
its form and its meaning, but in natural languages, the word form and word meaning are not in an elegant 
one-to-one match, one word form may connect to many different meanings, so hereforth, we need **senses**, 
to work as the unit of word meanings, for example, the word *bank* has at least two senses:

1. bank\ :sup:`1`\: financial institution, like *City Bank*;
2. bank\ :sup:`2`\: sloping land, like *river bank*;

Since **synsets** are group of words sharing the same concept, bank\ :sup:`1`\ and bank\ :sup:`2`\ are members of 
two different synsets, although they have the same word form.

On the other hand, different word forms may also convey the same concept, such as *cab* and *taxi*, 
these word forms with the same concept are grouped together into one synset.

.. raw:: html
    :file: images/word-sense-synset.svg


.. role:: center
    :class: center

:center:`Figure: relations between words, senses and synsets`


Synset Relations
----------------
In wordnet, synsets are linked with each other to form various kinds of relations. For example, if 
the concept expressed by a synset is more general than a given synset, then it is in a 
*hypernym* relation with the given synset. As shown in the figure below, the synset with *car*, *auto* and *automobile* as its 
member is the *hypernym* of the other synset with *cab*, *taxi* and *hack*. Such relation which is built on 
the synset level is categorized as synset relations.

.. raw:: html
    :file: images/synset-synset.svg

:center:`Figure: example of synset relations`

Sense Relations
---------------

Some relations in wordnet are also built on sense level, which can be further divided into two types, 
relations that link sense with another sense, and relations that link sense with another synset.

.. note::  In wordnet, synset relation and sense relation can both employ a particular 
    relation type, such as `domain topic <https://globalwordnet.github.io/gwadoc/#domain_topic>`_.

**Sense-Sense**

Sense to sense relations emphasize the connections between different senses, especially when dealing 
with morphologically related words. For example, *behavioral* is the adjective to the noun *behavior*, 
which is known as in the *pertainym* relation with *behavior*, however, such relation doesn't exist between 
*behavioral* and *conduct*, which is a synonym of *behavior* and is in the same synset. Here *pertainym* 
is a sense-sense relation.

.. raw:: html
    :file: images/sense-sense.svg

:center:`Figure: example of sense-sense relations`

**Sense-Synset**

Sense-synset relations connect a particular sense with a synset. For example, *cursor* is a term in the 
*computer science* discipline, in wordnet, it is in the *has domain topic* relation with the 
*computer science* synset, but *pointer*, which is in the same synset with *cursor*, is not a term, thus 
has no such relation with *computer science* synset.

.. raw:: html
    :file: images/sense-synset.svg

:center:`Figure: example of sense-synset relations`

Other Information
-----------------
A wordnet should be built in an appropriate form, two schemas are accepted:

* XML schema based on the Lexical Markup Framework (LMF)
* JSON-LD using the Lexicon Model for Ontologies

The structure of a wordnet should contain below info:

**Definition**

Definition is used to define senses and synsets in a wordnet, it is given in the language 
of the wordnet it came from. 

**Example**

Example is used to clarify the senses and synsets in a wordnet, users can understand the definition 
more clearly with a given example.

**Metadata**

A wordnet has its own metadata, based on the `Dublin Core <https://dublincore.org/>`_, to state the 
basic info of it, below table lists all the items in the metadata of a wordnet:

+------------------+-----------+-----------+
| contributor      | Optional  |  str      |
+------------------+-----------+-----------+
| coverage         | Optional  |  str      |
+------------------+-----------+-----------+
| creator          | Optional  |  str      |
+------------------+-----------+-----------+
| date             | Optional  |  str      |
+------------------+-----------+-----------+
| description      | Optional  |  str      |
+------------------+-----------+-----------+
| format           | Optional  |  str      |
+------------------+-----------+-----------+
| identifier       | Optional  |  str      |
+------------------+-----------+-----------+
| publisher        | Optional  |  str      |
+------------------+-----------+-----------+
| relation         | Optional  |  str      |
+------------------+-----------+-----------+
| rights           | Optional  |  str      |
+------------------+-----------+-----------+
| source           | Optional  |  str      |
+------------------+-----------+-----------+
| subject          | Optional  |  str      |
+------------------+-----------+-----------+
| title            | Optional  |  str      |
+------------------+-----------+-----------+
| type             | Optional  |  str      |
+------------------+-----------+-----------+
| status           | Optional  |  str      |
+------------------+-----------+-----------+
| note             | Optional  |  str      |
+------------------+-----------+-----------+
| confidence       | Optional  |  float    |
+------------------+-----------+-----------+