File: wn.morphy.rst

package info (click to toggle)
python-wn 0.13.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky
  • size: 2,184 kB
  • sloc: python: 7,592; xml: 493; sql: 220; makefile: 12
file content (106 lines) | stat: -rw-r--r-- 3,210 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106

wn.morphy
=========

.. automodule:: wn.morphy

.. seealso::

   The Princeton WordNet `documentation
   <https://wordnet.princeton.edu/documentation/morphy7wn>`_ describes
   the original implementation of Morphy.

   The :doc:`../guides/lemmatization` guide describes how Wn handles
   lemmatization in general.


Initialized and Uninitialized Morphy
------------------------------------

There are two ways of using Morphy in Wn: initialized and
uninitialized.

Unintialized Morphy is a simple callable that returns lemma
*candidates* for some given wordform. That is, the results might not
be valid lemmas, but this is not a problem in practice because
subsequent queries against the database will filter out the invalid
ones. This callable is obtained by creating a :class:`Morphy` object
with no arguments:

>>> from wn import morphy
>>> m = morphy.Morphy()

As an uninitialized Morphy cannot predict which lemmas in the result
are valid, it always returns the original form and any transformations
it can find for each part of speech:

>>> m('lemmata', pos='n')  # exceptional form
{'n': {'lemmata'}}
>>> m('lemmas', pos='n')   # regular morphology with part-of-speech
{'n': {'lemma', 'lemmas'}}
>>> m('lemmas')            # regular morphology for any part-of-speech
{None: {'lemmas'}, 'n': {'lemma'}, 'v': {'lemma'}}
>>> m('wolves')            # invalid forms may be returned
{None: {'wolves'}, 'n': {'wolf', 'wolve'}, 'v': {'wolve', 'wolv'}}


This lemmatizer can also be used with a :class:`wn.Wordnet` object to
expand queries:

>>> import wn
>>> ewn = wn.Wordnet('ewn:2020')
>>> ewn.words('lemmas')
[]
>>> ewn = wn.Wordnet('ewn:2020', lemmatizer=morphy.Morphy())
>>> ewn.words('lemmas')
[Word('ewn-lemma-n')]

An initialized Morphy is created with a :class:`wn.Wordnet` object as
its argument. It then uses the wordnet to build lists of valid lemmas
and exceptional forms (this takes a few seconds). Once this is done,
it will only return lemmas it knows about:

>>> ewn = wn.Wordnet('ewn:2020')
>>> m = morphy.Morphy(ewn)
>>> m('lemmata', pos='n')  # exceptional form
{'n': {'lemma'}}
>>> m('lemmas', pos='n')   # regular morphology with part-of-speech
{'n': {'lemma'}}
>>> m('lemmas')            # regular morphology for any part-of-speech
{'n': {'lemma'}}
>>> m('wolves')            # invalid forms are pre-filtered
{'n': {'wolf'}}

In order to use an initialized Morphy lemmatizer with a
:class:`wn.Wordnet` object, it must be assigned to the object after
creation:

>>> ewn = wn.Wordnet('ewn:2020')  # default: lemmatizer=None
>>> ewn.words('lemmas')
[]
>>> ewn.lemmatizer = morphy.Morphy(ewn)
>>> ewn.words('lemmas')
[Word('ewn-lemma-n')]

There is little to no difference in the results obtained from a
:class:`wn.Wordnet` object using an initialized or uninitialized
:class:`Morphy` object, but there may be slightly different
performance profiles for future queries.


Default Morphy Lemmatizer
-------------------------

As a convenience, an uninitialized Morphy lemmatizer is provided in
this module via the :data:`morphy` member.

.. data:: morphy

   A :class:`Morphy` object created without a :class:`wn.Wordnet`
   object.


The Morphy Class
----------------

.. autoclass:: Morphy