File: usage.md

package info (click to toggle)
pfzy 0.3.4-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 224 kB
  • sloc: python: 311; makefile: 17; sh: 5
file content (127 lines) | stat: -rw-r--r-- 3,183 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# Usage

## Matcher

The [pfzy](https://github.com/jhawthorn/fzy) package provides an async entry point [fuzzy_match](https://pfzy.readthedocs.io/en/latest/pages/api.html#pfzy.match.fuzzy_match) to perform
fuzzy matching using a given string against a given list of strings and will perform ranking automatically.

```{code-block} python
---
caption: main.py
---
import asyncio

from pfzy import fuzzy_match

async def main():
  return await fuzzy_match("ab", ["acb", "acbabc"])

if __name__ == "__main__":
  print(asyncio.run(main()))
```

```{code-block} python
>>> python main.py
[{"value": "acbabc", "indices": [3, 4]}, {"value": "acb", "indices": [0, 2]}]
```

### Matching against dictionaries

The second argument can also be a list of dictionary but you'll have to also specify the argument `key` so that
the function knows which key in the dictionary contains the value to match.

```{code-block} python
import asyncio

from pfzy import fuzzy_match

result = asyncio.run(fuzzy_match("ab", [{"val": "acb"}, {"val": "acbabc"}], key="val"))
```

```{code-block} python
>>> print(result)
[{"val": "acbabc", "indices": [3, 4]}, {"val": "acb", "indices": [0, 2]}]
```

### Using different scorer

By default, it uses the [fzy_scorer](#fzy_scorer) to perform string matching if not specified. You can
explicitly set a different scorer using the argument `scorer`. Reference [#Scorer](#scorer) for a list of
available scorers.

```{code-block} python
import asyncio

from pfzy import fuzzy_match, substr_scorer

result = asyncio.run(fuzzy_match("ab", ["acb", "acbabc"], scorer=substr_scorer))
```

```{code-block} python
>>> print(result)
[{'value': 'acbabc', 'indices': [3, 4]}]
```

## Scorer

### [fzy_scorer](https://pfzy.readthedocs.io/en/latest/pages/api.html#pfzy.score.fzy_scorer)

```{Tip}
The higher the score, the higher the string similarity.
```

The `fzy_scorer` uses [fzy](https://github.com/jhawthorn/fzy) matching logic to perform string fuzzy
matching.

The returned value is a tuple with the matching score and the matching indices.

```{code-block} python
from pfzy import fzy_scorer

score, indices = fzy_scorer("ab", "acbabc")
```

```{code-block} python
>>> print(score)
0.98
>>> print(indices)
[3, 4]
```

### [substr_scorer](https://pfzy.readthedocs.io/en/latest/pages/api.html#pfzy.score.substr_scorer)

```{Note}
The score returned by `substr_scorer` might be negative value, but it doesn't mean its not a match.
As a rule of thumb, the higher the score, the higher the string similarity.
```

Use this scorer when exact substring matching is preferred. Different than the [fzy_scorer](#fzy_scorer),
`substr_scorer` only performs exact matching and the score calculation works differently.

The returned value is a tuple with the matching score and the matching indices.

```{code-block} python
from pfzy import substr_scorer

score, indices = substr_scorer("ab", "awsab")
```

```{code-block} python
>>> print(score)
-1.3
>>> print(indices)
[3, 4]
```

```{code-block} python
from pfzy import substr_scorer

score, indices = substr_scorer("ab", "asdafswabc")
```

```{code-block} python
>>> print(score)
-1.6388888888888888
>>> print(indices)
[7, 8]
```