File: PKG-INFO

package info (click to toggle)
python-pysolr 3.11.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 372 kB
  • sloc: python: 2,050; sh: 166; makefile: 9
file content (301 lines) | stat: -rw-r--r-- 9,453 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
Metadata-Version: 2.4
Name: pysolr
Version: 3.11.0
Summary: Lightweight Python client for Apache Solr
Home-page: https://github.com/django-haystack/pysolr/
Author: Daniel Lindsley
Author-email: daniel@toastdriven.com
License: BSD
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
License-File: LICENSE
License-File: AUTHORS
Requires-Dist: requests>=2.32.5
Requires-Dist: setuptools
Provides-Extra: solrcloud
Requires-Dist: kazoo>=2.5.0; extra == "solrcloud"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

======
pysolr
======

``pysolr`` is a lightweight Python client for `Apache Solr`_. It provides an
interface that queries the server and returns results based on the query.

.. _`Apache Solr`: https://solr.apache.org/

Status
======

`Changelog <https://github.com/django-haystack/pysolr/blob/master/CHANGELOG.rst>`_

Features
========

* Basic operations such as selecting, updating & deleting.
* Index optimization.
* `"More Like This" <http://wiki.apache.org/solr/MoreLikeThis>`_ support (if set up in Solr).
* `Spelling correction <http://wiki.apache.org/solr/SpellCheckComponent>`_ (if set up in Solr).
* Timeout support.
* SolrCloud awareness

Requirements
============

* Python 3.10+
* Requests 2.32.5+
* **Optional** - ``simplejson``
* **Optional** - ``kazoo`` for SolrCloud mode

Installation
============

pysolr is on PyPI:

.. code-block:: console

   $ pip install pysolr

Or if you want to install directly from the repository:

.. code-block:: console

    $ python setup.py install

Usage
=====

Basic usage looks like:

.. code-block:: python

    import pysolr

    # Create a client instance. The timeout and authentication options are not required.
    solr = pysolr.Solr('http://localhost:8983/solr/', always_commit=True, [timeout=10], [auth=<type of authentication>])

    # Note that auto_commit defaults to False for performance. You can set
    # `auto_commit=True` to have commands always update the index immediately, make
    # an update call with `commit=True`, or use Solr's `autoCommit` / `commitWithin`
    # to have your data be committed following a particular policy.

    # Do a health check.
    solr.ping()

    # How you'd index data.
    solr.add([
        {
            "id": "doc_1",
            "title": "A test document",
        },
        {
            "id": "doc_2",
            "title": "The Banana: Tasty or Dangerous?",
            "_doc": [
                { "id": "child_doc_1", "title": "peel" },
                { "id": "child_doc_2", "title": "seed" },
            ]
        },
    ])

    # You can index a parent/child document relationship by
    # associating a list of child documents with the special key '_doc'. This
    # is helpful for queries that join together conditions on children and parent
    # documents.

    # Later, searching is easy. In the simple case, just a plain Lucene-style
    # query is fine.
    results = solr.search('bananas')

    # The ``Results`` object stores total results found, by default the top
    # ten most relevant results and any additional data like
    # facets/highlighting/spelling/etc.
    print("Saw {0} result(s).".format(len(results)))

    # Just loop over it to access the results.
    for result in results:
        print("The title is '{0}'.".format(result['title']))

    # For a more advanced query, say involving highlighting, you can pass
    # additional options to Solr.
    results = solr.search('bananas', **{
        'hl': 'true',
        'hl.fragsize': 10,
    })

    # Traverse a cursor using its iterator:
    for doc in solr.search('*:*',fl='id',sort='id ASC',cursorMark='*'):
        print(doc['id'])

    # You can also perform More Like This searches, if your Solr is configured
    # correctly.
    similar = solr.more_like_this(q='id:doc_2', mltfl='text')

    # Finally, you can delete either individual documents,
    solr.delete(id='doc_1')

    # also in batches...
    solr.delete(id=['doc_1', 'doc_2'])

    # ...or all documents.
    solr.delete(q='*:*')

.. code-block:: python

    # For SolrCloud mode, initialize your Solr like this:

    zookeeper = pysolr.ZooKeeper("zkhost1:2181,zkhost2:2181,zkhost3:2181")
    solr = pysolr.SolrCloud(zookeeper, "collection1", auth=<type of authentication>)


Multicore Index
~~~~~~~~~~~~~~~

Simply point the URL to the index core:

.. code-block:: python

    # Setup a Solr instance. The timeout is optional.
    solr = pysolr.Solr('http://localhost:8983/solr/core_0/', timeout=10)


Custom Request Handlers
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Setup a Solr instance. The trailing slash is optional.
    solr = pysolr.Solr('http://localhost:8983/solr/core_0/', search_handler='/autocomplete', use_qt_param=False)


If ``use_qt_param`` is ``True`` it is essential that the name of the handler is
exactly what is configured in ``solrconfig.xml``, including the leading slash
if any. If ``use_qt_param`` is ``False`` (default), the leading and trailing
slashes can be omitted.

If ``search_handler`` is not specified, pysolr will default to ``/select``.

The handlers for MoreLikeThis, Update, Terms etc. all default to the values set
in the ``solrconfig.xml`` SOLR ships with: ``mlt``, ``update``, ``terms`` etc.
The specific methods of pysolr's ``Solr`` class (like ``more_like_this``,
``suggest_terms`` etc.) allow for a kwarg ``handler`` to override that value.
This includes the ``search`` method. Setting a handler in ``search`` explicitly
overrides the ``search_handler`` setting (if any).


Custom Authentication
~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Setup a Solr instance in a kerborized environment
    from requests_kerberos import HTTPKerberosAuth, OPTIONAL
    kerberos_auth = HTTPKerberosAuth(mutual_authentication=OPTIONAL, sanitize_mutual_error_response=False)

    solr = pysolr.Solr('http://localhost:8983/solr/', auth=kerberos_auth)

.. code-block:: python

    # Setup a CloudSolr instance in a kerborized environment
    from requests_kerberos import HTTPKerberosAuth, OPTIONAL
    kerberos_auth = HTTPKerberosAuth(mutual_authentication=OPTIONAL, sanitize_mutual_error_response=False)

    zookeeper = pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")
    solr = pysolr.SolrCloud(zookeeper, "collection", auth=kerberos_auth)


If your Solr servers run off https
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Setup a Solr instance in an https environment
    solr = pysolr.Solr('http://localhost:8983/solr/', verify=path/to/cert.pem)

.. code-block:: python

    # Setup a CloudSolr instance in a kerborized environment

    zookeeper = pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")
    solr = pysolr.SolrCloud(zookeeper, "collection", verify=path/to/cert.perm)


Custom Commit Policy
~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    # Setup a Solr instance. The trailing slash is optional.
    # All requests to Solr will be immediately committed because `always_commit=True`:
    solr = pysolr.Solr('http://localhost:8983/solr/core_0/', search_handler='/autocomplete', always_commit=True)

``always_commit`` signals to the Solr object to either commit or not commit by
default for any solr request. Be sure to change this to ``True`` if you are
upgrading from a version where the default policy was always commit by default.

Functions like ``add`` and ``delete`` also still provide a way to override the
default by passing the ``commit`` kwarg.

It is generally good practice to limit the amount of commits to Solr as
excessive commits risk opening too many searchers or excessive system
resource consumption. See the Solr documentation for more information and
details about the ``autoCommit`` and ``commitWithin`` options:

https://lucene.apache.org/solr/guide/7_7/updatehandlers-in-solrconfig.html#UpdateHandlersinSolrConfig-autoCommit


LICENSE
=======

``pysolr`` is licensed under the New BSD license.

Contributing to pysolr
======================

For consistency, this project uses `pre-commit <https://pre-commit.com/>`_ to manage Git commit hooks:

#. Install the `pre-commit` package: e.g. `brew install pre-commit`,
   `pip install pre-commit`, etc.
#. Run `pre-commit install` each time you check out a new copy of this Git
   repository to ensure that every subsequent commit will be processed by
   running `pre-commit run`, which you may also do as desired. To test the
   entire repository or in a CI scenario, you can check every file rather than
   just the staged ones using `pre-commit run --all`.


Running Tests
=============

The ``run-tests.py`` script will automatically perform the steps below and is
recommended for testing by default unless you need more control.

Running a test Solr instance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Downloading, configuring and running Solr 4 looks like this::

    ./start-solr-test-server.sh

Running the tests
~~~~~~~~~~~~~~~~~

.. code-block:: console

    $ python -m unittest tests