1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301
|
Metadata-Version: 2.4
Name: pysolr
Version: 3.11.0
Summary: Lightweight Python client for Apache Solr
Home-page: https://github.com/django-haystack/pysolr/
Author: Daniel Lindsley
Author-email: daniel@toastdriven.com
License: BSD
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
License-File: LICENSE
License-File: AUTHORS
Requires-Dist: requests>=2.32.5
Requires-Dist: setuptools
Provides-Extra: solrcloud
Requires-Dist: kazoo>=2.5.0; extra == "solrcloud"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary
======
pysolr
======
``pysolr`` is a lightweight Python client for `Apache Solr`_. It provides an
interface that queries the server and returns results based on the query.
.. _`Apache Solr`: https://solr.apache.org/
Status
======
`Changelog <https://github.com/django-haystack/pysolr/blob/master/CHANGELOG.rst>`_
Features
========
* Basic operations such as selecting, updating & deleting.
* Index optimization.
* `"More Like This" <http://wiki.apache.org/solr/MoreLikeThis>`_ support (if set up in Solr).
* `Spelling correction <http://wiki.apache.org/solr/SpellCheckComponent>`_ (if set up in Solr).
* Timeout support.
* SolrCloud awareness
Requirements
============
* Python 3.10+
* Requests 2.32.5+
* **Optional** - ``simplejson``
* **Optional** - ``kazoo`` for SolrCloud mode
Installation
============
pysolr is on PyPI:
.. code-block:: console
$ pip install pysolr
Or if you want to install directly from the repository:
.. code-block:: console
$ python setup.py install
Usage
=====
Basic usage looks like:
.. code-block:: python
import pysolr
# Create a client instance. The timeout and authentication options are not required.
solr = pysolr.Solr('http://localhost:8983/solr/', always_commit=True, [timeout=10], [auth=<type of authentication>])
# Note that auto_commit defaults to False for performance. You can set
# `auto_commit=True` to have commands always update the index immediately, make
# an update call with `commit=True`, or use Solr's `autoCommit` / `commitWithin`
# to have your data be committed following a particular policy.
# Do a health check.
solr.ping()
# How you'd index data.
solr.add([
{
"id": "doc_1",
"title": "A test document",
},
{
"id": "doc_2",
"title": "The Banana: Tasty or Dangerous?",
"_doc": [
{ "id": "child_doc_1", "title": "peel" },
{ "id": "child_doc_2", "title": "seed" },
]
},
])
# You can index a parent/child document relationship by
# associating a list of child documents with the special key '_doc'. This
# is helpful for queries that join together conditions on children and parent
# documents.
# Later, searching is easy. In the simple case, just a plain Lucene-style
# query is fine.
results = solr.search('bananas')
# The ``Results`` object stores total results found, by default the top
# ten most relevant results and any additional data like
# facets/highlighting/spelling/etc.
print("Saw {0} result(s).".format(len(results)))
# Just loop over it to access the results.
for result in results:
print("The title is '{0}'.".format(result['title']))
# For a more advanced query, say involving highlighting, you can pass
# additional options to Solr.
results = solr.search('bananas', **{
'hl': 'true',
'hl.fragsize': 10,
})
# Traverse a cursor using its iterator:
for doc in solr.search('*:*',fl='id',sort='id ASC',cursorMark='*'):
print(doc['id'])
# You can also perform More Like This searches, if your Solr is configured
# correctly.
similar = solr.more_like_this(q='id:doc_2', mltfl='text')
# Finally, you can delete either individual documents,
solr.delete(id='doc_1')
# also in batches...
solr.delete(id=['doc_1', 'doc_2'])
# ...or all documents.
solr.delete(q='*:*')
.. code-block:: python
# For SolrCloud mode, initialize your Solr like this:
zookeeper = pysolr.ZooKeeper("zkhost1:2181,zkhost2:2181,zkhost3:2181")
solr = pysolr.SolrCloud(zookeeper, "collection1", auth=<type of authentication>)
Multicore Index
~~~~~~~~~~~~~~~
Simply point the URL to the index core:
.. code-block:: python
# Setup a Solr instance. The timeout is optional.
solr = pysolr.Solr('http://localhost:8983/solr/core_0/', timeout=10)
Custom Request Handlers
~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
# Setup a Solr instance. The trailing slash is optional.
solr = pysolr.Solr('http://localhost:8983/solr/core_0/', search_handler='/autocomplete', use_qt_param=False)
If ``use_qt_param`` is ``True`` it is essential that the name of the handler is
exactly what is configured in ``solrconfig.xml``, including the leading slash
if any. If ``use_qt_param`` is ``False`` (default), the leading and trailing
slashes can be omitted.
If ``search_handler`` is not specified, pysolr will default to ``/select``.
The handlers for MoreLikeThis, Update, Terms etc. all default to the values set
in the ``solrconfig.xml`` SOLR ships with: ``mlt``, ``update``, ``terms`` etc.
The specific methods of pysolr's ``Solr`` class (like ``more_like_this``,
``suggest_terms`` etc.) allow for a kwarg ``handler`` to override that value.
This includes the ``search`` method. Setting a handler in ``search`` explicitly
overrides the ``search_handler`` setting (if any).
Custom Authentication
~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
# Setup a Solr instance in a kerborized environment
from requests_kerberos import HTTPKerberosAuth, OPTIONAL
kerberos_auth = HTTPKerberosAuth(mutual_authentication=OPTIONAL, sanitize_mutual_error_response=False)
solr = pysolr.Solr('http://localhost:8983/solr/', auth=kerberos_auth)
.. code-block:: python
# Setup a CloudSolr instance in a kerborized environment
from requests_kerberos import HTTPKerberosAuth, OPTIONAL
kerberos_auth = HTTPKerberosAuth(mutual_authentication=OPTIONAL, sanitize_mutual_error_response=False)
zookeeper = pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")
solr = pysolr.SolrCloud(zookeeper, "collection", auth=kerberos_auth)
If your Solr servers run off https
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
# Setup a Solr instance in an https environment
solr = pysolr.Solr('http://localhost:8983/solr/', verify=path/to/cert.pem)
.. code-block:: python
# Setup a CloudSolr instance in a kerborized environment
zookeeper = pysolr.ZooKeeper("zkhost1:2181/solr, zkhost2:2181,...,zkhostN:2181")
solr = pysolr.SolrCloud(zookeeper, "collection", verify=path/to/cert.perm)
Custom Commit Policy
~~~~~~~~~~~~~~~~~~~~
.. code-block:: python
# Setup a Solr instance. The trailing slash is optional.
# All requests to Solr will be immediately committed because `always_commit=True`:
solr = pysolr.Solr('http://localhost:8983/solr/core_0/', search_handler='/autocomplete', always_commit=True)
``always_commit`` signals to the Solr object to either commit or not commit by
default for any solr request. Be sure to change this to ``True`` if you are
upgrading from a version where the default policy was always commit by default.
Functions like ``add`` and ``delete`` also still provide a way to override the
default by passing the ``commit`` kwarg.
It is generally good practice to limit the amount of commits to Solr as
excessive commits risk opening too many searchers or excessive system
resource consumption. See the Solr documentation for more information and
details about the ``autoCommit`` and ``commitWithin`` options:
https://lucene.apache.org/solr/guide/7_7/updatehandlers-in-solrconfig.html#UpdateHandlersinSolrConfig-autoCommit
LICENSE
=======
``pysolr`` is licensed under the New BSD license.
Contributing to pysolr
======================
For consistency, this project uses `pre-commit <https://pre-commit.com/>`_ to manage Git commit hooks:
#. Install the `pre-commit` package: e.g. `brew install pre-commit`,
`pip install pre-commit`, etc.
#. Run `pre-commit install` each time you check out a new copy of this Git
repository to ensure that every subsequent commit will be processed by
running `pre-commit run`, which you may also do as desired. To test the
entire repository or in a CI scenario, you can check every file rather than
just the staged ones using `pre-commit run --all`.
Running Tests
=============
The ``run-tests.py`` script will automatically perform the steps below and is
recommended for testing by default unless you need more control.
Running a test Solr instance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Downloading, configuring and running Solr 4 looks like this::
./start-solr-test-server.sh
Running the tests
~~~~~~~~~~~~~~~~~
.. code-block:: console
$ python -m unittest tests
|