1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
|
======
Parsel
======
.. image:: https://github.com/scrapy/parsel/actions/workflows/tests.yml/badge.svg
:target: https://github.com/scrapy/parsel/actions/workflows/tests.yml
:alt: Tests
.. image:: https://img.shields.io/pypi/pyversions/parsel.svg
:target: https://github.com/scrapy/parsel/actions/workflows/tests.yml
:alt: Supported Python versions
.. image:: https://img.shields.io/pypi/v/parsel.svg
:target: https://pypi.python.org/pypi/parsel
:alt: PyPI Version
.. image:: https://img.shields.io/codecov/c/github/scrapy/parsel/master.svg
:target: https://codecov.io/github/scrapy/parsel?branch=master
:alt: Coverage report
Parsel is a BSD-licensed Python_ library to extract and remove data from HTML_
and XML_ using XPath_ and CSS_ selectors, optionally combined with
`regular expressions`_.
Find the Parsel online documentation at https://parsel.readthedocs.org.
Example (`open online demo`_):
.. code-block:: python
>>> from parsel import Selector
>>> selector = Selector(text="""<html>
<body>
<h1>Hello, Parsel!</h1>
<ul>
<li><a href="http://example.com">Link 1</a></li>
<li><a href="http://scrapy.org">Link 2</a></li>
</ul>
</body>
</html>""")
>>> selector.css('h1::text').get()
'Hello, Parsel!'
>>> selector.xpath('//h1/text()').re(r'\w+')
['Hello', 'Parsel']
>>> for li in selector.css('ul > li'):
... print(li.xpath('.//@href').get())
http://example.com
http://scrapy.org
.. _CSS: https://en.wikipedia.org/wiki/Cascading_Style_Sheets
.. _HTML: https://en.wikipedia.org/wiki/HTML
.. _open online demo: https://colab.research.google.com/drive/149VFa6Px3wg7S3SEnUqk--TyBrKplxCN#forceEdit=true&sandboxMode=true
.. _Python: https://www.python.org/
.. _regular expressions: https://docs.python.org/library/re.html
.. _XML: https://en.wikipedia.org/wiki/XML
.. _XPath: https://en.wikipedia.org/wiki/XPath
|