File: README.rst

package info (click to toggle)
python-parsel 1.7.0%2Bdfsg-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 440 kB
  • sloc: python: 2,203; makefile: 214; xml: 15; sh: 8
file content (58 lines) | stat: -rw-r--r-- 2,012 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
======
Parsel
======

.. image:: https://github.com/scrapy/parsel/actions/workflows/tests.yml/badge.svg
   :target: https://github.com/scrapy/parsel/actions/workflows/tests.yml
   :alt: Tests

.. image:: https://img.shields.io/pypi/pyversions/parsel.svg
   :target: https://github.com/scrapy/parsel/actions/workflows/tests.yml
   :alt: Supported Python versions

.. image:: https://img.shields.io/pypi/v/parsel.svg
   :target: https://pypi.python.org/pypi/parsel
   :alt: PyPI Version

.. image:: https://img.shields.io/codecov/c/github/scrapy/parsel/master.svg
   :target: https://codecov.io/github/scrapy/parsel?branch=master
   :alt: Coverage report


Parsel is a BSD-licensed Python_ library to extract and remove data from HTML_
and XML_ using XPath_ and CSS_ selectors, optionally combined with
`regular expressions`_.

Find the Parsel online documentation at https://parsel.readthedocs.org.

Example (`open online demo`_):

.. code-block:: python

    >>> from parsel import Selector
    >>> selector = Selector(text="""<html>
            <body>
                <h1>Hello, Parsel!</h1>
                <ul>
                    <li><a href="http://example.com">Link 1</a></li>
                    <li><a href="http://scrapy.org">Link 2</a></li>
                </ul>
            </body>
            </html>""")
    >>> selector.css('h1::text').get()
    'Hello, Parsel!'
    >>> selector.xpath('//h1/text()').re(r'\w+')
    ['Hello', 'Parsel']
    >>> for li in selector.css('ul > li'):
    ...     print(li.xpath('.//@href').get())
    http://example.com
    http://scrapy.org


.. _CSS: https://en.wikipedia.org/wiki/Cascading_Style_Sheets
.. _HTML: https://en.wikipedia.org/wiki/HTML
.. _open online demo: https://colab.research.google.com/drive/149VFa6Px3wg7S3SEnUqk--TyBrKplxCN#forceEdit=true&sandboxMode=true
.. _Python: https://www.python.org/
.. _regular expressions: https://docs.python.org/library/re.html
.. _XML: https://en.wikipedia.org/wiki/XML
.. _XPath: https://en.wikipedia.org/wiki/XPath