File: PKG-INFO

package info (click to toggle)
python-protego 0.1.16%2Bdfsg-2
  • links: PTS, VCS
  • area: main
  • in suites: bullseye
  • size: 30,204 kB
  • sloc: python: 1,426; perl: 190; cpp: 33; sh: 12; makefile: 3
file content (113 lines) | stat: -rw-r--r-- 4,354 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
Metadata-Version: 2.1
Name: Protego
Version: 0.1.16
Summary: Pure-Python robots.txt parser with support for modern conventions
Home-page: UNKNOWN
Author: Anubhav Patel
Author-email: anubhavp28@gmail.com
License: BSD
Description: # Protego
        
        ![build-badge](https://api.travis-ci.com/scrapy/protego.svg?branch=master)
        [![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)
        ## Overview
        Protego is a pure-Python `robots.txt` parser with support for modern conventions.
        
        ## Requirements
        * Python 2.7 or Python 3.5+
        * Works on Linux, Windows, Mac OSX, BSD
        
        ## Install
        
        To install Protego, simply use pip:
        
        ```
        pip install protego
        ```
        
        ## Usage
        
        ```python
        >>> from protego import Protego
        >>> robotstxt = """
        ... User-agent: *
        ... Disallow: /
        ... Allow: /about
        ... Allow: /account
        ... Disallow: /account/contact$
        ... Disallow: /account/*/profile
        ... Crawl-delay: 4
        ... Request-rate: 10/1m                 # 10 requests every 1 minute
        ... 
        ... Sitemap: http://example.com/sitemap-index.xml
        ... Host: http://example.co.in
        ... """
        >>> rp = Protego.parse(robotstxt)
        >>> rp.can_fetch("http://example.com/profiles", "mybot")
        False
        >>> rp.can_fetch("http://example.com/about", "mybot")
        True
        >>> rp.can_fetch("http://example.com/account", "mybot")
        True
        >>> rp.can_fetch("http://example.com/account/myuser/profile", "mybot")
        False
        >>> rp.can_fetch("http://example.com/account/contact", "mybot")
        False
        >>> rp.crawl_delay("mybot")
        4.0
        >>> rp.request_rate("mybot")
        RequestRate(requests=10, seconds=60, start_time=None, end_time=None)
        >>> list(rp.sitemaps)
        ['http://example.com/sitemap-index.xml']
        >>> rp.preferred_host
        'http://example.co.in'
        ```
        
        Using Protego with [Requests](https://3.python-requests.org/)
        
        ```python
        >>> from protego import Protego
        >>> import requests
        >>> r = requests.get("https://google.com/robots.txt")
        >>> rp = Protego.parse(r.text)
        >>> rp.can_fetch("https://google.com/search", "mybot")
        False
        >>> rp.can_fetch("https://google.com/search/about", "mybot")
        True
        >>> list(rp.sitemaps)
        ['https://www.google.com/sitemap.xml']
        ```
        
        ## Documentation
        
        Class `protego.Protego`:
            
        ### Properties
        
        * `sitemaps` {`list_iterator`} A list of sitemaps specified in `robots.txt`.
        * `preferred_host` {string} Preferred host specified in `robots.txt`.
        
        ### Methods
        
        * `parse(robotstxt_body)` Parse `robots.txt` and return a new instance of `protego.Protego`. 
        * `can_fetch(url, user_agent)` Return True if the user agent can fetch the URL, otherwise return False.
        * `crawl_delay(user_agent)` Return the crawl delay specified for the user agent as a float. If nothing is specified, return None.
        * `request_rate(user_agent)` Return the request rate specified for the user agent as a named tuple `RequestRate(requests, seconds, start_time, end_time)`. If nothing is specified, return None.
        
Keywords: robots.txt,parser,robots,rep
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
Description-Content-Type: text/markdown