File: README.md

package info (click to toggle)
robot-detection 0.4.0-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid, trixie
  • size: 416 kB
  • sloc: python: 1,440; makefile: 3
file content (27 lines) | stat: -rw-r--r-- 1,058 bytes parent folder | download | duplicates (4)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# robot_detection

robot_detection is a python module to detect if a given HTTP User Agent is a web crawler. It uses the list of registered robots from http://www.robotstxt.org: (Robots Database)[http://www.robotstxt.org/db.html)

## Usage

There is only one, function, ``is_robot`` that takes a string (unicode or not) and returns True iff that string matches a known robot in the robotstxt.org robot database

### Example

    >>> import robot_detection
    >>> robot_detection.is_robot(user_agent_string)

## Updating

You can download a new version of the Robot Database from [this link](http://www.robotstxt.org/dbexport.html).

Download the database dump, and run the file ``robot_detection.py`` with the file as first argument.

    $ wget http://www.robotstxt.org/db/all.txt
    $ python robot_detection.py all.txt

If the database has changed, it'll print out the new version of ``robot_useragents`` variable that you need to put into the source code.

## Tests

Some simple unittests are included. Running the ``tests.py`` file will run the tests.