1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286
|
.. _topics-index:
==============================
Scrapy |version| documentation
==============================
Scrapy is a fast high-level `web crawling`_ and `web scraping`_ framework, used
to crawl websites and extract structured data from their pages. It can be used
for a wide range of purposes, from data mining to monitoring and automated
testing.
.. _web crawling: https://en.wikipedia.org/wiki/Web_crawler
.. _web scraping: https://en.wikipedia.org/wiki/Web_scraping
.. _getting-help:
Getting help
============
Having trouble? We'd like to help!
* Try the :doc:`FAQ <faq>` -- it's got answers to some common questions.
* Looking for specific information? Try the :ref:`genindex` or :ref:`modindex`.
* Ask or search questions in `StackOverflow using the scrapy tag`_.
* Ask or search questions in the `Scrapy subreddit`_.
* Search for questions on the archives of the `scrapy-users mailing list`_.
* Ask a question in the `#scrapy IRC channel`_,
* Report bugs with Scrapy in our `issue tracker`_.
* Join the Discord community `Scrapy Discord`_.
.. _scrapy-users mailing list: https://groups.google.com/forum/#!forum/scrapy-users
.. _Scrapy subreddit: https://www.reddit.com/r/scrapy/
.. _StackOverflow using the scrapy tag: https://stackoverflow.com/tags/scrapy
.. _#scrapy IRC channel: irc://irc.freenode.net/scrapy
.. _issue tracker: https://github.com/scrapy/scrapy/issues
.. _Scrapy Discord: https://discord.com/invite/mv3yErfpvq
First steps
===========
.. toctree::
:caption: First steps
:hidden:
intro/overview
intro/install
intro/tutorial
intro/examples
:doc:`intro/overview`
Understand what Scrapy is and how it can help you.
:doc:`intro/install`
Get Scrapy installed on your computer.
:doc:`intro/tutorial`
Write your first Scrapy project.
:doc:`intro/examples`
Learn more by playing with a pre-made Scrapy project.
.. _section-basics:
Basic concepts
==============
.. toctree::
:caption: Basic concepts
:hidden:
topics/commands
topics/spiders
topics/selectors
topics/items
topics/loaders
topics/shell
topics/item-pipeline
topics/feed-exports
topics/request-response
topics/link-extractors
topics/settings
topics/exceptions
:doc:`topics/commands`
Learn about the command-line tool used to manage your Scrapy project.
:doc:`topics/spiders`
Write the rules to crawl your websites.
:doc:`topics/selectors`
Extract the data from web pages using XPath.
:doc:`topics/shell`
Test your extraction code in an interactive environment.
:doc:`topics/items`
Define the data you want to scrape.
:doc:`topics/loaders`
Populate your items with the extracted data.
:doc:`topics/item-pipeline`
Post-process and store your scraped data.
:doc:`topics/feed-exports`
Output your scraped data using different formats and storages.
:doc:`topics/request-response`
Understand the classes used to represent HTTP requests and responses.
:doc:`topics/link-extractors`
Convenient classes to extract links to follow from pages.
:doc:`topics/settings`
Learn how to configure Scrapy and see all :ref:`available settings <topics-settings-ref>`.
:doc:`topics/exceptions`
See all available exceptions and their meaning.
Built-in services
=================
.. toctree::
:caption: Built-in services
:hidden:
topics/logging
topics/stats
topics/email
topics/telnetconsole
:doc:`topics/logging`
Learn how to use Python's builtin logging on Scrapy.
:doc:`topics/stats`
Collect statistics about your scraping crawler.
:doc:`topics/email`
Send email notifications when certain events occur.
:doc:`topics/telnetconsole`
Inspect a running crawler using a built-in Python console.
Solving specific problems
=========================
.. toctree::
:caption: Solving specific problems
:hidden:
faq
topics/debug
topics/contracts
topics/practices
topics/broad-crawls
topics/developer-tools
topics/dynamic-content
topics/leaks
topics/media-pipeline
topics/deploy
topics/autothrottle
topics/benchmarking
topics/jobs
topics/coroutines
topics/asyncio
:doc:`faq`
Get answers to most frequently asked questions.
:doc:`topics/debug`
Learn how to debug common problems of your Scrapy spider.
:doc:`topics/contracts`
Learn how to use contracts for testing your spiders.
:doc:`topics/practices`
Get familiar with some Scrapy common practices.
:doc:`topics/broad-crawls`
Tune Scrapy for crawling a lot domains in parallel.
:doc:`topics/developer-tools`
Learn how to scrape with your browser's developer tools.
:doc:`topics/dynamic-content`
Read webpage data that is loaded dynamically.
:doc:`topics/leaks`
Learn how to find and get rid of memory leaks in your crawler.
:doc:`topics/media-pipeline`
Download files and/or images associated with your scraped items.
:doc:`topics/deploy`
Deploying your Scrapy spiders and run them in a remote server.
:doc:`topics/autothrottle`
Adjust crawl rate dynamically based on load.
:doc:`topics/benchmarking`
Check how Scrapy performs on your hardware.
:doc:`topics/jobs`
Learn how to pause and resume crawls for large spiders.
:doc:`topics/coroutines`
Use the :ref:`coroutine syntax <async>`.
:doc:`topics/asyncio`
Use :mod:`asyncio` and :mod:`asyncio`-powered libraries.
.. _extending-scrapy:
Extending Scrapy
================
.. toctree::
:caption: Extending Scrapy
:hidden:
topics/architecture
topics/addons
topics/downloader-middleware
topics/spider-middleware
topics/extensions
topics/signals
topics/scheduler
topics/exporters
topics/components
topics/api
:doc:`topics/architecture`
Understand the Scrapy architecture.
:doc:`topics/addons`
Enable and configure third-party extensions.
:doc:`topics/downloader-middleware`
Customize how pages get requested and downloaded.
:doc:`topics/spider-middleware`
Customize the input and output of your spiders.
:doc:`topics/extensions`
Extend Scrapy with your custom functionality
:doc:`topics/signals`
See all available signals and how to work with them.
:doc:`topics/scheduler`
Understand the scheduler component.
:doc:`topics/exporters`
Quickly export your scraped items to a file (XML, CSV, etc).
:doc:`topics/components`
Learn the common API and some good practices when building custom Scrapy
components.
:doc:`topics/api`
Use it on extensions and middlewares to extend Scrapy functionality.
All the rest
============
.. toctree::
:caption: All the rest
:hidden:
news
contributing
versioning
:doc:`news`
See what has changed in recent Scrapy versions.
:doc:`contributing`
Learn how to contribute to the Scrapy project.
:doc:`versioning`
Understand Scrapy versioning and API stability.
|