File: developer.rst

package info (click to toggle)
fsspec 2025.7.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 9,200 kB
  • sloc: python: 24,285; makefile: 31; sh: 17
file content (122 lines) | stat: -rw-r--r-- 4,576 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
Developing with fsspec
----------------------

Whereas the majority of the documentation describes the use of ``fsspec``
from the end-user's point of view, ``fsspec`` is used by many libraries
as the primary/only interface to file operations.

Clients of the library
~~~~~~~~~~~~~~~~~~~~~~

The most common entrance point for libraries which wish to rely on ``fsspec``
will be ``open`` or ``open_files``, as a way of generating an object compatible
with the python file interface. This actually produces an ``OpenFile`` instance,
which can be serialised across a network, and resources are only engaged when
entering a context, e.g.

.. code-block:: python

    with fsspec.open("protocol://path", 'rb', param=value) as f:
        process_file(f)

Note the backend-specific parameters that can be passed in this call.

In cases where the caller wants to control the context directly, they can use the
``open`` method of the ``OpenFile``, or get the filesystem object directly,
skipping the ``OpenFile`` route. In the latter case, text encoding and compression
are **not** handled for you. The file-like object can also be used as a context
manager, or the ``close()`` method must be called explicitly to release resources.

.. code-block:: python

    # OpenFile route
    of = fsspec.open("protocol://path", 'rb', param=value)
    f = of.open()
    process_file(f)
    f.close()

    # filesystem class route, context
    fs = fsspec.filesystem("protocol", param=value)
    with fs.open("path", "rb") as f:
        process_file(f)

    # filesystem class route, explicit close
    fs = fsspec.filesystem("protocol", param=value)
    f = fs.open("path", "rb")
    process_file(f)
    f.close()

Implementing a backend
~~~~~~~~~~~~~~~~~~~~~~

The class ``AbstractFileSystem`` provides a template of the methods
that a potential implementation should supply, as well as default
implementation of functionality that depends on these. Methods that
*could* be implemented are marked with ``NotImplementedError`` or
``pass`` (the latter specifically for directory operations that might
not be required for some backends where directories are emulated.

Note that not all of the methods need to be implemented: for example,
some implementations may be read-only, in which case things like ``pipe``,
``put``, ``touch``, ``rm``, etc., can be left as not-implemented
(or you might implement them and raise PermissionError, OSError 30 or some
read-only exception).

We may eventually refactor ``AbstractFileSystem`` to split the default implementation,
the set of methods that you might implement in a new backend, and the
documented end-user API.

In order to register a new backend with fsspec, new backends should register
themselves using the `entry_points <https://setuptools.readthedocs.io/en/latest/userguide/quickstart.html#entry-points-and-automatic-script-creation>`_
facility from setuptools. In particular, if you want to register a new
filesystem protocol ``myfs`` which is provided by the ``MyFS`` class in
the ``myfs`` package, add the following to your ``setup.py``:

.. code-block:: python

    setuptools.setup(
        ...
        entry_points={
            'fsspec.specs': [
                'myfs=myfs.MyFS',
            ],
        },
        ...
    )


Alternatively, the previous method of registering a new backend can be used.
That is, new backends must register themselves on import
(``register_implementation``) or post a PR to the ``fsspec`` repo
asking to be included in ``fsspec.registry.known_implementations``.

Implementing async
~~~~~~~~~~~~~~~~~~

Starting in version 0.7.5, we provide async operations for some methods
of some implementations. Async support in storage implementations is
optional. Special considerations are required for async
development, see :doc:`async`.

Developing the library
~~~~~~~~~~~~~~~~~~~~~~

The following can be used to install ``fsspec`` in development mode

.. code-block::

   git clone https://github.com/fsspec/filesystem_spec
   cd filesystem_spec
   pip install -e .[dev,doc,test]

A number of additional dependencies are required to run tests in full, see "ci/environment*.yml", as
well as Docker. Most implementation-specific tests should skip if their requirements are
not met.

Development happens by submitting pull requests (PRs) on github.
This repo adheres to flake8 and black coding conventions. You may wish to install
commit hooks if you intend to make PRs, as linting is done as part of the CI.

Docs use sphinx and the numpy docstring style. Please add an entry to the changelog
along with any PR.