File: tools.rst

package info (click to toggle)
intake 0.6.6-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm
  • size: 6,552 kB
  • sloc: python: 12,408; makefile: 37; sh: 14
file content (284 lines) | stat: -rw-r--r-- 7,713 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
Command Line Tools
==================

The package installs two executable commands: for starting the catalog server; and
a client for accessing catalogs and manipulating the configuration.

.. _configuration:

Configuration
-------------

A file-based configuration service is available to Intake. This file is by default
sought at the location ``~/.intake/conf.yaml``, but either of the environment variables
``INTAKE_CONF_DIR`` or ``INTAKE_CONF_FILE`` can be used to specify another directory
or file. If both are given, the latter takes priority.

At present, the configuration file might look as follows:

.. code-block:: yaml

   auth:
     cls: "intake.auth.base.BaseAuth"
   port: 5000
   catalog_path:
     - /home/myusername/special_dir

These are the defaults, and any parameters not specified will take the values above

* the Intake Server will listen on port 5000 (this can be overridden on the command line,
  see below)
* and the auth system used will be the fully qualified class given (which, for BaseAuth,
  always allows access). For further information on securing
  the Intake Server, see the :ref:`authplugins`.

See ``intake.config.defaults`` for a full list of keys and their default values.

Log Level
---------

The logging level is configurable using Python's built-in logging module.

The config option ``'logging'`` holds the current level for the intake logger, and
can take values such as ``'INFO'`` or ``'DEBUG'``. This can be set in the ``conf.yaml``
file of the config directory (e.g., ``~/.intake/``), or overridden by the environment
variable ``INTAKE_LOG_LEVEL``.

Furthermore, the level and settings of the logger can be changed programmatically in code::

  import logging
  logger = logging.getLogger('intake')
  logger.setLevel(logging.DEBUG)
  logget.addHandler(..)

Intake Server
-------------

The server takes one or more catalog files as input and makes them available on
port 5000 by default.

You can see the full description of the server command with:

::

  >>> intake-server --help

  usage: intake-server [-h] [-p PORT] [--list-entries] [--sys-exit-on-sigterm]
                       [--flatten] [--no-flatten] [-a ADDRESS]
                       FILE [FILE ...]

  Intake Catalog Server

  positional arguments:
    FILE                  Name of catalog YAML file

  optional arguments:
    -h, --help            show this help message and exit
    -p PORT, --port PORT  port number for server to listen on
    --list-entries        list catalog entries at startup
    --sys-exit-on-sigterm
                          internal flag used during unit testing to ensure
                          .coverage file is written
    --flatten
    --no-flatten
    -a ADDRESS, --address ADDRESS
                          address to use as a host, defaults to the address in
                          the configuration file, if provided otherwise localhost
    usage: intake-server [-h] [-p PORT] [--list-entries] [--sys-exit-on-sigterm]
                 [--flatten] [--no-flatten] [-a ADDRESS]
                 FILE [FILE ...]

To start the server with a local catalog file, use the following:

::

  >>> intake-server intake/catalog/tests/catalog1.yml
  Creating catalog from:
    - intake/catalog/tests/catalog1.yml
  catalog_args ['intake/catalog/tests/catalog1.yml']
  Entries: entry1,entry1_part,use_example1
  Listening on port 5000

You can use the catalog client (defined below) using:

::

  $ intake list intake://localhost:5000
  entry1
  entry1_part
  use_example1

Intake Client
-------------

While the Intake data sources will typically be accessed through the Python
API, you can use the client to verify a catalog file.

Unlike the server command, the client has several subcommands to access a
catalog. You can see the list of available subcommands with:

::

  >>> intake --help
  usage: intake {list,describe,exists,get,discover} ...

We go into further detail in the following sections.

List
''''

This subcommand lists the names of all available catalog entries. This is
useful since other subcommands require these names.

If you wish to see the details about each catalog entry, use the ``--full`` flag.
This is equivalent to running the ``intake describe`` subcommand for all catalog
entries.

::

  >>> intake list --help
  usage: intake list [-h] [--full] URI

  positional arguments:
    URI         Catalog URI

  optional arguments:
    -h, --help  show this help message and exit
    --full

::

  >>> intake list intake/catalog/tests/catalog1.yml
  entry1
  entry1_part
  use_example1
  >>> intake list --full intake/catalog/tests/catalog1.yml
  [entry1] container=dataframe
  [entry1] description=entry1 full
  [entry1] direct_access=forbid
  [entry1] user_parameters=[]
  [entry1_part] container=dataframe
  [entry1_part] description=entry1 part
  [entry1_part] direct_access=allow
  [entry1_part] user_parameters=[{'default': '1', 'allowed': ['1', '2'], 'type': u'str', 'name': u'part', 'description': u'part of filename'}]
  [use_example1] container=dataframe
  [use_example1] description=example1 source plugin
  [use_example1] direct_access=forbid
  [use_example1] user_parameters=[]


Describe
''''''''

Given the name of a catalog entry, this subcommand lists the details of the
respective catalog entry.

::

  >>> intake describe --help
  usage: intake describe [-h] URI NAME

  positional arguments:
    URI         Catalog URI
    NAME        Catalog name

  optional arguments:
    -h, --help  show this help message and exit

::

  >>> intake describe intake/catalog/tests/catalog1.yml entry1
  [entry1] container=dataframe
  [entry1] description=entry1 full
  [entry1] direct_access=forbid
  [entry1] user_parameters=[]


Discover
''''''''

Given the name of a catalog entry, this subcommand returns a key-value
description of the data source. The exact details are subject to change.

::

  >>> intake discover --help
  usage: intake discover [-h] URI NAME

  positional arguments:
    URI         Catalog URI
    NAME        Catalog name

  optional arguments:
    -h, --help  show this help message and exit

::

  >>> intake discover intake/catalog/tests/catalog1.yml entry1
  {'npartitions': 2, 'dtype': dtype([('name', 'O'), ('score', '<f8'), ('rank', '<i8')]), 'shape': (None,), 'datashape':None, 'metadata': {'foo': 'bar', 'bar': [1, 2, 3]}}


Exists
''''''

Given the name of a catalog entry, this subcommand returns whether or not the
respective catalog entry is valid.

::

  >>> intake exists --help
  usage: intake exists [-h] URI NAME

  positional arguments:
    URI         Catalog URI
    NAME        Catalog name

  optional arguments:
    -h, --help  show this help message and exit

::

  >>> intake exists intake/catalog/tests/catalog1.yml entry1
  True
  >>> intake exists intake/catalog/tests/catalog1.yml entry2
  False


Get
'''

Given the name of a catalog entry, this subcommand outputs the entire data
source to standard output.

::

  >>> intake get --help
  usage: intake get [-h] URI NAME

  positional arguments:
    URI         Catalog URI
    NAME        Catalog name

  optional arguments:
    -h, --help  show this help message and exit

::

  >>> intake get intake/catalog/tests/catalog1.yml entry1
         name  score  rank
  0    Alice1  100.5     1
  1      Bob1   50.3     2
  2  Charlie1   25.0     3
  3      Eve1   25.0     3
  4    Alice2  100.5     1
  5      Bob2   50.3     2
  6  Charlie2   25.0     3
  7      Eve2   25.0     3


Config and Cache
''''''''''''''''

CLI functions starting with ``intake cache`` and ``intake config`` are available to
provide information about the system: the locations and value of configuration
parameters, and the state of cached files.