File: quick-start.md

package info (click to toggle)
zarr 3.1.5-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 3,068 kB
  • sloc: python: 31,589; makefile: 10
file content (176 lines) | stat: -rw-r--r-- 4,849 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
This section  will help you get up and running with
the Zarr library in Python to efficiently manage and analyze multi-dimensional arrays.

### Creating an Array

To get started, you can create a simple Zarr array:

```python exec="true" session="quickstart"
import shutil
shutil.rmtree('data', ignore_errors=True)
import numpy as np
from pprint import pprint
import io
import warnings

warnings.filterwarnings(
  "ignore",
  message="Numcodecs codecs are not in the Zarr version 3 specification*",
  category=UserWarning
)
np.random.seed(0)
```

```python exec="true" session="quickstart" source="above" result="ansi"
import zarr
import numpy as np

# Create a 2D Zarr array
z = zarr.create_array(
    store="data/example-1.zarr",
    shape=(100, 100),
    chunks=(10, 10),
    dtype="f4"
)

# Assign data to the array
z[:, :] = np.random.random((100, 100))
print(z.info)
```

Here, we created a 2D array of shape `(100, 100)`, chunked into blocks of
`(10, 10)`, and filled it with random floating-point data. This array was
written to a `LocalStore` in the `data/example-1.zarr` directory.

#### Compression and Filters

Zarr supports data compression and filters. For example, to use Blosc compression:


```python exec="true" session="quickstart" source="above" result="code"

# Create a 2D Zarr array with Blosc compression
z = zarr.create_array(
    store="data/example-2.zarr",
    shape=(100, 100),
    chunks=(10, 10),
    dtype="f4",
    compressors=zarr.codecs.BloscCodec(
        cname="zstd",
        clevel=3,
        shuffle=zarr.codecs.BloscShuffle.shuffle
    )
)

# Assign data to the array
z[:, :] = np.random.random((100, 100))
print(z.info)
```

This compresses the data using the Blosc codec with shuffle enabled for better compression.


### Hierarchical Groups

Zarr allows you to create hierarchical groups, similar to directories:

```python exec="true" session="quickstart" source="above" result="ansi"

# Create nested groups and add arrays
root = zarr.group("data/example-3.zarr")
foo = root.create_group(name="foo")
bar = root.create_array(
    name="bar", shape=(100, 10), chunks=(10, 10), dtype="f4"
)
spam = foo.create_array(name="spam", shape=(10,), dtype="i4")

# Assign values
bar[:, :] = np.random.random((100, 10))
spam[:] = np.arange(10)

# print the hierarchy
print(root.tree())
```

This creates a group with two datasets: `foo` and `bar`.

#### Batch Hierarchy Creation

Zarr provides tools for creating a collection of arrays and groups with a single function call.
Suppose we want to copy existing groups and arrays into a new storage backend:

```python exec="true" session="quickstart" source="above" result="html"

# Create nested groups and add arrays
root = zarr.group("data/example-4.zarr", attributes={'name': 'root'})
foo = root.create_group(name="foo")
bar = root.create_array(
    name="bar", shape=(100, 10), chunks=(10, 10), dtype="f4"
)
nodes = {'': root.metadata} | {k: v.metadata for k,v in root.members()}
# Report nodes
output = io.StringIO()
pprint(nodes, stream=output, width=60, depth=3)
result = output.getvalue()
print(result)
# Create new hierarchy from nodes
new_nodes = dict(zarr.create_hierarchy(store=zarr.storage.MemoryStore(), nodes=nodes))
new_root = new_nodes['']
assert new_root.attrs == root.attrs
```

Note that [`zarr.create_hierarchy`][] will only initialize arrays and groups -- copying array data must
be done in a separate step.

### Persistent Storage

Zarr supports persistent storage to disk or cloud-compatible backends. While examples above
utilized a [`zarr.storage.LocalStore`][], a number of other storage options are available.

Zarr integrates seamlessly with cloud object storage such as Amazon S3 and Google Cloud Storage
using external libraries like [s3fs](https://s3fs.readthedocs.io) or
[gcsfs](https://gcsfs.readthedocs.io):

```python

import s3fs

z = zarr.create_array("s3://example-bucket/foo", mode="w", shape=(100, 100), chunks=(10, 10), dtype="f4")
z[:, :] = np.random.random((100, 100))
```

A single-file store can also be created using the [`zarr.storage.ZipStore`][]:

```python exec="true" session="quickstart" source="above"

# Store the array in a ZIP file
store = zarr.storage.ZipStore("data/example-5.zip", mode="w")

z = zarr.create_array(
    store=store,
    shape=(100, 100),
    chunks=(10, 10),
    dtype="f4"
)

# write to the array
z[:, :] = np.random.random((100, 100))

# the ZipStore must be explicitly closed
store.close()
```

To open an existing array from a ZIP file:

```python exec="true" session="quickstart" source="above" result="code"

# Open the ZipStore in read-only mode
store = zarr.storage.ZipStore("data/example-5.zip", read_only=True)

z = zarr.open_array(store, mode='r')

# read the data as a NumPy Array
print(z[:])
```

Read more about Zarr's storage options in the [User Guide](user-guide/index.md).