File: userguide_data.md

package info (click to toggle)
python-libpyvinyl 1.2.0-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 1,020 kB
  • sloc: python: 3,213; makefile: 11
file content (210 lines) | stat: -rw-r--r-- 6,921 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
### Data API
`libpyvinyl` provides several abstract classes to create data interfaces.

#### DataCollection
`DataCollection` is a thin layer interface between the Calculator and DataClass. It aggregates
the input and output into a single variable, respectively.

A `DataCollection` can be initialized with several DataClass instances like this:
```py
collection = DataCollection(data_1, data_2, ..., data_n)
```
or with the `add_data()`:
```py
collection = DataCollection()
collection.add_data(data_1, data_2, ..., data_n)
```

A data can be accessed by its key:
```py
data_1 = collection["data_1_key"]
```

A list of data dictionaries of the data in a `DataCollection` can be obtained:
```py
collection.get_data()
```

You can also create a list of the Data objects in the `DataCollection`
```py
collection.to_list()
```

To get an overview of the `DataCollection`, just print it out:
```
print(collection)
```
#### BaseData
A specialized Data class can be created for a kind of data with similar attributes
based on the abstract `BaseData` class. The abstract class provides useful helper
functions and a template for the Data interface.

A file-mapping DataClass will not read the file until the final user calls `get_data()`, which
calls the `read()` method of its `file_format_class` and returns
the python dictionary of the data. The `file_format_class` is defined by one of these functions:

To create/set a DataClass as a python dictionary mapping:
- `from_dict()`: Create a class instance mapping from a python dictionary.
- `set_dict()`: Set the class as a python dictionary mapping.

To create/set a DataClass as a file mapping:
- `set_file()`: Set the class as a file mapping.
- `from_file()`: Create a class instance mapping from a file.

To write the Data class into a file in a certain file format you can:
```py
data_file = data.write(filename = 'test_file', format_class=FormatClass)
```
The file can then be written into a `test_file`, with the FormatClass you specify.

To list the formats supported by the Data Class:
- `list_formats()`: This method prints the return of `supported_formats()`, which needs
to be defined for the derived class.

##### Develop a derived DataClass
A DataClass derived from the `BaseData` class only needs two pieces of information:
- `expected_data`: a dictionary whose key defines the data needed.
- `supported_formats()`, it returns a dictionary describing the supported formats.
The information is extracted from the format class with the `_add_ioformat()` method.
An example:

```py
class NumberData(BaseData):
    def __init__(
        self,
        key,
        data_dict=None,
        filename=None,
        file_format_class=None,
        file_format_kwargs=None,
    ):

        expected_data = {}

        ### DataClass developer's job start
        expected_data["number"] = None
        ### DataClass developer's job end

        super().__init__(
            key,
            expected_data,
            data_dict,
            filename,
            file_format_class,
            file_format_kwargs,
        )

    @classmethod
    def supported_formats(self):
        format_dict = {}
        ### DataClass developer's job start
        self._add_ioformat(format_dict, TXTFormat.TXTFormat)
        self._add_ioformat(format_dict, H5Format.H5Format)
        ### DataClass developer's job end
        return format_dict
```

#### BaseFormat
The Format class is the interface between the exact file and
the python object.

For each derived FormatClass, we have to provide the content of:
- `format_register()`: to provide the meta data of this format.
- `read()`: how do we read the file into a python dictionary, whose
keys must include the keys of the `expected_data` of the DataClass connecting to
this format.
- `write()`: how do we write the data of the DataClass into a file in this format.

Optionally, a direct convert method can be defined to avoid reading the whole
data into the memory. See:
- BaseFormat.direct_convert_formats()
- BaseFormat.convert()

##### read() and write()
The `read()` method needs to return a python dictionary required by its corresponding
Data Class. Example:
```py
class NumberData(BaseData):
    ...
    expected_data = {}

    ### DataClass developer's job start
    expected_data["number"] = None
    ...


class TXTFormat(BaseFormat):
    ...
    @classmethod
    def read(cls, filename: str) -> dict:
            """Read the data from the file with the `filename` to a dictionary. The dictionary will
            be used by its corresponding data class."""
            number = float(np.loadtxt(filename))
            data_dict = {"number": number}
            return data_dict
    ...
```
The `write()` method should call `object.get_data()`, where the `object` is an instance of the FormatClass's corresponding
DataClass, and write the data to the intended file. It is recommended to return a DataClass object mapping to the newly written
file.

```py
class TXTFormat(BaseFormat):
    ...
    @classmethod
    def write(cls, object: NumberData, filename: str, key: str = None):
        """Save the data with the `filename`."""
        data_dict = object.get_data()
        arr = np.array([data_dict["number"]])
        np.savetxt(filename, arr, fmt="%.3f")
        if key is None:
            original_key = object.key
            key = original_key + "_to_TXTFormat"
        return object.from_file(filename, cls, key)
    ...
```


##### Example of a FormatClass:
```py
class TXTFormat(BaseFormat):
    def __init__(self) -> None:
        super().__init__()

    @classmethod
    def format_register(self):
        key = "TXT"
        desciption = "TXT format for NumberData"
        file_extension = ".txt"
        read_kwargs = [""]
        write_kwargs = [""]
        return self._create_format_register(
            key, desciption, file_extension, read_kwargs, write_kwargs
        )

    @staticmethod
    def direct_convert_formats():
        # Assume the format can be converted directly to the formats supported by these classes:
        # AFormat, BFormat
        # Redefine this `direct_convert_formats` for a concrete format class
        return []

    @classmethod
    def read(cls, filename: str) -> dict:
        """Read the data from the file with the `filename` to a dictionary. The dictionary will
        be used by its corresponding data class."""
        number = float(np.loadtxt(filename))
        data_dict = {"number": number}
        return data_dict

    @classmethod
    def write(cls, object: NumberData, filename: str, key: str = None):
        """Save the data with the `filename`."""
        data_dict = object.get_data()
        arr = np.array([data_dict["number"]])
        np.savetxt(filename, arr, fmt="%.3f")
        if key is None:
            original_key = object.key
            key = original_key + "_to_TXTFormat"
        return object.from_file(filename, cls, key)
```