File: PKG-INFO

package info (click to toggle)
python-airr 1.3.1-1
  • links: PTS, VCS
  • area: main
  • in suites: bookworm, bullseye, sid
  • size: 364 kB
  • sloc: python: 1,734; sh: 19; makefile: 10
file content (184 lines) | stat: -rw-r--r-- 8,856 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
Metadata-Version: 1.1
Name: airr
Version: 1.3.1
Summary: AIRR Community Data Representation Standard reference library for antibody and TCR sequencing data.
Home-page: http://docs.airr-community.org
Author: AIRR Community
Author-email: UNKNOWN
License: CC BY 4.0
Description: Installation
        ------------------------------------------------------------------------------
        
        Install in the usual manner from PyPI::
        
            > pip3 install airr --user
        
        Or from the `downloaded <https://github.com/airr-community/airr-standards>`__
        source code directory::
        
            > python3 setup.py install --user
        
        
        Quick Start
        ------------------------------------------------------------------------------
        
        Reading AIRR Repertoire metadata files
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        The ``airr`` package contains functions to read and write AIRR repertoire metadata
        files. The file format is either YAML or JSON, and the package provides a
        light wrapper over the standard parsers. The file needs a ``json``, ``yaml``, or ``yml``
        file extension so that the proper parser is utilized. All of the repertoires are loaded
        into memory at once and no streaming interface is provided::
        
            import airr
        
            # Load the repertoires
            data = airr.load_repertoire('input.airr.json')
            for rep in data['Repertoire']:
                print(rep)
        
        Why are the repertoires in a list versus in a dictionary keyed by the ``repertoire_id``?
        There are two primary reasons for this. First, the ``repertoire_id`` might not have been
        assigned yet. Some systems might allow MiAIRR metadata to be entered but the
        ``repertoire_id`` is assigned to that data later by another process. Without the
        ``repertoire_id``, the data could not be stored in a dictionary. Secondly, the list allows
        the repertoire data to have a default ordering. If you know that the repertoires all have
        a unique ``repertoire_id`` then you can quickly create a dictionary object using a
        comprehension::
        
            rep_dict = { obj['repertoire_id'] : obj for obj in data['Repertoire'] }
        
        Writing AIRR Repertoire metadata files
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        Writing AIRR repertoire metadata is also a light wrapper over standard YAML or JSON
        parsers. The ``airr`` library provides a function to create a blank repertoire object
        in the appropriate format with all of the required fields. As with the load function,
        the complete list of repertoires are written at once, there is no streaming interface::
        
            import airr
        
            # Create some blank repertoire objects in a list
            reps = []
            for i in range(5):
                reps.append(airr.repertoire_template())
        
            # Write the repertoires
            airr.write_repertoire('output.airr.json', reps)
        
        Reading AIRR Rearrangement TSV files
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        The ``airr`` package contains functions to read and write AIRR rearrangement files
        as either iterables or pandas data frames. The usage is straightforward,
        as the file format is a typical tab delimited file, but the package
        performs some additional validation and type conversion beyond using a
        standard CSV reader::
        
            import airr
        
            # Create an iteratable that returns a dictionary for each row
            reader = airr.read_rearrangement('input.tsv')
            for row in reader: print(row)
        
            # Load the entire file into a pandas data frame
            df = airr.load_rearrangement('input.tsv')
        
        Writing AIRR formatted files
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        Similar to the read operations, write functions are provided for either creating
        a writer class to perform row-wise output or writing the entire contents of
        a pandas data frame to a file. Again, usage is straightforward with the ``airr``
        output functions simply performing some type conversion and field ordering
        operations::
        
            import airr
        
            # Create a writer class for iterative row output
            writer = airr.create_rearrangement('output.tsv')
            for row in reader:  writer.write(row)
        
            # Write an entire pandas data frame to a file
            airr.dump_rearrangement(df, 'file.tsv')
        
        By default, ``create_rearrangement`` will only write the ``required`` fields
        in the output file. Additional fields can be included in the output file by
        providing the ``fields`` parameter with an array of additional field names::
        
            # Specify additional fields in the output
            fields = ['new_calc', 'another_field']
            writer = airr.create_rearrangement('output.tsv', fields=fields)
        
        A common operation is to read an AIRR rearrangement file, and then
        write an AIRR rearrangement file with additional fields in it while
        keeping all of the existing fields from the original file. The
        ``derive_rearrangement`` function provides this capability::
        
            import airr
        
            # Read rearrangement data and write new file with additional fields
            reader = airr.read_rearrangement('input.tsv')
            fields = ['new_calc']
            writer = airr.derive_rearrangement('output.tsv', 'input.tsv', fields=fields)
            for row in reader:
                row['new_calc'] = 'a value'
                writer.write(row)
        
        
        Validating AIRR data files
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        The ``airr`` package can validate repertoire and rearrangement data files
        to insure that they contain all required fields and that the fields types
        match the AIRR Schema. This can be done using the ``airr-tools`` command
        line program or the validate functions in the library can be called::
        
            # Validate a rearrangement file
            airr-tools validate rearrangement -a input.tsv
        
            # Validate a repertoire metadata file
            airr-tools validate repertoire -a input.airr.json
        
        Combining Repertoire metadata and Rearrangement files
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        
        The ``airr`` package does not keep track of which repertoire metadata files
        are associated with rearrangement files, so users will need to handle those
        associations themselves. However, in the data, the ``repertoire_id`` field forms
        the link. The typical usage is that a program is going to perform some
        computation on the rearrangements, and it needs access to the repertoire metadata
        as part of the computation logic. This example code shows the basic framework
        for doing that, in this case doing gender specific computation::
        
            import airr
        
            # Load the repertoires
            data = airr.load_repertoire('input.airr.json')
        
            # Put repertoires in dictionary keyed by repertoire_id
            rep_dict = { obj['repertoire_id'] : obj for obj in data['Repertoire'] }
        
            # Create an iteratable for rearrangement data
            reader = airr.read_rearrangement('input.tsv')
            for row in reader:
                # get repertoire metadata with this rearrangement
                rep = rep_dict[row['repertoire_id']]
                
                # check the gender
                if rep['subject']['sex'] == 'male':
                    # do male specific computation
                elif rep['subject']['sex'] == 'female':
                    # do female specific computation
                else:
                    # do other specific computation
        
Keywords: AIRR,bioinformatics,sequencing,immunoglobulin,antibody,adaptive immunity,T cell,B cell,BCR,TCR
Platform: UNKNOWN
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics