File: dataorigin.rst

package info (click to toggle)
astropy 7.2.0-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 34,156 kB
  • sloc: python: 240,839; ansic: 55,852; lex: 8,621; sh: 3,318; xml: 2,399; makefile: 149
file content (129 lines) | stat: -rw-r--r-- 5,958 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
Introduction
------------

Extract basic provenance information from VOTable header. The information is described in
DataOrigin IVOA note: https://www.ivoa.net/documents/DataOrigin/.

DataOrigin includes both the query information (such as publisher, contact, versions, etc.)
and the Dataset origin (such as Creator, bibliographic links, URL, etc.)

This API retrieves Metadata from INFO in VOTable.


Getting Started
---------------

For the following example, we would first reconstruct a VOTable DataOrigin based on a query to
VizieR catalogue J/AJ/167/18. In practice, you would obtain this table directly from
the VO service of interest::

    >>> from astropy.io.votable.dataorigin import add_data_origin_info
    >>> from astropy.io.votable.tree import VOTableFile
    >>> from astropy.table import Column, Table
    >>> # For this example, the table data itself is irrelevant.
    >>> table = Table([
    ...     Column(name="id", data=[1, 2, 3, 4]),
    ...     Column(name="bmag", unit="mag", data=[5.6, 7.9, 12.4, 11.3])])
    >>> votable = VOTableFile().from_table(table)
    >>> votable.description = "Period variations of 32 contact binaries (Hong+, 2024)"
    >>> # Order is important here for the example.
    >>> add_data_origin_info(votable, "ivoid", "ivo://cds.vizier/j/aj/167/18",
    ...                      content="IVOID of underlying data collection")
    >>> add_data_origin_info(votable, "creator", "Hong K.",
    ...                      content="First author or institution")
    >>> add_data_origin_info(votable, "cites", "bibcode:2024AJ....167...18H",
    ...                      content="Article or Data origin sources")
    >>> add_data_origin_info(votable, "editor", "Astronomical Journal (AAS)",
    ...                      content="Editor name (article)")
    >>> add_data_origin_info(votable, "original_date", "2024",
    ...                      content="Year of the article publication")
    >>> # The rest in alphabetical order.
    >>> add_data_origin_info(votable, "citation", "doi:10.26093/cds/vizier.51670018")
    >>> add_data_origin_info(votable, "contact", "cds-question@unistra.fr")
    >>> add_data_origin_info(votable, "publication_date", "2024-11-06")
    >>> add_data_origin_info(votable, "publisher", "CDS")
    >>> add_data_origin_info(votable, "reference_url", "https://cdsarc.cds.unistra.fr/viz-bin/cat/J/AJ/167/18")
    >>> add_data_origin_info(votable, "request_date", "2025-03-05T14:18:05")
    >>> add_data_origin_info(votable, "rights_uri", "https://cds.unistra.fr/vizier-org/licences_vizier.html")
    >>> add_data_origin_info(votable, "server_software", "7.4.5")
    >>> add_data_origin_info(votable, "service_protocol", "ivo://ivoa.net/std/ConeSearch/v1.03")


To extract DataOrigin from VOTable::

    >>> from astropy.io.votable.dataorigin import extract_data_origin
    >>> data_origin = extract_data_origin(votable)
    >>> print(data_origin)
    publisher: CDS
    server_software: 7.4.5
    service_protocol: ivo://ivoa.net/std/ConeSearch/v1.03
    request_date: 2025-03-05T14:18:05
    contact: cds-question@unistra.fr
    <BLANKLINE>
    ivoid: ivo://cds.vizier/j/aj/167/18
    citation: doi:10.26093/cds/vizier.51670018
    reference_url: https://cdsarc.cds.unistra.fr/viz-bin/cat/J/AJ/167/18
    rights_uri: https://cds.unistra.fr/vizier-org/licences_vizier.html
    creator: Hong K.
    editor: Astronomical Journal (AAS)
    cites: bibcode:2024AJ....167...18H
    original_date: 2024
    publication_date: 2024-11-06


Contents and metadata
---------------------

`astropy.io.votable.dataorigin.extract_data_origin` returns a `astropy.io.votable.dataorigin.DataOrigin` (class) container which is made of:

* a `astropy.io.votable.dataorigin.QueryOrigin` (class) container describing the request.
  ``QueryOrigin`` is considered to be unique for the whole VOTable.
  It includes metadata like  the publisher, the contact, date of execution, query, etc.

* a list of `astropy.io.votable.dataorigin.DatasetOrigin` (class) container for each Element having DataOrigin information.
  ``DataSetOrigin`` is a basic provenance of the datasets queried. Each attribute is a list.
  It includes metadata like authors, ivoid, landing pages, ....

Examples
--------

Get the (Data Center) publisher and the Creator of the dataset::

    >>> print(data_origin.query.publisher)
    CDS
    >>> print(data_origin.origin[0].creator)
    ['Hong K.']

Other capabilities
------------------

DataOrigin container includes VO Elements:

* Extract list of `astropy.io.votable.tree.Info`:

    >>> # get DataOrigin with the description of each INFO
    >>> for dataset_origin in data_origin.origin:
    ...    for info in dataset_origin.infos:
    ...        print(f"{info.name}: {info.value} ({info.content})")
    ivoid: ivo://cds.vizier/j/aj/167/18 (IVOID of underlying data collection)
    creator: Hong K. (First author or institution)
    cites: bibcode:2024AJ....167...18H (Article or Data origin sources)
    editor: Astronomical Journal (AAS) (Editor name (article))
    original_date: 2024 (Year of the article publication)
    ...

* Extract tree node `astropy.io.votable.tree.Element`;
  The following example extracts the citation from the header (in APA style):

    >>> # get the Title retrieved in Element
    >>> origin = data_origin.origin[0]
    >>> vo_elt = origin.get_votable_element()
    >>> title = vo_elt.description if vo_elt else ""
    >>> print(f"APA: {','.join(origin.creator)} ({origin.publication_date[0]}). {title} [Dataset]. {data_origin.query.publisher}. {origin.citation[0]}")
    APA: Hong K. (2024-11-06). Period variations of 32 contact binaries (Hong+, 2024) [Dataset]. CDS. doi:10.26093/cds/vizier.51670018

* Add Data Origin INFO into VOTable:

    >>> from astropy.io.votable import dataorigin
    >>> dataorigin.add_data_origin_info(votable, "query", "Data center name")
    >>> dataorigin.add_data_origin_info(votable.resources[0], "creator", "Author name")