1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286
|
Internet Archive Metadata
=========================
`Metadata <https://en.wikipedia.org/wiki/Metadata>`_ is data about data.
In the case of Internet Archive items, the metadata describes the contents of the items.
Metadata can include information such as the performance date for a concert, the name of the artist, and a set list for the event.
Metadata is a very important element of items in the Internet Archive.
Metadata allows people to locate and view information.
Items with little or poor metadata may never be seen and can become lost.
Note that metadata keys must be valid XML tags.
Please refer to the XML Naming Rules section `here <https://www.w3schools.com/xml/xml_elements.asp>`_.
Archive.org Identifiers
-----------------------
Each item at Internet Archive has an identifier. An identifier is composed of any unique combination of alphanumeric characters, underscore (``_``) and dash (``-``). While there are no official limits it is strongly suggested that identifiers be between 5 and 80 characters in length.
Identifiers must be unique across the entirety of Internet Archive, not simply unique within a single collection.
Once defined an identifier **can not** be changed. It will travel with the item or object and is involved in every manner of accessing or referring to the item.
Standard Internet Archive Metadata Fields
-----------------------------------------
There are several standard metadata fields recognized for Internet Archive items.
Most metadata fields are optional.
addeddate
^^^^^^^^^
Contains the date on which the item was added to Internet Archive.
Please use an `ISO 8601`_ compatible format for this date.
For instance, these are all valid date formats:
- YYYY
- YYYY-MM-DD
- YYYY-MM-DD HH:MM:SS
While it is possible to set the ``addeddate`` metadata value it is not recommended.
This value is typically set by automated processes.
adder
^^^^^
The name of the account which added the item to the Internet Archive.
While is is possible to set the ``adder`` metadata value it is not recommended.
This value is typically set by automated processes.
collection
^^^^^^^^^^
A collection is a specialized item used for curation and aggregation of other items.
Assigning an item to a collection defines where the item may be located by a user browsing Internet Archive.
A collection **must** exist prior to assigning any items to it.
Currently collections can only be created by Internet Archive staff members.
Please `contact Internet Archive <mailto:info@archive.org?subject=[Collection Creation Request]>`_ if you need a collection created.
All items **should** belong to a collection.
If a collection is not specified at the time of upload, it will be added to the `Community texts <https://archive.org/details/opensource>`_ collection.
For testing purposes, you may upload to the ``test_collection`` collection. The following collections are also available to the public at the time of writing:
* `Community Audio <https://archive.org/details/opensource_audio>`_
* `Community Media <https://archive.org/details/opensource_media>`_
* `Community Software <https://archive.org/details/open_source_software>`_
* `Community Texts <https://archive.org/details/opensource>`_ (default collection)
* `Community Video <https://archive.org/details/opensource_movies>`_
* `Test collection <https://archive.org/details/test_collection>`_
contributor
^^^^^^^^^^^
The value of the ``contributor`` metadata field is information about the entity responsible for making contributions to the content of the item.
This is often the library, organization or individual making the item available on Internet Archive.
The value of this metadata field may contain HTML. ``<script>`` tags and CSS are not allowed.
coverage
^^^^^^^^
The extent or scope of the content of the material available in the item.
The value of the ``coverage`` metadata field may include geographic place, temporal period, jurisdiction, etc.
For items which contain multi-volume or serial content, place the statement of holdings in this metadata field.
creator
^^^^^^^
An entity primarily responsible for creating the files contained in the item.
credits
^^^^^^^
The participants in the production of the materials contained in the item.
The value of this metadata field may contain HTML. ``<script>`` tags and CSS are not allowed.
date
^^^^
The publication, production or other similar date of this item.
Please use an `ISO 8601`_ compatible format for this date.
description
^^^^^^^^^^^
A description of the item.
The value of this metadata field may contain HTML. ``<script>`` tags and CSS are not allowed.
language
^^^^^^^^
The primary language of the material available in the item.
While the value of the ``language`` metadata field can be any value, Internet Archive prefers they be `MARC21 Language Codes <https://www.loc.gov/marc/languages/language_name.html>`_.
licenseurl
^^^^^^^^^^
A URL to the license which covers the works contained in the item.
Internet Archive recommends (but does not require) `Creative Commons <https://creativecommons.org>`_ licensing.
Creative Commons provides a `license selector <https://creativecommons.org/choose/?partner=ia&exit_url=http%3A%2F%2Fwww.archive.org%2Fservices%2Flicense-chooser.php%3Flicense_url%3D%5Blicense_url%5D%26license_name%3D%5Blicense_name%5D%26license_image%3D%5Blicense_button%5D%26deed_url%3D%5Bdeed_url%5D&jurisdiction_choose=1>`_ for finding the correct license for your needs.
mediatype
^^^^^^^^^
The primary type of media contained in the item.
While an item can contain files of diverse mediatypes the value in this field defines the appearance and functionality of the item's detail page on Internet Archive.
In particular, the mediatype of an item defines what sort of online viewer is available for the files contained in the item.
The mediatype metadata field recognizes a limited set of values:
- ``audio``: The majority of audio items should receive this mediatype value.
Items for the `Live Music Archive <https://www.archive.org/details/etree>`_ should instead use the ``etree`` value.
- ``collection``: Denotes the item as a collection to which other collections and items can belong.
- ``data``: This is the default value for mediatype.
Items with a mediatype of ``data`` will be available in Internet Archive but you will not be able to browse to them.
In addition there will be no online reader/player for the files.
- ``etree``: Items which contain files for the `Live Music Archive <https://www.archive.org/details/etree>`_ should have a mediatype value of ``etree``.
The Live Music Archive has very specific upload requirements.
Please consult the `documentation <https://www.archive.org/about/faqs.php#Live_Music_Archive>`_ for the Live Music Archive prior to creating items for it.
- ``image``: Items which predominantly consist of image files should receive a mediatype value of ``image``.
Currently these items will not be available for browsing or online viewing in Internet Archive but they will require no additional changes when this mediatype receives additional support in the Archive.
- ``movies``: All videos (television, features, shorts, etc.) should receive a mediatype value of ``movies``.
These items will be displayed with an online video player.
- ``software``: Items with a mediatype of ``software`` are accessible to browse via Internet Archive's `software collection <http://www.archive.org/details/software>`_.
There is no online viewer for software but all files are available for download.
- ``texts``: All text items (PDFs, EPUBs, etc.) should receive a mediatype value of ``texts``.
- ``web``: The ``web`` mediatype value is reserved for items which contain web archive `WARC <http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml>`_ files.
If the mediatype value you set is not in the list above it will be saved but ignored by the system. The item will be treated as though it has a mediatype value of ``data``.
If a value is not specified for this field it will default to ``data``.
noindex
^^^^^^^
All items will have their metadata included in the Internet Archive search engine.
To disable indexing in the search engine, include a ``noindex`` metadata tag.
The value of the tag does not matter.
Its presence is enough to trigger not including the metadata in the search engine.
If an item's metadata has already been indexed in the search engine, setting ``noindex`` will remove it from the index.
Items whose metadata is not included in the search engine index are not considered "public" per se and therefore will not have a value in the ``publicdate`` metadata field (see below).
notes
^^^^^
Contains user-defined information about the item.
The value of this metadata field may contain HTML. ``<script>`` tags and CSS are not allowed.
pick
^^^^
On the v1 archive.org site, each collection page on Internet Archive may include a "Staff Picks" section.
This section will highlight a single item in the collection.
This item will be selected at random from the items with a ``pick`` metadata value of ``1``.
If there are no items with this ``pick`` metadata value the "Staff Picks" section will not appear on the collection page.
By default all new items have no `pick` metadata value.
**Note:** v2 of the archive.org website does not make use of this value.
publicdate
^^^^^^^^^^
Items which have had their metadata included in the Internet Archive search engine index are considered to be public.
The date the metadata is added to the index is the public date for the item.
Please use an `ISO 8601`_ compatible format for this date.
For instance, these are all valid date formats:
- YYYY
- YYYY-MM-DD
- YYYY-MM-DD HH:MM:SS
While it is possible to set the ``publicdate`` metadata value it is not recommended.
This value is typically set by automated processes.
publisher
^^^^^^^^^
The publisher of the material available in the item.
rights
^^^^^^
A statement of the rights held in and over the files in the item.
The value of this metadata field may contain HTML. ``<script>`` tags and CSS are not allowed.
subject
^^^^^^^
Keyword(s) or phrase(s) that may be searched for to find your item.
This field can contain multiple values:
.. code:: bash
$ ia metadata <identifier> --modify='subject:foo' --modify='subject:bar'
Or, in Python:
.. code:: python
>>> from internetarchive import modify_metadata
>>> md = {'subject': ['foo', 'bar']}
>>> r = modify_metadata('<identifier>', md)
It is helpful but **not** necessary for you to use `Library of Congress Subject Headings <http://id.loc.gov/authorities/subjects.html>`_ for the value of this metadata header.
title
^^^^^
The title for the item.
This appears in the header of the item's detail page on Internet Archive.
If a value is not specified for this field it will default to the identifier for the item.
updatedate
^^^^^^^^^^
The date on which an update was made to the item.
This field is repeatable.
Please use an `ISO 8601`_ compatible format for this date.
While it is possible to set the ``publicdate`` metadata value it is not recommended.
This value is typically set by automated processes.
updater
^^^^^^^
The name of the account which updated the item.
This field is repeatable.
While it is possible to set the ``updater`` metadata value it is not recommended.
This value is typically set by automated processes.
uploader
^^^^^^^^
The name of the account which uploaded the file(s) to the item.
The uploader has ownership over the item and is allowed to maintain it.
This value is set by automated processes.
Custom Metadata Fields
----------------------
Internet Archive strives to be metadata agnostic, enabling users to define the metadata format which best suits the needs of their material.
In addition to the standard metadata fields listed above you may also define as many custom metadata fields as you require.
These metadata fields can be defined ad hoc at item creation or metadata editing time and do not have to be defined in advance.
.. _ISO 8601: https://en.wikipedia.org/wiki/ISO_8601
|