File: db.rst

package info (click to toggle)
python-ase 3.26.0-2
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 15,484 kB
  • sloc: python: 148,112; xml: 2,728; makefile: 110; javascript: 47
file content (604 lines) | stat: -rw-r--r-- 17,132 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
.. module:: ase.db

====================
A database for atoms
====================

ASE has its own database that can be used for storing and retrieving atoms and
associated data in a compact and convenient way.

There are currently five back-ends:

JSON_:
    Simple human-readable text file with a ``.json`` extension.
SQLite3_:
    Self-contained, server-less, zero-configuration database.  Lives in a file
    with a ``.db`` extension.
PostgreSQL_:
    Server based database.
MySQL_:
    Server based database.
MariaDB_:
    Server based database.

The JSON and SQLite3 back-ends work "out of the box", whereas PostgreSQL, MySQL
and MariaDB requires a server (See :ref:`server` or :ref:`MySQL_server`).

.. note::

   You need to install the back-ends for PostgreSQL, MySQL and MariaDB with::

      pip install git@gitlab.com:ase/ase-db-backends

There is a command-line tool called :ref:`ase-db` that can be
used to query and manipulate databases and also a `Python interface`_.

.. _JSON: http://www.json.org/
.. _SQLite3: https://www.sqlite.org/index.html
.. _PostgreSQL: https://www.postgresql.org/
.. _MySQL: https://www.mysql.com/
.. _MariaDB: https://mariadb.org/

.. contents::


What's in the database?
=======================

Every row in the database contains:

* all the information stored in the :class:`~ase.Atoms` object
  (positions, atomic numbers, ...)
* calculator name and parameters (if a calculator is present)
* already calculated properties such as energy and forces
  (if a calculator is present)
* key-value pairs (for finding the calculation again)
* an integer ID (unique for each database) starting with 1 and always
  increasing for each new row
* a unique ID which is a 128 bit random number which should be globally
  unique (at least in the lifetime of our universe)
* constraints (if present)
* user-name
* creation and modification time


.. _ase-db:

ase db
======

The :ref:`ase-db` command-line tool can be used to query databases and for
manipulating key-value pairs.  Try::

    $ ase db --help

Example: Show all rows of SQLite database abc.db:

.. literalinclude:: ase-db.txt

Show all details for a single row:

.. literalinclude:: ase-db-long.txt

.. seealso::

    * :ref:`cli`
    * `texase <https://github.com/steenlysgaard/texase>`_ is a
      terminal user interface (TUI) for ASE databases


Querying
--------

Here are some example query strings:

.. list-table::
    :widths: 25 75

    * - Cu
      - contains copper
    * - H<3
      - less than 3 hydrogen atoms
    * - Cu,H<3
      - contains copper and has less than 3 hydrogen atoms
    * - H2O
      - At least two hydrogens and at least one oxygen
    * - formula=H2O
      - Exactly two hydrogens and one oxygen
    * - v3
      - has 'v3' key
    * - abc=bla-bla
      - has key 'abc' with value 'bla-bla'
    * - v3,abc=bla-bla
      - both of the above
    * - calculator=nwchem
      - calculations done with NWChem
    * - 2.2<bandgap<3.0
      - 'bandgap' key has value between 2.2 and 3.0
    * - natoms>=10
      - 10 or more atoms
    * - id=2345
      - specific id
    * - age<1h
      - not older than 1 hour
    * - age>1y
      - older than 1 year
    * - pbc=TTT
      - Periodic boundary conditions along all three axes
    * - pbc=TTF
      - Periodic boundary conditions along the first two axes (F=False, T=True)

These names are special:

.. list-table::
    :widths: 25 75

    * - id
      - integer identifier
    * - natoms
      - number of atoms
    * - pbc
      - Periodic boundary conditions
    * - formula
      - formula
    * - energy
      - potential energy
    * - charge
      - total charge
    * - magmom
      - total magnetic moment
    * - calculator
      - name of calculator
    * - user
      - who did it
    * - age
      - age of calculation (use s, m, h, d, w, M and y for second, minute,
        hour, day, week, month and year respectively)


Integration with other parts of ASE
===================================

ASE's :func:`ase.io.read` function can also read directly from databases:

>>> from ase.io import read
>>> a = read('abc.db@42')
>>> a = read('abc.db@id=42')  # same thing
>>> b = read('abc.db@v3,abc=H')

Also the :ref:`ase-gui` program can read from databases using the
same syntax.


.. _ase-db-web:

Browse database with your web-browser
=====================================

You can use your web-browser to look at and query databases like this::

    $ ase db abc.db -w
    $ firefox http://0.0.0.0:5000/

Click individual rows to see details.  See the CMR_ web-page for an example of
how this works.

.. _CMR: https://cmrdb.fysik.dtu.dk/


Python Interface
================

.. module:: ase.db.core

First, we :func:`connect` to the database:

>>> from ase.db import connect
>>> db = connect('abc.db')

or

>>> import ase.db
>>> db = ase.db.connect('abc.db')

Let's do a calculation for a hydrogen molecule and write some results to a
database:

>>> from ase import Atoms
>>> from ase.calculators.emt import EMT
>>> h2 = Atoms('H2', [(0, 0, 0), (0, 0, 0.7)])
>>> h2.calc = EMT()
>>> h2.get_forces()
array([[ 0.   ,  0.   , -9.803],
       [ 0.   ,  0.   ,  9.803]])

Write a row to the database with a key-value pair (``'relaxed'``, ``False``):

>>> db.write(h2, relaxed=False)
1

The :meth:`~Database.write` method returns an integer id.

Do one more calculation and write results:

>>> from ase.optimize import BFGS
>>> BFGS(h2).run(fmax=0.01)  # doctest: +SKIP
BFGS:   0  12:49:25        1.419427       9.8029
BFGS:   1  12:49:25        1.070582       0.0853
BFGS:   2  12:49:25        1.070544       0.0236
BFGS:   3  12:49:25        1.070541       0.0001
>>> db.write(h2, relaxed=True)
2

Loop over selected rows using the :meth:`~Database.select` method:

>>> for row in db.select(relaxed=True):
...     print(row.forces[0, 2], row.relaxed)  # doctest: +SKIP
-9.8029057329 False
-9.2526347333e-05 True

The :meth:`~Database.select` method will generate :ref:`row objects`
that one can loop over.

Write the energy of an isolated hydrogen atom to the database:

>>> h = Atoms('H')
>>> h.calc = EMT()
>>> h.get_potential_energy()
3.21
>>> db.write(h)
3

Select a single row with the :meth:`~Database.get` method:

>>> row = db.get(relaxed=1, calculator='emt')
>>> for key in row:
...    print('{0:22}: {1}'.format(key, row[key]))  # doctest: +SKIP
pbc                   : [False False False]
relaxed               : True
calculator_parameters : {}
user                  : jensj
mtime                 : 15.3439399027
calculator            : emt
ctime                 : 15.3439399027
positions             : [[ ... ]]
id                    : 2
cell                  : [[ 0.  0.  0.] [ 0.  0.  0.] [ 0.  0.  0.]]
forces                : [[ ... ]]
energy                : 1.07054126233
unique_id             : bce90ff3ea7661690b54f9794c1d7ef6
numbers               : [1 1]

Calculate the atomization energy and :meth:`~Database.update` a row in
the database:

>>> e2 = row.energy
>>> e1 = db.get(H=1).energy
>>> ae = 2 * e1 - e2
>>> print(ae)
5.34945873767
>>> id = db.get(relaxed=1).id
>>> db.update(id, atomization_energy=ae)
1

Delete a single row:

>>> del db[db.get(relaxed=0).id]

or use the :meth:`~Database.delete` method to delete several rows.


Description of a row
--------------------

The first 9 keys (from "id" to "positions") are always present --- the rest
may be there:

=====================  =================================  ============  ======
key                    description                        datatype      shape
=====================  =================================  ============  ======
id                     Local database id                  int
unique_id              Globally unique hexadecimal id     str
ctime                  Creation time                      float
mtime                  Modification time                  float
user                   User name                          str
numbers                Atomic numbers                     int           (N,)
pbc                    Periodic boundary condition flags  bool          (3,)
cell                   Unit cell                          float         (3, 3)
positions              Atomic positions                   float         (N, 3)
initial_magmoms        Initial atomic magnetic moments    float         (N,)
initial_charges        Initial atomic charges             float         (N,)
masses                 Atomic masses                      float         (N,)
tags                   Tags                               int           (N,)
momenta                Atomic momenta                     float         (N, 3)
constraints            Constraints                        list of dict
energy                 Total energy                       float
forces                 Atomic forces                      float         (N, 3)
stress                 Stress tensor                      float         (6,)
dipole                 Electrical dipole                  float         (3,)
charges                Atomic charges                     float         (N,)
magmom                 Magnetic moment                    float
magmoms                Atomic magnetic moments            float         (N,)
calculator             Calculator name                    str
calculator_parameters  Calculator parameters              dict
=====================  =================================  ============  ======


Extracting Atoms objects from the database
------------------------------------------

If you want an :class:`~ase.Atoms` object insted of an
:class:`~ase.db.row.AtomsRow` object, you should use the
:meth:`~Database.get_atoms` method:

>>> h2 = db.get_atoms(H=2)

Add additional data
-------------------

When you write a row to a database using the :meth:`~Database.write` method,
you can add key-value pairs where the values can be
strings, floating point numbers, integers and booleans:

>>> db.write(atoms, functional='LDA', distance=7.2)

More complicated data can be written like this:

>>> db.write(atoms, ..., data={'parents': [7, 34, 14], 'stuff': ...})

and accessed like this:

>>> row = db.get(...)
>>> row.data.parents
[7, 34, 14]


.. _row objects:

Row objects
-----------

There are three ways to get at the columns of a row:

1) as attributes (``row.key``)

2) indexing (``row['key']``)

3) the :meth:`~ase.db.row.AtomsRow.get` method (``row.get('key')``)

The first two will fail if there is no ``key`` column whereas the last will
just return ``None`` in that case.  Use ``row.get('key', ...)`` to use
another default value.

.. autoclass:: ase.db.row.AtomsRow
    :members:
    :member-order: bysource


Writing and updating many rows efficiently
------------------------------------------

If you do this::

    db = connect('mols.db')
    for mol in molecules:
        db.write(mol, ...)

the database will make sure that each molecule is written to permanent
storage (typically a harddisk) before it moves on to the next molecule.  This
can be quite slow.  To speed this up, you can write all the molecules in a
single transaction like this::

    with connect('mols.db') as db:
        for mol in molecules:
            db.write(mol, ...)

When the for-loop is done, the database will commit (or roll back if there
was an error) the transaction.

Similarly, if you want to :meth:`~Database.update` many rows, you should
do it in one transaction::

    with db:
        for id in ...:
            db.update(id, foo='bar')


Writing rows in parallel
------------------------

Say you want to run a series of jobs and store the calculations in one
database::

    for name in many_molecules:
        mol = read(name)
        calculate_something(mol)
        db.write(mol, name=name)

With four extra lines (see the :meth:`~Database.reserve` method)::

    for name in many_molecules:
        id = db.reserve(name=name)
        if id is None:
            continue
        mol = read(name)
        calculate_something(mol)
        db.write(mol, id=id, name=name)

you will be able to run several jobs in parallel without worrying about two
jobs trying to do the same calculation.  The :meth:`~Database.reserve` method
will write an empty row with the ``name`` key and return the ID of that row.
Other jobs trying to make the same reservation will fail.  While the jobs are
running, you can keep an eye on the ongoing (reserved) calculations by
identifying empty rows::

    $ ase db many_results.db natoms=0


More details
------------

Use this function for getting a connection to a database:

.. autofunction:: connect

Here is a description of the database object:

.. autoclass:: ase.db.core.Database
    :members:
    :member-order: bysource
    :exclude-members: write, reserve, update

    .. decorators hide these three from Sphinx, so we add them by hand:

    .. automethod:: write(atoms, id=None, key_value_pairs={}, data={}, **kwargs)
    .. automethod:: reserve(**key_value_pairs)
    .. automethod:: update(id, atoms=None, delete_keys=[], data=None, **add_key_value_pairs)

    .. attribute:: metadata

       Dictionary


.. _metadata:

Metadata
--------

It's can be useful to add a discription of the data that's in a database.
This description will be used when browsing the database with a
:ref:`web-browser <ase-db-web>`.
You can add the desciption using the :attr:`~Database.metadata` attribute:

>>> db.metadata = {
...     'title': 'Project 1',
...     'key_descriptions':
...         {'v0': ('Voltage', 'Longer description ...', 'V')},
...     'default_columns': ['id', 'formula', 'v0']}

ASE already knows all about the following keys:

.. csv-table::
    :file: known-keys.csv
    :header-rows: 1
    :widths: 2 3 4 2

You can also write/read to/from JSON using::

    $ ase db proj1.db --set-metadata metadata.json
    $ ase db proj1.db --show-metadata > metadata.json


External Tables
----------------

If the number of *key_value_pairs* becomes large, for example when
saving a large number of features for your machine learning model, ASE
DB offers an alternative way of storing them. Internally ASE can
create a dedicated table to store groups of *key_value_pairs*. You can
store a group of *key_value_pairs* in a separate table named
*features* by:

>>> atoms = Atoms()
>>> no_features = 5000
>>> feature_dict = dict(('feature' + str(i), i) for i in range(no_features))
>>> id = db.write(atoms, external_tables={'features': feature_dict})

Values stored in external tables can be accessed using:

>>> row = db.get(id=id)
>>> f1 = row['features']['feature1']
>>> f4999 = row['features']['feature4999']


.. _server:

Running a PostgreSQL server
===========================

.. highlight:: bash

With your PostgreSQL server up and running, you should run the following
command as the ``postgres`` user::

    $ sudo -u postgres psql

Then you create an 'ase' user and one database for each project you have::

    postgres=# create user ase login password 'pw';
    postgres=# create database project1;
    postgres=# create database project2;

Show databases and quit::

    postgres=# \l
    postgres=# \q

You should now be able to
query the database using a URI like
``postgresql://user:pw@host:port/dbname``::

    $ PROJ1=postgresql://ase:pw@localhost:5432/project1
    $ ase db $PROJ1

If you have some data in a ``data.db`` SQLite3 file, then you can insert that
into the PostgreSQL database like this::

    $ ase db data.db --insert-into $PROJ1

Now you can start the Flask_\ -app ``ase.db.app.app`` with any WSGI_
compatible server.

.. note::

    Please review the code carefully before exposing the ``ase.db.app`` to
    the internet or `bad things <https://xkcd.com/327/>`__ could happen.

.. _Flask: https://palletsprojects.com/p/flask/
.. _WSGI: https://www.python.org/dev/peps/pep-3333/


.. _MySQL_server:

Running a MySQL server
========================

ASE DB can also be run with a MySQL server. First, we need to get the MySQL
server up and running. There are many online resources describing how to to
that, but on a Ubuntu system the following should work::

    $ sudo apt-get install mysql-server
    $ sudo mysql_secure_installation

Then we need to check if the server is running::

    $ systemctl status mysql.service

if it is not running you can start the service by::

    $ systemctl start mysql.service

Note that on some Linux distributions *mysql.service* should be replaced by
*mysqld.service*.

Once the service is running, we can enter the MySQL shell::

    $ mysql -u root -p

where we assume that there is a user named **root**, that will be prompted
for a password. Now, we can create a user::

    mysql> CREATE USER 'ase'@'localhost' IDENTIFIED BY 'strongPassword';

and then a database for our project::

    mysql> CREATE DATABASE my_awesome_project;

We need to give the ase user privileges to edit this database::

    mysql> GRANT ALL PRIVILEGES ON my_awesome_project.* TO 'ase'@'localhost' IDENTIFIED BY 'strongPassword';

From a Python script we can now connect to the database via

>>> mysql_url = 'mysql://ase:strongPassword@localhost:3306/my_awesome_project'
>>> connect(mysql_url)  # doctest: +SKIP