1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
|
.. _cookbook:
Cookbook
========
In this section, we will show by example how to complete common tasks with idiomatic use of loompy.
Working with a newly created file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loompy 2 changes the behaviour of ``loompy.create``: it no longer returns a value. Thus in order to work with a newly created file you have to do this:
.. code:: python
loompy.create("filename.loom", m, row_attrs, col_attrs)
with loompy.connect("filename.loom") as ds:
....do something with ds
# File closes automatically
The reason for the change is that we would often create files without closing the returned file handle, which causes issues especially in multi-process scenarios.
Note: if you simply want to create the file, and not access it, there is no need to use a ``with`` statement:
.. code:: python
loompy.create("filename.loom", m, row_attrs, col_attrs)
This will leave the file closed.
Loading attributes from Pandas
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you have your metadata in a Pandas DataFrame, you can easily use it to create a Loom file. You will
need one DataFrame for the column metadata and one for the row metadata. Convert each DataFrame into a dictionary
of lists:
.. code:: python
df_row_metadata = ... # A pandas DataFrame holding row metadata
df_col_metadata = ... # A pandas DataFrame holding column metadata
data = ... # A numpy ndarray holding the main dataset
loompy.create(filename, data, df_row_metadata.to_dict("list"), df_col_metadata.to_dict("list"))
Combining data using scan() and new()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We often want to scan through a number of input files (for example, raw
data files from multiple experiments), select a subset of the columns (e.g. cells passing QC)
and write them to a new file. This can be accomplished by creating an empty file using ``loompy.new()`` and
then filling it up using the ``scan()`` method.
For example, let's select cells that have more than 500 detected UMIs in each of several files:
.. code:: python
with loompy.new(out_file) as dsout: # Create a new, empty, loom file
for f in input_files:
with loompy.connect(f) as ds:
totals = ds.map([np.sum], axis=1)[0]
cells = np.where(totals > 500)[0] # Select the cells that passed QC (totals > 500)
for (ix, selection, view) in ds.scan(items=cells, axis=1):
dsout.add_columns(view.layers, col_attrs=view.ca, row_attrs=view.ra)
Note that by using ``new()`` we will first be creating the new file, then appending columns to it.
But what if the input files do not have their rows in the same order? ``scan()`` accepts a ``key`` argument
to designate a primary key; each view is then returned sorted on the primary key on the *other axis*.
For example, if you're scanning across columns, you should provide a row attribute as key, and each view will be sorted
on that attribute.
Here's the same example, but this time we provide the ``key="Accession"`` argument to ensure that the input files
are sorted on the accession identifier along rows:
.. code:: python
with loompy.new(out_file) as dsout: # Create a new, empty, loom file
for f in input_files:
with loompy.connect(f) as ds:
totals = ds.map([np.sum], axis=1)[0]
cells = np.where(totals > 500)[0] # Select the cells that passed QC (totals > 500)
for (ix, selection, view) in ds.scan(items=cells, axis=1, key="Accession"):
dsout.add_columns(view.layers, col_attrs=view.ca, row_attrs=view.ra)
Fitting an incremental PCA
^^^^^^^^^^^^^^^^^^^^^^^^^^
Incremental algorithms are a powerful way of working with datasets that won't fit in RAM. For
example, we can use incremental PCA to learn a PCA transform by batch-wise partial fits:
.. code:: python
from sklearn.decomposition import IncrementalPCA
genes = (ds.ra.Selected == 1)
pca = IncrementalPCA(n_components=50)
for (ix, selection, view) in ds.scan(axis=1):
self.pca.partial_fit(view[genes, :].transpose())
|