.. _context: =========================== Context and Metadata system =========================== .. contents:: Jump to: :local: -------- Overview -------- The Context and Metadata system is intended to provide easy programmatic access to combinations of time-ordered detector data and supporting data products ("metadata"). The section on :ref:`context-section` deals with the configuration files that describe a dataset, and the use of the Context object to load data and metadata into sotodlib containers; the section on :ref:`metadata-section` describes how to store and index metadata information for use with this system. Information on TOD indexing and loading can be found in the section on :ref:`ObsFileDb `. .. _context-section: ------- Context ------- .. py:module:: sotodlib.core Assuming someone has set up a Context for a particular dataset, you would instantiate it like this:: from sotodlib.core import Context ctx = Context('path/to/the/context.yaml') This will cause the specified ``context.yaml`` file to be parsed, as well as user and site configuration files, if those have been set up. Once the Context is loaded, you will probably have access to the various databases (``detdb``, ``obsdb``, ``obsfiledb``) and you'll be able to load TOD and metadata using ``get_obs(...)``. Dataset, User, and Site Context files ===================================== The three configuration files are: Dataset Context File This will often be called ``context.yaml`` and will contain a description of how to access a particular TOD data set (for example a set of observations output from a particular simulation run) and supporting metadata (including databases listing the observations, detector properties, and intermediate analysis products such as cuts or pointing offsets). When instantiating a Context object, the path to this file will normally be the only argument. Site Context File This file contains settings that are common to a particular computing site but are likely to differ from one site to the next. It is expected that a single file will be made available to all sotodlib users on a system. The main purpose of this is to describe file locations, so that the Dataset Context File can be written in a way that is portable between computing sites. User Context File This is a yaml file contains user-specific settings, and is loaded after the Site Context File but before the Dataset Context File. This plays the same role as Site Context but allows for per-user tweaking of parameters. The User and Site Context Files will be looked for in certain places if not specified explicitly; see :class:`Context` for details. Annotated Example ================= The Dataset Context File is yaml, that decodes to a dict. Here is an annotated example. .. code-block:: yaml # Define some "tags". These are string variables that are eligible # for substitution into other settings. The most common use is to # establish base paths for data and metadata products. tags: actpol_shared: /mnt/so1/shared/data/actpol/depots/scratch depot_scratch: /mnt/so1/shared/data/actpol/depots/scratch metadata_lib: /mnt/so1/shared/data/actpol/depots/scratch/uranus_200327/metadata # List of modules to import. Importing modules such as # moby2.analysis.socompat or sotodlib.io.metadata causes certain IO # functions to be registered for the metadata system to use. imports: - sotodlib.io.metadata - moby2.analysis.socompat # The basic databases. The sotodlib TOD loading system uses an # ObsFileDb to determine what files to read. The metadata association # and loading system uses a DetDb and an ObsDb to find and read # different kinds of metadata. obsfiledb: '{metadata_lib}/obsfiledb_200407.sqlite' detdb: '{metadata_lib}/detdb_200109.sqlite' obsdb: '{metadata_lib}/obsdb_200618.sqlite' # Additional settings related to TOD loading. obs_colon_tags: ['band'] obs_loader_type: actpol_moby2 # A list of metadata products that should be loaded along with the # TOD. Each entry in the list points to a metadata database (an # sqlite file) and specifies the name under which that information # should be associated (unpacked) in the loaded data structure. # The entries here are all fairly simple -- see documentation for # more complex examples. metadata: - db: "{metadata_lib}/cuts_s17_c11_200327_cuts.sqlite" unpack: "glitch_flags&flags" - db: "{metadata_lib}/cuts_s17_c11_200327_planet_cuts.sqlite" unpack: "source_flags&flags" - db: "{metadata_lib}/cal_s17_c11_200327.sqlite" unpack: "relcal&cal" - db: "{metadata_lib}/timeconst_200327.sqlite" unpack: "timeconst&" loader: "PerDetectorHdf5" - db: "{metadata_lib}/abscal_190126.sqlite" unpack: "abscal&cal" - db: "{metadata_lib}/detofs_200218.sqlite" unpack: "focal_plane" - db: "{metadata_lib}/pointofs_200218.sqlite" unpack: "pointofs" With a context like the one above, a user can load a TOD and its supporting data very simply:: from sotodlib.core import Context from moby2.analysis import socompat # For special ACT loader functions context = Context('context.yaml') # Get a random obs_id from the ObsDb we loaded: context.obsdb.get()[10] # output is: OrderedDict([('obs_id', '1500022312.1500087647.ar6'), ('timestamp', 1500022313.0)]) # Load the TOD and metadata for that obs_id: tod = context.get_obs('1500022312.1500087647.ar6') # The result is an AxisManager with members [axes]: # - signal ['dets', 'samps'] # - timestamps ['samps'] # - flags ['samps'] # - boresight ['samps'] # - array_data ['dets'] # - glitch_flags ['dets', 'samps'] # - source_flags ['dets', 'samps'] # - relcal ['dets'] # - timeconst ['dets'] # - abscal ['dets'] # - focal_plane ['dets'] # - pointofs ['dets'] Context Schema ============== Here are all of the top-level entries with special meanings in the Context system: ``tags`` A map from string to string. This entry is treated in a special way when the Site, User, and Dataset context files are evalauted in series; see below. ``imports`` A list of modules that should be imported prior to attempting any metadata operations. The purpose of this is to allow IO functions to register themselves for use by the Metadata system. This list will usually need to include at least ``sotodlib.io.metadata``. ``obsfiledb``, ``obsdb``, ``detdb`` Each of these should provide a string. The string represents the path to files carrying an ObsFileDb, ObsDb, and DetDb, respectively. These are all technically optional but it will be difficult to load TOD data with the ObsFileDb and it will be difficult to load metadata without the ObsDb and DetDb. ``obsfiledb.prefix`` Optional specification of ``prefix`` attribute for the ObsFileDb. (This will be ignored if the ObsFileDb specifies absolute paths.) ``obs_colon_tags`` A list of strings. The strings in this list must refer to columns from the DetDb. When a string appears in this list, then the values appearing in that column of the DetDb may be used to modify an obs_id when requestion TOD data. For example, suppose DetDb has a column 'band' with values ['f027', 'f039', ...]. Suppose that ``'obs201230'`` is an observation ID for an array that has detectors in bands 'f027' and 'f039'. Then passing ``'obs201230:f027'`` to Context.get_obs will read and return only the timestream and metadata for the 'f027' detectors. ``obs_loader_type`` A string, giving the name of a loader function that should be used to load the TOD. The functions are registered in the module variable ``sotodlib.core.OBSLOADER_REGISTRY``; also see :func:`sotodlib.core.context.obsloader_template`. ``metadata`` A list of metadata specs. Each metadata spec is a dict with the schema as described by :class:`metadata.MetadataSpec`. ``context_hooks`` A string that identifies a set of functions to hook into particular parts of the Context system. See uses of _call_hook() in the Context code, for any implemented hook function names and their signatures. To use this feature, an imported module must register the hook set in Context.hook_sets (a dict), e.g. ``Context.hook_sets["my_sim_hooks"] = {"on-context-ready": ...}``, and then also set ``context_hooks: "my_sim_hooks"`` in the context.yaml. Context system APIs =================== Context ------- .. autoclass:: Context :special-members: __init__ :members: obsloader --------- Data formats are abstracted in the Context system, and "obsloader" functions provide the implementations to load data for a particular storage format. The API is documented in the ``obsloader_template`` function: .. autofunction:: sotodlib.core.context.obsloader_template .. py:module:: sotodlib.core.metadata .. _metadata-section: -------- Metadata -------- Background ========== The "metadata" we are talking about here consists of things like detector pointing offsets, calibration factors, detector time constants, and so on. It may also include more complicated or voluminous data, such as per-sample flags, or signal modes. The purpose of the "metadata system" in sotodlib is to help identify, load, and correctly label such supporting ancillary data for a particular observation and set of detectors. Storing metadata information on disk requires both a **Metadata Archive**, which is a set of files containing the actual data, and a **Metadata Index**, represented by the ManifestDb class, which encodes instructions for associating metadata information from the archive with particular observations and detector sets. When the Context system processes a metadata entry, it will make use of the DetDb and ObsDb in its interactions with the ManifestDb. Ultimately the system will load numbers and associate with them with detectors in a particular observation. But a Metadata Archive does not need to have separate records of each number, for each detector and each observation. When appropriate, the data could be stored so that there is a single value for each detector, applicable for an entire season. Or there could be different calibration numbers for each wafer in an array, that change depending on the observation ID. As long as the ObsDb and DetDb know about the parameters being used to index things in the Metadata Archive (e.g., the timestamp of observations and the wafer for each detector), the system can support resolving the metadata request, loading the results, and broadcasting them to their intended targets. Examples ======== These examples demonstrate the creation of a metadata archive and index, and adding an entry to the context.yaml metadata list. Example 1 --------- Let's say we want to build an HDF5 database with a number ``thing`` per detector per observation:: from sotodlib.core import Context, metadata import sotodlib.io.metadata as io_meta context = Context('context_file_no_thing.yaml') obs_rs = context.obsdb.query() h5_file = 'thing.h5' for i in range(len(obs_rs)): aman = context.get_obs(obs_rs[i]['obs_id']) things = calculate_thing(aman) thing_rs = metadata.ResultSet(keys=['dets:readout_id', 'thing']) for d, det in enumerate(aman.dets.vals): thing_rs.rows.append((det, things[d])) io_meta.write_dataset(thing_rs, h5_file, f'thing_for_{obs_id}') Once we've built the lower level HDF5 file we need to add it to a metadata index:: scheme = metadata.ManifestScheme() scheme.add_exact_match('obs:obs_id') scheme.add_data_field('dataset') db = metadata.ManifestDb(scheme=scheme) for i in range(len(obs_rs)): obs_id = obs_rs[i]['obs_id'] db.add_entry({'obs:obs_id': obs_id, 'dataset': f'thing_for_{obs_id}',}, filename=h5_file) db.to_file('thing_db.gz') Then have a new context file that includes:: metadata: - db: 'thing_db.gz' unpack: 'thing' Using that context file:: context = Context('context_file_with_thing.yaml') aman = context.get_obs(your_favorite_obs_id) will return an AxisManager that includes ``aman.thing`` for that specific observation. Example 2 --------- In this example, we loop over observations found in an ObsDb and create an AxisManager for each one that contains new, interesting supporting data. The AxisManager is written to HDF5, and the ManifestDb is updated with indexing information so the metadata system can find the right dataset automatically. Here is the script to generate the HDF5 and ManifestDb on disk:: from sotodlib import core # We will create an entry for every obs found in this context. ctx = core.Context('context/context-basic.yaml') # Set up our scheme -- one result per obs_id, to be looked up in an # archive of HDF5 files at address stored in dataset. scheme = core.metadata.ManifestScheme() scheme.add_exact_match('obs:obs_id') scheme.add_data_field('dataset') man_db = core.metadata.ManifestDb(scheme=scheme) # Path for the ManifestDb man_db_filename = 'my_new_info.sqlite' # Use a single HDF5 file for now. output_filename = 'my_new_info.h5' # Loop over obs. obs = ctx.obsdb.get() for obs_id in obs['obs_id']: print(obs_id) # Load the observation, without signal, so we can see the samps # and dets axes, timestamps, etc. tod = ctx.get_obs(obs_id, no_signal=True) # Create an AxisManager using tod's axes. (Note that the dets # axis is required for compatibility with the metadata system, # even if you're not going to use it.) new_info = core.AxisManager(tod.dets, tod.samps) # Add some stuff to it new_info.wrap_new('my_special_vector', shape=('samps', )) # Save the info to HDF5 h5_address = obs_id new_info.save(output_filename, h5_address, overwrite=True) # Add an entry to the database man_db.add_entry({'obs:obs_id': obs_id, 'dataset': h5_address}, filename=output_filename) # Commit the ManifestDb to file. man_db.to_file(man_db_filename) The new context.yaml file, that includes this new metadata, would have a metadata list that includes:: metadata: - db: 'my_new_info.sqlite' unpack: 'new_info' If you want to load the single vector ``my_special_vector`` into the top level of the AxisManager, under name ``special``, use this syntax:: metadata: - db: 'my_new_info.sqlite' unpack: 'special&my_special_vector' Example 3 --------- In this example we store a calibration table, with a separate number for each passband value. .. code-block:: python from sotodlib import core import sotodlib.io.metadata as io_meta # This cal result has one number per wafer x passband. data = [ ('ws0', 'f090', 1.02), ('ws1', 'f090', 1.00), ('ws2', 'f090', 1.08), ('ws0', 'f150', 0.95), ('ws1', 'f150', 0.98), ('ws2', 'f150', 0.96), ('ws0', 'NC', 1.00), ('ws1', 'NC', 1.00), ('ws2', 'NC', 1.00), ] # Write to HDF5 rs = core.metadata.ResultSet( keys=['dets:wafer_slot', 'dets:bandpass', 'cal']) rs.rows = data io_meta.write_dataset(rs, 'db.h5', 'bandpass_cal') # Record in ManifestDb. db = core.metadata.ManifestScheme(). \ add_data_field('dataset'). \ new_db() db.add_entry({'dataset': 'bandpass_cal'}, filename='db.h5') db.to_file('db.sqlite') Example 4 --------- This example uses an AxisManager to store a generic data structure that is not tied to specific detectors (and also does not refer to the "samps" axis). It also uses ``obs:timestamp`` to index the results. .. code-block:: python from sotodlib import core # Model data (t0, t1, scale, offset) data = [ (1700000000., 1800000000., 1.00, 4.50), (1800000000., 1900000000., 0.98, 4.42), ] # Get a new database ready db = core.metadata.ManifestScheme(). \ add_range_match('obs:timestamp'). \ add_data_field('dataset'). \ new_db() # Store each dataset to HDF5 and add record to ManifestDb. filename = 'db.h5' for dset, (t0, t1, scale, offset) in enumerate(data): group = f'encoder_model_{dset}' output = core.AxisManager() output.wrap('model_version', 'v1') output.wrap('scale', scale) output.wrap('offset', offset) output.save(filename, group=group) db.add_entry({'obs:timestamp': (t0, t1), 'dataset': group}, filename=filename) # Save db. db.to_file('db.sqlite') Tips ---- If these examples are almost, but not quite, what you need, consider the following: - You can use multiple HDF5 files in your Metadata Archive -- the filename is a parameter to ``db.add_entry``. That helps to keep your HDF5 files a manageable size, and is good practice in cases where there are regular (e.g. daily or hourly) updates to an archive. - You can use an archive format other than HDF5 (if you must), see the example in :ref:`metadata-archives-custom`. - The entries in the Index do not have to be per-``obs_id``. You can associate results to ranges of time or to other fields in the ObsDb. See examples in :ref:`metadata-indexes`. - The entries in the Archive do not need to be per-detector. You can specify results for a whole group of detectors, if that group is enumerated in the DetDb (or added to det_info using other metadata). For example, if DetDb contains a column ``passband``, the dataset could contain columns ``dets:passband`` and ``cal`` and simply report one calibration number for each frequency band. (On load, the Context Metadata system will automatically broadcast the ``cal`` number so that it has shape ``(n_dets,)`` in the fully populated AxisManager.) - You can store all the results (i.e., results for multiple ``obs_id``) in a single HDF5 dataset. This is not usually a good idea if your results are per-detector, per-observation... the dataset will be huge, and not easy to update incrementally. But for smaller things (one or two numbers per observation, as in the ``dets:passband`` example above) it can be convenient. Doing this requires including ``obs:obs_id`` (or some other ObsDb column) in the dataset. - Committing changes to the database (i.e. writing them to disk) can be slow, especially on certain filesystems. If you are making many updates in a row to the metadata index, consider using :py:class:`DbBatchManager` or :py:class:`MultiDbBatchManager` (see examples in their respective API documentations) to reduce I/O overhead. .. _metadata-archives: Metadata Archives ================= A **Metadata Archive** is a set of files that hold metadata to be accessed by the Context/Metadata system. The metadata system is designed to be flexible with respect to how such archives are structured, at least in terms of the numbers and formats of the files. A **Metadata Instance** is a set of numerical and string data, stored at a particular address within a Metadata Archive. The Instance will typically have information organized in a tabular form, equivalent to a grid where the columns have names and each row contains a set of coherent data. Some of the columns describe the detectors or observations to which the row applies. The other columns provide numerical or string information that is to be assocaited with those detectors or observations. For example, consider a table with columns called `dets:id` and `cal`. The entry for `dets:id` in each row specifies what detector ID the number in the `cal` column of that row should be applied to. The indexing information in `dets:id` is called *instrinsic indexing information*, because it is carried within the Metadata Instance. In contrast, *extrinsic indexing information* is specified elsewhere. Usualy, extrinsic indexing information is provided through the Metadata Index (ManifestDb). One possible form for an archive is a set of HDF5 files, where simple tabular data are stored in datasets. This type of archive is used for the reference implementation of the Metadata system interface, so we will describe support for that first and then later go into using other formats. ResultSet over HDF5 ------------------- The ResultSet is a container for tabular data. Functions are available in sotodlib.io.metadata for reading and writing ResultSet data to HDF5 datasets. Here's an example using h5py that creates a compatible dataset, and writes it to an HDF5 file:: import h5py import numpy as np obs_id = 'obs123456' timeconst = np.array([('obs123456', 0.001)], dtype=[('obs:obs_id', 'S20'), ('timeconst', float)]) with h5py.File('test.h5', 'w') as fout: fout.create_dataset('timeconst_for_obs123456', data=timeconst) Here's the equivalent operation, accomplished using :class:`ResultSet` and :func:`write_dataset`:: from sotodlib.core import metadata from sotodlib.io.metadata import write_dataset timeconst = metadata.ResultSet(keys=['obs:obs_id', 'timeconst']) timeconst.rows.append(('obs123456', 0.001)) write_dataset(timeconst, 'test2.h5', 'timeconst_for_obs123456', overwrite=True) The advantages of using write_dataset instead of h5py primitives are: - You can pass it a ResultSet directly and it will handle creating the right types. - Passing overwrite=True will handle the removal of any existing entries at the target path. To inspect datasets in a HDF5 file, you can load them using h5py primitives, or with :func:`read_dataset`:: from sotodlib.io.metadata import read_dataset timeconst = read_dataset('test2.h5', 'timeconst_for_obs123456') The returned object looks like this:: >>> print(timeconst) ResultSet<[obs:obs_id,timeconst], 1 rows> The metadata handling code does not use read_dataset. Instead it uses ResultSetHdfLoader, which has some optimizations for loading batches of metadata from HDF5 files and datasets, and will forcefully reconcile any columns prefixed by ``obs:`` or ``dets:`` against the provided request (using detdb and obsdb, potentially). Loading the time constants for ``obs123456`` is done like this:: from sotodlib.io.metadata import ResultSetHdfLoader loader = ResultSetHdfLoader() request = {'filename': 'test2.h5', 'dataset': 'timeconst_for_obs123456', 'obs:obs_id': 'obs123456'} timeconst = loader.from_loadspec(request) The resulting object looks like this:: >>> print(timeconst) ResultSet<[timeconst], 1 rows> Note the ``obs:obs_id`` column is gone -- it was taken as index information, and matched against the ``obs:obs_id`` in the ``request``. .. _metadata-archives-custom: Custom Archive Formats ---------------------- HDF5 is cool but sometimes you need or want to use a different storage system. Setting up a custom loader function involves the following: - A loader class that can read the metadata from that storage system, respecting the request API. - A module, containg the loader class, and also the ability to register the loader class with sotodlib, under a particular laoder name. - A ManifestDb data field called ``loader``, with the value set to the loader name. Here's a sketchy example. We start by defining a loader class, that will read a number from a text file:: from sotodlib.io import metadata from sotodlib.core.metadata import ResultSet, SuperLoader, LoaderInterface class TextLoader(LoaderInterface): def from_loadspec(self, load_params): with open(load_params['filename']) as fin: the_answer = float(fin.read()) rs = ResultSet(keys=['answer'], [(the_answer, )]) SuperLoader.register_metadata('text_loader', TextLoader) Let's suppose that code (including the SuperLoader business) is in a module called ``othertel.textloader``. To get this code to run whenever we're working with a certain dataset, add it to the ``imports`` list in the context.yaml: .. code-block:: yaml # Standard i/o import, and TextLoader for othertel. imports: - sotodlib.io.metadata - othertel.textloader Now for the ManifestDb:: scheme = metadata.ManifestScheme() scheme.add_exact_match('obs:obs_id') scheme.add_data_field('loader') db = metadata.ManifestDb(scheme=scheme) db.add_entry({'obs:obs_id: 'obs12345', 'loader': 'text_loader'}, filename='obs12345_timeconst.txt') db.add_entry({'obs:obs_id: 'obs12600', 'loader': 'text_loader'}, filename='obs12600_timeconst.txt') Now if a metadata request is made for ``obs12345``, for example, a single number will be loaded from ``obs12345_timeconst.txt``. Note the thing returned by ``TextLoader.from_loadspec`` is a ResultSet. Presently the only types you can return from a loader class function are ResultSet and AxisManager. .. _metadata-indexes: Metadata Indexes ================ In the last example above, a ``request`` dictionary was passed to ResultSetLoaderHdf, and provided instructions for locating a particular result. Such request dictionaries will normally be generated by a ManifestDb object, which is connected to an sqlite3 database that provides a means for converting high-level requests for metadata into specific request dictionaries. The database behind a ManifestDb has 2 main tables. One of them is a table with columns for Index Criteria and Endpoint Data. The Index Criteria columns are intended to be matched against observation metadata, such as the ``obs_id`` or the timestamp of the observation. Endpoint Data contain a filename and other instructions required to locate and load the data, as well as additional restrictions to put on the result. Please see the class documentation for :class:`ManifestDb` and :class:`ManifestScheme`. The remainder of this section demonstrates some basic usage patterns. Examples -------- Example 1: Observation ID ````````````````````````` The simplest Metadata index will translate an ``obs_id`` to a particular dataset in an HDF file. The ManifestScheme for this case is constructed as follows:: from sotodlib.core import metadata scheme = metadata.ManifestScheme() scheme.add_exact_match('obs:obs_id') scheme.add_data_field('dataset') Then we can instantiate a ManifestDb using this scheme, add some data rows, and write the database (including the scheme) to disk:: db = metadata.ManifestDb(scheme=scheme) db.add_entry({'obs:obs_id': 'obs123456', 'dataset': 'timeconst_for_obs123456'}, filename='test2.h5') db.add_entry({'obs:obs_id': 'obs123500', 'dataset': 'timeconst_for_obs123500'}, filename='test2.h5') db.add_entry({'obs:obs_id': 'obs123611', 'dataset': 'timeconst_for_obs123611'}, filename='test2.h5') db.add_entry({'obs:obs_id': 'obs123787', 'dataset': 'timeconst_for_obs123787'}, filename='test2.h5') db.to_file('timeconst.gz') Example 2: Inspecting and modifying the index ````````````````````````````````````````````` Starting from the previous example, suppose we were updating the Index in a cronjob and needed to first check whether we had already entered some entry. We can use :func:`ManifestDb.inspect` to retrieve records, matching on *any* fields (they don't have to be index fields). Here's a quick set of examples:: # Do we have any items in the file called "test2.h5"? entries = db.inspect({'filename': 'test2.h5'}) if len(entries) > 0: # yes, we do ... # Have we already added the item for obs_id='obs123787'? entries = db.inspect({'obs:obs_id': 'obs123787'}) if len(entries) == 0: # no, so add it ... Entries retrieved using inspect are dicts and contain an '_id' element that allows you to modify or delte those records from the Index, using :func:`ManifestDb.update_entry` and :func:`ManifestDb.remove_entry`. For example:: # Delete all entries that refer to test2.h5. for entry in db.inspect({'filename': 'test2.h5'}): db.remove_entry(entry) # Change the spelling of 'timeconst' in the 'dataset' field of all records. for entry in db.inspect({}): entry['dataset'] = entry['dataset'].replace('timeconst', 'TimECOnSt') # currently it's not possible to change the filename, so don't mention it... del entry['filename'] db.update_entry(entry) Example 3: Timestamp ```````````````````` Another common use case is to map to a result based on an observation's timestamp instead of obs_id. The standardized key for timestamp is ``obs:timestamp``, and we include it in the scheme with :func:`add_range_match` instead of :func:`add_exact_match`:: scheme = metadata.ManifestScheme() scheme.add_range_match('obs:obs_timestamp') scheme.add_data_field('dataset') db = metadata.ManifestDb(scheme=scheme) db.add_entry({'obs:timestamp': (123400, 123600), 'dataset': 'timeconst_for_early_times'}, filename='test2.h5') db.add_entry({'obs:timestamp': (123600, 123800), 'dataset': 'timeconst_for_late_times'}, filename='test2.h5') db.to_file('timeconst_by_timestamp.gz') In the this case, when we add entries to the ManifestDb, we pass a tuple of timestamps (lower inclusive limit, higher non-inclusive limit) for the key ``obs:timestamp``. Example 4: Other observation selectors `````````````````````````````````````` Other fields from ObsDb can be used to build the Metadata Index. While timestamp or obs_id are quite general, a more compact and direct association can be made if ObsDb contains a field that is more directly connected to the metadata. For example, suppose there was an intermittent problem with a subset of the detectors that requires us to discard those data from analysis. The problem occurred randomly, but it could be identified and each observation could be classified as either having that problem or not. We decide to eliminate those bad detectors by applying a calibration factor of 0 to the data. We create an HDF5 file called ``bad_det_issue.h5`` with two datasets: - ``cal_all_ok``: has columns ``dets:name`` (listing all detectors) and ``cal``, where ``cal`` is all 1s. - ``cal_mask_bad``: same but with ``cal=0`` for the bad detectors. We update the ObsDb we are using to include a column ``bad_det_issue``, and for each observation we set it to value 0 (if the problem is not seen in that observation) or 1 (if it is). We build the Metadata Index to select the right dataset from ``bad_det_issue.h5``, depending on the value of ``bad_det_issue`` in the ObsDb:: scheme = metadata.ManifestScheme() scheme.add_exact_match('obs:bad_det_issue') scheme.add_data_field('dataset') db = metadata.ManifestDb(scheme=scheme) db.add_entry({'obs:bad_det_issue': 0, 'dataset': 'cal_all_ok'}, filename='bad_det_issue.h5') db.add_entry({'obs:bad_det_issue': 1, 'dataset': 'cal_mask_bad'}, filename='bad_det_issue.h5') db.to_file('cal_bad_det_issue.gz') The context.yaml metadata entry would probably look like this:: metadata: ... - db: '{metadata_lib}/cal_bad_det_issue.gz' unpack: 'cal_remove_bad&cal' ... ManifestDb reference -------------------- *The class documentation of ManifestDb should appear below.* .. autoclass:: ManifestDb :special-members: __init__ :members: DbBatchManager reference ------------------------ .. autoclass:: DbBatchManager :special-members: __init__ :members: MultiDbBatchManager reference ----------------------------- .. autoclass:: MultiDbBatchManager :special-members: __init__ :members: ManifestScheme reference ------------------------ *The class documentation of ManifestScheme should appear below.* .. autoclass:: ManifestScheme :special-members: __init__ :members: Metadata Request Processing =========================== Metadata loading is triggered automatically when :func:`Context.get_obs()` (or ``get_meta``) is called. The parameters to ``get_obs`` define an observation of interest, through an ``obs_id``, as well as (potentially) a limited set of detectors of interest. Processing the metadata request may require the code to refer to the ObsDb for more information about the specified ``obs_id``, and to the DetDb or ``det_info`` dataset for more information about the detectors. Steps in metadata request processing ------------------------------------ For each item in the context.yaml ``metadata`` entry, a series of steps are performed: 1. Read ManifestDb. The ``db`` file specified in the ``metadata`` entry is loaded into memory. 2. Promote Request. The metadata request is likely to include information about what observation and what detectors are of interest. But the ManifestDb may desire a slightly different form for this information. For example, an ``obs_id`` might be provided in the request, but the ManifestDb might index its results using timestamps (``obs:timestamp``). In the *Promote Request* step, the ManifestDb is interrogated for what ``obs`` and ``dets`` fields it requires as Index Data. If those fields are not already in the request, then the request is augmented to include them; this typically requires interaction with the ObsDb and/or DetDb and det_info. The augmented request is the result of the Promotion Step. 3. Get Endpoints. Once the augmented request is computed, the ManifestDb can be queried to see what endpoints match the request. The ManifestDb will return a list of Endpoint results. Each Endpoint result describes the location of some metadata (i.e. a filename and possibly some address within that file), as well as any limitations on the applicability of that data (e.g. it may specify that although the results include values for 90 GHz and 150 GHz detectors, only the results for 150 GHz detectors should be kept). The metadata results are not yet loaded. 4. Read Metadata. Each Endpoint result is processed and the relevant files accessed to load the specified data products. The data within are trimmed to only include items that were actually requested by the Index data (for example, although results for 90 GHz and 150 GHz detectors are included in an HDF5 dataset, the data may be trimmed to only include the 150 GHz detector results). This will yield one metadata result per Endpoint item. 5. Combine Metadata. The individual results from each Endpoint are combined into a single object, of the same type. 6. Wrap Metadata. The metadata object is converted to an AxisManager, and wrapped as specified by the user (this could include storing the entire object as a field; or it could mean extracting and renaming a single field from the result, for example). Incomplete or missing metadata ------------------------------ Metadata is considered "incomplete" for a particular request, if any of the following are true: - No endpoints were returned from the query to the ManifestDb; i.e. the database has no records of what files contain data pertinent to the request. - The results loaded from the metadata archive do not cover the requested sample range or detector set completely (this includes the case where there is zero overlap). When a metadata result is incomplete, the metadata loader code will take one of the following actions: - `trim`: take the limited metadata result, and trim the merged data object down so it contains only the detectors and samples that are found in the metadata result. - `skip`: discard this metadata result; do not include it in the merged data object. - `fail`: raise an error. In the metadata list in ``context.yaml``, each metadata entry can declare a preferred action using the ``on_missing`` key. For example:: metadata: - label: optional_relcal on_missing: skip db: "relcal1.sqlite" unpack: "optional_relcal&cal" - label: critical_relcal on_missing: fail db: "relcal2.sqlite" unpack: "critical_relcal&cal" The default behavior is ``trim``. The behavior specified in ``context.yaml`` can be overridden through the call to :func:`Context.get_obs()` or :func:`Context.get_meta()`; just use the ``on_missing`` argument to specify what should happen for metadata with a specific label. Examples:: # Suppose this fails because relcal2.sqlite does not cover the obs ... tod = ctx.get_obs(obs_id) # This will instead ignore the result, and not populate critical_relcal. tod = ctx.get_obs(obs_id, on_missing={'critical_relcal': 'skip'}) Note that "skipping" can have confusing downstream consequences, such as when a ``det_info`` entry doesn't get added and then some metadata product tries to use it as an index field. Rules for augmenting and using det_info --------------------------------------- The metadata loader associates metadata results to individual detectors using fields from the observation's ``det_info``. For any single observation, the ``det_info`` is first initialized from: - The ObsFileDb; this provides the unique ``readout_id`` for each channel, as well as the ``detset`` to which each belongs. - If a DetDb has been specified, everything in there is copied into ``det_info``. From that starting point, additional fields can be loaded into the ``det_info``; these fields can then be used to load metadata indexed in a variety of ways. For the mu-mux readout used in SO, the following additional steps will usually be performed to augment the ``det_info``: - A "channel map" (a.k.a. "det map", "det match", ...) result will be loaded, to associate a ``det_id`` with each of the loaded ``readout_id`` values (or most of them, hopefully). While the ``readout_id`` describes a specific channel of the readout hardware, the ``det_id`` corresponds to a particular optical device (e.g. a detector with known position, orientation, and passband). - Some "wafer info" will be loaded, containing various properties of the detectors, as designed (for example, their approximate locations; passbands; wiring information; etc.). This is a single table of data for each physical wafer, indexed naturally by ``det_id``, but the rows here can not be associated with each ``readout_id`` until we have loaded the "channel map". Special metadata entries in context.yaml are used to augment the ``det_info``. table; these are marked with ``det_info: true`` and do not have a ``unpack: ...`` field. For example:: metadata: - ... - db: 'more_det_info.sqlite' det_info: true - ... It is expected that the database (``more_det_info.sqlite``) is a standard ManifestDb, which will be queried in the usual way except that when we get to the "Wrap Metadata" step, instead the following is performed: - All the loaded fields are inspected, and any fields that are already found in the current ``det_info`` are used as Index fields. - The columns from the new metadata are merged into the active ``det_info``, ensuring that the index field values correspond. - Only the rows for which the index field has the same value in the two objects are kept. Here are a few more tips about det_info: - *All* fields in the det_info metdata should be prefixed with ``dets:``, to signify that they are associated with the dets axis (this is similar to fields used as index fields in standard metadata. - The field called ``dets:readout_id`` is assumed, throughout the Context/Metdata code, to correspond to the values in the ``.dets`` axis of the TOD AxisManager. - By convention, ``dets:det_id`` is used for the physical detector (or pseudo-detector device) identifier. The special value "NO_MATCH" is used for cases where the ``det_id`` could not be determined. When new det_info are merged in, any fields found in both the existing and new det_info will be used to form the association. Many-to-many matching is fully supported, meaning that a unique index (e.g. ``readout_id``) does not need to be used in the new det_info. However, it is expected that in most cases either ``readout_id`` or ``det_id`` will be used to label det_info contributions. Metadata System APIs ==================== load_metadata ------------- *This function provides a way to load metadata for an AxisManager, from a ManifestDb, without having to encode the result in a context.yaml metadata entry.* .. autofunction:: load_metadata MetadataSpec ------------ .. autoclass:: MetadataSpec :special-members: __init__ :members: SuperLoader ----------- .. autoclass:: SuperLoader :special-members: __init__ :members: ResultSet --------- .. autoclass:: ResultSet :special-members: __init__ :members: sotodlib.io.metadata -------------------- This module contains functions for working with ResultSet in HDF5. Here's the docstring for ``write_dataset``: .. autofunction:: sotodlib.io.metadata.write_dataset Here's the docstring for ``read_dataset``: .. autofunction:: sotodlib.io.metadata.read_dataset Here's the class documentation for ResultSetHdfLoader: .. autoclass:: sotodlib.io.metadata.ResultSetHdfLoader :inherited-members: __init__ :members: ------------------------------------ DetDb: Detector Information Database ------------------------------------ The purpose of the ``DetDb`` class is to give analysts access to quasi-static detector metadata. Some good example of quasi-static detector metadata are: the name of telescope the detector lives in, the approximate central frequency of its passband, the approximate position of the detector in the focal plane. This database is not intended to carry precision results needed for mapping, such as calibration information or precise pointing and polarization angle data. .. note:: DetDb is not planned for use in SO, because of the complexity of the ``readout_id`` to ``det_id`` mapping problem. The ``det_info`` system (described in the :ref:`metadata-section` section) is used for making detector information available in the loaded TOD object. DetDb is still used for simulations and for wrapping of data from other readout systems. Using a DetDb (Tutorial) ======================== Loading a database into memory ------------------------------ To load an existing :py:obj:`DetDb` into memory, use the :py:obj:`DetDb.from_file` class method:: >>> from sotodlib.core import metadata >>> my_db = metadata.DetDb.from_file('path/to/database.sqlite') This function understands a few different formats; see the method documentation. If you want an example database to play with, run this:: >>> from sotodlib.core import metadata >>> my_db = metadata.get_example('DetDb') Creating table base Creating table geometry Creating LF-type arrays... Creating MF-type arrays... Creating HF-type arrays... Committing 17094 detectors... Checking the work... >>> my_db The usage examples below are based on this example database. Detectors and Properties ------------------------ The typical use of DetDb involves alternating use of the ``dets`` and ``props`` functions. The ``dets`` function returns a list of detectors with certain indicated properties; the ``props`` function returns the properties of certain indicated detectors. We can start by getting a list of *all* detectors in the database:: >>> det_list = my_db.dets() >>> det_list ResultSet<[name], 17094 rows> The :py:obj:`ResultSet` is a simple container for tabular data. Follow the link to the class documentation for the detailed interface. Here we have a single column, giving the detector name:: >>> det_list['name'] array(['LF1_00000', 'LF1_00001', 'LF1_00002', ..., 'HF2_06501', 'HF2_06502', 'HF2_06503'], dtype='>> props = my_db.props() >>> props ResultSet<[base.instrument,base.camera,base.array_code, base.array_class,base.wafer_code,base.freq_code,base.det_type, geometry.wafer_x,geometry.wafer_y,geometry.wafer_pol], 17094 rows> The output of ``props()`` is also a ``ResultSet``; but it has many columns. The property values for the first detector are: >>> props[0] {'base.instrument': 'simonsobs', 'base.camera': 'latr', 'base.array_code': 'LF1', 'base.array_class': 'LF', 'base.wafer_code': 'W1', 'base.freq_code': 'f027', 'base.det_type': 'bolo', 'geometry.wafer_x': 0.0, 'geometry.wafer_y': 0.0, 'geometry.wafer_pol': 0.0} We can also inspect the data by column, e.g. ``props['base.camera']``. Note that `name` isn't a column here... each row corresponds to a single detector, in the order returned by my_db.dets(). Querying detectors based on properties -------------------------------------- Suppose we want to get the names of the detectors in the (nominal) 93 GHz band. These are signified, in this example, by having the value ``'f093'`` for the ``base.freq_code`` property. We call ``dets()`` with this specfied:: >>> f093_dets = my_db.dets(props={'base.freq_code': 'f093'}) >>> f093_dets ResultSet<[name], 5184 rows> The argument passed to the ``props=`` keyword, here, is a dictionary containing certain values that must be matched in order for a detector to be included in the output ResultSet. One can also pass a *list* of such dictionaries (in which case a detector is included if it fully matches any of the dicts in the list). One can, to similar effect, pass a ResultSet, which results in detectors being checked against each row of the ResultSet. Similarly, we can request the properties of some sub-set of the detectors; let's use the ``f093_dets`` list to confirm that these detectors are all in ``MF`` arrays:: >>> f093_props = my_db.props(f093_dets, props=['base.array_class']) >>> list(f093_props.distinct()) [{'base.array_class': 'MF'}] Note we've used the :py:obj:`ResultSet.distinct()` method to eliminate duplicate entries in the output from ``props()``. If you prefer to work with unkeyed data, you can work with ``.rows`` instead of converting to a list:: >>> f093_props.distinct().rows [('MF',)] Grouping detectors by property ------------------------------ Suppose we want to loop over all detectors, but with them grouped by array name and frequency band. There are many ways to do this, but a very general approach is to generate a list of tuples representing the distinct combinations of these properties. We then loop over that list, pulling out the names of the matching detectors for each tuple of property values. Here's an example, which simply counts the results:: # Get the two properties, one row per detector. >>> props = my_db.props(props=[ ... 'base.array_code', 'base.freq_code']) # Reduce to the distinct combinations (only 14 rows remain). >>> combos = props.distinct() # Loop over all 14 combos: >>> for combo in combos: ... these_dets = my_db.dets(props=combo) ... print('Combo {} includes {} dets.'.format(combo, len(these_dets))) ... Combo {'base.array_code': 'HF1', 'base.freq_code': 'f225'} includes 1626 dets. Combo {'base.array_code': 'HF1', 'base.freq_code': 'f278'} includes 1626 dets. Combo {'base.array_code': 'HF2', 'base.freq_code': 'f225'} includes 1626 dets. # ... Combo {'base.array_code': 'MF4', 'base.freq_code': 'f145'} includes 1296 dets. Extracting useful detector properties ------------------------------------- There are a couple of standard recipes for getting data out efficiently. SUppose you want to extract two verbosely-named numerical columns `geometry.wafer_x` and `geometry.wafer_y`. We want to be sure to only type those key names out once:: # Find all 'LF' detectors. >>> LF_dets = my_db.dets(props={'base.array_class': 'LF'}) >>> LF_dets ResultSet<[name], 222 rows> # Get positions for those detectors. >>> positions = my_db.props(LF_dets, props=['geometry.wafer_x', ... 'geometry.wafer_y']) >>> x, y = numpy.transpose(positions.rows) >>> y array([0. , 0.02, 0.04, 0.06, 0.08, 0.1 , 0.12, 0.14, 0.16, 0.18, 0.2 , 0.22, 0.24, 0.26, 0.28, 0.3 , 0.32, 0.34, 0.36, 0.38, ...]) # Now go plot stuff using x and y... # ... Note in the last line we've used numpy to transform the tabular data (in `ResultSet.rows`) into a simple (n,2) float array, which is then transposed to a (2,n) array, and broadcast to variables x and y. It is import to include the `.rows` there -- a direct array conversion on `positions` will not give you what you want. Inspecting a database --------------------- If you want to see a list of the properties defined in the database, just call ``props`` with an empty list of detectors. Then access the ``keys`` data member, if you want programmatic access to the list of properties:: >>> props = my_db.props([]) >>> props ResultSet<[base.instrument,base.camera,base.array_code, base.array_class,base.wafer_code,base.freq_code,base.det_type, geometry.wafer_x,geometry.wafer_y,geometry.wafer_pol], 0 rows> >>> props.keys ['base.instrument', 'base.camera', 'base.array_code', 'base.array_class', 'base.wafer_code', 'base.freq_code', 'base.det_type', 'geometry.wafer_x', 'geometry.wafer_y', 'geometry.wafer_pol'] Creating a DetDb ================ For an example, see source code of :func:`get_example`. Database organization ===================== **The dets Table** The database has a primary table, called ``dets``. The ``dets`` table has only the following columns: ``name`` The string name of each detector. ``id`` The index, used internally, to enumerate detectors. Do not assume that the correspondence between index and name will be static -- it may change if the underlying inputs are changed, or if a database subset is generated. **The Property Tables** Every other table in the database is a property table. Each row of the property table associates one or more (key,value) pairs with a detector for a particular range of time. A property table contains at least the following 3 columns: ``det_id`` The detector's internal id, a reference to ``dets.id``. ``time0`` The timestamp indicating the start of validity of the properties in the row. ``time1`` The timestamp indicating the end of validity of the properties in the row. The pair of timestamps ``time0`` and ``time1`` define a semi-open interval, including all times ``t`` such that ``time0 <= t < time1``. All other columns in the property table provide detector metadata. For a property table to contain valid data, the following criteria must be satisfied: 1. The range of validity must be non-negative, i.e. ``time0 <= time1``. Note that if ``time0 == time1``, then the interval is empty. 2. In any one property table, the time intervals associated with a single ``det_id`` must not overlap. Otherwise, there would be ambiguity about the value of a given property. Internally, query code will assume that above two conditions are satisfied. Functions exist, however, to verify compliance of property tables. DetDb auto-documentation ======================== Auto-generated documentation should appear here. .. autoclass:: DetDb :special-members: __init__ :members: .. autofunction:: sotodlib.core.metadata.detdb.get_example --------------------------- ObsDb: Observation Database --------------------------- Overview =========== The purpose of the ``ObsDb`` is to help a user select observations based on high level criteria (such as the target field, the speed or elevation of the scan, whether the observation was done during the day or night, etc.). The ObsDb is also used, by the Context system, to select the appropriate metadata results from a metadata archive (for example, the timestamp associated with an observation could be used to find an appropriate pointing offset). The ObsDb is constructed from two tables. The "obs table" contains one row per observation and is appropriate for storing basic descriptive data about the observation. The "tags table" associates observations to particular labels (called tags) in a many-to-many relationship. The ObsDb code does not really require that any particular information be present; it does not insist that there is a "timestamp" field, for example. Instead, the ObsDb can contain information that is needed in a specific context. However, recommended field names for some typical information types are given in `Standardized ObsDb field names`_. .. note:: The difference between ObsDb and ObsFileDb is that ObsDb contains arbitrary high-level descriptions of Observations, while ObsFileDb only contains information about how the Observation data is organized into files on disk. ObsFileDb is consulted when it is time to load data for an observation into memory. The ObsDb is a repository of information about Observations, independent of where on disk the data lives or how the data files are organized. Creating an ObsDb ================= To create an ObsDb, one must define columns for the obs table, and then add data for one or more observations, then write the results to a file. Here is a short program that creates an ObsDb with entries for two observations:: from sotodlib.core import metadata # Create a new Db and add two columns. obsdb = metadata.ObsDb() obsdb.add_obs_columns(['timestamp float', 'hwp_speed float', 'drift string']) # Add two rows. obsdb.update_obs('myobs0', {'timestamp': 1900000000., 'hwp_speed': 2.0, 'drift': 'rising'}) obsdb.update_obs('myobs1', {'timestamp': 1900010000., 'hwp_speed': 1.5, 'drift': 'setting'}) # Apply some tags (this could have been done in the calls above obsdb.update_obs('myobs0', tags=['hwp_fast', 'cryo_problem']) obsdb.update_obs('myobs1', tags=['hwp_slow']) # Save (in gzip format). obsdb.to_file('obsdb.gz') The column definitions must be specified in a format compatible with sqlite; see :py:meth:`ObsDb.add_obs_columns`. When the user adds data using :py:meth:`ObsDb.update_obs`, the first argument is the ``obs_id``. This is the primary key in the ObsDb and is also used to identify observations in the ObsFileDb. When we write the database using :py:meth:`ObsDb.to_file`, using a .gz extension selects gzip output by default. Using an ObsDb ============== The :py:meth:`ObsDb.query` function is used to get a list of observations with particular properties. The user may pass in an sqlite-compatible expression that refers to columns in the obs table, or to the names of tags. Basic queries ------------- Using our example database from the preceding section, we can try a few queries:: >>> obsdb.query() ResultSet<[obs_id,timestamp,hwp_speed], 2 rows> >>> obsdb.query('hwp_speed >= 2.') ResultSet<[obs_id,timestamp,hwp_speed], 1 rows> >>> obsdb.query("hwp_speed > 1. and drift=='rising'") ResultSet<[obs_id,timestamp,hwp_speed], 1 rows> The object returned by obsdb.query is a :py:obj:`ResultSet`, from which individual columns or rows can easily be extracted: >>> rs = obsdb.query() >>> print(rs['obs_id']) ['myobs0' 'myobs1'] >>> print(rs[0]) OrderedDict([('obs_id', 'myobs0'), ('timestamp', 1900000000.0), ('hwp_speed', 2.0), ('drift', 'rising')]) Queries involving tags ---------------------- Information from the tags table will only show up in the output if explicitly requested. For example, we can ask for the ``'hwp_fast'`` and ``'hwp_slow'`` fields to be included:: >>> obsdb.query(tags=['hwp_fast', 'hwp_slow']) ResultSet<[obs_id,timestamp,hwp_speed,drift,hwp_fast,hwp_slow], 2 rows> Tag columns will have value 1 if the tag has been applied to that observation, and 0 otherwise. A query can be filtered based on tags; there are two ways to do this. One is to append '=0' or '=1' to the end of some of the tag strings:: >>> obsdb.query(tags=['hwp_fast=1']) ResultSet<[obs_id,timestamp,hwp_speed,drift,hwp_fast], 1 rows> Alternately, the values of tags can be used in query strings:: >>> obsdb.query("(hwp_fast==1 and drift=='rising') or (hwp_fast==0 and drift='setting')", tags=['hwp_fast']) ResultSet<[obs_id,timestamp,hwp_speed,drift,hwp_fast], 2 rows> Getting a description of a single observation --------------------------------------------- If you just want the basic information for an observation of known obs_id, use the get function :py:meth:`ObsDb.get` function:: >>> obsdb.get('myobs0') OrderedDict([('obs_id', 'myobs0'), ('timestamp', 1900000000.0), ('hwp_speed', 2.0), ('drift', 'rising')]) If you want a list of all tags for an observation, call get with tags=True:: >>> obsdb.get('myobs0', tags=True) OrderedDict([('obs_id', 'myobs0'), ('timestamp', 1900000000.0), ('hwp_speed', 2.0), ('drift', 'rising'), ('tags', ['hwp_fast', 'cryo_problem'])]) So here we see that the observation is associated with tags ``'hwp_fast'`` and ``'cryo_problem'``. .. _obsdb-names-section: Standardized ObsDb field names ============================== Other than obs_id, specific field names are not enforced in code. However, there are certain typical bits of information for which it makes sense to strongly encourage standardized field names. These are defined below. ``timestamp`` A specific moment (as a Unix timestamp) that should be used to represent the observation. Best practice is to have this be fairly close to the start time of the observation. ``duration`` The approximate length of the observation in seconds. Class auto-documentation ======================== *The class documentation of ObsDb should appear below.* .. autoclass:: ObsDb :special-members: __init__ :members: .. _obsfiledb-section: ------------------------------------ ObsFileDb: Observation File Database ------------------------------------ The purpose of ObsFileDb is to provide a map into a large set of TOD files, giving the names of the files and a compressed expression of what time indices and detectors are present in each file. The ObsFileDb is an sqlite database. It carries some information about each "Observation" and the "detectors"; but is complementary to the ObsDb and DetDb. Data Model ========== We assume the following organization of the time-ordered data: - The data are divided into contiguous segments of time called "Observations". An observation is identified by an ``obs_id``, which is a string. - An Observation involves a certain set of co-sampled detectors. The files associated with the Observation must contain data for all the Observation's detectors at all times covered by the Observation. - The detectors involved in a particular Observation are divided into groups called detsets. The purpose of detset grouping is to map cleanly onto files, thus each file in the Observation should contain the data for exactly one detset. Here's some ascii art showing an example of how the data in an observation must be split between files:: sample index +--------------------------------------------------> d | e | +-------------------------+------------------+ t | | obs0_waf0_00000 | obs0_waf0_01000 | e | +-------------------------+------------------+ c | | obs0_waf1_00000 | obs0_waf1_01000 | t | +-------------------------+------------------+ o | | obs0_waf2_00000 | obs0_waf2_01000 | r | +-------------------------+------------------+ V In this example the data for the observation has been distributed into 6 files. There are three detsets, probably called ``waf0``, ``waf1``, and ``waf2``. In the sample index (or time) direction, each detset is associated with two files; apparently the observation has been split at sample index 1000. Notes: - Normally detsets will be coherent across a large set of observations -- i.e. because we will probably always group the detectors into files in the same way. But this is not required. - In the case of non-cosampled arrays that are observing at the same time on the same telescope: these qualify as different observations and should be given different obs_ids. - It is currently assumed that in a single observation the files for each detset will be divided at the same sample index. The database structure doesn't have this baked in, but some internal verification code assumes this behavior. So this requirement can likely be loosened, if need be. The database consists of two main tables. The first is called ``detsets`` and associates detectors (string ``detsets.det``) with a particular detset (string ``detset.name``). The second is called ``files`` and associates files (``files.name`` to each Observation (string ``files.obs_id``), detset (string ``files.detset``), and sample range (integers ``sample_start`` and ``sample_stop``). Constructing the ObsFileDb involves building the detsets and files tables, using functions ``add_detset`` and ``add_obsfile``. Using the ObsFileDb is accomplished through the functions ``get_dets``, ``get_detsets``, ``get_obs``, and through custom SQL queries on ``conn``. Relative vs. Absolute Paths =========================== The filenames stored in ObsFileDb may be specified with relative or absolute paths. (Absolute paths are assumed if the filename starts with a /.) Relative paths are taken as being relative to the directory where the ObsFileDb sqlite file lives; this can be overridden by setting the ``prefix`` attribute. Consider the following file listing as an example:: / data/ planet_obs/ obsfiledb.sqlite # the database observation0/ # directory for an obs data0_0 # one data file data0_1 observation1/ data1_0 data1_1 observation2/ ... Note the ``obsfiledb.sqlite`` file, located in ``/data/planet_obs``. The filenames in obsfiledb.sqlite might be specified in one of two ways: 1. Using paths relative to the directory where ``obsfiledb.sqlite`` is located. For example, ``observation0/data0_1``. Relative paths permit one to move the tree of data to other locations without needing to alter the ``obsfiledb.sqlite`` (as long as the relative locations of the data and sqlite file remain fixed). 2. Using absolute paths on this file system; for example ``/data/planet_obs/observation0/data0_1``. This is not portable, but it is a better choice if the ObsFileDb ``.sqlite`` file isn't kept near the TOD data files. A database may contain a mixture of relative and absolute paths. Example Usage ============= Suppose we have a coherent archive of TOD data files living at ``/mnt/so1/shared/todsims/pipe-s0001/v2/``. And suppose there's a database file, ``obsfiledb.sqlite``, in that directory. We can load the observation database like this:: import sotoddb db = sotoddb.ObsFileDb.from_file('/mnt/so1/shared/todsims/pip-s0001/v2/') Note we've given it a directory, not a filename... in such cases the code will read ``obsfiledb.sqlite`` in the stated directory. Now we get the list of all observations, and choose one:: all_obs = db.get_obs() print(all_obs[0]) # -> 'CES-Atacama-LAT-Tier1DEC-035..-045_RA+040..+050-0-0_LF' We can list the detsets present in this observation; then get all the file info (paths and sample indices) for one of the detsets:: all_detsets = db.get_detsets(all_obs[0]) print(all_detsets) # -> ['LF1_tube_LT6', 'LF2_tube_LT6'] files = db.get_files(all_obs[0], detsets=[all_detsets[0]]) print(files['LF1_tube_LT6']) # -> [('/mnt/so1/shared/todsims/pipe-s0001/v2/datadump_LAT_LF1/CES-Atacama-LAT-Tier1DEC-035..-045_RA+040..+050-0-0/LF1_tube_LT6_00000000.g3', 0, None)] Class Documentation =================== *The class documentation of ObsFileDb should appear below.* .. autoclass:: ObsFileDb :special-members: __init__ :members: ----------------- Command Line Tool ----------------- The `so-metadata` script is a tool that can be used to inspect and alter the contents of ObsFileDb, ObsDb, and ManifestDb ("metadata") sqlite3 databases. In the case of ObsFileDb and ManifestDb, it can also be used to perform batch filename updates. To summarize a database, pass the db type and then the path to the db file. It might be convenient to start by printing a summary of the context.yaml file, as this will give full paths to the various databases, that can be copied and pasted:: so-metadata context /path/to/context.yaml Analyzing individual databases:: so-metadata obsdb /path/to/context/obsdb. so-metadata obsfiledb /path/to/context/obsfiledb.sqlite so-metadata metadata /path/to/context/some_metadata.sqlite Usage ===== .. argparse:: :module: sotodlib.core.metadata.cli :func: get_parser :prog: so-metadata