G3tSMuRF

G3tSmurf is primarily a database building system for indexing the smurf timestreams and ancillary smurf data to produce the tables necessary for Context and the rest of sotodlib. It also includes data loading functionality. The most general SO user will not interact much with G3tSmurf, it will build the ObsDb, ObsFileDb, and DetDb databases in the background. These will used to load data through Context. Those working closely with laboratory, level 1, or level 2 site data will likely be interacting with the databases more directly and/or using the G3tSmurf loaders.

Data Loading

Loading without Databases

We have a function that can accept any level of .g3 smurf data and load it into an AxisManager with the desired schema. These AxisManagers will not be as complete as they are when loaded with database information but will contain information available in the status frames such as the smurf band, channel mapping.

The load_file function only requires the filename or a list of file names:

aman = load_file(filename)

Loading with this method is recommended for work directly on SMuRF servers but if databases are accessible it is recommended that G3tSmurf database loads be used instead.

sotodlib.io.load_smurf.load_file(filename, channels=None, samples=None, ignore_missing=True, no_signal=False, load_biases=True, load_primary=True, status=None, archive=None, obsfiledb=None, show_pb=True, det_axis='dets', linearize_timestamps=True, merge_det_info=True)[source]

Load data from file where there may or may not be a connected archive.

Parameters:

filename (str or list) – A filename or list of filenames (to be loaded in order). Note that SmurfStatus is only loaded from the first file
channels (list or None) – If not None, it should be a list that can be sent to get_channel_mask.
samples (tuple or None) – If not None, it should be a tuple of (sample_start, sample_stop) where the sample counts are relative to the entire g3 session, not just the files being loaded from the list.
ignore_missing (bool) – If true, will not raise errors if a requested channel is not found
no_signal (bool) – If true, will not load the detector signal from files
load_biases (bool) – If true, will load the bias lines for each detector
load_primary (bool) – If true, loads the primary data fields, old .g3 files may not have these fields.
archive (a G3tSmurf instance (optional))
obsfiledb (a ObsFileDb instance (optional, used when loading from context))
status (a SmurfStatus Instance if we don't want to use the one from the) – first file
det_axis (name of the axis used for channels / detectors)
linearize_timestamps (bool) – sent to _get_timestamps. if true and using unix timing, linearize the timing based on the frame counter
merge_det_info (bool) – if true, emulate det_info from file info

Returns:

aman – AxisManager with the data with axes for channels and samps. It will always have fields timestamps, signal, flags`(FlagManager), `ch_info (AxisManager with bands, channels, frequency, etc).

Return type:

AxisManager

Loading with G3tSmurf

To load data with database information we first have to instantiate our G3tSmurf object. This is most easily done using a YAML configuration file because those are easily shared between people. The configuration file requires two keys:

data_prefix: "/path/to/data"
g3tsmurf_db: "/path/to/database.db"

This configuration file is set up so that other keys, such as a HWP prefixes or HK prefixes could also be added. With a config file, you connect to the G3tSmurf database as:

SMURF = G3tSmurf.from_configs("configs.yaml")

Without a configuration file, you can directly pass the required paths:

SMURF = G3tSmurf(archive_path='/path/to/data/timestreams/',
                 meta_path='/path/to/data/smurf/,
                 db_path='/path/to/database.db')

From here we can load files, time ranges, or find individual observations.

Load a file with database information for the readout channel names:

aman = load_file(filename, archive=SMURF)

Warning

The SMURF.load_data function no longer has complete functionality compared to load_file. Use load_file where possible. The example below shows how to use load_file to load a complete observation.

To find a specific Observation, a time when data was just streaming with no other specific actions being run, we can search the Observation table. Here is an example of finding the first Observation after a specific ctime:

session = SMURF.Session()
obs = session.query(Observations).filter(
    Observations.timestamp > my_ctime
)
obs = obs.order_by(Observations.start).first()
aman = load_file( [f.name for f in obs.files], archive=SMURF )

These queries are built using SQLAlchemy commands and can filter on any of the columns in the Observations table.

Channel Masks on Load

Since UFMs have a large number of channels and we have many long observations it is often important to reduce active memory in some manner. One way to do that is to reduce the number of channels we load into memory at once.

All loading function accept a channels argument that is sent to the get_channel_mask function. If this argument is given, only the channels in that list will be in the returned AxisManager. The documentation for get_channel_mask includes which type of information it accepts.

sotodlib.io.load_smurf.get_channel_mask(ch_list, status, archive=None, obsfiledb=None, ignore_missing=True)[source]

Take a list of desired channels and parse them so the different data loading functions can load them.

Parameters:

ch_list (list) –
List of desired channels the type of each list element is used to determine what it is:
- int : index of channel in file. Useful for batching.
- (int, int) : band, channel
- string : channel name (requires archive or obsfiledb)
- float : frequency in the smurf status (or should we use channel assignment?)
status (SmurfStatus instance) – Status to use to generate channel loading mask
archive (G3tSmurf instance) – Archive used to search for channel names / frequencies
obsfiledb (ObsFileDb instance) – ObsFileDb used to search for channel names if archive is None
ignore_missing (bool) – If true, will not raise errors if a requested channel is not found

Returns:

mask (bool array) – Mask for the channels in the SmurfStatus
TODO (When loading from name, need to check tune file in use during file.)

G3tSmurf AxisManagers

AxisManagers loaded with G3tSmurf will all have the form:

Axes:
    samps -- samples in the data
    dets -- channels in the data
    bias_lines (optional) -- bias information

Fields:
    timestamps : (samps,)
        ctime timestamps for the loaded data
    signal : (dets, samps)
        Array of the squid phase in units of radians for each channel
    primary : AxisManager (samps,)
        "primary" data included in the packet headers
        'AveragingResetBits', 'Counter0', 'Counter1', 'Counter2',
        'FluxRampIncrement', 'FluxRampOffset', 'FrameCounter',
        'TESRelaySetting', 'UnixTime'
    biases (optional): (bias_lines, samps)
        Bias values during the data
    det_info : AxisManager (dets,)
        Information about channels, including SMuRF band, channel,
         frequency.

Database Creation and Upkeep

All our database building scripts are based on the file system architecture expected at the DAQ nodes. Note that this system will not work on the Smurf Servers. The system requires the folder structure to following the pattern:

/prefix/
    timestreams/
        16000/
            stream_id1/
                1600000000_000.g3
                1600000000_001.g3
                ...
                1600000000_XYZ.g3
            stream_id2/
                1600000000_000.g3
                1600000000_001.g3
                ...
                1600000000_XYZ.g3
        16001/
            stream_id1/
                1600100000_000.g3
                1600100000_001.g3
                ...
                1600100000_XYZ.g3
            stream_id2/
                1600100000_000.g3
                1600100000_001.g3
                ...
                1600100000_XYZ.g3
        ...
    smurf/
        16000/
            stream_id1/
                1600000000_action1/
                1600000ABC_action2/
                ...
                1600000XYZ_actionN/
            stream_id2/
                1600000000_action1/
                1600000ABC_action2/
                ...
                1600000XYZ_actionN/
        16001/
            stream_id1/
                1600100000_action1/
                1600100ABC_action2/
                ...
                1600100XYZ_actionN/
            stream_id2/
                1600100000_action1/
                1600100ABC_action2/
                ...
                1600100XYZ_actionN/
        ...

The ctimes used are just examples. But files in a specific stream_id folder will be assumed to be that stream_id. The .g3 files with the same ctime before the “_” will be assumed to be in the same observation/streaming session. The metadata searches are done through the action folders and their produced timestreams.

Database Creation and Update Script

Keeping the databases updated requires a little care when we are building databases while data is actively being taken. To assist with this there is an update_g3tsmurf_db.py script saved within the sotodlib.site_pipeline folder. This script requires config file that is the same as an expanded version of the one used for connecting to the G3tSmurf database:

data_prefix : "/path/to/daq-node/"
g3tsmurf_db: "/path/to/g3tsmurf.db"
g3thk_db: "/path/to/g3hk.db"


finalization:
    servers:
        - smurf-suprsync: "smurf-sync-so1" ## instance-id
          timestream-suprsync: "timestream-sync-so1" ## instance-id
          pysmurf-monitor: "monitor-so1" ## instance-id
        - smurf-suprsync: "smurf-sync-so2" ## instance-id
          timestream-suprsync: "timestream-sync-so2" ## instance-id
          pysmurf-monitor: "monitor-so2" ## instance-id

The finalization information and the HK database are required for tracking the data transfer status between the level 1 servers and the level 2 DAQ node.

The user running this script must have read, write, and execute permissions to the database file in order to perform updates.

Here is the information for this script:

usage: __main__.py [-h] [--update-delay UPDATE_DELAY] [--from-scratch]
                   [--verbosity VERBOSITY] [--index-via-actions]
                   [--use-monitor]
                   config

Positional Arguments

config: g3tsmurf db configuration file

Named Arguments

--update-delay

Days to subtract from now to set as minimum ctime

Default: 2

--from-scratch

Builds or updates database from scratch

Default: False

--verbosity

increase output verbosity. 0:Error, 1:Warning, 2:Info(default), 3:Debug

Default: 2

--index-via-actions

Look through action folders to create observations

Default: False

--use-monitor

Send updates to influx

Default: False

Utilities with G3tSmurf

File System Searches

Several of the generators used in the database indexing could be useful for building search functions off the same file set. G3tSmurf.search_metadata_actions and G3tSmurf.search_metadata_files are generators which can be used in loops to easily page through either actions or files. For Example:

def find_last_action(SMURF, my_action, max_ctime=None):
    for action, ctime, path in SMURF.search_metadata_actions(self,
                    max_ctime=max_ctime, reverse = True):
        if action == my_action:
            return action, ctime, path

def find_last_iv(SMURF, max_ctime=None):
    a, ctime, base_dir = find_last_action(SMURF, 'analyze_iv_and_save',
                                        max_ctime=max_ctime)
    files = os.listdir(os.path.join(base_dir,'outputs'))
    info = [ff for ff in files if 'analyze' in ff][0]

    return os.path.join(base_dir, 'outputs',info)

Operation Searches

Many types of sodetlib operations are saved in the G3tSmurf databases and we have many functions meant to be one-liner searches to help find sodetlib operations relative to observations. Note, these searches are based on automatic file tagging that was implemented in Oct. 2022. Operations from before that may not be found.

sotodlib.io.g3tsmurf_utils.get_last_bg_map(my_obs_id, SMURF)[source]

Find the last bias group map relative to a specific observation ID.