vsf.dataset module

class vsf.dataset.BaseDataset[source]

Bases: object

Base class for sequential dataset. Each entry is a sequence of something, typically dicts mapping sequence names to numpy arrays.

get_sequence(sequenceIdx: int) → list[source]: Returns the sequence with index sequenceIdx. The sequence is assumed to be able to be treated like a list.

class vsf.dataset.DatasetConfig(type: str = '', path: str = '', keys: dict = <factory>, cache: bool = True, sensor_keys: ~typing.Dict[str, str] = <factory>, control_keys: ~typing.Dict[str, str] = <factory>)[source]

Bases: object

Configuration of a dataset, for the two types of dataset we support.

cache: bool = True: Whether to cache data in memory. Only used for MultiModalDataset

control_keys: Dict[str, str]: the keys in the dataset used for the controls

keys: dict: key definition for MultiModalDataset

path: str = '': where the dataset lies

sensor_keys: Dict[str, str]: the keys in the dataset used for matching sensor measurements

type: str = '': can be ‘file_loader’ or empty

class vsf.dataset.MultiModalDataset(data_types: dict, dir_path: str, cache_data=True, sensor_keys=None, control_keys=None)[source]

Bases: BaseDataset

A dataset that can store multiple types of data along with metadata.

The dataset structure consists of a top-level directory containing multiple subfolders, each named with seq_ followed by a sequence ID.

Parameters:

data_types (dict) – Mapping between dataset entry names and their types. - int: 1D column vector of default NumPy array type. - (int, dtype): 1D column vector of specified type. - ((shape), dtype): N-dimensional array of given shape and type. - dtype can be any array type or “img” for images, (will be saved and loaded as many png files instead of a numpy array).
dir_path (str) – Path to load/save the data, toplevel folder to dump things into. If it does not exist, it is created. Every time a new sequence is collected, a subfolder is created using the unix timestamp, and populated with data (numpy arrays / images). This class can write/read multiple sequences within this “toplevel folder”.
cache_data (bool, optional) – Whether to cache data in memory. Defaults to True.
sensor_keys (list, optional) – Keys used for matching sensor measurements.
control_keys (list, optional) – Keys used for the controls.

data_types

Processed mapping of dataset entry names and their types.

Type:: dict

dir_path

The dataset storage directory.

Type:: str

seq_names

List of sequence subfolders.

Type:: list

cache_data

Whether data is cached in memory.

Type:: bool

seq_cache

Cached data storage.

Type:: dict

sensor_keys

Keys used for sensor matching.

Type:: list

control_keys

Keys used for control matching.

Type:: list

IMAGE16_DTYPE: alias of uint16

IMAGE8_DTYPE: alias of uint8

create_sequence(seq_id=None, ret_id=False)[source]

Create a new data sequence folder, using the given sequence id.

NOTE: this routine will error if the given sequence id already exists.

get_sequence(sequenceIdx: int | str) → MultiModalDatasetSequence[source]: Returns a sequence as a MultiModalDatasetSequence