Creating your database

Build a database of DICOM data header information, a database file (.db). The create function initializes the database using DICOM data from a specified directory.

from dicomselect import Database
from pathlib import Path

# Specify the path for the database file
db_path = Path('path/to/example.db')

# Initialize the Database object with the specified path
db = Database(db_path)

# Create the .db file using DICOM data from the provided dataset directory
db.create('path/to/dataset/')

Database API

class dicomselect.database.Database(db_path: PathLike)

Bases: Logger

A class for creating and interacting with a database of DICOM data header information.

This class allows for the creation of a database file (.db) which stores DICOM data header information. It supports querying the database to filter and retrieve specific data rows, and can convert the results of queries into more accessible file formats like mha or nii.gz.

The database supports context management (with statement) for querying, as well as explicit open and close methods for database connections. Query objects returned can be manipulated similar to sets to filter down rows in the database. These can be used to specify the parameters for a data conversion.

Parameters:

db_path (PathLike) – The file system path where the database file (.db) will be created or accessed.

class PreferMode

Bases: object

See Database.prefer_mode().

PreferDcmFile = 2
PreferZipFile = 1
create(data_dir: PathLike, update: bool = False, batch_size: int = 10, max_workers: int = 4, max_rows: int = -1, max_init: int = 5, reader_prefer_mode: int | None = None, custom_header_func: Callable[[DICOMImageReader], Dict[str, str | int | float]] | None = None, skip_func: Callable[[Path], bool] | None = None, additional_dicom_tags: list[str] = None)

Build a database from DICOMs in data_dir.

Parameters:
  • data_dir (PathLike) – Directory containing .dcm data or dicom.zip data.

  • update (bool) – If the db exists and is complete, will force the database to rescan the data_dir for any new data.

  • batch_size (int) – Number of DICOMImageReader to process per worker.

  • max_workers (int) – Max number of workers for parallel execution of database creation.

  • max_rows (int) – Max rows sets the maximum number of rows in the database. Useful when doing a test run. -1 to disable.

  • max_init (int) – Max number of items to scout, in order to define the columns for the database. Minimum 1.

  • reader_prefer_mode (int | None) – By default, DICOM metadata and image data is read using SimpleITK. Set preference for which one to use during database creation, see DICOMImageReader.PreferMode.

  • custom_header_func (Callable[[DICOMImageReader], Dict[str, str | int | float]] | None) – Create custom headers by returning a dict of [str, str | int | float] using DICOMImageReader. Note that using DICOMImageReader.image is a heavy operation and will significantly slow down database creation speed

  • skip_func (Callable[[Path], bool] | None) – Filter out certain directories. This function performs an os.walk, directories are skipped for which True is returned in this function.

  • additional_dicom_tags (list[str]) – See https://www.dicomlibrary.com/dicom/dicom-tags/, input any additional tags that are not included by default Each tag should be formatted as shown in the DICOM tag library, eg. ‘(0002,0000)’. Non-existent tags will result in errors.

  • Examples

    >>> def custom_skip_func(path: Path):
    >>>     return '/incomplete_dicoms/' in path.as_posix()
    >>>
    >>> def custom_header_func(reader: DICOMImageReader):
    >>>     return {'custom_header': 'text', 'custom_header_int': 23}
    

property data_dir: Path

Path to the dataset directory this database is linked to.

Raises:

sqlite3.DataError, if no database exists or is corrupt.

property path: Path
plan(filepath_template: str, *queries: Query) Plan

Prepare a conversion plan, which can convert the results of queries to MHA files. You can use {dicomselect_uid} in the filepath_template to guarantee a unique string of

Parameters:
  • filepath_template (str) –

    Dictates the form of the directory and filename structure, omitting the suffix. Use braces along with column names to replace with that column value. Use forward slash to create a directory structure. (see Query.columns for a full list of available columns).

    Illegal characters will be replaced with ‘#’. Blank column values will be replaced with ‘(column_name)=blank’

  • queries (Query) – The combined results of the query object will be converted to MHA.

Return type:

Plan

Examples

>>> plan = db.plan('{patient_id}/prostateX_{series_description}_{instance_creation_time}.mha', query_0000)
>>> plan.target_dir = 'tests/output/example'
>>> plan.extension = '.mha'  # this is automatic if you provide a suffix to filepath_template as in this example
>>> plan.execute()
property prefer_mode: int

Sometimes, a directory may contain both .dcm files and a .zip file containing the same dicom files. Set preference for which one to use during database creation using PreferMode.

property verify_dcm_filenames: bool

Verify whether .dcm files in a directory are named logically (e.g. 01.dcm, 02.dcm, …, 11.dcm with none missing) Default is False.

property version: str

dicomselect version this database is created with.

class dicomselect.reader.DICOMImageReader(path: Path | str, prefer_mode: int = None, verify_dicom_filenames: bool = False, allow_raw_tags: bool = True, additional_tags: list[str] = None)

Bases: object

Reads a folder containing DICOM slices (possibly enclosed in a zip file). Will only read items which end with .dcm.

Parameters:
  • path (Path | str) – Path to the folder containing the DICOM slices. The folder should contain the DICOM slices, or a zip file named “dicom.zip” containing the DICOM slices.

  • verify_dicom_filenames (bool) – Verify DICOM filenames have increasing numbers, with no gaps. Common prefixes are removed from the filenames before checking the numbers, this allows to verify filenames like “1.2.86.1.dcm”, …, “1.2.86.12.dcm”.

  • allow_raw_tags (bool) – Allow loading of any tags contained in the DICOM, irrelevant of whether they are valid

  • additional_tags (list[str]) – Load more strings than is defined in dicomselect.constants. A full list of all DICOM tags is available in dicomselect.tags_generated

  • prefer_mode (int) –

Examples

>>> reader = DICOMImageReader('path/to/dicom/folder')
>>> image = reader.image
>>> metadata = reader.metadata
class PreferMode

Bases: object

See DICOMImageReader.prefer_mode().

PreferITKImage = 4
PreferITKMetadata = 1
PreferPydicomImage = 8
PreferPydicomMetadata = 2
property dicom_slice_paths: list[str]

Usually, a DICOM image is built from a multiple of smaller .dcm files. This returns a list of those .dcm files.

property image: Image

This is a slow operation. More information on sitk.Image can be found in the documentation. :returns: A SimpleITK image

property is_zipfile: bool
property metadata: Dict[str, str]

Native DICOM metadata.

Returns:

A dictionary of str -> str, where keys and values are native DICOM headers and their values.

property path: Path
property prefer_mode: int

By default, DICOM metadata and image data is read using SimpleITK. Set preference for which one to use during database creation using PreferMode.

exception dicomselect.reader.MissingDICOMFilesError(path: Path | str)

Bases: BaseException

Exception raised when a DICOM series has missing DICOM slices.

Parameters:

path (Path | str) –

exception dicomselect.reader.UnreadableDICOMError(path: Path | str)

Bases: BaseException

Exception raised when a DICOM series could not be loaded.

Parameters:

path (Path | str) –