Creating your database¶
Build a database of DICOM data header information, a database file (.db). The create function initializes the database using DICOM data from a specified directory.
from dicomselect import Database
from pathlib import Path
# Specify the path for the database file
db_path = Path('path/to/example.db')
# Initialize the Database object with the specified path
db = Database(db_path)
# Create the .db file using DICOM data from the provided dataset directory
db.create('path/to/dataset/')
Custom reader¶
By default, db.create() uses DICOMImageReader to read DICOM files.
You can customise the behaviour by subclassing it and passing the result as reader_cls.
SimpleITK vs pydicom metadata
The default metadata backend is SimpleITK, which does not support DICOM tags nested inside sequences.
To collect nested tags, switch to pydicom by setting PreferPydicomMetadata in the prefer_mode.
Pydicom flattens nested sequences into column names using double underscores and the sequence index:
# flat tag
series_description
# tag nested one level deep (sequence index 0)
radiopharmaceutical_information_sequence__0__radiopharmaceutical
# tag nested two levels deep
sequence1__0__sequence2__0__tag
The example below enables pydicom for both metadata and image reading, and requests one additional
top-level tag (RadiopharmaceuticalInformationSequence) whose nested contents will be expanded
automatically:
from dicomselect import Database, ReaderPreferMode, DICOMImageReader
from pathlib import Path
class MyDICOMReader(DICOMImageReader):
def __init__(self, file, is_zip=False):
super().__init__(
file,
is_zip=is_zip,
allow_raw_tags=True,
additional_tags=['0054|0016'], # RadiopharmaceuticalInformationSequence
prefer_mode=(
DICOMImageReader.PreferMode.PreferPydicomImage
| DICOMImageReader.PreferMode.PreferPydicomMetadata
),
)
db_path = Path('path/to/example.db')
db = Database(db_path)
db.create('path/to/dataset/', reader_cls=MyDICOMReader)
Database API¶
- class dicomselect.database.Database(db_path: PathLike)¶
Bases:
LoggerA class for creating and interacting with a database of DICOM data header information.
This class allows for the creation of a database file (.db) which stores DICOM data header information. It supports querying the database to filter and retrieve specific data rows, and can convert the results of queries into more accessible file formats like mha or nii.gz.
The database supports context management (with statement) for querying, as well as explicit open and close methods for database connections. Query objects returned can be manipulated similar to sets to filter down rows in the database. These can be used to specify the parameters for a data conversion.
- Parameters:
db_path (PathLike) – The file system path where the database file (.db) will be created or accessed.
- create(data_dir: ~os.PathLike, update: bool = False, batch_size: int = 10, max_workers: int = 4, max_rows: int = -1, max_init: int = 5, reader_cls: ~typing.Type[~dicomselect.readers.reader.Reader] = <class 'dicomselect.readers.dicom.reader.DICOMImageReader'>, skip_func: ~typing.Callable[[~pathlib.Path], bool] | None = None)¶
Build a database from DICOMs in data_dir.
- Parameters:
data_dir (PathLike) – Directory containing .dcm data or dicom.zip data.
update (bool) – If the db exists and is complete, will force the database to rescan the data_dir for any new data.
batch_size (int) – Number of DICOMImageReader to process per worker.
max_workers (int) – Max number of workers for parallel execution of database creation.
max_rows (int) – Max rows sets the maximum number of rows in the database. Useful when doing a test run. -1 to disable.
max_init (int) – Max number of items to scout, in order to define the columns for the database. Minimum 1.
reader_cls (Type[Reader]) – File reader to use to create this database. You can create custom file readers using dicomselect.Reader. Default: DICOMImageReader.
reader_kwargs – Additional keyword arguments to pass to the __init__ of reader_cls.
header_func – Inject a function prior to adding a file to the database. The function provides reader_cls instances, where you can edit the Reader.metadata value (a dict[str, str]). Note that using Reader.image is usually a heavy operation and will significantly slow down database creation speed
skip_func (Callable[[Path], bool] | None) – Filter out certain directories. This function performs an os.walk, passing directory Path objects. This Path is skipped if true is returned.
Examples –
>>> def custom_skip_func(path: Path): >>> return '/incomplete_dicoms/' in path.as_posix() >>> >>> def custom_header_func(reader: DICOMImageReader): >>> return {'custom_header': 'text', 'custom_header_int': 23}
- property data_dir: Path¶
Path to the dataset directory this database is linked to.
- Raises:
sqlite3.DataError, if no database exists or is corrupt. –
- property path: Path¶
- plan(filepath_template: str, *queries: Query) Plan¶
Prepare a conversion plan, which can convert the results of queries to MHA files. You can use {dicomselect_uid} in the filepath_template to guarantee a unique string of
- Parameters:
filepath_template (str) –
Dictates the form of the directory and filename structure, omitting the suffix. Use braces along with column names to replace with that column value. Use forward slash to create a directory structure. (see Query.columns for a full list of available columns).
Illegal characters will be replaced with ‘#’. Blank column values will be replaced with ‘(column_name)=blank’
queries (Query) – The combined results of the query object will be converted to MHA.
- Return type:
Examples
TODO: improve this example >>> plan = db.plan(‘{patient_id}/prostateX_{series_description}_{instance_creation_time}.mha’, query_0000) >>> plan.target_dir = ‘tests/output/example’ >>> plan.extension = ‘.mha’ # this is automatic if you provide a suffix to filepath_template as in this example >>> plan.execute()
- property prefer_mode: int¶
Sometimes, a directory may contain both .dcm files and a .zip file containing the same dicom files. Set preference for which one to use during database creation using
PreferMode.
- property version: str¶
dicomselect version this database is created with.
- class dicomselect.readers.dicom.reader.DICOMImageReader(file: PathLike, is_zip: bool, prefer_mode: int = None, verify_dicom_filenames: bool = False, allow_raw_tags: bool = True, additional_tags: list[str] = None)¶
Bases:
Reader[Image]Reads a folder of DICOM slices (or a zip archive containing them) into a SimpleITK image.
By default, metadata and image data are read using SimpleITK. SimpleITK does not support DICOM tags nested inside sequences — to collect nested tags, set
PreferPydicomMetadatainprefer_mode. SeePreferModefor all options.- Parameters:
file (PathLike) – Path to the folder containing the DICOM slices, or to a zip archive containing them.
is_zip (bool) – Whether
fileis a zip archive.prefer_mode (int) – Combination of
PreferModeflags controlling whether SimpleITK or pydicom is used for metadata and image reading. Defaults toPreferITKImage | PreferITKMetadata.verify_dicom_filenames (bool) – Verify that DICOM filenames contain increasing numbers with no gaps.
allow_raw_tags (bool) – Collect all DICOM tags present in the file, in addition to the default set. When using pydicom, nested sequence tags are expanded automatically.
additional_tags (list[str]) – Extra DICOM tags to collect, specified as
'GGGG|EEEE'strings (e.g.['0054|0016']). A full list of available tags can be found indicomselect.readers.dicom.constants.
Examples
Basic usage:
reader = DICOMImageReader('path/to/dicom/folder', is_zip=False) image = reader.data metadata = reader.metadata
Custom subclass with pydicom and nested tag support:
class MyReader(DICOMImageReader): def __init__(self, file, is_zip=False): super().__init__( file, is_zip=is_zip, allow_raw_tags=True, prefer_mode=( DICOMImageReader.PreferMode.PreferPydicomImage | DICOMImageReader.PreferMode.PreferPydicomMetadata ), )
- class PreferMode¶
Bases:
objectSee
DICOMImageReader.prefer_mode().- PreferITKImage = 4¶
- PreferITKMetadata = 1¶
- PreferPydicomImage = 8¶
- PreferPydicomMetadata = 2¶
- property index: int¶
- property prefer_mode: int¶
- static suffixes() List[str]¶
Returns
['.dcm'].- Return type:
List[str]
- static suffixes_zip() List[str]¶
Returns
['.zip'].- Return type:
List[str]
- class dicomselect.readers.reader.Reader(file: PathLike, is_zip: bool, **kwargs)¶
Bases:
ABC,Generic[Data]Abstract base class for file readers used during database creation and conversion.
A Reader is responsible for reading a single file or directory and providing its metadata (for database creation) and data (for conversion). Subclass this to support custom file formats, or subclass
DICOMImageReaderto customise DICOM reading behaviour.- Parameters:
file (PathLike) – Path to the file or directory to read.
is_zip (bool) – Whether the file is a zip archive.
- clear()¶
Release cached metadata, data, columns, and subpaths.
- property columns: Dict[str, str]¶
- convert(data: Data) dict[str, Data]¶
Convert loaded data into one or more output files.
Returns a dict mapping filename suffix to data. The default implementation returns
{'': data}, producing a single output file with no extra suffix. Override this to split or skip output — return an empty dict to skip conversion entirely, or multiple entries to produce multiple output files per series.- Parameters:
data (Data) – The data returned by
data.- Returns:
A dict mapping filename suffix (str) to output data.
- Return type:
dict[str, Data]
- property data: Data¶
- property is_zip: bool¶
- property metadata: Dict[str, str]¶
- property path: Path¶
- property subpaths: List[Path]¶
- abstractmethod static suffixes() List[str]¶
File suffixes this reader handles (e.g.
['.dcm']).- Return type:
List[str]
- abstractmethod static suffixes_zip() List[str]¶
Zip file suffixes this reader handles (e.g.
['.zip']).- Return type:
List[str]