When benchmarking an algorithm it is recommendable to use a standard test database (data set) for researchers to be able to directly compare the results. Most of the mammographic databases are not publicly available. The most easily accessed databases and therefore the most commonly used databases are the Mammographic Image Analysis Society (MIAS) database and the Digital Database for Screening Mammography (DDSM). Besides, there are currently few projects developing new mammographic image databases as well as several old projects.
The Mammographic Image Analysis Society (MIAS) is an organisation of UK research groups interested in the understanding of mammograms and has generated a database of digital mammograms. Films taken from the UK National Breast Screening Programme have been digitised to 50 micron pixel edge with a Joyce-Loebl scanning microdensitometer, a device linear in the optical density range 0-3.2 and representing each pixel with an 8-bit word. The database contains 322 digitised films and is available on 2.3GB 8mm (ExaByte) tape. It also includes radiologist's "truth"-markings on the locations of any abnormalities that may be present. The database has been reduced to a 200 micron pixel edge and padded/clipped so that all the images are 1024x1024. Mammographic images are available via the Pilot European Image Processing Archive (PEIPA) at the University of Essex.
The Digital Database for Screening Mammography (DDSM) is another resource for possible use by the mammographic image analysis research community. It is a collaborative effort between Massachusetts General Hospital, Sandia National Laboratories and the University of South Florida Computer Science and Engineering Department. The database contains approximately 2,500 studies. Each study includes two images of each breast, along with some associated patient information (age at time of study, ACR breast density rating, subtlety rating for abnormalities, ACR keyword description of abnormalities) and image information (scanner, spatial resolution, ...). Images containing suspicious areas have associated pixel-level "ground truth" information about the locations and types of suspicious regions. Also provided are software both for accessing the mammogram and truth images and for calculating performance figures for automated image analysis algorithms.
In 2006 digitisation of the Dutch breast cancer screening has started. All screening mammograms will be stored in one national archive, which will be facilitated by the use of broadband technology. As a consequence, a large database of breast cancer cases will become available in a few years.
AMDI provides a tool that enables the user to download cases from the mammographic database, so as to make the information available to authorized medical and research communities interested in breast cancer diagnosis. The mammographic database was projected to include cases with all of the available mammographic views, radiological findings, diagnosis proven by biopsy, the patient's clinical history, and information regarding the life style of the patient. Each exam of each case includes four views (two views of each breast: cranio-caudal or CC, and medio-lateral oblique or MLO). To address the teaching and research aspects, the database links each mammogram with the contour of the breast, the boundary of the pectoral muscle (MLO views only), the contours of masses (if present), the regions of clusters of calcifications and the number of calcifications (if present), and the locations and details of any other features of interest. The mammographic database also supports the inclusion of several mammographic exams of the same patient performed at different instants of time.
IRMA (Image Retrieval in Medical Applications) is a cooperative project of the Department of Diagnostic Radiology, the Department of Medical Informatics, Division of Medical Image Processing and the Chair of Computer Science VI at the Aachen University of Technology (RWTH Aachen). Aim of the project is the development and implementation of high-level methods for content-based image retrieval with prototypical application to medico-diagnostic tasks on a radiologic image archive.
J.E.E. Oliveira, M.O. Gueld, A. de A. Araújo, B. Ott, T.M. Deserno, Towards a Standard Reference Database for Computer-Aided Mammography, Proceedings of SPIE, Vol. 6915, Paper ID 69151Y, 2008
The aim of this project is to, in light of emerging Grid technology, develop a European-wide database of mammograms that will be used to investigate a set of important healthcare applications as well as the potential of this Grid to support effective co-working between healthcare professionals throughout the EU.
The CALMA project (Computer Assisted Library for MAmmography), that was financed by INFN, started in 1997. The aim of CALMA (1997-2001) was: to collect a huge database of mammographic digitized images; to realize a CAD (Computer-Aided Detection) system for the automatic search of microcalcifications and masses in digitized mammograms. Images were collected in different Italian hospitals (Bari, Napoli, Palermo, Sassari, Torino, Udine) and they were digitized in a special format. Results obtained from CALMA were the starting point for the GPCALMA project (GRID Platform for CALMA), financed by INFN from 2002 to 2003. The basic idea for GPCALMA was to develop a GRID configuration for the CALMA utilities (database and CAD). To this aim the results obtained from GPCALMA were: a distributed database of mammographic images; an experimental GRID connections for the CAD algorithms.