The objective of the Discovery and Development of Insulin project was to digitize and provide web access to a collection of approximately 7,000 images.
The project implementation can be divided into four separate processes:
- Optical Character Recognition (OCR) / Transcription
- Metadata creation
- Database creation and web access
The scanning was done by Preservation Services, University of Toronto Library, on the Eyelike digital camera system V 3.04. The master images have been captured to the following standard:
Scanning Bit Depth:
8-bit per channel of colour information (24 per pixel)
Since the chosen standard for archiving generates large files (average file size 48 MB), the project team chose to convert the images to JPGs for online delivery. The site provides users with 4 sizes of JPGs -- a thumbnail for quick reference and three varying sizes for examination and usage. In order to balance the onscreen quality with the overall size of the download, JPG images were created at a medium compression level.
Optical Character Recognition (OCR) and Transcription
The Insulin project team chose to perform OCR on the page images to allow for full-text searching and thus optimise the database for online access. OCR was done by Preservation Services using the Prime Recognition software package. For columnar text, Abbyy FineReader 5.0 Office was used to produce the OCR.
Materials that could not be processed through OCR software such as manuscript letters and writings, notebooks, and many of the newspaper clippings have been transcribed by retyping page text. Transcribed files and the files processed by OCR software have been saved as ASCII text files and are used for full-text indexing and searching.
The structural metadata was captured in a relational database during the scanning process. The structural metadata tables contain information about the pagination of the document, correlation between filename and page numbers, features of the document as well as particular comments about the quality of the original material.
The descriptive metadata was also saved in a relational database format. The descriptive metadata contains fields such as title, author, extent, subject and others that facilitate in the discovery and retrieval of information. During the 2017 migration of the Insulin site, this descriptive metadata was converted to MODS.
Database creation and Web access
The Discovery and Early Development of Insulin collection was first made available online using ColdFusion technology in 2003. In 2017, the collection was migrated from ColdFusion to its current Islandora site. The ColdFusion site is available to view through the Wayback Machine at https://wayback.archive-it.org/6473/20170626191114/https://resource.library.utoronto.ca/insulin/.