Joanna – Page 2 – Vassar College Digital Library

Michael McCarthy approached the Archives & Special Collections Library about digitizing the Memorial Minutes read at faculty meetings. He has some recent ones available at http://aevc.webs.com, dating back to 1990, but would like a more robust list. Michael was referred to us for this digitization.

The Archives & Special Collections Library contains three volumes of Memorial Minutes:

Volume 1, 1877-1942
Volume 2, 1943-1960
Volume 3, 1960-1978

There is a gap from 1978-1989 which need to be pulled from the faculty minutes in Archives & Special Collections. The digital library has already completed this digitization.

Digitization consists of approximately 90-100 pages per volume. The volumes cannot be digitized as one book but will need to be digitized on a person-by-person (memorial-by-memorial) basis. Each volume also contains an archival folder with the same contents except for Volume 3, whose folder contains overlapping but unequal content.

File creation

Goal is a one-to-one correspondence between person and minute (and, in the case of multiple minutes submitted per person, one-to-many). Thus this project is inherently slow.
No metadata exists.
Items must be hand-scanned.

File naming scheme:

aevc_last_XXX_YYY

Where:

aevc = project code (memorial minutes is too long)
last = last name of person. When same last names exist, use lastname-firstinitial.
XXX = zero-left string-padded number indicating which memorial minute per person is being scanned.
YYY = zero-left string-padded number indicating the page sequence per memorial minute per person.

Vassar Wesleyan Program in Paris

Vinay Swamy approached the digital initiatives group about digitizing the old files related to the VWPP program since its inception in 1969. We’ve reviewed the files and are now awaiting Wesleyan’s response. Vassar can do this digitization in-house.

Note: this is an institutional repository project, not a digital library project.

Imaging Specs:

Canon Image Runner 3030: Set dpi to 400, multipage tiffs, send to ftp site.

File name prefix: vwpp

Partnership: Wesleyan University archives. Wesleyan has processed the items and Vassar will use the box/folder number setup for identifiers and filenaming scheme.

Einstein project

The Albert Einstein project, funded by the Polonsky foundation, seeks to digitize and make available the contents in the following collection:

http://specialcollections.vassar.edu/findingaids/einstein_albert.html

Items:

There are approximately 290 images that will be produced from these series

Filenaming scheme

einstein_series_subseries_folder_item_page[a].extension

For example:

einstein_01_01_014_001_001.tif – Albert Einstein to Otto Nathan, Sept 1936 – first telegram, first page; service copy image
einstein_01_01_014_001_002.tif – Albert Einstein to Otto Nathan, Sept 1936 – first telegram, second page; service copy image

einstein_01_01_014_002_001.tif – Albert Einstein to Otto Nathan, Sept 1936 – second telegram, first page; service copy image
einstein_01_01_014_002_002.tif – Albert Einstein to Otto Nathan, Sept 1936 – second telegram, second page; service copy image

Longfellow and Vassar Songs Sheet Music Collections

This project plan provides background information, special considerations, and digitization recommendations for two related projects:

Babs and Bella C. Landauer collection of musical settings of the poetry of Henry Wadsworth Longfellow (“Longfellow Collection”)
Collection of Vassar College song books (“Vassar Songs”)

The information below is not a detailed technical and preservation analysis but a summary of known issues and basic road map for further consideration.

Basic information

Stakeholder(s): Sarah Canino
Time frame: AY2012-2013
Preservation needed?: yes
Funding opportunity?: yes (Farrish Foundation)

Background and Non-Digital Considerations

The Music Library contains two sheet music collections proposed as good candidates for digitization. In the course of another proposal for digitization (a more generalized sheet music digitization), Sarah Canino provided an analysis of these rare and unique collections. Each features very strong facets:

Both contain a significant portion of unique materials when searched against the Sheet Music Consortium (SMC) and the Petrucci Music Library Database at the International Music Score Library Project (IMSLP), the two premier repositories of digitized music.
Both contain large (if not all) objects created pre-1923, i.e., free from copyright restriction.
One collection contains items completely unique to Vassar College.
Although current scope is the boxes of the Vassar Songs collection only, further digital projects can stem from this theme, including audio, other song texts, class parties, and musicals.

In addition, there are some drawbacks:

There is virtually no item-level (EAD container / <c>) metadata for the items. Sarah is also interested in item-level cataloging / MARC records for these items.
The Vassar music, in particular, is extremely rare but in poor condition.
Some further research must occur to properly determine copyright status and weed out duplicates in the Longfellow Collection.
A precise metadata standard must be met to share items with the SMC and IMSLP databases.

Digital profiles for collections

Unit of consideration in physical collection: sheet / song
Unit of items to be digitized: page

Longfellow Collection

Item count: 378 objects, most of which are sheet music but some that are vocal scores (~ 35 of them, 50 pages each)
Assumption: 4-6 pages per item; average 5 pages
Estimated item count: 1,890 pages. N.B.: Sarah will identify and modify this page count. Duplicate items will not be counted.
Format: loose objects in boxes (4 boxes; identifier = 78 L86 v.1-4)
Dates: most published pre-1924; most are American publications (some are British). This needs to be verified.
Oversized materials?: no

Vassar songs collection

Item count: 8 volumes plus some additional publications of songs (“Peace I leave with you” and 1903 yearbook).
Estimated item count: 500 pages (provided by Sarah)
Format: bound volumes in poor condition
Dates: 1881-1940
Oversized materials?: TBD

Digital Considerations

Longfellow Collection

Number of items to digitize

There are 378 pieces in the Longfellow Collection (possibly including duplicates). Sarah has found that approximately 20% of the items in the collection have already been digitized and available elsewhere. ~~We must ensure that the oversized folios are not too large for the copystand.~~

Recommendation: we digitize the collection in its entirety — minus the duplicates — and not worry about the overlap. However, in our metadata schema, we should provide reference to a related item that provides a URL to an alternate digital object in another institution. Sharyn Cadogan and Joanna must measure the oversized folios.

Metadata and Copyright Research and Cataloging

There is virtually no metadata for these items, and significant research must be conducted to determine unique items, any background information, and copyright considerations. Additionally, we must create a metadata profile that is flexible and useful locally and worldwide.

Recommendations:

Joanna works with Sarah and Ann Churukian to create a new metadata profile that maps to Dublin Core or MODS (most likely MODS) in Islandora.
Music Library uses part of the available funds to hire a library school intern for a paid intership to research each piece, copy metadata when needed, and provide original cataloging of other items, under direction from Ann.
Cataloging should be done directly in Islandora. This can serve as a pilot project for account management, maintenance, and documentation in our chosen digital library software.
Once cataloged, Joanna can work with Ann, the library intern, and Laura Streett to create an EAD-compliant finding age for this collection. Additionally, because data in Islandora is stored as MODS, Joanna can fairly easily transform metadata into other standards, such as data required for the SMC.
Joan Pirie and Shay Foley should be consulted about formatting data for MARC ingest into the library catalog.

Recommendations and outcomes from 11/2/2012:

Sarah will work to analyze duplicates
Sarah will provide basic metadata in electronic format — Title, Composer, Number of Pages — for each item
Once Joanna has metadata, we can begin digitization
Bound volumes will be “Phase II”
Library school intern should be hired for paid internship
Sabrina and Sarah will identify possible interested faculty in collection

Vassar songs collection

Digitization process

Sharyn and Joanna, with help from Laura Streett, must assess the fragility of the biding in the context of digitization. Laura can determine the fragility of the object itself, while Sharyn and Joanna can determine the amount of shadowing, curvature, and margin; we must understand how much impact the condition of the item will impact a high-quality digital copy.

Recommendation: Sharyn, Joanna, and Laura examine the Vassar songs collection and take basic measurements. We cannot fully determine the feasibility of digitizing this collection in-house unless we do this critical step.

Metadata

There are some items already digitized, but at the collection / book level. We need metadata at the song / “sheet” level. We need to determine whether or not a one-to-one correspondence exists between song and page; in other words, do songs begin on the same page as other songs, or does a new song begin a new page? If the former, our metadata profile and digitization may be difficult; the easiest way to digitize may be to duplicate pages that contain the end and beginning of songs, adding to our digital count.

Recommendations:

Joanna should examine the volumes to determine the page-to-song correspondence, which will increase the page count.
Similar to the recommendations for the Longfellow Collection, it may be useful to provide a paid internship opportunity for the right MLIS student to research and then directly catalog items into Islandora.

Recommendations and outcomes from 11/2/2012:

Joanna will work with Laura to obtain songbooks

Recommendations and outcomes from 1/4/2012:

We have asked Hudson Microimaging for a proposal and cost estimate for digitization services

Item-level metadata will be at BOOK level. We may wish to OCR and then copy the Table of Contents from songbooks (when available) to help identify which songs are in which books
Books are already cataloged, so should be easy to obtain metadata

Printers’ Marks

About the Project

Working name:	Printers’ Marks
Sponsors:	Sabrina Pape and Ron Patkus
Duration:	Summer 2012
Nature: *[Text; image; text+image; GIS; audio/video; other]*	Text and images
Project track:	2 – VCL project with special considerations
Date prepared:	2012-08-01

Background / Purpose

The printers’ marks throughout the Main Library have been of interest to researchers and Vassar community members since they were installed in the early 20th century. A published volume, A list of the printers’ marks in the windows of the Frederick Ferris Thompson Memorial Library, Vassar College, is available online. We will digitize this volume to provide scans with very high resolution, as well as undertake a research project to document the printers, marks, and current locations of each plate. Additionally, we will photograph the current marks in situ. We will apply for a Ford Scholar to assist us in this work in Summer 2013.

Scope

Phases of project *Based on item temporal coverage*	Phase 1: Photograph plates, scan images Phase 2: Develop research with Ford Scholar Phase 3: Publish online project
Number of items to be digitized	TBD – there are 16 pages in the volume, and 66 current windows. We will splice marks from the TIFFs created from the book as well; there are 82 marks.
Total number of images *Assumption: one JPG derivative per each archival image created*	16 TIFFs page + 16 JPGs page + 82 TIFF marks + 82 JPG marks + 66 TIFFs windows + 66 JPGs windows = 328 images
Total number of records	TBD
Special considerations	Photography may be difficult

Location of Physical Items

Book is located in Special Collections; windows are dispersed throughout Main Library.

Hardware/Storage

System type	System	Space required
Archival image storage	digcol	164 images; 6560 MB
Derivative item storage	digcol	164 images; 1640 MB
TOTAL SPACE NEEDED		8200MB / 0.8 GB

Software

Image capture:	Scanners and cameras to Photoshop
Metadata capture and storage:	Islandora
Final product display:	Islandora

Scanning specifications

We will scan at 400ppi, 3000px for largest dimension. Individual marks at 1200ppi.

File Naming Convention

Formula

For book:

Prefix: pmarks
ID: book
ID part: page number (left pad 3 digits)
Delimiter: underscore

Example:

Page 5: pmarks_book_005

Archival file: pmarks_book_005_a.tif
Service file: pmarks_book_005_s.tif
Derivative: pmarks_book_005.jpg

For extracted images per page:

Prefix: pmarks
ID: book
ID part: page number (left pad 3 digits)
ID part: wing (e.g., “West Wing 4th” = ww4)
ID part: image number in sequence (left pad 3 digits)
Delimiter: underscore

Example:

Page 5, John Besson 1923 mark:

pmarks_book_005_ww4_001

Archival file: pmarks_book_005_ww4_001_a.tif
Service file: pmarks_book_005_ww4_001_s.tif
Derivative: pmarks_book_005_ww4_001.jpg

For windows:

Prefix: pmarks
ID: photo
ID part: wing (e.g., “West Wing 4th” = ww4)
ID part: image number in sequence (left pad 3 digits)
Delimiter: underscore

Example:

John Besson 1923 mark: pmarks_photo_ww4_001

Archival file: pmarks_photo_ww4_001_a.tif
Service file: pmarks_photo_ww4_001_s.tif
Derivative: pmarks_photo_ww4_001.jpg

Bidloo digitization

Proposal to digitize Vassar’s millionth book, Bidloo’s Anatomia. After careful consideration, we realize that we don’t have the equipment in-house to digitize such a large volume, and we’ve asked for estimates from the Northeast Document Conservation Center (NEDCC) for digitization.

Status: approved, estimate received. Digitization will begin in the summer.

Notes:

Functionality needed:

Zoomable images (400ppi, 48-bit archival TIFFs, jp2 generated)
Searchable text
Keep color bars on service copies?
Essays from faculty and librarians about importance of work?

Stakeholders:

Susan Kuretsky, Art History
Libraries

Salmon-Underhill Digital Exhibit

Instructions

Fill out the About the Project information below, and then use the Worksheet for Functional Specifications during consultation with stakeholders to help determine the software and steps used. The project track determination may change over time.

About the Project

Working name:	Salmon-Underhill Digital Exhibit
Sponsors:	Gretchen Lieb
Duration:	3 weeks
Nature: *[Text; image; text+image; GIS; audio/video; other]*	Text + Image
Project track:	Track 2
Date prepared:	February 1, 2012

Background / Purpose

The purpose of the Salmon-Underhill Digital Exhibit is to provide an Omeka- and CONTENTdm-ready set of images, metadata, and narratives to contribute to the Women’s History Month exhibit sponsored by HRVH.

Scope

Phases of project	One phase only
Number of items to be digitized	~ 30
Total number of images *Assumption: one JPG derivative per each archival image created*	~ 150
Total number of records
Special considerations	Some items may be fragile; letters may have bleed-through from recto to verso.

Location of Physical Items

Unit	Location
Letters	Special Collections
Pictures	Special Collections
Caption/text	Thumb drive

Hardware/Storage

System type	System	Space required
Archival image storage	Artfiles server
Derivative item storage	Omeka, CONTENTdm
TOTAL SPACE NEEDED

Software

Image capture:	VRL scanning; Archival TIFF, service TIFF
Metadata capture and storage:	Excel spreadsheet; already-written captions
Final product display:	Omeka (HRVH); CONTENTdm (VCL and HRVH)

File Naming Convention

Formula

Prefix: salmon
ID: box and folder #, delimited by hyphen
ID: left-padded, 3 characters
Page sequence: left-padded, 3 characters
Delimiter: underscore

Example: salmon_46-2_001_001.tif

Turn-of-the-Century Sheet Music (Music Library)

N.B.: this project became the Longfellow sheet music project

About the Project

Working name:	Sheet Music
Sponsors:	Sarah Canino, Sabrina Pape
Duration:	TBD
Nature: *[Text; image; text+image; GIS; audio/video; other]*	Images, text, possibly audio (see “specialized software”)
Project track:
Date prepared:	November 10, 2011
Project status:	Proposed

Background / Purpose

From Sarah Canino’s proposal:

This collection includes popular sheet music from the mid-1800s to mid-1900s and includes about 2,000 items of about 3-5 pages each. Many have the “decorative” title pages and many have been listed in a FileMaker Pro database.

This collection could be a good choice because there is interest not only to scholars focusing on music, but also those interested in visual images of the period and textual representations of inventions (telephone, airplane), events (elections, wars, fairs, etc.) and depictions of race, gender, and ethnicity (in particular African American, Native American, women, Jews. For example, Peter Antelyes drew upon these for shared imagery and text depictions of American Indians and Jews. Even though our collection is relatively small, we may have unique items.

Other schools have done quite a bit with their collections and LC has also included sheet music in its American memory project:
See http://library.duke.edu/music/sheetmusic/collections.html for a selective list.

Other thoughts (from Joanna):

If we are able to get high-quality TIFFs in this process, we could try to use software that uses Optical Music Recognition (OMR) to make a sheet music “transcript” — see http://journal.code4lib.org/articles/84.

Scope

Phases of project *Based on item temporal coverage*	Phase 1: Digitize pre-1900 itemsPhase 2: Digitize 1900-present, checking for copyright issues Phase 3: Explore OMR
Number of items to be digitized	Approx. 2,000
Total number of images *Assumption: one JPG derivative per each archival image created*	Approx. 3-5 pages per item, for totals of 6000-10,000 images
Total number of records
Special considerations	Estimates for space needed will vary widely depending on size of items, color depth for certain pages versus all-text pages (if any).

Location of Physical Items

Units	Location
	Music Library

Hardware/Storage

System type	System	Space required
Archival image storage
Derivative item storage
TOTAL SPACE NEEDED

Software

Image capture:	TBD
Metadata capture and storage:	FileMaker database already created, unsure depth of metadata
Final product display:

File Naming Convention

Formula

Prefix: msheet
ID: 4-digit, string pad left with zeroes, based on FMP item primary key
ID part: 3-digit based, string pad left with zeroes, based on order of pages
Delimiter: underscore

Early Images of Vassar

Currently resides at: http://libweb.vassar.edu/earlyimages/

Needs migration to new home.