FJC/Polonsky grant remaining funds

Three projects have been approved from the remaining funds from the FJC/Polonsky grant.  These have been previously discussed by the Digital Library Committee, and include:

  • Fannie Aaron papers
    • Open questions: transcription? item-level metadata? Number of items?
  • Susan B. Anthony Papers
    • Open questions: additional transcription? items in other collections? contribute to a broader project?
  • Jasper Parrish papers
    • Open questions: transcription? Number of items? contribute to HRVH?

 

Memorial Minutes

Background

Michael McCarthy approached the Archives & Special Collections Library about digitizing the Memorial Minutes read at faculty meetings.  He has some recent ones available at http://aevc.webs.com, dating back to 1990, but would like a more robust list.  Michael was referred to us for this digitization.

The Archives & Special Collections Library contains three volumes of Memorial Minutes:

  • Volume 1, 1877-1942
  • Volume 2, 1943-1960
  • Volume 3, 1960-1978

There is a gap from 1978-1989 which need to be pulled from the faculty minutes in Archives & Special Collections.  The digital library has already completed this digitization.

Digitization consists of approximately 90-100 pages per volume.  The volumes cannot be digitized as one book but will need to be digitized on a person-by-person (memorial-by-memorial) basis.  Each volume also contains an archival folder with the same contents except for Volume 3, whose folder contains overlapping but unequal content.

File creation

  • Goal is a one-to-one correspondence between person and minute (and, in the case of multiple minutes submitted per person, one-to-many).  Thus this project is inherently slow.
  • No metadata exists.
  • Items must be hand-scanned.

File naming scheme:

aevc_last_XXX_YYY

Where:

  • aevc = project code (memorial minutes is too long)
  • last = last name of person.  When same last names exist, use lastname-firstinitial.
  • XXX = zero-left string-padded number indicating which memorial minute per person is being scanned.
  • YYY = zero-left string-padded number indicating the page sequence per memorial minute per person.

Einstein project

The Albert Einstein project, funded by the Polonsky foundation, seeks to digitize and make available the contents in the following collection:

http://specialcollections.vassar.edu/findingaids/einstein_albert.html

Items:

There are approximately 290 images that will be produced from these series

Filenaming scheme

einstein_series_subseries_folder_item_page[a].extension

 For example:

einstein_01_01_014_001_001.tif – Albert Einstein to Otto Nathan, Sept 1936 – first telegram, first page; service copy image
einstein_01_01_014_001_002.tif – Albert Einstein to Otto Nathan, Sept 1936 – first telegram, second page; service copy image

einstein_01_01_014_002_001.tif – Albert Einstein to Otto Nathan, Sept 1936 – second telegram, first page; service copy image
einstein_01_01_014_002_002.tif – Albert Einstein to Otto Nathan, Sept 1936 – second telegram, second page; service copy image

Longfellow and Vassar Songs Sheet Music Collections

This project plan provides background information, special considerations, and digitization recommendations for two related projects:

  • Babs and Bella C. Landauer collection of musical settings of the poetry of Henry Wadsworth Longfellow (“Longfellow Collection”)
  • Collection of Vassar College song books (“Vassar Songs”)

The information below is not a detailed technical and preservation analysis but a summary of known issues and basic road map for further consideration.

Basic information

  • Stakeholder(s): Sarah Canino
  • Time frame: AY2012-2013
  • Preservation needed?: yes
  • Funding opportunity?: yes (Farrish Foundation)

Background and Non-Digital Considerations

The Music Library contains two sheet music collections proposed as good candidates for digitization.  In the course of another proposal for digitization (a more generalized sheet music digitization), Sarah Canino provided an analysis of these rare and unique collections.  Each features very strong facets:

  • Both contain a significant portion of unique materials when searched against the Sheet Music Consortium (SMC) and the Petrucci Music Library Database at the International Music Score Library Project (IMSLP), the two premier repositories of digitized music.
  • Both contain large (if not all) objects created pre-1923, i.e., free from copyright restriction.
  • One collection contains items completely unique to Vassar College.
  • Although current scope is the boxes of the Vassar Songs collection only, further digital projects can stem from this theme, including audio, other song texts, class parties, and musicals.

In addition, there are some drawbacks:

  • There is virtually no item-level (EAD container / <c>) metadata for the items.  Sarah is also interested in item-level cataloging / MARC records for these items.
  • The Vassar music, in particular, is extremely rare but in poor condition.
  • Some further research must occur to properly determine copyright status and weed out duplicates in the Longfellow Collection.
  • A precise metadata standard must be met to share items with the SMC and IMSLP databases.

Digital profiles for collections

Unit of consideration in physical collection: sheet / song
Unit of items to be digitized: page

Longfellow Collection

  • Item count: 378 objects, most of which are sheet music but some that are vocal scores (~ 35 of them, 50 pages each)
  • Assumption: 4-6 pages per item; average 5 pages
  • Estimated item count: 1,890 pages.  N.B.: Sarah will identify and modify this page count.  Duplicate items will not be counted.
  • Format: loose objects in boxes (4 boxes; identifier = 78 L86 v.1-4)
  • Dates: most published pre-1924; most are American publications (some are British).  This needs to be verified.
  • Oversized materials?: no

Vassar songs collection

  • Item count: 8 volumes plus some additional publications of songs (“Peace I leave with you” and 1903 yearbook).
  • Estimated item count: 500 pages (provided by Sarah)
  • Format: bound volumes in poor condition
  • Dates: 1881-1940
  • Oversized materials?: TBD

Digital Considerations

Longfellow Collection

Number of items to digitize

There are 378 pieces in the Longfellow Collection (possibly including duplicates).  Sarah has found that approximately 20% of the items in the collection have already been digitized and available elsewhere.  We must ensure that the oversized folios are not too large for the copystand.

Recommendation: we digitize the collection in its entirety — minus the duplicates — and not worry about the overlap.  However, in our metadata schema, we should provide reference to a related item that provides a URL to an alternate digital object in another institution.  Sharyn Cadogan and Joanna must measure the oversized folios.

Metadata and Copyright Research and Cataloging

There is virtually no metadata for these items, and significant research must be conducted to determine unique items, any background information, and copyright considerations.  Additionally, we must create a metadata profile that is flexible and useful locally and worldwide.

Recommendations:

  • Joanna works with Sarah and Ann Churukian to create a new metadata profile that maps to Dublin Core or MODS (most likely MODS) in Islandora.
  • Music Library uses part of the available funds to hire a library school intern for a paid intership to research each piece, copy metadata when needed, and provide original cataloging of other items, under direction from Ann.
  • Cataloging should be done directly in Islandora.  This can serve as a pilot project for account management, maintenance, and documentation in our chosen digital library software.
  • Once cataloged, Joanna can work with Ann, the library intern, and Laura Streett to create an EAD-compliant finding age for this collection.  Additionally, because data in Islandora is stored as MODS, Joanna can fairly easily transform metadata into other standards, such as data required for the SMC.
  • Joan Pirie and Shay Foley should be consulted about formatting data for MARC ingest into the library catalog.

Recommendations and outcomes from 11/2/2012:

  • Sarah will work to analyze duplicates
  • Sarah will provide basic metadata in electronic format — Title, Composer, Number of Pages — for each item
  • Once Joanna has metadata, we can begin digitization
  • Bound volumes will be “Phase II”
  • Library school intern should be hired for paid internship
  • Sabrina and Sarah will identify possible interested faculty in collection

Vassar songs collection

Digitization process

Sharyn and Joanna, with help from Laura Streett, must assess the fragility of the biding in the context of digitization.  Laura can determine the fragility of the object itself, while Sharyn and Joanna can determine the amount of shadowing, curvature, and margin; we must understand how much impact the condition of the item will impact a high-quality digital copy.

Recommendation: Sharyn, Joanna, and Laura examine the Vassar songs collection and take basic measurements.  We cannot fully determine the feasibility of digitizing this collection in-house unless we do this critical step.

Metadata

There are some items already digitized, but at the collection / book level.  We need metadata at the song / “sheet” level.  We need to determine whether or not a one-to-one correspondence exists between song and page; in other words, do songs begin on the same page as other songs, or does a new song begin a new page?  If the former, our metadata profile and digitization may be difficult; the easiest way to digitize may be to duplicate pages that contain the end and beginning of songs, adding to our digital count.

Recommendations:

  • Joanna should examine the volumes to determine the page-to-song correspondence, which will increase the page count.
  • Similar to the recommendations for the Longfellow Collection, it may be useful to provide a paid internship opportunity for the right MLIS student to research and then directly catalog items into Islandora.

Recommendations and outcomes from 11/2/2012:

  • Joanna will work with Laura to obtain songbooks

Recommendations and outcomes from 1/4/2012:

  • We have asked Hudson Microimaging for a proposal and cost estimate for digitization services

 

  • Item-level metadata will be at BOOK level.  We may wish to OCR and then copy the Table of Contents from songbooks (when available) to help identify which songs are in which books
  • Books are already cataloged, so should be easy to obtain metadata

 

 

Printers’ Marks

About the Project

Working name: Printers’ Marks
Sponsors: Sabrina Pape and Ron Patkus
Duration: Summer 2012
Nature:
[Text; image; text+image; GIS; audio/video; other]
Text and images
Project track: 2 – VCL project with special considerations
Date prepared: 2012-08-01

Background / Purpose

The printers’ marks throughout the Main Library have been of interest to researchers and Vassar community members since they were installed in the early 20th century.  A published volume, A list of the printers’ marks in the windows of the Frederick Ferris Thompson Memorial Library, Vassar College, is available online.  We will digitize this volume to provide scans with very high resolution, as well as undertake a research project to document the printers, marks, and current locations of each plate.  Additionally, we will photograph the current marks in situ.  We will apply for a Ford Scholar to assist us in this work in Summer 2013.

Scope

Phases of project
Based on item temporal coverage
Phase 1: Photograph plates, scan images
Phase 2: Develop research with Ford Scholar
Phase 3: Publish online project
Number of items to be digitized TBD – there are 16 pages in the volume, and 66 current windows.  We will splice marks from the TIFFs created from the book as well; there are 82 marks.
Total number of images
Assumption: one JPG derivative per each archival image created
16 TIFFs page + 16 JPGs page + 82 TIFF marks + 82 JPG marks + 66 TIFFs windows + 66 JPGs windows = 328 images
Total number of records TBD
Special considerations Photography may be difficult

Location of Physical Items

Book is located in Special Collections; windows are dispersed throughout Main Library.

Hardware/Storage

System type System Space required
Archival image storage  digcol 164 images;  6560 MB
Derivative item storage  digcol  164 images; 1640 MB
TOTAL SPACE NEEDED  8200MB / 0.8 GB

Software

Image capture: Scanners and cameras to Photoshop
Metadata capture and storage: Islandora
Final product display: Islandora

Scanning specifications

We will scan at 400ppi, 3000px for largest dimension. Individual marks at 1200ppi.

File Naming Convention

Formula

For book:

  • Prefix: pmarks
  • ID:  book
  • ID part: page number (left pad 3 digits)
  • Delimiter: underscore

Example:

Page 5: pmarks_book_005

  • Archival file: pmarks_book_005_a.tif
  • Service file: pmarks_book_005_s.tif
  • Derivative: pmarks_book_005.jpg

 For extracted images per page:

  • Prefix: pmarks
  • ID:  book
  • ID part: page number (left pad 3 digits)
  • ID part: wing (e.g., “West Wing 4th” = ww4)
  • ID part: image number in sequence (left pad 3 digits)
  • Delimiter: underscore

Example:

Page 5, John Besson 1923 mark:

pmarks_book_005_ww4_001

  • Archival file: pmarks_book_005_ww4_001_a.tif
  • Service file: pmarks_book_005_ww4_001_s.tif
  • Derivative: pmarks_book_005_ww4_001.jpg

For windows:

  • Prefix: pmarks
  • ID: photo
  • ID part: wing (e.g., “West Wing 4th” = ww4)
  • ID part: image number in sequence (left pad 3 digits)
  • Delimiter: underscore

Example:

John Besson 1923 mark: pmarks_photo_ww4_001

  • Archival file: pmarks_photo_ww4_001_a.tif
  • Service file: pmarks_photo_ww4_001_s.tif
  • Derivative: pmarks_photo_ww4_001.jpg

Bidloo digitization

Proposal to digitize Vassar’s millionth book, Bidloo’s Anatomia.  After careful consideration, we realize that we don’t have the equipment in-house to digitize such a large volume, and we’ve asked for estimates from the Northeast Document Conservation Center (NEDCC) for digitization.

Status: approved, estimate received.  Digitization will begin in the summer.

Notes:

Functionality needed:

  • Zoomable images (400ppi, 48-bit archival TIFFs, jp2 generated)
  • Searchable text
  • Keep color bars on service copies?
  • Essays from faculty and librarians about importance of work?

Stakeholders:

  • Susan Kuretsky, Art History
  • Libraries

 

Salmon-Underhill Digital Exhibit

Instructions

Fill out the About the Project information below, and then use the Worksheet for Functional Specifications during consultation with stakeholders to help determine the software and steps used. The project track determination may change over time.

About the Project

Working name:  Salmon-Underhill Digital Exhibit
Sponsors:  Gretchen Lieb
Duration:  3 weeks
Nature:

[Text; image; text+image; GIS; audio/video; other]
 Text + Image
Project track:  Track 2
Date prepared:  February 1, 2012

Background / Purpose

The purpose of the Salmon-Underhill Digital Exhibit is to provide an Omeka- and CONTENTdm-ready set of images, metadata, and narratives to contribute to the Women’s History Month exhibit sponsored by HRVH.

Scope

Phases of project One phase only
Number of items to be digitized ~ 30
Total number of images

Assumption: one JPG derivative per each archival image created
~ 150
Total number of records
Special considerations  Some items may be fragile; letters may have bleed-through from recto to verso.

Location of Physical Items

 

Unit Location
 Letters  Special Collections
 Pictures Special Collections
 Caption/text Thumb drive

 

Hardware/Storage

System type System Space required
Archival image storage Artfiles server
Derivative item storage Omeka, CONTENTdm
TOTAL SPACE NEEDED

Software

Image capture: VRL scanning; Archival TIFF, service TIFF
Metadata capture and storage:  Excel spreadsheet; already-written captions
Final product display:  Omeka (HRVH); CONTENTdm (VCL and HRVH)

File Naming Convention

Formula

  • Prefix: salmon
  • ID: box and folder #, delimited by hyphen
  • ID: left-padded, 3 characters
  • Page sequence: left-padded, 3 characters
  • Delimiter: underscore

Example: salmon_46-2_001_001.tif