FJC/Polonsky grant remaining funds

Three projects have been approved from the remaining funds from the FJC/Polonsky grant.  These have been previously discussed by the Digital Library Committee, and include:

  • Fannie Aaron papers
    • Open questions: transcription? item-level metadata? Number of items?
  • Susan B. Anthony Papers
    • Open questions: additional transcription? items in other collections? contribute to a broader project?
  • Jasper Parrish papers
    • Open questions: transcription? Number of items? contribute to HRVH?

 

Matthew Vassar Papers

Introduction

The Matthew Vassar Papers project consists of autobiographies, autographs, correspondence, diaries, maps, and photographs related to the founder of Vassar College.

Finding aid available at: http://specialcollections.vassar.edu/findingaids/vassar_matthew.html

This project will be funded by the Goodman grant.

Scope

We propose to digitize Series 1 (Materials concerning Vassar College), with more series as time / funding permits.

Items

There are 225 listed items in Series 1.  A conservative estimate suggests 338 items.

Estimates and assumptions:

Assumptions:

  • Each item contains one page (most likely erroneous)
  • Each item must have recto/verso imaged, including envelopes

Estimate:

338 items x 2 (recto/verso) = 676 images

Archival images most likely 40MB each, service copies at 30MB each.

Storage requirements (based on imaging specs, below)

338 TIFFs @ 40MB = 13,520 MB = ~ 13GB

338 TIFFs @ 30MB = 10,140 MB = ~ 10GB

Imaging specs

All imaging is for paper; a mix of flatbed scanning and copystand will be used.  Special handling required in consultation with Special Collections and Archives.

  • Archival scans at 400ppi, 24 bit color, TIFF
  • Service copies at 400ppi, 8 bit color, 4000px on largest dimension, TIFF
  • JPG copies not needed

Naming convention

Files will follow the standard practice:

project-prefix_box_folder_item_page

where folder, item, and page will be left string-padded to three places.

Project prefix: mvp (for “Matthew Vassar Papers”)

N.B.: the finding aid has some folders ending with the letter “A”.  In these cases, we will treat the “A” folder as a second item.

Examples:

  1. Folder 2.16 to James Grant Wilson, 27 Jun 1861 (1 letter)
    mvp_002_016_001_001a.tif — archival
    mvp_002_016_001_001.tif — service
  2. Folder 14.467A Matthew Vassar, Co.: Correspondence: Vassar to Dear Sir, 19 Nov 1838 (1 letter)
    mvp_014_467_002_001a.tif — archival
    mvp_014_467_002_001.tif — service
  3. Folder 6.186 from Carrie F. Stowe, 3 Jun 1862 (1 letter and photograph)
    mvp_006_186_001_001a.tif — letter, archival
    mvp_006_186_001_001.tif — letter, archival
    mvp_006_186_001_002a.tif — photograph, archival
    mvp_006_186_001_002.tif — photograph, archival

Metadata considerations

  1. Names should be cross-referenced with LC’s authority file for named creators and correspondents, etc.
  2. Standard MODS profile: title, date, identifier, creator, correspondent, rights management.

Other considerations:

  1. Some folders have photocopies or duplicates of items in other boxes.  Should we image these as well?  E.g., Folder 5.148 to Rev. Charles A. Raymond, typescripts of letters in folders 129-147, 30 Jul 1862 – 3 Apr 1864.
  2. We need accurate counts of items for better storage estimates.

Einstein project

The Albert Einstein project, funded by the Polonsky foundation, seeks to digitize and make available the contents in the following collection:

http://specialcollections.vassar.edu/findingaids/einstein_albert.html

Items:

There are approximately 290 images that will be produced from these series

Filenaming scheme

einstein_series_subseries_folder_item_page[a].extension

 For example:

einstein_01_01_014_001_001.tif – Albert Einstein to Otto Nathan, Sept 1936 – first telegram, first page; service copy image
einstein_01_01_014_001_002.tif – Albert Einstein to Otto Nathan, Sept 1936 – first telegram, second page; service copy image

einstein_01_01_014_002_001.tif – Albert Einstein to Otto Nathan, Sept 1936 – second telegram, first page; service copy image
einstein_01_01_014_002_002.tif – Albert Einstein to Otto Nathan, Sept 1936 – second telegram, second page; service copy image

Longfellow and Vassar Songs Sheet Music Collections

This project plan provides background information, special considerations, and digitization recommendations for two related projects:

  • Babs and Bella C. Landauer collection of musical settings of the poetry of Henry Wadsworth Longfellow (“Longfellow Collection”)
  • Collection of Vassar College song books (“Vassar Songs”)

The information below is not a detailed technical and preservation analysis but a summary of known issues and basic road map for further consideration.

Basic information

  • Stakeholder(s): Sarah Canino
  • Time frame: AY2012-2013
  • Preservation needed?: yes
  • Funding opportunity?: yes (Farrish Foundation)

Background and Non-Digital Considerations

The Music Library contains two sheet music collections proposed as good candidates for digitization.  In the course of another proposal for digitization (a more generalized sheet music digitization), Sarah Canino provided an analysis of these rare and unique collections.  Each features very strong facets:

  • Both contain a significant portion of unique materials when searched against the Sheet Music Consortium (SMC) and the Petrucci Music Library Database at the International Music Score Library Project (IMSLP), the two premier repositories of digitized music.
  • Both contain large (if not all) objects created pre-1923, i.e., free from copyright restriction.
  • One collection contains items completely unique to Vassar College.
  • Although current scope is the boxes of the Vassar Songs collection only, further digital projects can stem from this theme, including audio, other song texts, class parties, and musicals.

In addition, there are some drawbacks:

  • There is virtually no item-level (EAD container / <c>) metadata for the items.  Sarah is also interested in item-level cataloging / MARC records for these items.
  • The Vassar music, in particular, is extremely rare but in poor condition.
  • Some further research must occur to properly determine copyright status and weed out duplicates in the Longfellow Collection.
  • A precise metadata standard must be met to share items with the SMC and IMSLP databases.

Digital profiles for collections

Unit of consideration in physical collection: sheet / song
Unit of items to be digitized: page

Longfellow Collection

  • Item count: 378 objects, most of which are sheet music but some that are vocal scores (~ 35 of them, 50 pages each)
  • Assumption: 4-6 pages per item; average 5 pages
  • Estimated item count: 1,890 pages.  N.B.: Sarah will identify and modify this page count.  Duplicate items will not be counted.
  • Format: loose objects in boxes (4 boxes; identifier = 78 L86 v.1-4)
  • Dates: most published pre-1924; most are American publications (some are British).  This needs to be verified.
  • Oversized materials?: no

Vassar songs collection

  • Item count: 8 volumes plus some additional publications of songs (“Peace I leave with you” and 1903 yearbook).
  • Estimated item count: 500 pages (provided by Sarah)
  • Format: bound volumes in poor condition
  • Dates: 1881-1940
  • Oversized materials?: TBD

Digital Considerations

Longfellow Collection

Number of items to digitize

There are 378 pieces in the Longfellow Collection (possibly including duplicates).  Sarah has found that approximately 20% of the items in the collection have already been digitized and available elsewhere.  We must ensure that the oversized folios are not too large for the copystand.

Recommendation: we digitize the collection in its entirety — minus the duplicates — and not worry about the overlap.  However, in our metadata schema, we should provide reference to a related item that provides a URL to an alternate digital object in another institution.  Sharyn Cadogan and Joanna must measure the oversized folios.

Metadata and Copyright Research and Cataloging

There is virtually no metadata for these items, and significant research must be conducted to determine unique items, any background information, and copyright considerations.  Additionally, we must create a metadata profile that is flexible and useful locally and worldwide.

Recommendations:

  • Joanna works with Sarah and Ann Churukian to create a new metadata profile that maps to Dublin Core or MODS (most likely MODS) in Islandora.
  • Music Library uses part of the available funds to hire a library school intern for a paid intership to research each piece, copy metadata when needed, and provide original cataloging of other items, under direction from Ann.
  • Cataloging should be done directly in Islandora.  This can serve as a pilot project for account management, maintenance, and documentation in our chosen digital library software.
  • Once cataloged, Joanna can work with Ann, the library intern, and Laura Streett to create an EAD-compliant finding age for this collection.  Additionally, because data in Islandora is stored as MODS, Joanna can fairly easily transform metadata into other standards, such as data required for the SMC.
  • Joan Pirie and Shay Foley should be consulted about formatting data for MARC ingest into the library catalog.

Recommendations and outcomes from 11/2/2012:

  • Sarah will work to analyze duplicates
  • Sarah will provide basic metadata in electronic format — Title, Composer, Number of Pages — for each item
  • Once Joanna has metadata, we can begin digitization
  • Bound volumes will be “Phase II”
  • Library school intern should be hired for paid internship
  • Sabrina and Sarah will identify possible interested faculty in collection

Vassar songs collection

Digitization process

Sharyn and Joanna, with help from Laura Streett, must assess the fragility of the biding in the context of digitization.  Laura can determine the fragility of the object itself, while Sharyn and Joanna can determine the amount of shadowing, curvature, and margin; we must understand how much impact the condition of the item will impact a high-quality digital copy.

Recommendation: Sharyn, Joanna, and Laura examine the Vassar songs collection and take basic measurements.  We cannot fully determine the feasibility of digitizing this collection in-house unless we do this critical step.

Metadata

There are some items already digitized, but at the collection / book level.  We need metadata at the song / “sheet” level.  We need to determine whether or not a one-to-one correspondence exists between song and page; in other words, do songs begin on the same page as other songs, or does a new song begin a new page?  If the former, our metadata profile and digitization may be difficult; the easiest way to digitize may be to duplicate pages that contain the end and beginning of songs, adding to our digital count.

Recommendations:

  • Joanna should examine the volumes to determine the page-to-song correspondence, which will increase the page count.
  • Similar to the recommendations for the Longfellow Collection, it may be useful to provide a paid internship opportunity for the right MLIS student to research and then directly catalog items into Islandora.

Recommendations and outcomes from 11/2/2012:

  • Joanna will work with Laura to obtain songbooks

Recommendations and outcomes from 1/4/2012:

  • We have asked Hudson Microimaging for a proposal and cost estimate for digitization services

 

  • Item-level metadata will be at BOOK level.  We may wish to OCR and then copy the Table of Contents from songbooks (when available) to help identify which songs are in which books
  • Books are already cataloged, so should be easy to obtain metadata

 

 

John Burroughs Journals

The John Burroughs Journals consist of a few facets:

  1. Migrating content to Fedora from HRVH’s CONTENTdm [complete]
  2. Uploading new content to both Fedora/Islandora and CONTENTdm [ongoing]

Primary stakeholders: Jeff Walker, Special Collections

Jeff Walker has hired a student to continue to transcribe Burroughs’ journals.  As she completes them, she sends information along to the digital library.  Joanna then uses a series of customized scripts to send the information to our repository as well as HRVH.  This process occurs approximately once per semester.

Miscellany News and other student publications

About the Project

Working name: Misc / Student Pubs
Sponsors: Ron Patkus, Laura Streett
Duration: 6 mos
Nature:

[Text; image; text+image; GIS; audio/video; other]

Images, text
Project track:
Date prepared:
Project status: In process

Background / Purpose

Scope

Phases of project

Based on item temporal coverage

Number of items to be digitized
Total number of images

Assumption: one JPG derivative per each archival image created


Total number of records
Special considerations

Location of Physical Items

Units Location
Special Collections & Archives Library


Hardware/Storage

System type System Space required
Archival image storage

Derivative item storage

TOTAL SPACE NEEDED

Software

Image capture:
Metadata capture and storage:
Final product display:

File Naming Convention

Formula

  • Prefix:
  • ID:
  • ID part:
  • Delimiter: