FJC/Polonsky grant remaining funds

Three projects have been approved from the remaining funds from the FJC/Polonsky grant.  These have been previously discussed by the Digital Library Committee, and include:

  • Fannie Aaron papers
    • Open questions: transcription? item-level metadata? Number of items?
  • Susan B. Anthony Papers
    • Open questions: additional transcription? items in other collections? contribute to a broader project?
  • Jasper Parrish papers
    • Open questions: transcription? Number of items? contribute to HRVH?

 

Matthew Vassar Papers

Introduction

The Matthew Vassar Papers project consists of autobiographies, autographs, correspondence, diaries, maps, and photographs related to the founder of Vassar College.

Finding aid available at: http://specialcollections.vassar.edu/findingaids/vassar_matthew.html

This project will be funded by the Goodman grant.

Scope

We propose to digitize Series 1 (Materials concerning Vassar College), with more series as time / funding permits.

Items

There are 225 listed items in Series 1.  A conservative estimate suggests 338 items.

Estimates and assumptions:

Assumptions:

  • Each item contains one page (most likely erroneous)
  • Each item must have recto/verso imaged, including envelopes

Estimate:

338 items x 2 (recto/verso) = 676 images

Archival images most likely 40MB each, service copies at 30MB each.

Storage requirements (based on imaging specs, below)

338 TIFFs @ 40MB = 13,520 MB = ~ 13GB

338 TIFFs @ 30MB = 10,140 MB = ~ 10GB

Imaging specs

All imaging is for paper; a mix of flatbed scanning and copystand will be used.  Special handling required in consultation with Special Collections and Archives.

  • Archival scans at 400ppi, 24 bit color, TIFF
  • Service copies at 400ppi, 8 bit color, 4000px on largest dimension, TIFF
  • JPG copies not needed

Naming convention

Files will follow the standard practice:

project-prefix_box_folder_item_page

where folder, item, and page will be left string-padded to three places.

Project prefix: mvp (for “Matthew Vassar Papers”)

N.B.: the finding aid has some folders ending with the letter “A”.  In these cases, we will treat the “A” folder as a second item.

Examples:

  1. Folder 2.16 to James Grant Wilson, 27 Jun 1861 (1 letter)
    mvp_002_016_001_001a.tif — archival
    mvp_002_016_001_001.tif — service
  2. Folder 14.467A Matthew Vassar, Co.: Correspondence: Vassar to Dear Sir, 19 Nov 1838 (1 letter)
    mvp_014_467_002_001a.tif — archival
    mvp_014_467_002_001.tif — service
  3. Folder 6.186 from Carrie F. Stowe, 3 Jun 1862 (1 letter and photograph)
    mvp_006_186_001_001a.tif — letter, archival
    mvp_006_186_001_001.tif — letter, archival
    mvp_006_186_001_002a.tif — photograph, archival
    mvp_006_186_001_002.tif — photograph, archival

Metadata considerations

  1. Names should be cross-referenced with LC’s authority file for named creators and correspondents, etc.
  2. Standard MODS profile: title, date, identifier, creator, correspondent, rights management.

Other considerations:

  1. Some folders have photocopies or duplicates of items in other boxes.  Should we image these as well?  E.g., Folder 5.148 to Rev. Charles A. Raymond, typescripts of letters in folders 129-147, 30 Jul 1862 – 3 Apr 1864.
  2. We need accurate counts of items for better storage estimates.

Memorial Minutes

Background

Michael McCarthy approached the Archives & Special Collections Library about digitizing the Memorial Minutes read at faculty meetings.  He has some recent ones available at http://aevc.webs.com, dating back to 1990, but would like a more robust list.  Michael was referred to us for this digitization.

The Archives & Special Collections Library contains three volumes of Memorial Minutes:

  • Volume 1, 1877-1942
  • Volume 2, 1943-1960
  • Volume 3, 1960-1978

There is a gap from 1978-1989 which need to be pulled from the faculty minutes in Archives & Special Collections.  The digital library has already completed this digitization.

Digitization consists of approximately 90-100 pages per volume.  The volumes cannot be digitized as one book but will need to be digitized on a person-by-person (memorial-by-memorial) basis.  Each volume also contains an archival folder with the same contents except for Volume 3, whose folder contains overlapping but unequal content.

File creation

  • Goal is a one-to-one correspondence between person and minute (and, in the case of multiple minutes submitted per person, one-to-many).  Thus this project is inherently slow.
  • No metadata exists.
  • Items must be hand-scanned.

File naming scheme:

aevc_last_XXX_YYY

Where:

  • aevc = project code (memorial minutes is too long)
  • last = last name of person.  When same last names exist, use lastname-firstinitial.
  • XXX = zero-left string-padded number indicating which memorial minute per person is being scanned.
  • YYY = zero-left string-padded number indicating the page sequence per memorial minute per person.

Einstein project

The Albert Einstein project, funded by the Polonsky foundation, seeks to digitize and make available the contents in the following collection:

http://specialcollections.vassar.edu/findingaids/einstein_albert.html

Items:

There are approximately 290 images that will be produced from these series

Filenaming scheme

einstein_series_subseries_folder_item_page[a].extension

 For example:

einstein_01_01_014_001_001.tif – Albert Einstein to Otto Nathan, Sept 1936 – first telegram, first page; service copy image
einstein_01_01_014_001_002.tif – Albert Einstein to Otto Nathan, Sept 1936 – first telegram, second page; service copy image

einstein_01_01_014_002_001.tif – Albert Einstein to Otto Nathan, Sept 1936 – second telegram, first page; service copy image
einstein_01_01_014_002_002.tif – Albert Einstein to Otto Nathan, Sept 1936 – second telegram, second page; service copy image

Longfellow and Vassar Songs Sheet Music Collections

This project plan provides background information, special considerations, and digitization recommendations for two related projects:

  • Babs and Bella C. Landauer collection of musical settings of the poetry of Henry Wadsworth Longfellow (“Longfellow Collection”)
  • Collection of Vassar College song books (“Vassar Songs”)

The information below is not a detailed technical and preservation analysis but a summary of known issues and basic road map for further consideration.

Basic information

  • Stakeholder(s): Sarah Canino
  • Time frame: AY2012-2013
  • Preservation needed?: yes
  • Funding opportunity?: yes (Farrish Foundation)

Background and Non-Digital Considerations

The Music Library contains two sheet music collections proposed as good candidates for digitization.  In the course of another proposal for digitization (a more generalized sheet music digitization), Sarah Canino provided an analysis of these rare and unique collections.  Each features very strong facets:

  • Both contain a significant portion of unique materials when searched against the Sheet Music Consortium (SMC) and the Petrucci Music Library Database at the International Music Score Library Project (IMSLP), the two premier repositories of digitized music.
  • Both contain large (if not all) objects created pre-1923, i.e., free from copyright restriction.
  • One collection contains items completely unique to Vassar College.
  • Although current scope is the boxes of the Vassar Songs collection only, further digital projects can stem from this theme, including audio, other song texts, class parties, and musicals.

In addition, there are some drawbacks:

  • There is virtually no item-level (EAD container / <c>) metadata for the items.  Sarah is also interested in item-level cataloging / MARC records for these items.
  • The Vassar music, in particular, is extremely rare but in poor condition.
  • Some further research must occur to properly determine copyright status and weed out duplicates in the Longfellow Collection.
  • A precise metadata standard must be met to share items with the SMC and IMSLP databases.

Digital profiles for collections

Unit of consideration in physical collection: sheet / song
Unit of items to be digitized: page

Longfellow Collection

  • Item count: 378 objects, most of which are sheet music but some that are vocal scores (~ 35 of them, 50 pages each)
  • Assumption: 4-6 pages per item; average 5 pages
  • Estimated item count: 1,890 pages.  N.B.: Sarah will identify and modify this page count.  Duplicate items will not be counted.
  • Format: loose objects in boxes (4 boxes; identifier = 78 L86 v.1-4)
  • Dates: most published pre-1924; most are American publications (some are British).  This needs to be verified.
  • Oversized materials?: no

Vassar songs collection

  • Item count: 8 volumes plus some additional publications of songs (“Peace I leave with you” and 1903 yearbook).
  • Estimated item count: 500 pages (provided by Sarah)
  • Format: bound volumes in poor condition
  • Dates: 1881-1940
  • Oversized materials?: TBD

Digital Considerations

Longfellow Collection

Number of items to digitize

There are 378 pieces in the Longfellow Collection (possibly including duplicates).  Sarah has found that approximately 20% of the items in the collection have already been digitized and available elsewhere.  We must ensure that the oversized folios are not too large for the copystand.

Recommendation: we digitize the collection in its entirety — minus the duplicates — and not worry about the overlap.  However, in our metadata schema, we should provide reference to a related item that provides a URL to an alternate digital object in another institution.  Sharyn Cadogan and Joanna must measure the oversized folios.

Metadata and Copyright Research and Cataloging

There is virtually no metadata for these items, and significant research must be conducted to determine unique items, any background information, and copyright considerations.  Additionally, we must create a metadata profile that is flexible and useful locally and worldwide.

Recommendations:

  • Joanna works with Sarah and Ann Churukian to create a new metadata profile that maps to Dublin Core or MODS (most likely MODS) in Islandora.
  • Music Library uses part of the available funds to hire a library school intern for a paid intership to research each piece, copy metadata when needed, and provide original cataloging of other items, under direction from Ann.
  • Cataloging should be done directly in Islandora.  This can serve as a pilot project for account management, maintenance, and documentation in our chosen digital library software.
  • Once cataloged, Joanna can work with Ann, the library intern, and Laura Streett to create an EAD-compliant finding age for this collection.  Additionally, because data in Islandora is stored as MODS, Joanna can fairly easily transform metadata into other standards, such as data required for the SMC.
  • Joan Pirie and Shay Foley should be consulted about formatting data for MARC ingest into the library catalog.

Recommendations and outcomes from 11/2/2012:

  • Sarah will work to analyze duplicates
  • Sarah will provide basic metadata in electronic format — Title, Composer, Number of Pages — for each item
  • Once Joanna has metadata, we can begin digitization
  • Bound volumes will be “Phase II”
  • Library school intern should be hired for paid internship
  • Sabrina and Sarah will identify possible interested faculty in collection

Vassar songs collection

Digitization process

Sharyn and Joanna, with help from Laura Streett, must assess the fragility of the biding in the context of digitization.  Laura can determine the fragility of the object itself, while Sharyn and Joanna can determine the amount of shadowing, curvature, and margin; we must understand how much impact the condition of the item will impact a high-quality digital copy.

Recommendation: Sharyn, Joanna, and Laura examine the Vassar songs collection and take basic measurements.  We cannot fully determine the feasibility of digitizing this collection in-house unless we do this critical step.

Metadata

There are some items already digitized, but at the collection / book level.  We need metadata at the song / “sheet” level.  We need to determine whether or not a one-to-one correspondence exists between song and page; in other words, do songs begin on the same page as other songs, or does a new song begin a new page?  If the former, our metadata profile and digitization may be difficult; the easiest way to digitize may be to duplicate pages that contain the end and beginning of songs, adding to our digital count.

Recommendations:

  • Joanna should examine the volumes to determine the page-to-song correspondence, which will increase the page count.
  • Similar to the recommendations for the Longfellow Collection, it may be useful to provide a paid internship opportunity for the right MLIS student to research and then directly catalog items into Islandora.

Recommendations and outcomes from 11/2/2012:

  • Joanna will work with Laura to obtain songbooks

Recommendations and outcomes from 1/4/2012:

  • We have asked Hudson Microimaging for a proposal and cost estimate for digitization services

 

  • Item-level metadata will be at BOOK level.  We may wish to OCR and then copy the Table of Contents from songbooks (when available) to help identify which songs are in which books
  • Books are already cataloged, so should be easy to obtain metadata

 

 

Printers’ Marks

About the Project

Working name: Printers’ Marks
Sponsors: Sabrina Pape and Ron Patkus
Duration: Summer 2012
Nature:
[Text; image; text+image; GIS; audio/video; other]
Text and images
Project track: 2 – VCL project with special considerations
Date prepared: 2012-08-01

Background / Purpose

The printers’ marks throughout the Main Library have been of interest to researchers and Vassar community members since they were installed in the early 20th century.  A published volume, A list of the printers’ marks in the windows of the Frederick Ferris Thompson Memorial Library, Vassar College, is available online.  We will digitize this volume to provide scans with very high resolution, as well as undertake a research project to document the printers, marks, and current locations of each plate.  Additionally, we will photograph the current marks in situ.  We will apply for a Ford Scholar to assist us in this work in Summer 2013.

Scope

Phases of project
Based on item temporal coverage
Phase 1: Photograph plates, scan images
Phase 2: Develop research with Ford Scholar
Phase 3: Publish online project
Number of items to be digitized TBD – there are 16 pages in the volume, and 66 current windows.  We will splice marks from the TIFFs created from the book as well; there are 82 marks.
Total number of images
Assumption: one JPG derivative per each archival image created
16 TIFFs page + 16 JPGs page + 82 TIFF marks + 82 JPG marks + 66 TIFFs windows + 66 JPGs windows = 328 images
Total number of records TBD
Special considerations Photography may be difficult

Location of Physical Items

Book is located in Special Collections; windows are dispersed throughout Main Library.

Hardware/Storage

System type System Space required
Archival image storage  digcol 164 images;  6560 MB
Derivative item storage  digcol  164 images; 1640 MB
TOTAL SPACE NEEDED  8200MB / 0.8 GB

Software

Image capture: Scanners and cameras to Photoshop
Metadata capture and storage: Islandora
Final product display: Islandora

Scanning specifications

We will scan at 400ppi, 3000px for largest dimension. Individual marks at 1200ppi.

File Naming Convention

Formula

For book:

  • Prefix: pmarks
  • ID:  book
  • ID part: page number (left pad 3 digits)
  • Delimiter: underscore

Example:

Page 5: pmarks_book_005

  • Archival file: pmarks_book_005_a.tif
  • Service file: pmarks_book_005_s.tif
  • Derivative: pmarks_book_005.jpg

 For extracted images per page:

  • Prefix: pmarks
  • ID:  book
  • ID part: page number (left pad 3 digits)
  • ID part: wing (e.g., “West Wing 4th” = ww4)
  • ID part: image number in sequence (left pad 3 digits)
  • Delimiter: underscore

Example:

Page 5, John Besson 1923 mark:

pmarks_book_005_ww4_001

  • Archival file: pmarks_book_005_ww4_001_a.tif
  • Service file: pmarks_book_005_ww4_001_s.tif
  • Derivative: pmarks_book_005_ww4_001.jpg

For windows:

  • Prefix: pmarks
  • ID: photo
  • ID part: wing (e.g., “West Wing 4th” = ww4)
  • ID part: image number in sequence (left pad 3 digits)
  • Delimiter: underscore

Example:

John Besson 1923 mark: pmarks_photo_ww4_001

  • Archival file: pmarks_photo_ww4_001_a.tif
  • Service file: pmarks_photo_ww4_001_s.tif
  • Derivative: pmarks_photo_ww4_001.jpg

Salmon-Underhill Digital Exhibit

Instructions

Fill out the About the Project information below, and then use the Worksheet for Functional Specifications during consultation with stakeholders to help determine the software and steps used. The project track determination may change over time.

About the Project

Working name:  Salmon-Underhill Digital Exhibit
Sponsors:  Gretchen Lieb
Duration:  3 weeks
Nature:

[Text; image; text+image; GIS; audio/video; other]
 Text + Image
Project track:  Track 2
Date prepared:  February 1, 2012

Background / Purpose

The purpose of the Salmon-Underhill Digital Exhibit is to provide an Omeka- and CONTENTdm-ready set of images, metadata, and narratives to contribute to the Women’s History Month exhibit sponsored by HRVH.

Scope

Phases of project One phase only
Number of items to be digitized ~ 30
Total number of images

Assumption: one JPG derivative per each archival image created
~ 150
Total number of records
Special considerations  Some items may be fragile; letters may have bleed-through from recto to verso.

Location of Physical Items

 

Unit Location
 Letters  Special Collections
 Pictures Special Collections
 Caption/text Thumb drive

 

Hardware/Storage

System type System Space required
Archival image storage Artfiles server
Derivative item storage Omeka, CONTENTdm
TOTAL SPACE NEEDED

Software

Image capture: VRL scanning; Archival TIFF, service TIFF
Metadata capture and storage:  Excel spreadsheet; already-written captions
Final product display:  Omeka (HRVH); CONTENTdm (VCL and HRVH)

File Naming Convention

Formula

  • Prefix: salmon
  • ID: box and folder #, delimited by hyphen
  • ID: left-padded, 3 characters
  • Page sequence: left-padded, 3 characters
  • Delimiter: underscore

Example: salmon_46-2_001_001.tif