Jeff Ubois, Chairman

JEFF UBOIS is a Berkeley, California-based consultant on archival issues for Intelligent Television in New York; for Fujitsu Labs in Sunnyvale, California; and for the Preserving Digital Public Television Project at WNET/Thirteen in New York. Earlier, he was staff research associate at the School of Information Management and Systems at the University of California, Berkeley. For the Internet Archive, Jeff has worked on managing orphan works, the collection and retention of digital library usage data, and the launch of the Open Content Alliance. Jeff has also worked as a consultant to the Sunlight Foundation, OCLC, and the Bassetti Foundation, and has been published in First Monday, D-Lib, Release 1.0, the Journal of Digital Information, and ACM Interactions.

The Personal Digital Archiving Conference, San Francisco 2011

Jeff's list of topics (from personal conference notes)

Expanding the Collection & Digitisation

Others who have relevant materials (family members? friends? professional associations?)
Photos: need to scan and integrate
  • Stan James annotation approach
Ian Tresman's "Festschrift" for Alfred de Grazia
Video recordings

Strategies for preservation

Multiple copies
Tarballs & versions
Multiple copies to multipe services
Custodial institutions
Back out to film, paper
Deposit with Internet Archive


Deposit with Internet Archive
New Interface options
  • Social network, map, timeline
What are the logs telling us? Who is using what?
  • Al's favorites (mine: FB on punchcards; PA; Theses & Stases; ABS / Braudel, Simon;
  • - letters to the future

Near Term Improvements & projects for grazian-archive:

* figuring out name & place extraction [see tools list below]
*/explore tools
*/ maybe do it for a limited set
*/ rank on scale of intimacy

* adding in existing digitized material
*/ newspapers
*/ jstor articles
*/ other recent scans
*/ Festschrift

* photos
*/ probably scan on site rather than ship them off

* film & video
*/ send to Home Movie Depot
*/ video might be done at home

* inventory of most valuable items
*/ war letters, photos, other correspondence

*/ get estimate of value by putting it up for sale?

* quotes & reviews section

* landing pages for each section

* everyone contributes to general wishlist

* later: timelines & maps

* email: what is to be done with Al's email?

* things to chase down / recover
*/ movies with Mel van Peebles

* document what has been done so far

* file management
*/ de-duplication

General business plan

* Directory of personal archives (& examples to analyze)
* Bertrand Russell, Donald Rumsfeld, Buckminster Fuller, Archives of American Art (50 mid-century artists); Aldo Leopold papers project; Franklin papers, Varchive, Stan James
Warren Washington (;
Sally Potter;
Engelbart Archive;
Harold "doc" Edgerton;

* Develop model way of doing subject interview
*/ to estimate extent of archives, project scope
*/ to more fully describe items

* Output to print / print on demand

* Valuing manuscripts
*/giving donors large tax deductions

* Computer History Museum donors as a target group

* Use cases & scenarios (subjects)
*/ all with mix of digital and non-digital materials
*/ professors or eventually eminent persons
*/ general family archives
*/political figures

* What is it for?
*/show to grandkids
*/ commercial exploitation
*/ personal use
*/ broad scholarly uses

* Other random notes from Stan James

*/ balance between effort to scan vs post process
*/ privacy
*/ contextualizing with Time Magazine covers
[also Al's "A Century of My Own"]

* Standards for files
*/file naming
*/ file formats

* What depth of familiarity with collection is required for which tasks?

* General tools & software packages
*/ Stanford Self Archiving Legacy Toolkit
Edward Feigenbaum (Herbert's Simon's student):

*/ Stan James's Drupal modules
*/ name & place extraction tools
*/ Simile Timeline
*/ Mechanical Turk (use for transcription or categorization)
*/ Picasa (face recognition for photos)

* Named entity extraction utilities
*/ See also Google Refine:
*/ CMU:

Difficulties & unsolved problems

* Time estimates for tasks related to archiving, e.g. inventory, scanning, etc
* File naming
* Name extraction
* Presentation / user interface
* Long term custody

Design prize:

* Internet Archive design prize
*/Name extraction
*/ Use grazian-archive as a corpus … with Buckminster Fuller
*/ Name lists
*/ Time lines
*/ Media type
*/ Interconnection … clever preservation strategy

* Design problems connected with the substance of the site, e..g
*/site on biology, flowers & trees; politics;
*/ designs relate to topics … or to media type
*/ for journals
*/ best archive by a faculty member
*/ best archive for a family
*/ best for born digital
*/ best for digitized
*/ love archives
*/ war archives
*/ family
*/ name lists, maps
*/ business archives . stock traders
*/ teachers
*/ by scientific field
*/ of books
*/ archive

*/ Chris :  think about design elements , e.g. Simile Widgets, and by media type
*/ best use of "Mechanical Turk" / combination of elements
*/ more by media than by topic

* Martin
*/ Martin: value is the content not the pretty package….

Focus on short term

If not completed… we have to build the mass of information

Additions to Martin's notes

* license / certificate for dealing with personal data, that it won't be traded; security of personal data should be guaranteed
* insurance for company: burning and water and loss
* offer indexing of all the documents
* in the CD there is a table which year, how many documents … generating a report
* outbound: youtube / open library /

* promotion is part of the deal

* sample for name extraction

Adrianna' remarks

Security problems
* transport security
* when handling / in custody
* personal data

Different levels / package
* estimation and calculation
* different budget plans

Describing the technical process

Quality control

prospect among SiliValley / marketing


j e f f @ u b o i s . c o m

Copyright © 2011. Tous droits réservés. mercredi 11 avril 2012 Ami de Grazia, webmasterContact: