Full Catalog RDA Enrichment in Alma (ELUNA 2015)
Transcript of Full Catalog RDA Enrichment in Alma (ELUNA 2015)
Full Catalog RDA Enrichment in Alma/Primo
Background
• 21 libraries on 5 campuses• ~5.8 million locally managed bibliographic
records• Aleph ILS 2002-2013• Primo implemented in 2007• Migrated to Alma with Primo front end in
December 2013.
Planning
• Partnered with Backstage Library Works• Planning began April 2013• Project completion anticipated Fall 2013• Authority control and descriptive
enhancement choices made by small team of 3 catalogers and 1 systems librarian
Testing
• Initial preliminary test set: April 2013• Approximately 10,000 records in all formats
and languages• Used default profile for initial test run• Suggested several improvements to BSLW
based on test results
Testing
• July 2013: customized profile for authority and RDA enhancements
• August 2013: run of initial test set with custom profile
• September 2013: Suggested more improvements to BSLW– Most related to determining CMC values for
music and video records
Testing
• September 2013: 3rd test run• A few more “edge case” improvements
suggested to BSLW• October 2013: preparation of final test set
(100K records) and timeline discussion for processing entire catalog
Choices for Authority Control
• No flipping of name headings with only $$a• No 650 indicator flips for partial matches• No conversion of $$x subdivisions to $$v• Matched 655 terms against thesauri
specified in $$2• No authority records (Alma authority files are
centrally maintained by Ex Libris)
Choices for RDA Enrichment
• Expanded most abbreviations in 260 fields• No conversion of 260 to 264• Expanded abbreviations in 300 and 500 fields• Changed SMDs to match RDA terms• Accepted all options for RDA changes to
1XX/6XX/7XX/8XX fields• Converted $$4 relator codes to $$e relator terms
Choices for RDA Enrichment
• All records enriched or validated regardless of descriptive rules followed
• GMDs removed from 245 $$h but retained in a local field
• 336-337-338 $$a and $$b added to all records• 245 [et al.] changed to [and others]; other 245
changes not selected• No expansion of abbreviations in 250 fields
Delay: Alma Migration
• Early November 2013: the math doesn’t work
• Not enough time for BSLW processing + Aleph reload + Aleph indexing before pre-migration freeze.
…And we’re back!
• May 2014: Ready to proceed with testing in Alma
Process
• Summer 2014: Collaborated with staff at Ex Libris to develop process in Alma
• Major concerns:– Identifying subsets of full catalog to send– Scalability of Alma export/import processes– Alma indexing demand– Alma/Primo publishing/harvesting demand– Primo normalization and indexing demand
Process
• Mid-August – Early November 2014• Started with smaller sets of records and
proceeded with caution• Gradually increased file size• Close monitoring and frequent
communication with Ex Libris analysts
Process
University of Minnesota / Ex Libris
Spot check of file contents
Import/overlay process runs in Alma
Failed records corrected and reloaded
Alma publishes overlaid records to Primo
Primo harvests, normalizes, and indexes
Backstage Library WorksExport file uploaded to BSLW FTP server File processed according to profile Processed file placed on FTP server for
pickup
University of MinnesotaBib record set of desired size created in
Alma Alma export process run Alma process run to add warning note to exported records
Timing
University of
Minnesota
Export: 30 minutes to 4
hours
Warning note process: 12 to
48 hours
Backstage Library Works
File processing: 3 to 7 days
University of
Minnesota
Spot checks: 1 to 2 hours
Import/overlay: 10 to 28 hours
Failed record correction: 4 to 12 hours
Primo harvest and index: 24-
48 hours
Total: 5-12 days turnaround time per set of 500,000-1 million records
Creating Alma sets for export
• Not obvious how to ensure all IZ records were included
• No easily-determinable pattern for how MMS IDs were created during migration
• Sets created by ranges of MMS IDs; ranges determined by trial and error
Export Process
• Mostly ran smoothly, though timing was unpredictable
• Exported in MARC21 rather than MARCXML for smaller file sizes
• Some records failed export due to critical MARC coding errors; no way to identify these from export job reports
Staff Warning Note• After export, added note in 098 to exported records
instructing staff not to edit those records.• Note “deleted” automatically upon overlay with enhanced
version of record• Honor system: no way to lock down records
Staff Warning Note
• Configured 098 field as a search index• Enabled easy retrieval of records that
retained DO NOT EDIT note after they should have been overlaid
• Happy coincidence: allowed us to identify records that failed export
Import/Overlay of Enhanced Records
• Before overlay, spot checks of processed file done in MarcEdit
• Full overlay with 001/MMS ID as match point• No merge routine needed• Profile set to skip any unresolved records
Overlay Process Pitfalls
• Initially could not download failed files to fix and re-attempt.
• Reported issue to Ex Libris; prompt fix by Development
• Failed files typically contained badly encoded characters
Overlay Process Pitfalls
• Some records failed each large overlay process
• Records typically failed basic validation checks: missing or multiple 245 fields most common reason
• Many older CJK records also failed due to incorrectly linked 880 alternative script fields
Overlay Process Pitfalls
• Bad 035s created since many legacy records had OCLC or RLIN source codes in 003 fields
035 $a (OCoLC) 9929094990001701
• Added normalization process to import profile to strip 003 fields
• Identified previously-affected records via Analytics and ran normalization process to clean up
Overlay Process Pitfalls
• Suppression status not part of Alma bibliographic record, so previously suppressed records published to Primo after overlay with enhanced records
• Changed process to save an itemized subset of suppressed records in each base set, and ran suppression job on those records post-overlay.
Post-Overlay: Indexing, Publishing, Harvesting
• Alma indexing: no issues• Publishing to Primo: jobs often slow to
complete, but no major issues• Primo pipe: temporarily suspended when too
many job instances queued up• Primo processes took the longest, but
encountered no serious problems
Challenges and Outcomes
• Difficult to predict running time for Alma processes
• Bad data in, bad data out– Bad 440 nonfiling character indicators– 33X fields on minimal records– Incorrectly paired 880 fields
Challenges and Outcomes
• Cleaned up/deleted many “garbage” records not caught before system migration
• Identified records with critical MARC coding problems for later cleanup
• Descriptive changes = more consistency and clarity for users
• Heading updates especially important as we wait for improvements to Alma’s native authority control processes.