Full Catalog RDA Enrichment in Alma (ELUNA 2015)

Full Catalog RDA Enrichment in Alma/Primo

Background

• 21 libraries on 5 campuses• ~5.8 million locally managed bibliographic

records• Aleph ILS 2002-2013• Primo implemented in 2007• Migrated to Alma with Primo front end in

December 2013.

Planning

• Partnered with Backstage Library Works• Planning began April 2013• Project completion anticipated Fall 2013• Authority control and descriptive

enhancement choices made by small team of 3 catalogers and 1 systems librarian

Testing

• Initial preliminary test set: April 2013• Approximately 10,000 records in all formats

and languages• Used default profile for initial test run• Suggested several improvements to BSLW

based on test results

Testing

• July 2013: customized profile for authority and RDA enhancements

• August 2013: run of initial test set with custom profile

• September 2013: Suggested more improvements to BSLW– Most related to determining CMC values for

music and video records

Testing

• September 2013: 3rd test run• A few more “edge case” improvements

suggested to BSLW• October 2013: preparation of final test set

(100K records) and timeline discussion for processing entire catalog

Choices for Authority Control

• No flipping of name headings with only $$a• No 650 indicator flips for partial matches• No conversion of $$x subdivisions to $$v• Matched 655 terms against thesauri

specified in $$2• No authority records (Alma authority files are

centrally maintained by Ex Libris)

Choices for RDA Enrichment

• Expanded most abbreviations in 260 fields• No conversion of 260 to 264• Expanded abbreviations in 300 and 500 fields• Changed SMDs to match RDA terms• Accepted all options for RDA changes to

1XX/6XX/7XX/8XX fields• Converted $$4 relator codes to $$e relator terms

Choices for RDA Enrichment

• All records enriched or validated regardless of descriptive rules followed

• GMDs removed from 245 $$h but retained in a local field

• 336-337-338 $$a and $$b added to all records• 245 [et al.] changed to [and others]; other 245

changes not selected• No expansion of abbreviations in 250 fields

Delay: Alma Migration

• Early November 2013: the math doesn’t work

• Not enough time for BSLW processing + Aleph reload + Aleph indexing before pre-migration freeze.

…And we’re back!

• May 2014: Ready to proceed with testing in Alma

Process

• Summer 2014: Collaborated with staff at Ex Libris to develop process in Alma

• Major concerns:– Identifying subsets of full catalog to send– Scalability of Alma export/import processes– Alma indexing demand– Alma/Primo publishing/harvesting demand– Primo normalization and indexing demand

Process

• Mid-August – Early November 2014• Started with smaller sets of records and

proceeded with caution• Gradually increased file size• Close monitoring and frequent

communication with Ex Libris analysts

Process

University of Minnesota / Ex Libris

Spot check of file contents

Import/overlay process runs in Alma

Failed records corrected and reloaded

Alma publishes overlaid records to Primo

Primo harvests, normalizes, and indexes

Backstage Library WorksExport file uploaded to BSLW FTP server File processed according to profile Processed file placed on FTP server for

pickup

University of MinnesotaBib record set of desired size created in

Alma Alma export process run Alma process run to add warning note to exported records

Timing

University of

Minnesota

Export: 30 minutes to 4

hours

Warning note process: 12 to

48 hours

Backstage Library Works

File processing: 3 to 7 days

University of

Minnesota

Spot checks: 1 to 2 hours

Import/overlay: 10 to 28 hours

Failed record correction: 4 to 12 hours

Primo harvest and index: 24-

48 hours

Total: 5-12 days turnaround time per set of 500,000-1 million records

Creating Alma sets for export

• Not obvious how to ensure all IZ records were included

• No easily-determinable pattern for how MMS IDs were created during migration

• Sets created by ranges of MMS IDs; ranges determined by trial and error

Export Process

• Mostly ran smoothly, though timing was unpredictable

• Exported in MARC21 rather than MARCXML for smaller file sizes

• Some records failed export due to critical MARC coding errors; no way to identify these from export job reports

Staff Warning Note• After export, added note in 098 to exported records

instructing staff not to edit those records.• Note “deleted” automatically upon overlay with enhanced

version of record• Honor system: no way to lock down records

Staff Warning Note

• Configured 098 field as a search index• Enabled easy retrieval of records that

retained DO NOT EDIT note after they should have been overlaid

• Happy coincidence: allowed us to identify records that failed export

Import/Overlay of Enhanced Records

• Before overlay, spot checks of processed file done in MarcEdit

• Full overlay with 001/MMS ID as match point• No merge routine needed• Profile set to skip any unresolved records

Overlay Process Pitfalls

• Initially could not download failed files to fix and re-attempt.

• Reported issue to Ex Libris; prompt fix by Development

• Failed files typically contained badly encoded characters


• Some records failed each large overlay process

• Records typically failed basic validation checks: missing or multiple 245 fields most common reason

• Many older CJK records also failed due to incorrectly linked 880 alternative script fields


• Bad 035s created since many legacy records had OCLC or RLIN source codes in 003 fields

035 $a (OCoLC) 9929094990001701

• Added normalization process to import profile to strip 003 fields

• Identified previously-affected records via Analytics and ran normalization process to clean up


• Suppression status not part of Alma bibliographic record, so previously suppressed records published to Primo after overlay with enhanced records

• Changed process to save an itemized subset of suppressed records in each base set, and ran suppression job on those records post-overlay.

Post-Overlay: Indexing, Publishing, Harvesting

• Alma indexing: no issues• Publishing to Primo: jobs often slow to

complete, but no major issues• Primo pipe: temporarily suspended when too

many job instances queued up• Primo processes took the longest, but

encountered no serious problems

Challenges and Outcomes

• Difficult to predict running time for Alma processes

• Bad data in, bad data out– Bad 440 nonfiling character indicators– 33X fields on minimal records– Incorrectly paired 880 fields

Challenges and Outcomes

• Cleaned up/deleted many “garbage” records not caught before system migration

• Identified records with critical MARC coding problems for later cleanup

• Descriptive changes = more consistency and clarity for users

• Heading updates especially important as we wait for improvements to Alma’s native authority control processes.

Thank You!Stacie Traill

University of Minnesota Libraries

[email protected]

Full Catalog RDA Enrichment in Alma (ELUNA 2015)

Documents

Transcript of Full Catalog RDA Enrichment in Alma (ELUNA 2015)