Toward a post-MARC view of bibliographic metadata

28
Toward a post-MARC view of bibliographic metadata Jean Godby, Senior Research Scientist Triangle Research Libraries Network workshop -- Chapel Hill, North Carolina March 15, 2012

description

Toward a post-MARC view of bibliographic metadata. Jean Godby , Senior Research Scientist Triangle Research Libraries Network workshop -- Chapel Hill, North Carolina March 15, 2012. Outline for today. How did I get to this place? - PowerPoint PPT Presentation

Transcript of Toward a post-MARC view of bibliographic metadata

Page 1: Toward a post-MARC view of bibliographic metadata

Toward a post-MARC view of bibliographic metadata

Jean Godby, Senior Research Scientist

Triangle Research Libraries Network workshop -- Chapel Hill, North Carolina

March 15, 2012

Page 2: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 2

Outline for today

1. How did I get to this place?2. The Library of Congress

Bibliographic Framework for Digital Resources

3. The OCLC ‘Beyond MARC’ work agenda

4. Four guiding assumptions5. Some questions

Page 3: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 3

OCLCMARC

OutputsInputs

Translations in the Crosswalk service

ONIX Books 2.1

ONIX Books 3.0

MODS

Dublin Core

OCLC MARCDC-Qualified

MARC

ONIX Books 2.1

ONIX Books 3.0

MODS

Dublin Core

DC-Qualified

MARC

OCLC MARC

Page 4: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 4

Problems with mapping to and from MARCProblem: In a MARC record, some critical information is represented redundantly. Effect on the Crosswalk: requires one-to-many mappings, which are semantically opaque and difficult to maintain.Problem: Some MARC fields are ambiguous. Effect on the Crosswalk: The distinctions are difficult to recover or may be lost.Problem: Many MARC free-text fields have formatting requirements. Effect on the Crosswalk: They must be added in (and taken out).

Page 5: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 5

And so forth….and so on

Problem: Many formatting requirements are explicitly stated only in cataloging rules, not in the data that is algorithmically processed.

Effect on the Crosswalk: Knowledge of the cataloging rules must be embedded in the translation software.

Problem: Some MARC fields are coded with hidden assumptions.

Effect on the Crosswalk: Knowledge of the hidden assumptions must be embedded in the translation software, which requires complex and brittle Boolean logic.

Problem: MARC has a “long tail.”

Effect on the Crosswalk: It is necessary to maintain a large number of mappings that are not used.

Page 6: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 6

RDA or other

structured metadata

vocabulary

OutputsInputs

MARC’s complexity needs to be quarantined.

ONIX Books 2.1

ONIX Books 3.0

MODS

Dublin Core

OCLC MARCDC-Qualified

MARC

ONIX Books 2.1

ONIX Books 3.0

MODS

Dublin Core

DC-Qualified

MARC

OCLC MARC

Page 7: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 7

In other words, with MARC in the center of our model…

Despite the hundreds of millions of mappings that have been performed on OCLC’s bibliographic data, it is still locked up in a legacy system.

The mapping problem is complex largely because of the need to support MARC.

It is still too difficult to define and implement mappings.

So what is the alternative?

Page 8: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 8

“The new bibliographic framework we are aiming for will broaden participation in the network of resources, librarians will be able to do a much better job of linking their patrons to resources of all kinds (from the library and from many other sources), and costs can be better contained.”-- Library of Congress

“Bibliographic framework is... an environment rather than a ‘format’”

A Bibliographic Framework for the Digital Age (October 31, 2011)

Page 9: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 9

resource

relationship

manifestationentity

object

data

abstract

library

RD

A

serviceformat

linkedauthority

MARC

carriergroundtruthing

FRBR

semantic

beyond

content

tran

sfo

rmatio

n

RDFinstance

description

statementschema

role

hadoopproperty

UML

model

identifier

legacy

web

OCLC’s ‘Beyond MARC’ research agenda theme

Page 10: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 10

The OCLC “Beyond MARC: research agenda:who’s involved• Eric Childress, Consulting Product Manager• Jean Godby, Senior Research Scientist• Thom Hickey, Chief Scientist • Devon Smith, Consulting Software Engineer• Karen Smith-Yoshimura, Program Officer• Roy Tennant, Senior Program Officer• Diane Vizine-Goetz, Senior Research Scientist• Jeff Young, Software Architect

Page 11: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 11

Assumption 1 There are many moving targets

Page 12: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 12

• Don’t add to the complexity.• Use publicly defined standards wherever possible.

• Leverage the work of others.• Focus on data preparation, cleanup, and modeling that will support a variety of formats.

The OCLC Research response: Some guiding principles

Page 13: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 13

Page 14: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 14

Make your stuff available on the web.Make it available as structured data……in a non-proprietary format. Use URLs to identify things.Link your data to other people’s data.

Data preparation: principles

Sourc

e:

W3C

Data, not textIdentifiers, not stringsStatements, not recordsMachine-readable schemaMachine-readable lists

Sourc

e:

Kare

n C

oyle

Page 15: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 15

Assumption 2: Most bibliographic metadata will not be created by libraries

Page 16: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 16

Why ONIX is interesting

<Product> <RecordReference>0892962844</> <ProductIdentifier> <ProductIDType>02</> <IDValue>0892962852</> </ProductIdentifier> <ProductForm>BB</> <Title> <TitleType>01</> <TitleText>McBain’s Ladies</> </Title> <Contributor> <ContributorRole>A01</> <PersonNameInverted>Hunter, Evan</> </Contributor> <Subject> <SubjectSchemeIdentifier>02</> <SubjectHeadingText> Policewomen--Fiction.

Leader 00000 jm a22000005 4500008 g eng020 $a 0892962852100 $a Hunter, Evan245 $a McBain’s ladies260 $b Mysterious Press $d 1988300 $a 320 p.650 #2 $a Policewomen -- Fiction

identifier

text

A record

string

identifier

string

A statement

data

data

identifier

datastring

Page 17: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 17

A hypothetical bibliographic description expressed as linked data

<Product> <RecordReference> http://uri/recordID/0892962844</> <ProductIdentifier> http://uri/identifierisbn0892962852</> </ProductIdentifier> <ProductForm>http://uri:/format/paperback</> <Title> http://uri/title/primaryTitle/McBain’s Ladies</> </Title> <Contributor> <ContributorRole>A01</> <Person> http://uri/person/Hunter, Evan</> </Contributor> <Subject> http://uri/subject/LCSH/Policewomen--Fiction.

Page 18: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 18

This list is

inadequate for

describing the range

of material types held

by libraries.

Page 19: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 19

Some proposed “library” extensions to Schema.org.

Page 20: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 20

The extensions are derived from MARC data for the WorldCat search interface.

Page 21: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 21

The WorldCat search interface terms reduce a complex MARC concept space to a list.

Page 22: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 22

Assumption 3:MARC will be around for awhile.

Assumption 4:Mapping is still necessary.

Page 23: Toward a post-MARC view of bibliographic metadata

A publishing model

OCLC Abstract Modelmodel

model

model

map

map

map

Raw DataStandard

Vocabularies

RDA or other

structured metadata

vocabulary

OutputsInputs

ONIX Books 2.1

ONIX Books 3.0

MODS

Dublin Core

OCLC MARCDC-Qualified

MARC

ONIX Books 2.1

ONIX Books 3.0

MODS

Dublin Core

DC-Qualified

MARC

OCLC MARC

Page 24: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 24

It is not enough

To RDF-ify MARC

The concepts

must be extracted.

They eventually

emerge.

Page 25: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 25

Some (perhaps uncomfortable) questions1. How much work will be involved in building out the

abstract model? What is the value proposition?2. How can we engage communities of practice to

contribute to the parts of the abstract model that describe their resources?

3. How will mappings be implemented in the post-MARC information landscape?

4. How much information in the MARC record will get lost?

5. What will content standards look like in post-MARC descriptions?

6. How many of the FRBR and RDA concepts are algorithmically recoverable from legacy data?

7. What happens if linked data does not live up to its promise or is not adopted quickly enough?

Page 26: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 26

But maps from many

MARC concepts look like this.

Set-theoretic mappings can be

implemented elegantly in RDF/OWL.

Page 27: Toward a post-MARC view of bibliographic metadata

Post-MARC bibliographic metadata 27

ReferencesCoyle, Karen. 2011. MARC 21 as data: a start

http://journal.code4lib.org/articles/5468

---.2012. Taking library data from here to there. http://lists.w3.org/Archives/Public/public-esw-thes/2012Feb/0001.html

Godby, Carol Jean. 2010. From records to streams: merging library and publisher metadata. http://dcpapers.dublincore.org/ojs/pubs/article/view/1033.

Library of Congress. 2011. A bibliographic framework for the digital age. http://www.loc.gov/marc/transition/news/framework-103111.html

Library Linked Data Incubator Group final report. 2011. http://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/

OCLC. 2012. FAST Linked Data. http://experimental.worldcat.org/fast/.

Schema.org. 2012 http://schema.org/

Smith-Yoshimura, Karen, et al. 2010. Implications of MARC tag usage on library metadata practices. http://www.oclc.org/research/publications/library/2010/2010-06.pdf

Page 28: Toward a post-MARC view of bibliographic metadata

Thank you!