Image recognition and image similarity - LIBER · PDF fileImage recognition and image...

31
Image recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf · Bayerische Staatsbibliothek, Digital Library Dept. June 24, 2015

Transcript of Image recognition and image similarity - LIBER · PDF fileImage recognition and image...

Page 1: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

Image recognition and

image similarity

A different approach for accessing large scale

digital collections

Dr. Thomas Wolf · Bayerische Staatsbibliothek, Digital Library Dept. June 24, 2015

Page 2: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

Project

background

2

Numerous digitization projects since 1997 Manuscripts, incunabula, newspapers, books

Up to 1 M objects from 8th – 20th century

Deep indexing available only for a part of the books

Where are the images?

Page 3: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

3

Seeking for a different approach: The

image-based similarity search

2011: Start of cooperation with the Fraunhofer Heinrich

Hertz Institute Berlin (core technology)

Based upon existing expertise of Fraunhofer HHI

Initiated by Dr. Markus Brantl (head of Dig. Library

Dept.)

2012: First prototypical implementation (250 volumes)

Optimization for BSB‘s holdings

Development of API and web interface

Online since April 2013

Page 4: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

4

How it works

Page 5: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

©Fraunhofer HHI

How it works: Edge detection and description

...

Edges, ie transitions

between areas of

different brightness

egde histogram

image descriptor

Page 6: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

29.06.2015

©Fraunhofer HHI

How it works: Color distribution

Color

distribution

Page 7: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

7

How it works

1. One-time processing of each image

Analyzing properties

Generation of a unique descriptor

Filling an index of descriptors

2. Index is loaded into RAM permanently

High availability for the application

Fast search among millions of images

Page 8: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

©Fraunhofer HHI

The similarity search in practice

01100110....

11110011....

01100110....

Input image

database

comparison

of

descriptors

similarity-based

image search

Search results

Page 9: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

Additional challenges with BSB‘s holdings

9

Recognition of images inside of pages

Image/text separation

Page 10: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

10

Additional challenges with BSB‘s holdings

Recognition of multiple images per page

(segmentation)

Page 11: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

11

Additional challenges with BSB‘s holdings

Recognition of book illumination,

initials, drawigs, woodcuts, copperplate

engravings, old newspaper photographs,

Page 12: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

12

Recognition accuracy

Some examples

Page 13: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

13

Recognition accuracy, example #1

Similar body shape, dress colour and foot position

Page 15: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

15

Recognition accuracy, example #3

The same printing plate used for different books, with

different coloration

Page 16: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

16

Recognition accuracy, example #3

http://bildsuche.digitale-

sammlungen.de/index.html?c=suche_sim&bandnummer=bsb00026765&pimage=4&einzelsegmentsuche=1&einzelsegment=1&l=de

Page 17: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

17

Recognition accuracy, example #4

Page 18: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

18

Recognition accuracy, example #5

Church interiour and train station halls - similar shapes &

lighting conditions

Page 19: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

19

Recognition accuracy, example #6

Four legs on the ground …

Page 20: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

20

The limits of image recognition

Stains & ripped pages

Decorative headlines

Exceptional shapes

Table Works

Newspaper ads

Sheet music

Page 21: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

Web application –

Functionalities overview

• Explorative access

• Customizable Simple Search

• Skimming of books

• All images from one book

• Compare your own image – Upload

Page 22: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

22

Page 23: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

23

Page 24: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

24

Page 25: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

25

Page 26: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

26

Page 27: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

27

Page 28: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

28

Main benefits of image-based similarity search

Reliable recognition of images

Optimized for the specific holdings of the BSB

Paintings, woodcuts, copperplate engravings,

photographs

Color or b/w

Separation of images from text

Customizable

High performance even with large amounts of data

Page 29: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

Outlook

Today

85.000 digitized works indexed

2,878,261 pages

5,423,025 images available

Planned for 2015/2016

1 Mio. digitized works (complete digital collections)

50 M. images assumed

29

Page 31: Image recognition and image similarity - LIBER · PDF fileImage recognition and image similarity A different approach for accessing large scale digital collections Dr. Thomas Wolf

31

Dr. Thomas Wolf

[email protected]