FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document...

33
SCSIT Talk, Nottingham University, Thursday 16th June 2005 FRE 2645 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹ , ² Thursday 16th June 2005 ¹ PSI Laboratory, Rouen University, France ² SCSIT, Nottingham University, UK
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document...

Page 1: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

FRE 2645

Indexing of Graphic Document Images : a Perceptive Approach

Mathieu Delalandre¹,²Thursday 16th June 2005

¹ PSI Laboratory, Rouen University, France

² SCSIT, Nottingham University, UK

Page 2: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Who I am ?Who I am ?

Mathieu Delalandre Thesis: Fourth year of PhD (defence in September) Lab: PSI Laboratory, Rouen city, France Super: E. Trupin, J.M. Ogier, J. Labiche Team: S. Adam, H. Locteau, P. Héroux, E. Barbu, Y. Lecourtier Field: Document Image Analysis (Graphics Recognition) Postdoc: IPI, SCSIT, from April to September (4-5 months) with

Tony Pridmore

Page 3: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Indexing of Graphic Document Images : a Perceptive Approach

Introduction Systems Overview The Knowledge Level Conclusion

Page 4: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

IntroductionIntroductionIndexing & Retrieval (I & R)Indexing & Retrieval (I & R)

Indexing & Retrieval [Greengrass’00] Indexing: Identification and recording of attributes of data that will aid

retrieval. Retrieval: Ability of a database management system to get back data

that were stored there previously.

Applications videos (MPEG, AVI, …) Web pages (XML, XHTML, …) structured documents (PDF, PS, Word, …) images (JPG, GIF, …)

-Indexing & Retrieval (I & R)

-Categorization of Images

-I & R of Document Images

-My Topic

Page 5: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

IntroductionIntroductionCategorization of ImagesCategorization of Images

document images

trademark logo heading

journal

manual

photographies

foreground/background images

-Indexing & Retrieval (I & R)

-Categorization of Images

-I & R of Document Images

-My Topic

Page 6: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

IntroductionIntroduction I & R of Document Images (1/3)I & R of Document Images (1/3)

Web Pages

ImagesMarkup LanguagesHTML, XHTML, ..

30% 70%

Document ImagesLogos, Headings, …

Photographies

60% 40%

Today, document images are not indexed by search engines due of complexity of Document Image Analysis (DIA) task [Doerman’98][Walker’00][Baird’03]

Is indexing of document images really needed ? two questions Question : How many document images and where [Spring’95] [Cleveland’98]

[Steve’99] [Ouf’01] [Baird’03] [Hu’04] ?

Deep Web

Web (8.1015ko)

0.3% 99.3%

Digital LibrariesOthers

Softwares, Data Bases, …

large (or main) part

Document Images Structured Documents

minor partmain part

-Indexing & Retrieval (I & R)

-Categorization of Images

-I & R of Document Images

-My Topic

Page 7: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

IntroductionIntroduction I & R of Document Images (2/3)I & R of Document Images (2/3)

Paper (and image) has too many desirable properties,

document images and structured documents

will increasingly co-exist in the future [Breul’04]

Question : New or just old document images ?

-Indexing & Retrieval (I & R)

-Categorization of Images

-I & R of Document Images

-My Topic

Page 8: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

IntroductionIntroduction I & R of Document Images (3/3)I & R of Document Images (3/3)

To Conclude : (1) DIA is needed (and will be needed) in the future of I &

R of documents [Baird’03] [Breul’04] (2) DIA must come back today under the way of I & R

[Baird’03]

-Indexing & Retrieval (I & R)

-Categorization of Images

-I & R of Document Images

-My Topic

Page 9: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

IntroductionIntroduction My Topic My Topic

Indexing of graphic document images Indexing & Retrieval Indexing

Identification and recording of attributes of data that will aid retrieval

First step before retrieval

document images graphic document images

line drawing symbol logo asian script historical heading

-Indexing & Retrieval (I & R)

-Categorization of Images

-I & R of Document Images

-My Topic

Page 10: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Indexing of Graphic Document Images : a Perceptive Approach

Introduction Systems Overview The Knowledge Level Conclusion

Page 11: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Systems OverviewSystems OverviewIntroductionIntroduction

Overview of systems to index graphic document images we talk about Graphics Indexing Systems

Graphics Indexing Systems are specialized from DIA systems applied to recognition and understanding of graphic document images [Tombre’03] we talk about Graphics Recognition Systems

-Introduction

-Graphics Recognition Systems

-Graphics Indexing Systems

-Open Problems

Page 12: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Systems OverviewSystems OverviewGraphics Recognition Systems (1/3)Graphics Recognition Systems (1/3)

Applications deal with graphics parts (symbol and linear) text/graphics segmentation [Tombre’02], vectorisation

[Mejbri’02], symbol recognition [Llados’02], document interpretation (or understanding) [Ablameko’00], …

symbol linear text

Graphics Recognition Systems : graphic document images structured documents

-Introduction

-Graphics Recognition Systems

-Graphics Indexing Systems

-Open Problems

Page 13: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Systems OverviewSystems OverviewGraphics Recognition Systems (2/3)Graphics Recognition Systems (2/3)

Graphics are structured and connected

Graphics Recognition Systems are based on structural methods “relational organization of low-level features (graphic primitives) into higher-level

structures (graph)” [Tombre’96] [Shi’89]

symbol and its structure

connected symbol in drawing

lineconnect point

connect point T link

line

low level featuresgraphic primitives

lineconnect edge

higher-level structuregraph

T edge

symbol recognition

-Introduction

-Graphics Recognition Systems

-Graphics Indexing Systems

-Open Problems

Page 14: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Systems OverviewSystems OverviewGraphics Recognition Systems (3/3)Graphics Recognition Systems (3/3)

Graphic Primitive Extraction, some methods [Wenyin’98] [Delalandre’04] : skeletonization [Hilaire’04], contouring [Ramel’00], tracking [Song’00], labelling [Badawy’02],

transform [Couasnon’01], meshes [Vaxiviere’95], region segmentation [Cao’00], run-length [Burge’98], …

Recognition Graph Matching [Bunke’00], Graph Transform [Blostein’05], Primitive Matching [Foggia’99], …

Architecture of Graphics Recognition Systems :

-Introduction

-Graphics Recognition Systems

-Graphics Indexing Systems

-Open Problems

Graphic PrimitiveExtraction

Recognition

document images graph of graphic primitives

<network><part id=”1”><symbols><labels></labels></symbols></part></network>

structured document

Graphic Models

Page 15: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Systems OverviewSystems OverviewGraphics Indexing Systems (1/3)Graphics Indexing Systems (1/3)

Graphics Indexing Systems [Doerman’98] [Tombre’03], 3 classes :

Title block recognition [Arias’98], [Najman’01],

[Lamiroy’02], …

Statistical framework [Samet’96], [Worring’99], [Tabbone’03], [Terrades’03], …

Connected so no matched

Partial matching

Graphics indexing [Kasturi’88], [Lorenz’95], [Huang’97], [Hu’97], [Barbu’04], [Valasoulis’04], …

-Introduction

-Graphics Recognition Systems

-Graphics Indexing Systems

-Open Problems

Page 16: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Systems OverviewSystems OverviewGraphics Indexing Systems (2/3)Graphics Indexing Systems (2/3)

Architecture of Graphics Indexing Systems :

Graphic PrimitiveExtraction

Indexing

Graph of graphic primitives indexing attributes specific set of graphic

primitives

Indexattributes+

document links

document links

-Introduction

-Graphics Recognition Systems

-Graphics Indexing Systems

-Open Problems

Page 17: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Works

[Huang’97]

[Kasturi’88]

[Lorenz’95]

[Barbu’04]

[Hu’04]

[Dosh’04]

Graphic PrimitivesExtraction

thinning andchaining

run length encodingand polygonisation

contouring and polygonisation

thinning and neighbour analysis of skeleton’s pixels

thinning, chaining, and polygonisation

thinning,chaining, and polygonisation

Graph of Graphic Primitives

line graph of skeleton

straight line graph of contours and skeleton

2-D strings of contours

region adjacency graph

set of straight line of skeleton

set of straight line of skeleton

Indexing

cycle search, width and length matching of lines

Fourier approximationof line graph

string matching

graph mining

string matching

vectorial signature

Systems OverviewSystems OverviewGraphics Indexing Systems (3/3)Graphics Indexing Systems (3/3)

thinning

contouring region graph

skeleton graph

statistical

structural

-Introduction

-Graphics Recognition Systems

-Graphics Indexing Systems

-Open Problems

Page 18: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Systems OverviewSystems OverviewOpen Problems (1/2)Open Problems (1/2)

All these systems use a Lexical/Syntactic (or Bottom/Up) approach [Tombre’96] Lexical (Bottom) : Extraction from images of graphical primitives in an fixed way Syntactic (Up) : Analysis of graphical primitives without returns on image

So, all these systems use a Document Understanding Approach, but I & R is not an Understanding problem

-Introduction

-Graphics Recognition Systems

-Graphics Indexing Systems

-Open Problems

Criterion Understanding I & R

Image Size large small and mediumData Base Size small large

Process Execution one shot every-timecomplexity

Graphic Primitives accurate approximatedNoise Level high and medium low and medium

robustness

Prior Knowledge yes noDocument Class few and known several and unknown

content adaptation

content adaptation is the most important feature of I & R systems

Page 19: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Systems OverviewSystems OverviewOpen Problems (2/2)Open Problems (2/2)

-Introduction

-Graphics Recognition Systems

-Graphics Indexing Systems

-Open Problems

region based[Roque’03]

both based [Ramel’00]

line based[Hilaire’04]

Examples of Content Adaptation A broad class of document

Context

text/graphics segmentation

noise adaptation

To conclude A I & R must deal with the content adaptation Content adaptation can’t be solved without a knowledge based approach

Page 20: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Indexing of Graphic Document Images : a Perceptive Approach

Introduction Systems Overview The Knowledge Level Conclusion

Page 21: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

The Knowledge LevelThe Knowledge Level IntroductionIntroduction

Some (general) definitions [Tuthill’90] [Holsapple’04] Knowledge : human mental grasp of reality Representation : placement (and meaning) of knowledge into (from) computer

memory Formalism : a set of symbols corresponding to knowledge inside computers

Knowledge Human

Formalism(s) Computer

placementmeaning Human/Computer

Different types of knowledge on strategies [] on case based reasoning [] on ontologies [] ….

-Introduction

-Graphical Knowledge

-Graphics Model

-a Perceptive Approach

Page 22: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

pixel-basedformalisms

vector-basedformalisms

graph-basedformalisms

graphic primitives

high-level objects

formalism levels

The Knowledge Level The Knowledge Level Graphical Knowledge (1/2)Graphical Knowledge (1/2)

Graphical Knowledge [Delalandre’05] : It is a type of knowledge corresponding to human mental grasp of graphics

Levels of Graphical Knowledge

image

symbol

perception

interpretation

abstraction levels

it is a gate !

-Introduction

-Graphical Knowledge

-Graphics Model

-a Perceptive Approach

Page 23: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

primitives line images

The Knowledge Level The Knowledge Level Graphical Knowledge (2/2)Graphical Knowledge (2/2)

Two formalism levels [Tombre’96]

Graphic Primitives [Murray’96] Pixel-based formalism : pixel, raster,

run, connected component, … Vector-based formalism : vector, arc,

curve, ellipsis, square, …

Graph-based formalisms [Sowa 99]: Relational Attributed Graphs (RAG), Frames, Object-Oriented Languages, …

Relational Attributed Graphs [Seong’93]

-Introduction

-Graphical Knowledge

-Graphics Model

-a Perceptive Approach

Page 24: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

The Knowledge Level The Knowledge Level Graphics Model (1/2)Graphics Model (1/2)

Model [Seguela’01] : a knowledge representation using given formalisms and for given system’s purposes

Graphics Model [Delalandre’05] : model used to represent the graphical knowledge

a (simple) shape graphic primitivesextremity junction line

line based modeljunction edge line

junction based modelextremity junction line edge

-Introduction

-Graphical Knowledge

-Graphics Model

-a Perceptive Approach

Page 25: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

The Knowledge Level The Knowledge Level Graphics Model (2/2)Graphics Model (2/2)

region-based modelscomponent loop neighbour include

contour based modelsquadrilateral Line link Junction link

skeleton based modelsextremity junction line edge

One system = one model a considerable number of models [Joseph’92] [Pasternak’93] [Han’94] [Burgue’95] [Yu’97] [Lee’98] [Ramel’00]

[Couasnon’01] [Badawy’02] [Yan’04] …

Models depend of extracted graphic primitives, we can defined a graphics model taxonomy into 3 classes [Delalandre’05]

-Introduction

-Graphical Knowledge

-Graphics Model

-a Perceptive Approach

Page 26: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

The Knowledge Level The Knowledge Level a Perceptive Approach (1/6)a Perceptive Approach (1/6)

Region Level

Contour Level

Skeleton Level

Perception Level

of RepresentationsGlobal

Local

-Introduction

-Graphical Knowledge

-Graphics Model

-a Perceptive Approach

specialisation aggregation

two links between levels

Page 27: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

The Knowledge Level The Knowledge Level a Perceptive Approach (2/6)a Perceptive Approach (2/6)

classic models

-Introduction

-Graphical Knowledge

-Graphics Model

-a Perceptive Approach

Contour Level

Skeleton Level

Perception Level

of RepresentationsGlobal

Local

Region Level

hybrid models

perceptive approach (jump or browse)

Page 28: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

The Knowledge Level The Knowledge Level a Perceptive Approach (3/6)a Perceptive Approach (3/6)

First step, the region level : connected component analysis [Alnuweiri’92]

foreground background

foreground’s components

background’s components

main background

loops

-Introduction

-Graphical Knowledge

-Graphics Model

-a Perceptive Approach

Page 29: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Six Features (F) Foreground

(B) Background

(R) Resolution (ie. distance)

The Knowledge Level The Knowledge Level a Perceptive Approach (4/6)a Perceptive Approach (4/6)

(N) Neighboring

(S) Size

(I) Inclusion

-Introduction

-Graphical Knowledge

-Graphics Model

-a Perceptive Approach

Page 30: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Use-Case Queries

The Knowledge Level The Knowledge Level a Perceptive Approach (5/6)a Perceptive Approach (5/6)

-Introduction

-Graphical Knowledge

-Graphics Model

-a Perceptive Approach

started image FR1 FR2

BR2 BR2S2 BR2S2N2

Page 31: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

The Knowledge Level The Knowledge Level a Perceptive Approach (6/6)a Perceptive Approach (6/6)

True-Life Query

FS1

-Introduction

-Graphical Knowledge

-Graphics Model

-a Perceptive Approach

BR2 N>2

Page 32: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

Indexing of Graphic Document Images : a Perceptive Approach

Introduction Systems Overview The Knowledge Level Conclusion

Page 33: FRE 2645 SCSIT Talk, Nottingham University, Thursday 16th June 2005 Indexing of Graphic Document Images : a Perceptive Approach Mathieu Delalandre¹, ².

SCSIT Talk, Nottingham University, Thursday 16th June 2005

ConclusionConclusion

Conclusion It is just a bibliography study and ideas Start on this ideas ?

Perspectives Contour and skeleton levels ? System to control the representation building ?