Post on 25-Mar-2018
Second International Conference on Natural Sciences and Technology in Manuscript Analysis
Centre for the Study of Manuscript Cultures (CSMC), Hamburg29 February - 2 March 2016
Book of AbstractsConference Programme
CENTRE FOR THE STUDY OF MANUSCRIPT CULTURES
1
Preface
Dear colleagues and friends,
Welcome to the Centre for the Studies of Manuscript Cultures (CSMC) in Hamburg and to the Second
International Conference on Natural Sciences and Technology in Manuscript Analysis!
Since 2011 the Centre is engaged in fundamental research investigating a broad range of Asian,
African, and European manuscript cultures represented by material artefacts both from a historical
and comparative perspective. The variety of research fields and academic disciplines, as well as the
large number of cultures under investigation allowed for overcoming simplistic attitudes, such as
considering historically contingent European developments to be generally legitimate, or naive
dichotomies ("East‐West"), considered self‐evident not only in Europe, but also in Asia and Africa.
Within the CSMC as well as the SFB, natural sciences and informatics play an active role in shaping
interdisciplinary research by utilization of methods and devices rooted in e.g. physico‐chemical
measurement technologies ‐ thus going beyond the basic support of scholars from the Humanities.
Our long‐term goals include the establishment of an interdisciplinary research field dealing with
generalized manuscript studies, and the development of sustainable and functional tools.
The second conference dedicated to natural sciences and technology in manuscript analysis
continues our attempt of bringing together scholars and scientists from the fields of the humanities,
informatics, chemistry, physics, and biology. We hope that this conference will again provide a forum
for discussing various aspects of manuscript analysis and presenting new results, technologies, and
approaches. Contributions reflect original research work stressing the impact of the natural sciences,
technology and informatics in the following areas:
● Material analysis of writing material
● Recovering lost writing
● Image analysis of visual manuscript features
● Cutting edge techniques
In addition to the regular presentations, we invite you to participate in the Round Table discussion
that will address various issues of interdisciplinary research.
We wish all of you a great conference and a pleasant stay in Hamburg, yours
Christian Brockmann
Michael Friedrich
Oliver Hahn
Volker Märgner
Ira Rabin
H. Siegfried Stiehl
Conference Chairs and Program Committee
2
Invited Speakers
Marina Bicchieri (ICRCPAL, Istituto Centrale per il Restauro e la Conservazione del Patrimonio
Archivistico e Librario, Rome, Italy)
Leif Glaser (DESY ‐ Deutsches Elektronen‐Synchrotron)
Vito Mocella (CNR‐IMM‐Istituto per la Microelettronica e Microsistemi‐Unità di Napoli, Italy)
Peter A. Stokes (King's College London, UK)
Conference Chair
Michael Friedrich (Director of CSMC, University of Hamburg, Germany)
Oliver Hahn (CSMC, University of Hamburg, BAM, Berlin, Germany)
Programme Committee
Christian Brockmann (CSMC, University of Hamburg, Germany)
Oliver Hahn (CSMC, University of Hamburg, BAM, Berlin, Germany)
Volker Märgner (CSMC, University of Hamburg, Germany)
Ira Rabin (CSMC, University of Hamburg, BAM, Berlin, Germany)
H. Siegfried Stiehl (CSMC, University of Hamburg, Germany)
International Advisory Board
Roger Easton (Rochester Institute of Technology, N.Y., USA)
Gregory Heyworth (University of Mississippi, USA)
Judith Schlanger (EPHE‐Sorbonne, Paris, France)
Friederike Seyfried (Egyptian Museum and Papyrus Collection, Berlin, Germany)
Daniel Stoekl Ben Ezra (EPHE‐Sorbonne, Paris, France)
Local Organising Committee (CSMC)
Karsten Helmholz
Christina Kaminski
Daniela Niggemeier
Irina Wandrey
3
CONFERENCE PROGRAMME
Monday, 29 February 2016
1:00 pm Registration
2:00 pm Welcome
SESSION I: MATERIAL ANALYSIS
Session Chair: H. Siegfried Stiehl
2:15 pm M. Bicchieri (invited)
Hard Science and History
3:00 pm M. M. Khorandi, M. Gulmini, M. Aceto, A. Agostino, and H. Sayyadshahri
The non‐invasive Approaches to identify the Dyes and Pigments of the
Haft Awrang‐i Jāmi (A Persian Manuscript from 1553 AD in MAO)
3:25 pm P. Çakar
Elemental Analysis of Mesnavi from 14th Century
3:50 pm Coffee Break
4:20 pm M. Mayer
ATWISE 5242‐ A recently developed Device for imaging Watermarks in Medieval
Manuscripts
4:45 pm M. Geissbühler
Advanced Codicological Studies of Codex germanicus 6.
5:10 pm M. Delhey
Material Analysis of Buddhist Sanskrit Manuscripts preserved in Nepal
5:35 pm D. Nosnitzin und A. Brita
A Field Experience in Ink Studies: Manuscripts from Northern Ethiopia (East Tigray)
6:30 pm Reception
4
Tuesday, 1 March 2016
SESSION II: RECOVERING OF LOST WRITING
Session Chair: Ira Rabin
9:00 am V. Mocella (invited), E. Brun, C. Ferrero, and D. Delattre
The Quest of lost Ancient Literature: X‐ray Phase Contrast Tomography reveals the
Secrets of Herculaneum Papyri
9:45 am K. T. Knox
Image Processing Software for the Recovery of Erased or Damaged Text
10:10 am C. T. C. Arsene, P. E. Pormann, W. I. Sellers, and S. Bhayro
Computational Techniques in Multispectral Image Processing:
Application to the Syriac Galen Palimpsest
10:35 am Coffee Break
11:05 am F. Albertin, E. Peccenini, M. Bettuzzi, R. Brancaccio, M. P. Morigi, A. Patera, I. Jerjen,
S. Hartmann, and R. Kaufmann
X‐Ray Reading of Large‐size Unopened Ancient Manuscripts
11:30 am V. Lorusso and B. Pouvkova
Recovering lost Commentaries on Aristotle’s Treatise On the Heavens in Venice
Manuscript Marcianus Gr. 210
11:55 am M. Schreiner, H. Miklas, C. Rapp, R. Sablatnig, W. Vetter, B. Frühmann, F. Hollaus
The Centre of Image and Material Analysis in Cultural Heritage (CIMA) in Vienna,
Austria
12:20 pm T. Łojewski and D. Chlebda
Application of Hyperspectral Imaging for Quantitative Assessment of Conservation
Treatments for Documents
12:45 pm Lunch
5
SESSION III: IMAGE ANALYSIS
Session Chair: Christian Brockmann
2:30 pm P. A. Stokes (invited)
Computation and Palaeography: Where are we Now?
3:15 pm E. Arabnejad, H. Ziaei Nafchi, E. Treharne, C. Allen, and M. Cheriet
Visual Saliency for Visual Feature Analysis of Historical Manuscripts
3:40 pm Y. Elfakir, G. Khaissidi, M. Mrabti, M. A. El Yaccoubi, Z. Lakhliai, and D. Chenouni
Bag‐of‐descriptors of SIFT for Segmentation‐Free Word Spotting in Handwritten
Arabic Documents
4:05 pm Coffee Break
4:35 pm R. Cohen, K. Kedem, and J. El‐Sana
Transcript Alignment for Historical Manuscripts
5:00 pm S. Sudholt, L. Rothacker, and G. A. Fink
Simple and Effective Segmentation‐Free Word Spotting in Historic Documents
5:25 pm D. Stutzmann, T. Bluche, Y. Leydier, F. Cloppet, V. Eglin, C. Kermorvant, and
N. Vincent
Text‐Image Alignment and Automated Letter‐form Classification:
Reading vs. Looking at
5:50 pm T. Konidaris, A. L. Kesidis, and B. Gatos
A Segmentation‐Free Word Spotting Method
8:00 pm Dinner
6
Wednesday, 2 March 2016
SESSION IV: CUTTING EDGE TECHNIQUES
Session Chair: Volker Märgner
9:00 am L. Glaser (invited), D. Deckers, and C. Brockmann
10 Years of Iron Gall Ink X‐Ray Fluorescence Element Mapping
9:45 am F. Kergourlay, C. Andraud, A. Michelin, A. Histace, B. Lavédrine, I. Aristide‐Hastir,
and R. Lheureux
REX Project: Extraction and Processing of underlying Texts ‐ Study of a
Marie‐Antoinette secret Correspondence
10:10 am A. Garz, M. Seuret, A. Fischer, and R. Ingold
GraphManuscribble: Interact Intuitively with Digital Facsimiles
10:35 am Coffee Break
11:05 am R. Hedjam, M. Kalacska, S. S. A. Al‐ma’adeed, and M. Cheriet
Visual Literary Topology
11:30 am A. Santoro, A. Marcelli, and F. Carillo
An Interactive System to help Transcription of Historical Handwritten Documents
11:55 am Round Table
7
Posters
N. Akcebe
Imaging Watermarks of 15th Century Islamic Manuscript Kashf Al‐Bayan’an Sifat Al‐Hayawan
M. Bronzato, A. Zoleo, L. Nodari, C. Federici, and M. Zanetti
The Ignatius of Loyola’s Exercitia Spiritualia Autograph: Analyses before and during Conservation
Treatments
C. Colini, I. Rabin, O. Hahn
Can non‐destructive Techniques and Portable Instruments be used to analyse Ink and Paper
Degradation?
R. Farrahi Moghaddam, M. Cheriet, and S. A. Al‐Ma’adeed
Age and Fiber Structure Study Using 3D, Mesoscale Modeling and Simulation of Ink Seepage in Paper
Porous Media
B. Frühmann, F. Cappa, W. Vetter, and M. Schreiner
A combination of three complementary non‐destructive Methods applied to Historical Manuscripts
R. Hedjam, M. Kalacska, S. A. Al‐Ma’adeed, and M. Cheriet
Old Manuscript Analysis: beyond the Visible
T. Jocham, M. Marx
Scientific Analysis of Early Qur'anic Manuscripts
Y. Keheyan and G. Eliazyan
Spectroscopic Studies of Armenian Manuscripts: Paper, Inks, Pigments
M. W. A. Kesiman, J.‐C. Burie, and J.‐M. Ogier
A First Step to Balinese Script OCR: An Initial Study on Isolated Character Recognition of Balinese
Script on Palm Leaf Manuscripts
A. Kocaman
Fibre Analysis of Pattani Manuscripts
8
A. Rogulska and B. Łydżba‐Kopczyńska
Application of forensic Multispectral Scanner to non‐invasive Analysis of Iron Gall Inks:
a Comparison with XRF and micro‐Raman Spectroscopic Techniques
S. Snoussi
Perceptual Model with global‐local Vision Primitives for Arabic Script Recognition
D. Stoekl Ben Ezra
Why should Philologists learn Computer Vision?
Marina Toumpouri
The "decorative style" group reconsidered: A contribution to the study of twelfth and thirteenth
century production of Greek illuminated manuscripts in the Eastern Mediterranean
A. Ul‐Hasan, S. S. Bukhari, and A. Dengel
Meaningless Text OCR Model for Medieval Scripts
9
Abstracts
10
11
Hard science and history
Marina Bicchieri
Istituto Centrale Restauro e Conservazione del Patrimonio Archivistico e Librario (Icrcpal), Italy
Books, archival documents and graphic works of art are one amongst the most invaluable patrimony
in human history. Each single document is an open window on our history and its preservation is
paramount. Often the value of books is merely evaluated on the basis of their content, either textual
of graphical, and it is neglected the history brought by the physical support, the paper used, the kind
of ink chosen, their provenience, what are they made of, the fabrication procedures. All of these
information, stored between the pages, and somehow hidden to the eye, tell us of the long travel of
the paper used, of the technological and scientific discoveries made at the time the book was written
or drawn, they tell of the genius of whom invented an ink or a specific paper treatment, they bring
with them the evolution of aesthetics and morals and the costumes of the time.
In short terms, they are carriers of our story, of the human history. This entire incommensurable
heritage is unfortunately destined to a slow death. Supports, media, and binding are subject to
ageing and they lose their mechanical characteristics; inks can fade or induce acidity in the support,
by damaging it till reaching its complete destruction. The natural aging is a spontaneous and
irreversible process; quite slow by itself in absence of other external interferences, such as ‐ for
example ‐ the storage in unsuitable places, when other degradation processes ‐physical, biological or
chemical‐ can take place. The function of scientists in the field of conservation of Cultural Heritage is
manifold. On the one hand by investigating the structure of materials, they can understand the
nature and the causes of the degradation and find solutions to prevent a further decay. On the other
they can solve some problems or questions related to the manufacturing of the object or to its past
life, thus helping scholars in their historical studies. Moreover each discovery permits, as well, to
explore issues of the history of science. In this paper two case studies will be presented to underline
the synergy ‐positive or negative‐ between different expertises.
Leonardo da Vinci self‐portrait
In 2012 the very famous self‐portrait of Leonardo da Vinci was subjected to a completely non‐
destructive diagnostic campaign at Icrcpal (Istituto Centrale Restauro e Conservazione del Patrimonio
Archivistico e Librario). The purpose of the analyses was to assess the conservation status of the
drawing, which presented an apparent fading of the graphic medium and diffuse foxing. To this end,
surveys were accomplished in the chemistry laboratory by using molecular (Raman and Infrared)
and elemental (X‐Ray Fluorescence) spectroscopies. Alongside, Atomic Force Microscopy (AFM) was
applied to obtain a topographic description of the paper in damaged ad less damaged areas. The
topography of the paper is, in fact, related to its preservation state. The other laboratories of the
Institute performed measures in FORS, Multispectral Reflectance and microbiological studies (Misiti
2014).
Only the comparative analysis of experimental results obtained with different techniques and
methods can provide scientific information to correctly characterize the work, and to predict how the
time will alter its chemical‐physical characteristics, and which will be the expected life for the work of
art. All the techniques showed a very dramatic oxidation of the paper, caused both by chemical,
physical and biological attacks. Moreover AFM topographies demonstrated a severe decrease in the
12
thickness of the paper in the foxed areas, ranging in average, from 20% on the whole foxing spots to
60% in some parts of each spot, where the spectroscopic measures had shown the presence of triple
carbon‐carbon bonds (Fig.1). A restoration project, including chemical treatment for the stabilization
of the paper, was proposed, but the art historians rejected it for purely “philological” reasons,
condemning in this way the drawing to the destruction. In this case the positive synergy between the
different branches of science did not correspond to a positive synergy with the world of the
humanities, thus leding to negative results.
Fig. 1. Leonardo self‐portrait. Left: AFM topography of the paper. Right: AFM topography of the paper in a
foxing spot.
The purple Codex Rossanensis
The Codex Rossanensis is a 6th century illuminated manuscript written on purple parchment,
conserved at the Museo Diocesano in Rossano Calabro (Cosenza, Italy). In 1917‐19 the codex was
subjected to a restoration treatment, carried out by Nestore Leoni, a famous miniaturist, active from
the end of 19th century to mid 20th century. The Leoni’s intervention irreversibly modified the aspect
of the illuminated sheets. Nestore Leoni newer wrote which materials he used for the restoration. In
June 2012 the Codex arrived at Icrcpal for a complete characterization of the pigments, the support
and the materials used by Nestore Leoni, the state of conservation and for the conservative
restoration. A scientific commission was established, including paleographers, biblicists and
historians specialized in the study of illumination. The positive synergy between all the different
approaches allowed for the complete characterization of the manuscript, in its dating (mostly on the
basis of the biblical text), in the attribution of the discovered pictorial palette to a specific
geographical area. Moreover the chemistry laboratory could discover, replicate and characterize a
peculiar lake, the elderberry lake. To the author’s knowledge, this was the first time that
experimental evidence has been shown on the use of that lake in such an ancient document
(Bicchieri 2014).
Conclusions
The few examples presented in this paper allow drawing some conclusions. From the methodological
point of view, it is necessary to face all problems related to cultural heritage artefacts with a
multidisciplinary approach that includes several and complementary techniques and competences. It
should also be emphasised that no single non‐destructive technique can be claimed to be the
resolving one and that the cooperation between all the “souls” involved in conservation, from
scientists to humanists, can bring new pieces to the knowledge of our history.
References
M. Bicchieri, Environ. Sci. Pollut. R., 21(24), 14146, 2014.
M. C. Misiti, I disegni di Leonardo, diagnostica, conservazione, tutela, Sillabe, Livorno, 2014.
13
The non‐invasive approaches to identify the dyes and pigments of the Haft Awrang‐i Jāmi
(A Persian manuscript from 1553 AD in MAO*)
Mojtaba Mahmoudi Khorandi1, Monica Gulmini1, Maurizio Aceto2, Angelo Agostino1
and Hamed Sayyadshahri3
1Department of Chemistry, University of Turin, Italy 2Department of Science and Technological Innovation, University of Eastern Piemonte, Italy
3Department of Physics and Earth sciences, University of Ferrara, Italy
The influence of Iranian Empires led to the production of manuscripts with common linguistic‐textual
features also over the borders of political influence. Hence, the term “Persian manuscripts” covers
the old books written in Persian language dealing with various subjects (from a wide geographical
area and different epoch). After the Mongol Conquest and from the fourteenth century onwards,
supporting the artisans by rulers and patrons caused a great achievement of all luxury arts. As a
result, a renaissance of the painting art started again that its apex appeared in the seventeenth
century. In spite of partially losing the social function of painting, the great development of
illustration in books generated great advances in the artistic expression. The strong link with Persian
poems caused to create many elaborated manuscripts that can be found in the museums and
collections. One of these magnificent artworks is the Haft Awrang‐i Jāmi belonging to the Museum of
Oriental Art of Turin (MAO). This Persian manuscript, which is the subject of this paper, is a copy of
the Jāmi’s poems (1414 ‐1492 AD) created in 961 AH (1553 AD) at Shiraz, Iran, and its calligrapher
(Diagonal Nasta’liq) is Ali Al‐Khatib. The black and red inks are employed to write the text, and, nine
illuminated headpieces, one fine gilded and painted double page frontispiece nine miniatures are the
other decorative parts of this book. A wide range of colors including blue, red, yellow, black, green,
violet, turquoise, pink, gray, brown, white and orange embellish the painted pages of the book. Some
non‐invasive analytical techniques, such as Fiber Optics Reflectance Spectroscopy (FORS), Fiber
Optics Molecular Fluorimetry (FOMF) and Portable X‐Ray Fluorescence (p‐XRF) were used to identify
the utilized colors of this manuscript.
It is generally possible to recognize the inorganic pigments by combining the reflectance
spectroscopy and X‐ray fluorescence. The organic dyes are barely mentioned in the paper of
Purington and Watters ,although evidence of their large use emerged by considering some volumes
kept in libraries in Europe which were investigated in situ by UV‐visible diffuse reflectance
spectrophotometry , FOMF and p‐XRF. Therefore, the FORS technique was selected as the main
method to analyze both pigments and dyes. Moreover, to confirm the obtained results, the p‐XRF or
FOMF analysis applied to those diagnosed by FORS for pigments and dyes respectively. In order to
build up a spectral data‐base devoted to dyes possibly employed in Persian manuscripts that were
not studied before, a set of mock‐up samples was set up by considering natural dyestuffs indicated in
the comprehensive book of Nadjib Mayil Harawi , which collects articles dealing with penmanship,
ink making, papers, gilding and book binding. The paper employed as a support was obtained from
hemp by miming historical procedures whereas Althea officinalis, Anemone coronaira, Lawsonia
inermis, Berberis Vulgaris, Rheum undulatum, Curcuma longa and Crocus sativus were considered as
sources of dyes. The plants were treated according to ancient recipes to extract the dyes and used to
* Museum of Oriental Art of Turin, Italy
14
dye or paint the paper substrate. FORS and spectrofluorimetry equipped with fibre optics were then
employed to record the spectral features of the mock up samples. The information obtained on mock
ups were then considered for the interpretation of reflectance and fluorescence spectra use for
analyze the dyes this manuscripts. The obtained results revealed that the dyes were employed to dye
the paper support and to impart delicate hues to particular details in miniatures. According to the
results, the sprayed particles on the papers (observed by Digital Microscope) were gold. Moreover,
the dying agent of the paper are a mix of Crocus Sativus (saffron) and Curcuma longa. Furthermore,
the analysis showed that Cochineal and mix of cochineal and indigo was used for the violet color of
the faces and some dresses of people in the miniatures of this manuscript. Likewise, it was
demonstrated that giving a special properties to paint ornament (e.g. brilliance, antiseptic or glazing),
the saffron and indigo were combined with verdigris to create different hues of green in miniature
paintings and the colorant of green in those of other illustration of book is Malachite. In addition, the
applied analysis revealed the blue color of headpiece are ultramarine while indigo and ultramarine
are the color agent of blue in those of miniature, red and orange color of headpiece are just Red
ochre while for the miniatures are used Red Lead, cinnabar, and red ocher. and, the white lead,
carbon and orpiment are employed to create the white, black and yellow colors of the book
respectively. Additionally, the investigation showed that gray color consists of mix of carbon with
white lead or silver and pink color is a mix of Cinnabar and Red Lead, and, a combination of carbon
and red ochre or Carbon and unknown colorant, was detected for the brown color in this manuscript.
References
M. Aceto, A. Agostino, G. Fenoglio, A. Idone, M. Gulmini, M. Picollo, P. Ricciardi, J.K. Delane, 2014, Characteri‐sation of colourants on illuminated manuscripts by portable fibre optic UV‐visible‐NIR reflectance spectropho‐tometry, Analytical methods, DOI: 10.1039/c3ay41904e
M. Aceto, A. Agostino, G. Fenoglio, M. Gulmini, V. Bianco, E. Pellizzi, 2012, Non invasive analysis of miniature paintings: proposal for an analytical protocol, Spectrochimica Acta Part A, 91, 34‐41
M. Bacci, 2000. UV‐Vis‐NIR, FT‐IR and FORS Spectroscopies. In: E. CILIBERTO, GL Spoto, a cura di, 2000. Modern Analytical Methods in Art and Archaeology. 1° ed. (s.l.): John Wiley & Sons, Inc, 321‐361.
M. Barkeshli, 2009. Historical and scientific analysis of Iranian illuminated manuscripts and miniature painting. Golestan‐e Honar. Quarterly on the History of Iranian Art and Architecture, 5 (2(16)).
D. Cardon, 2007. Natural Dyes. Archetype publications, London.
A. Idone, 2014, Analytical techniques for the investigation of natural dyestuffs, PhD thesis, Università degli Studi del Piemonte Orientale “Amedeo Avogadro” XXVI course
Nadjib Mayil Harawi, 1993, Art of bibliopegy in Islamic civilization, Printing and publishing department of Astan Quds Razavi, Mashhad, Iran.
Qāżi Aḥmad b. Šaraf‐al‐Din Ḥosayn Monši Qomi Ebrāhimi, 18th, GOLESTĀN‐E HONAR (هنر گلستان),Entesarat‐e Bonyad‐e Farhang‐e Iran 1973.
R. Pakbaz, 2006, Persian Painting (naghashi iran az dirbaz ta emrooz, نقاشی ايرانی از ديرباز تا .Zarrin va simin (ISBN 964‐92113‐3‐0) ,( امروز
N. Purinton and M. Watters, JAIC 30 (1991) 125‐144
15
Elemental Analysis of Mesnavi from 14th Century
Pınar Çakar
Department of Manuscript Conservation and Archive, Manuscripts Institution of Turkey, Istanbul
The Masnavi is the masterpiece of philosopher Jalāl ad‐Dīn Muhammad Rūmī (aka Rumi) (d. 1273)
(Shakibaej and Golaiji 2012). The manuscript analysed was copied by Ahmed bin Muhammed el
Mevlevi in 1386. It is named as ‘Mesnevi‐i Şerif’ and belongs to the collection of Hacı Selim Ağa
Library (collection number: 554) in Istanbul. It was written with naskh style and has 185 sheets (Fig.
1a). It has microorganism damage on paper and fleaking pigments on some of the illuminated areas.
It also contains self‐adhesive tapes on the illuminated pages and depending on the degradation of
the binder, the tape damages the pages (Fig. 1b). Old repairs are present on the bookbinding. The
aim of the work is to determine the pigments and inks and to reveal the chemical content of paper
used at the manuscript via elemental analyses. Through these analyses palette of the pigments used
were determined.
Fig. 1a: Illuminated pages of the manuscript, Fig. 1b: Illuminated page with self‐adhesive tape
Elemental analyses were carried out via ARTAX µXRF non‐destructively. The instrument can detect
elements from Na to U but it is hard to detect light elements without helium atmosphere (Çakar
2011). The analyses were performed with adjusting the voltage at 50 kV and the current at 600 µA.
Rich colors of the illuminations were analysed and interpreted. Further study will be done for
undetected colors. An example of an analysed area and the spectrum of it is presented in Fig. 2 and
Fig. 3, respectively.
Fig. 2: Image of the gold area of the illumination
a b
16
Fig. 3: µXRF spectrum of the gold area of the illumination
The data obtained were used to understand the art history of the manuscript and to choose the
appropriate conservation methods and materials.
References
P. Çakar, Tezhipli Elyazması Eserlerde Bakır ve Diğer Elementlerin Pigmentler Üzerine Etkisinin İncelenmesi,
Yıldız Technical University Department of Chemical Engineering, Published Master Thesis, İstanbul, (2011).
Z. Shakibaej and Y. Golaiji, The Effect of Mavlana’s Masnavi Manavi Narrative on the Extent of Adolescent’s
Philosophizing Questioning Skills, 4th World Conference on Educational Sciences (WCES‐2012) 2‐5 February
2012 Barcelona,Spain, Procedia‐Social and Behavioral Sciences, 46(2012), 2882‐2885.
17
ATWISE 5242‐ A recently developed Device for imaging Watermarks in Medieval Manuscripts
Manfred Mayer
University Library Graz, Austria
For the documentation of historical watermarks various methods have been developed. Drawings by
hand over light sources and rubbings are among the earliest and simplest, but are relatively
inaccurate. The method chosen should not affect in any case neither the paper nor the watermark.
Taking this into account, the International Association of Paper Historians (IPH), published a standard
for determining methods of watermarking (version 2.0, 1997). Dylux, beta‐radiography, X‐ray
method, transmitted light photography (VIS and IR) and others will be briefly discussed and
compared to each other. The challenge of watermark documentation begins exactly when the
watermark is superimposed by written or printed text and graphics, and therefore normal
transmitted light methods cannot be applied.
In 2011 the project CHARTA was launched at the University Library of Graz, whose aim is the
complete documentation of all papers used in the medieval manuscripts of the collection.
Understandably, watermarks are very often located in the center of the sheet and are overlaid by
writing on both sides. Very often one fails when trying to make a drawing by hand under transmitted
light. Another "classic case" is the position of the watermark in the book fold, also here it is
particularly tricky to get access to the watermark. Having this in mind and exclusing the Beta‐
radiography and thermographic method due to lack of budget we faced a serious problem. So we
decided to develop a special, new device that meets our requirements: “fast, easy, good results,
limited costs”. The point is, that by use of infrared‐photography one has a chance to largely eliminate
text, which is written in iron gall ink (Fig. 1 and 2)
Luckily iron gall ink was widely used in the Middle Ages, so we estimate that about 70 to 80% of
western medieval manuscripts that are stored in our library can be examined with that equipment
(Fig. 3). For oriental manuscripts a much lower percentage of iron‐gall ink written manuscripts may
be estimated, but their number is nevertheless high enough to play a certain role in the CHARTA‐
project.
Of course the device fulfills all the conditions of conservation, for example the binding is supported
by a book cradle and there is no significant exposure to mechanical stress. We named this machine
“ATWISE 5242”, which means “Austrian Watermark Imaging System”. 5242 stands for the
information that the dimension of the page can be up to 52 x 42 cm.
The paper describes the special characteristics and challenges in the development of this device. At
the end of the presentation a short video about its practical application will be shown.
18
Fig. 1: MS307 fol. 96, Image of a page under transmitted visible light (detail). The watermark is super‐
imposed by the text on both sides.
Fig. 2: MS307 fol. 96, the same detail as in figure 1, but captured with ATWISE 5242 under infrared
illumination. The watermark is clear to be seen.
Fig. 3: The Equipment: ATWISE 5242
19
Advanced codicological studies of Codex germanicus 6
Mirjam Geissbühler
University of Bern, Switzerland
15th century manuscript, Codex Germanicus 6, has proved to be an extremely thankful object for a
codicological investigation assisted by ink classification with the help of micro X‐ray fluorescence
analysis. A preliminary codicological study has revealed that the order of the twelve texts that
constitute the codex though written by a single scribe could not correspond to the chronology of
writing. The scribe used iron‐gall inks for the main texts and red inks for the rubrics and certain
passages at the beginning and the end of the texts.
Following the codicological studies we conducted three measurement campaigns to establish the
inks fingerprint in the main texts and the transition passages of the consecutive texts to clarify
whether they were written with the same ink. In addition, we checked the inks of the later marginal
notes, the pagination and corrections to clarify their connection with the main inks.
The study of the red inks appeared to be very fruitful: red inks range from cinnabar to mixtures of
cinnabar and lead read, and in one case of ochre. Such a variety pointed to a rather complicated
history of the manuscript production. Similarly, the fingerprint of the iron‐gall inks helped to sort the
texts according to the order of the writing.
With the help of this study we could reconstruct the history of the production of Codex germanicus 6
that could not be done using conventional codicology alone.
20
21
Material Analysis of Buddhist Sanskrit Manuscripts preserved in Nepal
Martin Delhey
Centre for the Study of Manuscripts, Hamburg, Germany
Ongoing research in the Centre for the Study of Manuscripts at the University of Hamburg (CSMC) is
devoted to the library or manuscript collection(s) of Vikramaśīla, which was one of the most
important and famous Buddhist monasteries of medieval India. In accordance with the general
approach emphasized at the CSMC, we try to gain some insight in various aspects of the physical
organization of knowledge at this library including the production and later fate of its manuscripts
rather than being interested in these manuscripts only as carriers of texts in certain states of their
transmission.
Vikramaśīla was founded by the first rulers of the East Indian Pāla dynasty in the early 9th century and
was deserted and destroyed about 1200 CE. It can be considered as fairly certain that ruins
excavated in the East of present‐day Bihar near the South banks of the Ganges are the remains of
this famous monastic establishment. There can be no doubt that most of the manuscripts produced
there are irretrievably lost. Moreover, none of those that have survived are extant in situ. However, a
significant number of manuscripts produced in this or other similar Buddhist monasteries of medieval
East India have been discovered in modern times in Nepal and Tibet. Due to the fact that only a small
minority of these important materials bear an explicit mark or note regarding their exact place of
origin, it is very hard to determine which of these manuscripts come from Vikramaśīla.
In short, one could divide the corpus of palm‐leaf manuscripts that we are examining in our present
project in three groups: Some manuscripts containing colophons that explicitly mention Vikramaśīla
as place of the production (group I); c. 15 manuscripts which have their layout and script in common
with one of the items belonging to group I (group II); a smaller group of manuscripts that differ in
some respects from those of group II in layout and script and are considered to be a Nepalese
imitation of the Vikramaśīla standard (group III).
This contribution presents the findings resulting from the material analysis of those manuscripts of
the aforementioned groups that are preserved in Nepal, viz. in the National Archives, Kathmandu
(NAK), and in the Kaiser Library (KL), which is likewise situated in Kathmandu. The analysis was
undertaken in March 2013. The colleagues of the NAK allocated a room in their precincts to us,
where we could set up our mobile laboratory, and gave us access to the required manuscripts from
their holdings. The officials of the KL, in turn, allowed us to take some of their valuable and ancient
manuscripts to the NAK. In this way, we were enabled to conduct multi‐instrumental studies on
writing materials of great antiquity and interest.
The main findings relate to arsenic in the palm leaves and the mercury enriched carbon ink in the
primary texts of all the proper Vikramaśīla manuscripts (group I) and those associated with the
monastery by codicological investigations (group II). Interestingly, the only manuscript from the
group III that we were able to examine also has been written on arsenic treated palm leaves with the
inks that display slight mercury enrichment.
22
Our results have an impact on the question of how the historical connection between group II and
group III manuscripts of our corpus and the group I manuscripts has to be conceived of. The
hypothesis that there is an intimate relationship between these groups was originally formed on
evidence of a different nature, especially, but not exclusively, the striking similarities regarding the
dimensions and the standardized layout of the pages. By material analysis we have discovered
further similarities and thus corroborated the hypothesis of a common or very similar origin. We
have also seen that one of these newly discovered similarities (i.e. the use of mercury) sets our
original manuscripts apart from some recognizably later additions made on them.
23
A Field Experience in Ink Studies: Manuscripts from Northern Ethiopia (East Tigray)
Denis Nosnitsin1 and Antonella Brita2
1Hiob Ludolf Centre for Ethiopian Studies, Hamburg, Germany 2Centre for the Study of Manuscripts, Hamburg, Germany
In the framework of the project Ethio‐Spare (supported by the European Research Council and
carried out in 2009‐2015) the research team from Hamburg had a rare opportunity to access a
number of ecclesiastic traditional libraries and to digitize and study the numerous parchment
manuscripts.
The last stage of the project work included also attempts at material studies. It was decided to focus
the attention on the study of the ink as the most important material component of the manuscript;
the intention was also to try various methods of ink analysis which could be conducted in situ, in the
field conditions of North Ethiopia, looking for those most effective and feasible, which could be
effectively used both for the description and study of a single manuscript and for other conservation
and study tasks. Empirical observation of a significant number of the manuscript lead to the
conclusion that the typology of the inks used by the Ethiopian scribes may be more diversified than it
was commonly assumed before, with methods of ink preparation not completely identical in various
periods and regions.
The research team, consulted by a specialist in manuscript material studies, started to apply the 3
colour usb microscope Dinolite for the quick reflectography of the ink in the field. At one single
occasion, it was possible to organize more extensive field study and include XRF spectroscopy
executed by the invited specialist. The speakers will present some of the results and challenges
encountered in the course of the work.
24
25
10 years of Iron Gall Ink X‐Ray Fluorescence Element Mapping
Leif Glaser1, Daniel Deckers2 and Christian Brockmann3
1Deutsches Elektronen‐Synchrotron DESY, Hamburg, Germany 2Universität Hamburg, Institut für Griechische und Lateinische Philologie, Germany
In medieval times it was common practise to reuse old books by erasing the writing and preparing
the parchment to be written upon again. In these cases the previously written text was often erased
chemically by means of bleach or other reagents, thus just removing the organic compound of the
ink, but leaving the metallic part at its place.
The metallic fingerprint of the erased inks can nowadays be visualized and sometimes used to
correlate texts from one author to different times.
In order to re‐access this writing non‐destructively acceptable standard modern techniques besides
several often very successful different methods of photographic imaging, the use of X‐rays, in
particular X‐ray fluorescence (XRF), allows an element specific probing of the writing, even if this is
covered, hidden or chemically erased. Using a small X‐ray spot while scanning the writing (a), one can
measure the elemental distribution on the parchment and after deconvolution of the different
writings (c) (making use of their different metallic fingerprints (b)), allows to access the hidden or
erased text or texts, while leaving the parchment unharmed (Young et al. 2005).
The talk will give an overview on what has been done with the technique of XRF element mapping
since the first successful text recovery at the Archimedes Palimpsest in 2006 (Bergmann 2007). Some
improvements on setup and detector side could be achieved using storage ring based light sources
and post measuring data processing (Bergmann and Knox 2009) to optimize the readability of the
results. Additionally some steps have been made towards transportable alternatives (Glaser and
Deckers 2014) with a few important developments still needed to be done, following the goal to
eventually move the measuring equipment to wherever needed and thus avoiding to transport any
historic material.
26
References
U. Bergmann, Archimedes brought to light, Physics World Archive, Physics World, November 2007, Institute of
Physics Publishing Bristol and Philadelphia, ISSN: 0953‐8585
U. Bergmann and K. Knox; Pseudo‐color enhanced X‐ray fluorescence imaging of the Archimedes Palimpsest,
Document Recognition and Retrieval XVI, edited by Kathrin Berkner, Laurence Likforman‐Sulem, Proc. of SPIE‐
IS&T Electronic Imaging, SPIE Vol. 7247, 724702‐1‐13 (2009).
L. Glaser and D. Deckers, Basics of fast scanning XRF element mapping for iron gall ink palimpsests, Manuscript
Cultures No.7, (2014), ISSN 1867‐9617, pp. 104‐112
G. Young et al., Effect of High Flux X‐radiation on Parchment, Canadian Conservation Institute Report No. Protus
92195, <http://www.archimedespalimpsest.org/pdf/archimedes_f.pdf>.
27
Image Processing Software for the Recovery of Erased or Damaged Text
Keith T. Knox
Imaging Consultant, Hawaii, USA
An imaging processing software package will be described that recovers erased or damaged text
from multispectral images of ancient documents written on parchment or paper. The software is
written in the Java programming language to make it portable to many different computer platforms.
The goal of the project is to make this package of image processing routines available for use
anywhere in the world by researchers, students, and even scholars. The architecture of the software
has been designed to make it modular, easily expanded, and easy‐to‐use with an intuitive graphical
user interface. Examples of recovered text from manuscripts from the library of St. Catherine’s
Monastery in the Sinai in Egypt will demonstrate the capabilities of the software. See
http://sinaipalimpsests.org.
This software is an adaptation of a UNIX‐based package of image processing routines, written in C by
the author between 2000 and 2013, to process the multispectral images of the Archimedes
Palimpsest project. See http://www.archimedespalimpsest.org. The UNIX operating system has the
advantage that image scanlines can be passed between the modules over UNIX pipes. As a result, a
new algorithm can be incorporated by writing a new module and including it in the UNIX command
line. This is easy for a software researcher to do, but is beyond the capabilities of a non‐technical
user. In Figure 1, an example is shown of a parchment with erased text. The processing of the
multispectral imagery was done using the UNIX software package. This capability will be available in
the new Java‐language package, but will be easier to use and will be more widely available.
Holy Monastery of St. Catherine at Mount Sinai
Fig. 1: On the left is a natural light image of a parchment page in which the erased text slightly visible. On the
right, the ultraviolet illumination has enhanced the erased writing. In the pseudocolor image, the erased text is
rendered in color, giving it increased contrast.
28
The move to the Java programming language was made for two reasons. First, Java is a portable
language that is available on almost all computers and operating systems. Secondly, Java comes with
tools that make it easy to create graphical user interfaces. These two features make it possible to
create an image processing package that can be used by a large number of people with varying
degrees of technical expertise.
Although Java does not implement UNIX pipes, a modular structure was created to enable each
module to be run as an independent software “thread” with an interface that enables modules to
retrieve and send processed scanlines. As a result, a new image processing capability can be easily
incorporated into the package.
The Java software package is still under development, but a preliminary user interface is shown in
Figure 2. A list of available routines is automatically created as the package starts up and is displayed
along the top. To use a module, the user simply drags it into the main body of the window. As
multiple modules are added to the processing task, links are automatically connected between
modules. In the example shown below, an image, taken in red light, is flattened. A second image,
taken in ultraviolet light, is combined with the first image in the “pseudocolor” module. The colors
are enhanced and written to a TIFF file. The task, as shown, is run in batch mode and can be applied
to any number of image files.
Fig. 2: The preliminary graphical user interface of the software package is shown. In this example, erased text is
enhanced by combining two spectral separations in pseudocolor.
There are commercial image processing systems available to process multispectral imagery. For
example, see ENVI at https://www.exelisvis.com/docs/linearspectralunmixing.html. While these
commercial packages contain many image processing features, typically, they are expensive and can
be out of the reach of many potential users.
The Java package, described in this talk, will be distributed free of charge. Currently, only the author
is developing this software package. Early in 2016, the package will be sufficiently developed to
allow other developers to join the effort. The author’s goal is to locate a few software developers
that are interested in participating in the continued development of the package. Also, if sufficient
interest exists, the author is would like to work with a few individuals to explore the capabilities of
the package to scholars.
29
Computational Techniques in Multispectral Image Processing:
Application to the Syriac Galen Palimpsest
Corneliu T.C. Arsene1, Peter E. Pormann1, William I. Sellers1, and Siam Bhayro2
1School of Arts, Languages and Cultures, University of Manchester, United Kingdom 2Department of Theology and Religion, University of Exeter, United Kingdom
Multispectral/hyperspectral image analysis has experienced much development in the last decade
(Kwon et al. 2013; Wang and Chunhui 2015; Shanmugam and Srinivasa Perumal 2014; Chang 2013;
Zhang and Du 2012). The application of these methods to palimpsests (Bhayro 2013; Pormann 2015;
Hollaus et al. 2012) has produced significant results, enabling researchers to recover texts that would
be otherwise lost under the visible overtext, by improving the contrast between the undertext and
the overtext. In this paper we explore an extended number of multispectral/hyperspectral image
analysis methods, consisting of supervised and unsupervised dimensionality reduction techniques
(van der Maaten and Hinton 2008), on a part of the Syriac Galen Palimpsest dataset
(http://www.digitalgalen.net). Of this extended set of methods, eight methods gave good results:
three were supervised methods – Generalized Discriminant Analysis (GDA), Linear Discriminant
Analysis (LDA), and Neighborhood Component Analysis (NCA); and the other five methods were
unsupervised methods – Gaussian Process Latent Variable Model (GPLVM), Isomap, Landmark
Isomap, Principal Component Analysis (PCA), and Probabilistic Principal Component Analysis (PPCA).
The relative success of these methods was determined visually, using color pictures, on the basis of
whether the undertext was distinguishable from the overtext, resulting in the following ranking of
the methods: LDA, NCA, GDA, Isomap, Landmark Isomap, PPCA, PCA, and GPLVM. These results were
compared with those obtained using the Canonical Variates Analysis (CVA) method [6,7] on the same
dataset, which showed remarkably accuracy (LDA is a particular case of CVA where the objects are
classified to two classes). A comparison was also made with a double thresholding and processing
technique, developed as part of this project, which consists of the following: the darker overtext is
carefully identified by the human operator and colored in white (threshold 1), and then the
remaining undertext, which is black but not as black as the overtext was, is made even darker
(threshold 2). This last technique showed some initial encouraging results, but its success depends on
the human operator selecting suitable cutting values. Figure 1 shows the results and a comparison of
the different computational techniques applied to page 102v‐107r_B of the Syriac Galen Palimpsest
data (http://www.digitalgalen.net) and for the page obtained with the ultraviolet (365 nm)
illumination with green color filter (i.e. called CFUG).
Ultimately the choice of technique is based on the preferences of the person trying to read the
manuscript and the precise makeup of the original document but easy access to an appropriate
toolset is clearly highly desirable. Further work will consist of applying other reducing dimensionality
techniques that enable the recovery of the undertext in palimpsests, as well as applying the above
techniques to the rest of the Syriac Galen Palimpsest.
30
a) Original picture b) Thresholding and processing c) CVA
d) GDA e) PCA f) Probabilistic PCA
Fig. 1: Comparison of different computational techniques applied to the Syriac Galen Palimpsest for multispec‐
tral image processing and enhancement.
Acknowledgment
The authors would like to thank the Arts and Humanities Research Council, United Kingdom, for
supporting this work (Research Grant AH/M005704/1 ‐ The Syriac Galen Palimpsest: Galen’s On
Simple Drugs and the Recovery of Lost Texts through Sophisticated Imaging Techniques).
References
S. Bhayro, P.E. Pormann, W.J. Sellers, Imaging the Syriac Galen Palimpsest: preliminary analysis and future pro‐spects, Semitica et Classica, vol. 6 (2013) 297‐300.
C. Chang, Hyperspectral data processing: algorithm design and analysis, Wiley (2013).
F. Hollaus, M. Gau, and R. Sablatnig, Multispectral Image Acquisition of Ancient Manuscripts, Progress in Cul‐tural Heritage Preservation, Lecture Notes in Computer Science, EuroMed, (2012), 30‐39.
H. Kwon, X.Hu, J. Theiler, A. Zare, P. Gurram, Algorithms for Multispectral and Hyperspectral Image Analysis, Journal of Electrical and Computer Engineering (2013).
S. Shanmugam, P. Srinivasa Perumal, Spectral matching approaches in hyperspectral image processing, Interna‐tional Journal of Remote Sensing, vol.35, 24 (2014).
P.E. Pormann, Interdisciplinary: Inside Manchester’s ‘arts lab’, Nature, 525 (2015).
L.J.P. van der Maaten and G.E. Hinton. Visualizing High‐Dimensional Data Using t‐SNE, Journal of Machine Learning Research, 9, (2008), 2579‐2605.
L. Wang, Z. Chunhui, Hyperspectral Image Processing, Springer (2015).
L. Zhang, B. Du, Recent advances in hyperspectral image processing, Geo‐spatial Information Science, vol. 15‐3 (2012), 143‐156.
31
X‐Ray Reading of Large‐size Unopened Ancient Manuscripts
F. Albertin1, E. Peccenini2,3,4, M. Bettuzzi2,3,4, R. Brancaccio2,3,4,
M. P. Morigi2,3,4, A. Patera5, I. Jerjen5, S. Hartmann6, and R. Kaufmann6
1Faculté des sciences de base, Ecole Polytechnique Fédérale de Lausanne (EPFL),
CH‐1015 Lausanne, Switzerland 2Centro Fermi, 00184 Roma, Italy
3Dipartimento di Fisica e Astronomia, Università di Bologna, 40127 Bologna, Italy 4INFN Sezione di Bologna, 40127 Bologna, Italy
5Swiss Light Source, Paul‐Scherrer‐Institute, Villigen, Switzerland 6Center for X‐ray Analytics, Swiss Federal Laboratories for Materials Science and Technology, Duben‐
dorf, Switzerland
In recent experiments (Albertin et al. 2015/1; Albertin et al. 2015/2 Albertin et al. 2015/3), we
successfully used X‐ray tomography to read texts inside ancient manuscripts. As an example, Fig. 1
shows a reconstructed portion of a 200‐page handwritten physics book from the 18th century. Our
tests did not use centralized synchrotron facilities: advanced microfocus X‐ray sources provided
sufficient contrast and resolution.
Fig. 1: X‐ray tomography reconstruction of handwriting from inside a 1790 physics book. The reconstructed
portion of the book exhibits readily recognizable characters and words. The tomography was based on raw
projection radiographs obtained with a laboratory‐based microfocus source (Albertin et al. 2015/3).
Also recently, we coupled this technique with photogrammetry that produced accurate 3‐
dimensional renderings of the objects. The combination provides correlated information on the
content and structure of the manuscripts.
The use of tomography to analyze ancient manuscripts (Albertin et al. 2015/1; Albertin et al. 2015/2
Albertin et al. 2015/3) is the response to multiple challenges: (1) “reading” unopened volumes and
scrolls; (2) in general, avoiding as much as possible the manipulation of the specimens, to prevent
possible damage (we observed no radiation effects in our tests); (3) in the long term, the rapid and
non‐invasive digitization of large historical collections like the Archivio di Stato in Venice – the target
of our “Venice Time Machine” project (http://vtm.epfl.ch/). The foundation of the technique is the
widespread use throughout Europe of inks containing X‐ray‐absorbing heavy elements. Indeed, our
chemical analysis detected “iron gall” black inks in all the specimens we used so far, over 6 centuries.
32
The first stage of our program (Albertin et al. 2015/1) used X‐rays emitted by centralized synchrotron
sources. The beam quality was outstanding, but it forced us to move specimens outside their normal
environment, traveling over long distances. This obviously limited the potential applications.
This limitation was overcome thanks to an important recent success: the use of laboratory‐based X‐
ray instrumentation without an unacceptable loss of quality suitable for analysis of large‐area
manuscripts. Figure 1 is one of several recent results of this kind.
We are now dealing with a challenging obstacle on the path of large‐scale application: automatic
separation of individual pages ‐‐ from manuscripts that are typically warped and sometimes rolled.
We experimented with advanced algorithms, obtaining promising results. However, automatic
segmentation remains a formidable challenge: we will discuss the present problems and the possible
solutions.
Besides text recognition, X‐ray techniques can also deliver a wealth of additional information on: (1)
the substrates microstructure; (2) the writing process (i.e., paleographic facts such as the “ductus”);
(3) chemical data on inks, both black and colored (that typically contain heavy elements); (4)
manuscript structural features such as seals and watermarks; (5) in general, the “hidden” structure of
the specimens. Potentially, it could also contribute to the current studies of the ink‐substrate
interactions, in particular ink‐induced damage.
Our main target, however, remains text recognition. The recent successes in applying the approach
to specimens with a large number of pages (see again Fig. 1) open up exciting possibilities in that
direction ‐‐ corroborating and complementing the important results recently obtained, for example,
by Mocella et al. (2015) on the Herculaneum papyri.
References
F. Albertin, A. Astolfo, M. Stampanoni, E. Peccenini, Y. Hwu, F. Kaplan and G. Margaritondo, J. Synchrotron Rad. 22, 446 (2015)
F. Albertin, A. Patera, I. Jerjen, S. Hartmann, E. Peccenini, F. Kaplan, M. Stampanoni, R. Kaufmann and G. Mar‐garitondo, Microchemical J. (2015), In Press
F. Albertin, E. Peccenini, Y. Hwu, Tsung‐Tse Lee, E. B. L. Ong, J. H. Je, F. Kaplan and G. Margaritondo, Proc. In‐tern. Conf “Digital Heritage” (2015), p. 5
V. Mocella, E. Brun, C. Ferrero, and D. Delattre, Nature Commun. 6, 5895 (2015)
33
Recovering lost commentaries on Aristotle’s treatise On the Heavens in
Venice manuscript Marcianus Gr. 210
Vito Lorusso and Boriana Pouvkova
Centre for the Study of Manuscripts, Hamburg, Germany
The manuscript Marcianus Gr. 210, written in the late twelfth or early thirteenth century on oriental
paper and kept at the Biblioteca Nazionale Marciana of Venice, consists of 207 leaves and contains
three of Aristotle’s works devoted to natural philosophy, namely On the Heavens on leaves 1r‐80v, On
Generation and Corruption on leaves 80v‐122v, and Meteorology on leaves 123r‐207r.
The text of Aristotle’s works is enriched with several commentaries written by the main scribe in the
margins of almost every page of the manuscript. Marcianus gr. 210 has suffered badly from the
ravages of time. More specifically, the manuscript is faded and damaged by water with the result
that nearly all the commentaries that were written in the margins are not visible anymore to the
naked eye. In the course of a multispectral image campaign in October 2014, the Hamburg SFB‐
project Z01 provided better data from the manuscript. This talk will present some results from the
research carried on these new data.
34
35
The Centre of Image and Material Analysis in Cultural Heritage (CIMA) in Vienna, Austria
Manfred Schreiner1, Heinz Miklas2, Claudia Rapp3, Robert Sablatnig4, Wilfried Vetter1,
Bernadette Frühmann1, and Fabian Hollaus4
1Institute of Science and Technology in Art, Academy of Fine Arts Vienna, Austria 2Institute of Slavic Studies, University of Vienna, Austria
3Institute of Byzantine and Modern Greek Studies, University of Vienna, Austria 4Computer Vision Lab, Vienna University of Technology, Austria
The inter‐university Centre of Image and Material Analysis in Cultural Heritage (CIMA) was founded
in early 2014 within the framework of the HRSM‐project (HRSM: Hochschul‐Raum‐Struktur‐Mittel /
Structural fund for the Austrian higher education area), Higher Education Plan 2013 of the Austrian
Federal Ministry of Science and Research.]. The main aim of this centre is the “Analysis and
Conservation of Cultural Heritage – Modern Imaging and Material Analysis Methods for the
Visualization, Documentation and Classification of Historical Written Material (Manuscripts)”.
Specialized in research in the fields of imaging, image enhancement and analysis as well as the non‐
invasive chemical analysis of materials used for the production of historical objects, CIMA represents
a unique facility with an interdisciplinary approach to the investigation of cultural heritage. The
centre brings together the expertise of three disciplines from three universities: Philology (University
of Vienna), Computer Science (Vienna University of Technology) and Chemistry (Vienna Academy of
Fine Arts). The main idea behind the foundation of CIMA was to extend and strengthen co‐operations
by establishing a central laboratory that offers its services to universities, libraries, museums, private
collections etc.
One part of CIMA concerns MultiSpectral Imaging (MSI), which enables in combination with digital
image processing on one side enhancing the readability of palimpsests and damaged manuscripts
and, on the other, certain automated investigations of the codicology and palaeography of
manuscripts such as layout, line structure, or identification of scribes. In the second part so‐called
non‐destructive / non‐invasive analytical techniques such as x‐ray fluorescence (XRF), UV‐Vis,
reflection infrared (FTIR) and Raman spectroscopy are applied for e.g. manuscripts, in order to
determine pigments and/or inks used for the illumination and text of manuscripts. This combination
facilitates the creation of new and improved data in the humanities. Until now, CIMA has applied its
methodology and technical expertise to badly preserved or rewritten manuscripts (palimpsests) from
the 8th to the 14th centuries (mainly in Slavic, Greek and Latin). The material investigations aim at the
identification of the inks and pigments used in contrast to the supporting material (presently the
focus is on parchment).
In the course of the project, a common database will be created which contains the information
gained from the imaging, image enhancement, chemical and philological investigation. The final
objective of CIMA’s is to compare the data generated in the course of its research, to reveal
correlations stemming from multiple modalities (writing material and its preparation, inks and
pigments, reflectivity etc.) in order to advance the research agenda at the intersection of science and
the humanities.
36
37
Application of Hyperspectral Imaging for Quantitative Assessment of
Conservation Treatments for Documents
Tomasz Łojewski1 and Damian Chlebda2
1AGH University of Science and Technology, Faculty of Materials Science and Ceramics,
Krakow, Poland 2Jagiellonian University, Faculty of Chemistry, Krakow, Poland
Conservation procedures performed on documents (eg. consolidation, deacidification, cleaning,
disinfection) often lead to various kinds of changes in their appearance.
Evaluation and documentation of the desired changes as well as the unwanted ones is in the first
place based on optical methods ‐ primarily digital photography (or flat bed scanning) and/or
colorimetric measurements. Consumer digital cameras or scanners can produce images with very
high spatial resolution but do not provide sufficient spectral information to determine colorimetric
indices (eg. CIE L*a*b*).
Colorimetric measurements are restricted to relatively large areas of homogeneous color on a
studied document, which practically limits its use to paper (or parchment) substrate and does not
allow to record and monitor possible alterations induced on the writing media. Hyperspectral
imaging allow to overcome these difficulties offering both spatial and spectral resolution needed to
complete such a task.
In the presentation a detailed workflow would be ascribed for monitoring side effects of
conservation treatments on paper based documents with the use of a scanning hyperspectral system
comprised of a VisNIR camera (Headwall Photonics) and a broadband illumination source (xenon
lamp).
A set of colorimetric standards and model samples of modern inks on paper was used to test and
prove the procedure of datacube collection, normalization, registration and recalculation from
reflectance spectra to CIE L*a*b* color values. The method was applied to monitor in a quantitative
way color changes for archival documents subjected to two novel conservation treatments ‐ (1) cold
plasma and (2) essential oils disinfection. A comparison with data obtained with a filter‐based
multishot imaging system (7 spectral lines in VIS) with monochrome camera (Point Grey/CMOSIS) will
also be provided.
38
39
Computation and Palaeography: Where are we Now?
Peter A. Stokes
Department Digital Humanities, King’s College London, United Kingdom
The primary purpose of this lecture is to provide a survey of the field, focussing on developments
since the 2012 Dagstuhl Perspectives Workshop on ‘Computation and Palaeography: Potentials and
Limits’ (Hassner et al. 2013), in which a number of issues were discussed and identified as essential
to future development in the use of digital methods in the analysis of handwriting and other related
topics. Although this was by no means the first such conference, it was perhaps one of the more
significant in terms of bringing together expertise in palaeography, digital humanities and informatics
at an important time when many thousands if not millions of digital images were being produced in
large‐scale digitisation projects. However, three years is a long time in this field, and so it is worth
revisiting the discussions that were held there and asking where we are now, and where we might
want to go next.
The ‘manifesto’ from that workshop identified a number of areas for future development. Most of
these were not technological or algorithmic but related much more to aspects of communication and
collaboration. They can be broadly summarised into three overall headings:
1 Access to and sharing of data and images, including standards, metadata, harmonisation of
copyright and intellectual property.
2 Access to and sharing of results and methods, including tools, libraries and resources.
3 Increased communication and understanding particularly between disciplines, including add‐
ressing problems of terminology, developing meaningful ontologies and ‘mid‐level features’,
avoiding ‘black boxes’, addressing questions of context and meaning.
Since the publication of the ‘manifesto’, a Dagstuhl Seminar took place on ‘Digital Palaeography: New
Machines and Old Texts’ during which many of these questions were revisited (Hassner et al. 2014).
The conclusions then were that the problems of the ‘black box’ were at least much more widely
recognised than before, and that concerted effort was being made to address the problem. However,
it was also recognised more explicitly than before that the ‘black box’ applies not only to the
computer but also to the human specialist. This point had been made before (Davis 2007; Schomaker
2007; Stokes 2009), but raising it explicitly here changed the question somewhat from one of
obscurity to one of trust: not ‘how can we know what is happening in the box’ but ‘how can we (and
should we) trust others’ conclusions?’ Research brings with it a responsibility to be as transparent as
possible and also to challenge and question each other’s results, yet an interdisciplinary context
necessarily requires a wider range of expertise than any one individual can reasonably be expected
to understand. Indeed, it was also recognised that ‘digital palaeography’ is perhaps still not
interdisciplinary enough: that must involve more than just palaeography and image analysis,
expanding to include other areas ranging from ‘GLAM’ institutions (galleries, libraries, archives and
museums) through palaeography, codicology, history, art history, linguistics; infrastructure
development and support; image analysis, knowledge representation, UI and UX; and so on. Further
questions were raised about the possibility of creating toolboxes or suites of web services, of
reviewing and acknowledging the very different metrics for success in different disciplines.
40
Clearly there has been much progress in the last few years, both in methods and techniques in the
image analysis of visual manuscript features, and also in the more ‘social’ aspects of this area of
research. Old questions remain, however, and new ones are opened up by recent developments. The
promise of toolboxes, VREs, suites of web services and so on has been made for some time but
results have not yet become widespread: why this is so requires examination. The question of
measures for success seems fundamental here and easy to overlook: if different disciplines do indeed
have truly different goals, then how can we make these explicit in order to address them properly
and ensure that all parties can genuinely benefit from the work that is being done? This is perhaps
part of the reason why many palaeographers still consider image analysis to have no value for their
research: because the goal for them is not to analyse images or generate data but to understand
aspects of human history and culture, and how to get from the former to the latter is still not
sufficiently clear. If we can soon achieve very good results for key problems as some have suggested,
such as wordspotting, image segmentation, identification of allographs, and writer identification,
then what will the consequences be for our research (and what would ‘very good results’ mean to
different people in different fields)? What questions can be addressed by existing methods but have
not yet so far been considered (for some examples see (Hassner et al. 2014) and especially (Stokes
2015))? These and other related questions will be raised and addressed, particularly through
examination of existing projects and perhaps less common approaches, with a view not towards
establishing the state of the art in terms of algorithms and computational methods, but rather to
broaden the discussion, widening the context in the hope of inspiring some new thoughts and
directions of research from which all parties might benefit.
References
T. Davis, The practice of handwriting identification, The Library 7th series, 8, 251–76 (2007). 10.1093/library/8.3.251
T. Hassner, M. Rehbein, P.A. Stokes, L. Wolf, Computation and palaeography: Potentials and limits, Dagstuhl Manifestos 2, 14–35 (2013). 10.4230/DagMan.2.1.14
T. Hassner, R. Sablatnig, D. Stutzmann, S. Tarte. Digital palaeography: New machines and old texts, Dagstuhl Reports 4, 127–8 (2014). 10.4230/DagRep.4.7.112
L. Schomaker, Advances in writer identification and verification, in: Proc. of 9th Int. Conf. on Document Analysis and Recognition (ICDAR), 2, 1268–73 (2007). 10.1109/ICDAR.2007.4377119
P.A. Stokes, Computer‐aided palaeography, present and future, in: M. Rehbein et al. (ed), Kodikologie und Paläographie im Digitalen Zeitalter — Codicology and Palaeography in the Digital Age, Books on Demand, Nor‐derstedt, 2009, pp. 313–42. urn:nbn:de:hbz:38‐29782.
P.A. Stokes, Digital approaches to palaeography and book history: Some challenges, present and future, Front. Digit. Humanit. 2 (2015). 10.3389/fdigh.2015.00005
41
Visual Saliency for Visual Feature Analysis of Historical Manuscripts
Ehsan Arabnejad1, Hossein Ziaei Nafchi1, Elaine Treharne2, Celena Allen3, Benjamin L. Albritton4,
and Mohamed Cheriet1
1 Synchromedia Laboratory, École de Technologie Supérieure, Montreal, Canada, H3C 1K3 2 Department of English, Stanford University, CA, USA
3 Center of Spatial and Textual Analysis, (Cesta), Stanford, CA, USA 4 Stanford University Libraries, Stanford, CA, USA
Introduction
Visual feature extraction and analysis is very important step towards categorization and
understanding of historical manuscripts. While human visual system (HVS) can easily recognize and
localize significant features in historical images, automatic detection of salient features is not an easy
task. The salient features could be text or graphics with salient colors, or with salient shapes.
Depending on the documents under study, analysis of such features may reveal important
information about organization and structure of manuscript such as beginning of new significant
section, new text or important section of the text, beginning of new chapter and etc. Also detection
and analysis of these visual features help us to investigate the relation and interaction between
authors and writers. Current visual saliency detection algorithms are not designated to deal with
documents and degraded documents in particular. Degradations in historical documents often have
irregular patterns that might be considered as salient features by mistake. While color saliency based
methods cannot deal with the gray‐scale images, shape‐based methods that are not using color
information are not able to detect salient color regions. Meanwhile, there is no dataset of historical
document images with associated ground truths that can be used to evaluate different saliency
detection algorithms. Our project is in collaboration with Stanford University, part of Digging into
Data project. In this project 198 manuscripts from 11th, 12th, and 13th centuries are selected as target
for feature extraction. Some of them belong to one century while others have spanning of two or
three centuries. The authors used different colors, shapes and decorations for organization of
manuscripts which help readers for better understanding. The aim was to extract the salient color
regions (characters) in these document images. After extraction of features, the goal was to classify
them into four categories: i) Litterae Notabiliores, ii) Enlarged Capitals, iii) Rubric, and IV) Intertextual
space. The extraction of colored characters is in fact a segmentation problem. Gaussian mixture
model with expectation maximization and also K‐Means are classical approaches for color image
segmentation. There are three main problems that make the use of these approaches at least
interesting. The images in this study are degraded, the number of classes varies from one image to
another, and that the background color might be very similar to the colored text. In the following, we
explain the proposed method that we developed to overcome the aforementioned problems.
Proposed Method for Feature Analysis
The proposed method uses a new color saliency technique, as well as the color saliency method of [1]
to segment document images. In [1], instead of computing the gradient of the images from a
luminance channel, the color images are boosted and a gradient map is computed from that boosted
image. The advantage of using this gradient map is that edge strengths at colored contours often
have higher magnitudes. This is in contrast to the traditional edge detection methods that work on a
luminance channel. Since background often has less gradient information, it can be distinguished
42
from the texts with the same color as background. This approach, however, cannot always
distinguish between non‐salient text and the salient text (colored). Therefore, we propose a simple
and efficient color saliency method to classify image pixels into non‐salient and salient color pixels.
Our assumption is that variation of the three color channels (RGB) for colored pixels is high. For each
pixel, the standard deviation of three channels is computed. To take into account different colors,
this process is repeated three times on the original image, variance normalized image, and variance
normalized image with red channel being divided by 2. The three saliency maps are found to cover
wide range of colors which are enough to classify salient and non‐salient regions. From each of these
three maps, pixels with higher variation are selected and a binary map is generated by thresholding.
The binary maps are then combined. The binary image obtained from color gradient [1] and the
binary image generated with the proposed method is combined to form the final segmented image.
Next step is to analyze the extracted features. The goal is to automatically assign a label to each
feature. Litterae Notabiliores features are made of components with different colors, while only one
color is used to write capitals. Therefore, we simply use the second order image statistics, e.g.
standard deviation, of the three channels to classify features into Litterae Notabiliores and capitals.
Also, entropy was used to measure the amount of “business” for each feature. Both standard
deviation and entropy for capitals should be small in comparison with the Litterae Notabiliores. For
final decision, a support vector machine was trained and used. Rubrics are other interesting features
in this dataset. Similar to the capitals, rubrics are made of just one color. To distinguish between
capitals and rubrics, layout analysis was employed as if the detected color saliency matched with
specific constrains of text‐lines, columns or generally layouts, it will be labeled as Rubric. The last
features that should be detected are Intertextual space. For this purpose, structure of document and
layout are analyzed to find the spaces that are expected to be text according to layout contrariness
but kept void.
The performance of the proposed method for classification of four features in terms of recall and
precision is listed in Table I. The time complexity for processing such a big datasets is very important.
The proposed color saliency method is very simple and fast, while color gradient algorithm of [1] has
a moderate complexity. Therefore, we could process all of the images in the dataset in a relatively
short time.
Litterae Notabiliores Enlarged Capitals Rubrics InterTextual Space
Recall Precision Recall Precision Recall Precision Recall Precision
0.61 0.95 0.50 0.83 0.70 0.81 0.75 0.60 Table I. The classification performance of the proposed method for four features.
Acknowledgement
The authors would like to thank DiDC (NSERC RGP DD‐13) project, and also NSERC of Canada for their
financial support.
Reference
J. Weijer, T. Gevers, A. D. Bagdanov, Boosting color saliency in image feature detection, in: IEEE Transactions on Pattern Analysis and Machine Intelligence. 28, 1 (2006).
43
Bag‐of‐descriptors of SIFT for Segmentation‐Free Word Spotting in Handwritten Arabic Documents
Y. Elfakir1, G. Khaissidi1, M. Mrabti1, M. A. El Yaccoubi2, Z. Lakhliai1, and D. Chenouni1
1LIPI / ENS, Fes, Morocco 2SAMOVAR, Télécom SudParis, CNRS, Université Paris‐Saclay, France
Old manuscripts are a part of the richest cultural heritage and legacy of civilizations. Repetitive
manual manipulation of fragile documents should be avoided as it could destroy them. Digitalization,
therefore, is a convenient solution for the preservation of these manuscripts. Many digitization
projects, which treat Latin scripts, have been developed such as manuscripts d’Oc and d’Oïl in the
Vatican Library (IRHT 2011), Better Access to Manuscripts and Browsing of Images (Calabretto et al.
1999), etc. The conception of recognition systems for degraded handwritten Arabic document
images knows today a great expansion and appears as a necessity in order to exploit the wealth of
information contained in ancient manuscripts.
This paper deals with the problem of query‐by‐example word spotting in handwritten Arabic
documents. This operation needs a lot of time and effort to do by manual inspection. Many existing
architectures on word spotting based on text, word or line segmentation steps (Rath and Manmatha
2003; Elfakir 2015) are used in the recognition systems to facilitate the search. However, any
segmentation errors of the document affect the subsequent word representations and matching
steps. This explains why research on word spotting and retrieval is oriented towards segmentation‐
free methods. Gatos and Pratikakis (2009) present an approach applied to historical printed
documents. The proposed method is based on document image block descriptors that are used in a
template matching process. Rothacker et al. (2013) propose to combine the Bag‐of‐visual‐word
representation with Hidden Markov Models in a patch‐based segmentation‐free framework in
handwritten documents. Almazán et al. (2014) represent document images by a grid of HOG
descriptors and a sliding‐window approach is used to locate in the document the regions that are
most similar to the query.
We address the search problem by using a Bag of Visual Words (BoVW) powered by Scale‐invariant
feature transform (SIFT) descriptors. The BoVW method, based on a histogram of occurrence counts
of words, is a popular technique for image classification inspired by models used in natural language
processing. This representation does not take into account the spatial distribution of the visual words.
To solve this problem, we use the Spatial Pyramid Matching method proposed by Lazebnik et al.
(2006). Then, the Latent Semantic Analysis method introduced by Landauer et al. (1998) is applied to
represent the local region descriptors in order to solve the ambiguity and redundancy of individual
visual words in the document. Finally, to reduce the memory of local regions descriptors and the
computational cost of searching the nearest neighbors, we encode the SIFT descriptors using the
Product Quantization (PQ) method (Jégou et al. 2011). The latter consists of decomposing the space
into a Cartesian product of low dimensional subspaces and of quantizing each subspace separately.
The proposed method was applied to handwritten Arabic document images from the Ibn Sina
dataset (Moghaddam et al. 2010) and other Arabic documents. The obtained results are satisfactory
in terms of recognition rate and execution time.
44
References
J. Almazán, A. Gordo, A. Fornés, and E. Valveny, Segmentation‐free word spotting with exemplar SVMs, Pattern Recognition 47 (12), (2014), pp. 3967–3978.
S. Calabretto, A. Bozzi, J.‐M. Pinon, Numérisation des manuscrits médiévaux : le projet européen BAMBI, in: Actes du colloque Vers une nouvelle érudition: numérisation et recherche en histoire du livre, Rencontres Jacques Cartier, Lyon. décembre 1999.
Y. Elfakir, G. Khaissidi, M. Mrabti, Z. Lakhliai, D. Chenouni, and M. Elyacoubi, Contribution à l’indexation des documents manuscrits arabes scannés, Mediterranean Telecommunication Journal Vol. 5, N° 2 (2015).
B. Gatos and I. Pratikakis, Segmentation‐free word spotting in historical printed documents, in: International Conference on Document Analysis and Recognition, Proceedings, (2009), pp. 271–275.
IRHT, coord. Maria Careri (Université de Chiet ‐ membre associé à l’IRHT), Anne‐Françoise Leurquin et Marie‐Laure Savoye (tt://jonas.irht.cnrs.fr/2011 – 2021).
H. Jégou, M. Douze, and C. Schmid, Product quantization for nearest neighbor search, IEEE Trans. Pattern Anal. Mach. Intell. 33 (1) (2011), pp. 117–128.
S. Lazebnik, C. Schmid, and J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, in: international Conference on Computer Vision and Pattern Recognition, Proceedings of the IEEE Computer Society, (2006), pp. 2169–2178.
T. Landauer, P. Foltz, and D. Laham, “Introduction to Latent Semantic Analysis,” Discourse Processes, (1998), 25, pp. 259‐284.
R. F. Moghaddam, M. Cheriet, M. M. Adankon, K. Filonenko, and R. Wisnovsky, “IBN SINA: A database for re‐search on processing and understanding of Arabic manuscripts images”, Proceedings of DAS’10, June 9‐11, 2010, Boston, MA, USA
T. M. Rath and R. Manmatha. Word image matching using dynamic time warping, in: international Conference on Computer Vision and Pattern Recognition, Proceedings, (2003) volume 2, pp. 521–527.
L. Rothacker, M. Rusiñol, and G. Fink, Bag‐of‐features HMMs for segmentation‐free word spotting in handwrit‐ten documents, in: 12th International Conference on Document Analysis and Recognition, Proceedings, (2013), pp. 1305–1309.
45
Transcript Alignment for Historical Manuscripts
Rafi Cohen, Klara Kedem, and Jihad El‐Sana
Department of Computer Science, Ben‐Gurion University of the Negev, Israel
The recent efforts which have been invested in digitizing libraries, have exposed historical datasets to
scholars and the general public. However, the documents in these datasets are stored as images and
not as text, which makes searching, indexing and retrieval challenging tasks. Sometimes an ASCII
transcript is supplied together with the document’s image. A mapping (aligning) of each word in the
transcript to the corresponding word image in the document will simplify and accelerate accessing
and processing the manuscripts. In addition to allowing searching and indexing of the document
images, alignment provides an automatic way for ground truth generation, which, in turn, can be
used to evaluate various document retrieval and recognition algorithms.
Transcript alignment methods suggested in the literature can be roughly divided into two categories
depending on whether or not character/word recognition models are used. Recognition based
methods usually perform better, but require more preprocessing for training the character/word
recognizer (Yin 2013). Methods that do not use recognition models, usually reduce the problem into
an matching problem between features extracted from the line image, and features generated for
the ASCII transcript, using a matching technique such as Dynamic Time Warping (DTW) (Rabaev et al.
2015) or some other heuristics based matching methods, e.g., Hassner et al. (2013), Stamatopoulos
et al. (2014).
The work is inspired by the work done in the speech recognition community for speech‐to‐phoneme
alignment (Keshet2007). It uses the Structured Support Vector Machine (S‐SVM) framework for
learning a weight vector that separates the correct alignment sequences from incorrect ones.
In the alignment problem, we are provided with a line image which is accompanied with a sequence
of events (characters) and the goal is to align each of the events in the sequence with its
corresponding position in the line. The goal is to find the start time of each event in the input line,
where we also consider the space between words as an event. More formally, we represent a line
image as a sequence of feature vectors x x , x , . . , x and the sequence of events is denoted by
e e , e , . . , e . In our problem, each input is a pair x, e and the output is an alignment of x with e. That is, a sequence of start‐times y y , y , . . , yK , where y ∈ 1, . . , T is the start‐time of the
event ek in the line image. Our goal is to learn an alignment function, denoted f, which takes as input
the pair x, e and returns an event timing sequence y.
We use the Structured Support Vector Machine (S‐SVM) framework for predicting the correct
alignment. The S‐SVM is a machine learning algorithm that generalizes the SVM classifier. Whereas
the SVM classifier supports simple output, such as, binary classification, regression, etc. the S‐SVM
allows training of a classifier for predicting complex labels.
The first step towards a solution is to define a quantitative assessment of alignments. Let x, e, y be
a training example and let f be an alignment function. We denote by γ y, f x, e the cost of
predicting the timing sequence f x, e where the true timing sequence isy. In this work we use the cost function defined in Eq. (1). In words, the above cost is the average number of times the absolute
difference between the predicted timing sequence and the true timing sequence is greater thanε.
46
γ y, y1|y|
i: y y ε 1
We describe a large margin approach for learning f. Recall that a learning algorithm for alignment
receives as input a training set S x , e , y , . . , x , e , y , and returns an alignment function f.
To facilitate an efficient algorithm, we confine ourselves to a restricted class of alignment functions.
Specifically, we assume a predefined set of base alignment feature functions, ϕ , where
ϕ x, e, y returns the confidence of ϕ in the suggested timing sequence.
We denote by ϕ x, e, y the vector in , whose jth element isϕ x, e, y . The alignment functions
we use are of the form given in Eq. (2), where w ∈ is a vector of importance weights that we
need to learn. In words, f returns a suggestion for a timing sequence by maximizing a weighted sum
of the confidence scores returned by each base alignment function ϕ . The actual computation of
the arg max operator is done using dynamic programming.
f x, e argmax w ∙ϕ x, e, y 2
We now describe a large margin approach for learning w from the training set S. We try to rank the
sequences according to their quality. Ideally, for each instance x , e and for each possible
suggested timing sequencey′, we would like constraint (3) to hold. That is, w should rank the correct timing sequence y above any other possible timing y′ by at least γ y, y .
w ∙ ϕ x , e , y w ∙ ϕ x , e , y′ γ y, y 3
We follow the SVM approach and define the optimization problem (4), where eachξ 0, is a slack variable that indicates the loss of the ith example.
minizew,
12||w|| C ξ subject to {∀i, y |w ∙ ϕ x , e , y w ∙ ϕ x , e , y′
γ y, y ξ 4
Equation (4) cannot be solved using a standard solvers, since the number of constraints is
exponential. Therefore we solve it using an iterative algorithm that is similar to the Perceptron
algorithm (Rosenblatt 1957).
We define four base functions, our first base function is a character recognizer based on the HOG
descriptor combined with linear SVM. In particular, to train a classifier for a character, e, we extract
positive and negative examples for e from the training set. Our second and third base functions are
binary indicator functions, which aim at capturing transitions between events. The second function is
based on projection profile, where we compute the strict local minima of the projection profile for
the line image and for each column return a binary indicator whether it’s within the vicinity of such a
local minima. The third base function is based on connected components in the binarized image. We
scan the line image from left to right, and whenever we encounter within a column a new connected
component, the column and its neighboring columns are marked as 1. Our last base function scores
timing sequences is based on character length. It merely examines the length of each character, as
suggested byy, compared to the typical length of that character in the training set.
Our Structured Support Vector Machine(S‐SVM) method was tested on several datasets and provided
encouraging results. On the Saint Gall dataset, and it outperformed the results in (Fischer et al. 2011).
47
In particular we obtained the following results: Accuracy = 97:61%, Precision = 97:61%, Recall =
99:71%. Fig. 1 illustrates four examples of the alignment. The left 4 lines (a) are taken from (Fischer
et al. 2011), whereas the right 4, are the result of our algorithm. We mark the boundaries of words
by dark bars, and the beginning of characters by light bars.
(a) (b)
References
V. Fischer, A. Frinken, Forn´es, and H. Bunke. Transcription alignment of Latin manuscripts using Hidden Mar‐kov Models. In the Workshop on Historical Document Imaging and Processing (HIP’11), pages 29–36. ACM, 2011.
T. Hassner, L. Wolf, and N. Dershowitz. OCR‐free transcript alignment. In the 12th International Conference on Document Analysis and Recognition (ICDAR’13), pages 1310–1314, 2013.
J. Keshet, S. Shalev‐Shwartz, Y. Singer, and D. Chazan. A large margin algorithm for speech‐to‐phoneme and music‐to‐score alignment. IEEE Transactions on Audio, Speech, and Language Processing, 15(8):2373–2382, 2007.
I. Rabaev, R. Cohen, J. El‐Sana, and K. Kedem. Aligning transcript of historical documents using dynamic pro‐gramming. In Document Recognition and Retrieval XXII (DRR’15), IS&T/SPIE, pages 94020I1–94020I9.
F. Rosenblatt, The Perceptron‐‐a perceiving and recognizing automaton. Report 85‐460‐1, Cornell Aeronautical Laboratory, 1957.
N. Stamatopoulos, B. Gatos, and G. Louloudis. A novel transcript mapping technique for handwritten document images. In the 14th International Conference on Frontiers of Handwriting Recognition (ICFHR’14), pages 41–46, 2014.
F. Yin, Q. Wang, and C. Liu. Transcript mapping for handwritten Chinese documents by integrating character recognition model and geometric context. Pattern Recognition, 46(10):2807–2818, 2013.
48
49
Simple and Effective Segmentation‐Free Word Spotting in Historic Documents
Sebastian Sudholt, Leonard Rothacker, and Gernot A. Fink
Department of Computer Science, TU Dortmund University, Dortmund, Germany
Word spotting is the task of searching words in document images without explicitly transcribing the
documents first. Instead, possible matches are ranked according to their relevance with respect to
the query (Bluche et al. 2016). Although, a complete transcription would be preferable, as it allows
for manual and automatic processing of the document far beyond searching, transcriptions of
historic document images are hard to obtain in practice. Automatic recognizers usually fail unless the
variability in the script’s visual appearance is low or huge amounts of annotated training material are
available. Especially for historic documents these prerequisites are hardly met (Rothacker et al. 2014).
Word spotting methods, on the other hand, are much more robust in this regard. The search is
directly modeled as a retrieval problem instead of implementing the search on top of a classification
result. Users, therefore, benefit even if there are errors in the recognition, as long as the relevant
results are in the top ranks of the retrieval list (Frinken et al. 2012).
An important characteristic of word spotting systems is the input modality of query words, usually
given as exemplary image (query‐by‐example) or textually (query‐by‐string) (Lladós et al. 2012). In
query‐by‐example scenarios the query is an exemplary occurrence of the search term that has to be
selected in the document image by the user. While otherwise no annotated training material is
required and the complexity of such systems is relatively low, the drawbacks are limitations with
respect to the feasible variability in the script’s visual appearance and the user’s effort of locating the
query first. Query‐by‐string word spotting systems do not suffer from the aforementioned
disadvantages but require annotated training material and are a lot more complex in comparison (cf.
Frinken et al. 2012). For practical applications of word spotting, the retrieval database consists of
entire document images. In order to perform retrieval on word or line level, one approach is to
heuristically segment document images. These methods often require preprocessing, like
binarization, and assume a priori knowledge about the visual appearance of text. Due to
degenerations in historic documents originating from writing materials, storage or age, such
assumptions lead to errors as they will not be valid in general (cf. Lladós et al. 2012). Subsequent
steps in the recognition pipeline are doomed to fail if they are relying on perfect segmentations and
are sensitive to segmentation errors. One way of approaching this problem is the development of
fully segmentation‐free methods.
One of the first word spotting methods for historic documents that has been evaluated without any
dependency on given line or word segmentations was presented in (Leydier et al. 2007). Query words
are retrieved by detecting and matching zones‐of‐interest in the document image. The authors also
emphasize that their method does not require any binarization. Future approaches to segmentation‐
free word spotting were mainly inspired by successful methods from Computer Vision. While in
content‐based image retrieval hardly any assumptions with respect to the visual appearance of
scenes and objects are possible, the same approaches can be applied to retrieving word images. In
Rusiñol et al. (2015) and Almazán et al. (2014) methods are presented that are built on Bag‐of‐
Features Spatial Pyramids (Rusiñol et al. 2015) and Histogram‐of‐oriented‐Gradients representations
(Almazán et al. 2014). Retrieval is performed in patch‐based frameworks where densely sampled
50
patches are encoded in lower dimensional vector spaces that allow for very fast performance. The
high accuracy of both methods shows the features’ robustness with respect to patches that do not
exactly match with occurrences of the query in the document image. In Rothacker et al. (2014) we
presented a hierarchical method using inverted file structures for rapidly detecting regions of interest
in a first stage. Afterwards, these regions are examined with Bag‐of‐Features HMMs in a patch‐based
framework for highly accurate retrieval. Patch‐based frameworks approach the segmentation
problem by simply considering all possible word positions. Unfortunately, this leads to huge search
spaces, the rapid exploration of which requires indexing strategies as in (Almazán et al. 2014;
Rothacker et al. 2014; Rusiñol et al. 2015). Furthermore, the patch size, as well as orientation, is
crucial for the retrieval performance. Setting the size of the patches to the same size as the query
leads to good results as long as the writing style is homogeneous (Almazán et al. 2014; Rothacker et
al. 2014). For increased flexibility patches at a few sizes have been extracted in Rusiñol et al. (2015).
However, all of these attempts are far from evaluating all possible patch sizes for a given query. In
our experiments we will show that this is insufficient for handling larger writing style variability as it
can be found in the Bentham word spotting data set (cf. Puigcerver et al. 2015). In order to address
these problems of spotting words on historic document images, we present a simple and effective
approach achieving state‐of‐the‐art results. Using basic off‐the‐shelf methodology, we won the
learning‐free (query‐by‐example) track of the ICDAR 2015 Keyword Spotting Competition (Puigcerver
et al. 2015). Feature representations for patch‐based retrieval frameworks have already proven to be
robust against segmentation errors as patches overlapping only partly with relevant words receive
reasonably high similarity scores. For that reason we propose to use simple techniques for word
segmentation (based on binarization and connected component analysis) and apply simple word
representations from patch‐based segmentation‐free word spotting, similar to Rusiñol et al. (2015).
In the future we see two interesting lines of research. Given that it is sufficient to have the relevant
word segments within a list of possible word segments that will be ranked during retrieval, extracting
more variants in the segmentation process can increase accuracy. An alternative lies in the
possibilities of sequence models. Bag‐of‐Features HMMs have successfully been applied in fully
segmentation‐free word spotting and can be extended to decoding exact word positions. This is
different to the method presented in Frinken et al. (2012) because no perfect line segmentation is
required.
References
Jon Almazán, Albert Gordo, Alicia Fornés, Ernest Valveny.Segmentation‐free word spotting with exemplar SVMs. Pattern Recognition 47(12): 3967‐3978 (2014).
Volkmar Frinken, Andreas Fischer, R. Manmatha, Horst Bunke:A Novel Word Spotting Method Based on Re‐current Neural Networks. IEEE Trans. Pattern Anal. Mach. Intell. 34(2): 211‐224 (2012).
Yann Leydier, Frank Lebourgeois, Hubert Emptoz.Text search for medieval manuscript images. Pattern Recog‐nition 40(12): 3552‐3567 (2007).
Josep Lladós, Marçal Rusiñol, Alicia Fornés, David Fernández Mota, Anjan Dutta.On the Influence of Word Representations for Handwritten Word Spotting in Historical Documents. IJPRAI 26(5) (2012).
Leonard Rothacker, Marcal Rusinol, Josep Llados, Gernot A. Fink. A Two‐Stage Approach to Segmentation‐Free Query‐by‐Example Word Spotting. manuscript cultures, 1(7): 47‐57 (2014).
Marçal Rusiñol, David Aldavert, Ricardo Toledo, Josep Lladós.Efficient segmentation‐free keyword spotting in historical document collections. Pattern Recognition 48(2): 545‐555 (2015).
Joan Puigcerver, Alejandro H. Toselli, Enrique Vidal: ICDAR 2015 Competition on Keyword Spotting for Hand‐written Documents. In Proc. Int. Conf. on Document Analysis and Recognition, Nancy, France, 2015.
51
Text‐Image Alignment and Automated Letter‐form Classification: Reading vs. Looking at
Dominique Stutzmann1, Théodore Bluche2, Yann Leydier3,4, Florence Cloppet4, Véronique Eglin3,
Christopher Kermorvant5, and Nicole Vincent4
1Institut de Recherche et d’Histoire des Textes (CNRS – UPR 841), France 2A2iA, Paris, France
3LIRIS Laboratoire d’Informatique en Image et Systèmes d'information, Lyon, France 4LIPADE Laboratoire d’informatique Paris Descartes Université Paris Descartes, France
5 Teklia, Paris, France
This paper presents and compares two automated letter‐form classification methods in order to
enhance the production and analysis of text‐image alignment. The methods were applied on a large
corpus ‘GRAAL’ (130 pages, 10’700 lines, 114’268 words, and more than 400’300 characters),
available online (http://catalog.bfm‐corpus.org/qgraal_cm). The first method is based on Deep
Neural Networks and Hidden Markov Models, which achieve state‐of‐the‐art text recognition
accuracy (Bluche et al. 2014; Bluche 2015), and is primarily used and is primarily used to align the
text of a digital scholarly edition with the image of a medieval manuscript at page, line, word, and
character level. The alignment results have already been published and outmatches any other
attempt by other teams so far (Stutzmann et al. 2015). In this method, the existence of diverse letter‐
forms and other graphical phenomena in the sequence of letters are modelled, so that the computer
may apply a letter‐form classification during the process of aligning text and image and classify the
characters or the sequences of characters according to the classification without any information in
the ground‐truth. Four phenomena were modelled: ‘allographs’ (variant letter‐forms, e.g. d/D/ꝺ,
r/ꝛ/R, and s/ſ/S), ‘ligature’ (specific forms combining two subsequent letters, e.g. ff, ss, and st),
‘conjunction’ (connected characters or overlap between two letters, e.g. de, bo…) and ‘elision’
(suppression of the initial stroke of a letter after some specific letters, Bluche et al 2016). Results are
published (http://oriflamms.a2ialab.com/Charsegm/Graal/collage.html?chars=LIG_st). As for the
alignment, the accuracy is extremely high (e.g. 100% for 5224 occurrences of ligature ‘st’). Lesser
results are a consequence of unequal repartitions (no ‘vertical d’ in the corpus, so that the ‘uncial d’
were modelled in two classes; very few ‘round s’ so that the second class of ‘s’ gathers all ‘round s’
but also occurrences of ‘ſ’). Ligatures are prominent graphical features and are very adequately
identified. The modelled ligature ‘ez’ is a proof, because the Computer led palaeographers to revise
the notion of ligature for this sequence of letters. All in all, the results obtained while aligning, that is
by knowing which letter has to be modelled within the sliding window, are very good, and allow for
in‐depth and exhaustive palaeographic analysis. The second applied method is a learning‐free
classification of the crops of aligned characters as obtained from the first method. This method has
been developed to enhance and further analyse the letter‐forms for which we had not modelled
graphic differences, esp. in order to obtain a neat cluster of precisely aligned letters and thus foster
palaeographical analysis. This method has been developed in two steps. In the first we considered
that neatly aligned and well extracted characters would build a homogeneous group and therefore
developed a process to separate correctly aligned characters (assuming they correspond to a dense
population in the representation space) from outlying badly aligned characters (assuming they
correspond to sparse and scattered elements). Rather than just comparing colour pixels, we compare
gradients (i.e.: mathematical derivatives) that make use of each pixel's neighbourhood to describe
the local orientation of the curves and the contrast. Besides we did not limit the comparison to one
52
point of view but used multiple views in order to highlight the differences between characters. The
results did not correspond to the expectations and proved that artefacts on the cropped image
(overlapping ascenders or descenders, ink deficiencies, and ligatures and connected characters) yield
far more consequences than expected. This first result is of high importance for palaeographers,
because it confirms that reading is not only about recognizing a known letter‐form, it is also about
discarding all the not meaningful parts in the image of the script. Basing on this result and in order to
allow for a closer analysis, we decided to increase the number of classes. For testing this new path,
we used a cascading classification process on a ground‐truth of 627 letter samples (letter ‘r’). At each
step, the most heterogeneous class is divided into two subclasses. The resulting hierarchical
clustering brings more information than a flat collection of unrelated classes. Having up to thirty
classes for each letter allows for an analysis that goes beyond the usual allographs used in Latin
palaeography and identification of classes. Though such fine classification was not expected, the
information has to be integrated in our future conclusions. In parallel to this work, the alignment and
validation software developed since 2013 (Leydier at al. 2014) has been further developed to allow
the user to use the automated classifications and to tag and annotate whole classes at once. This has
now to be used as a historical tool for research in manuscript studies.
Fig. 1: Oriflamms software, with automated classification of letter 'm'
Acknowledgement
The research is funded by ORIFLAMMS research project (Ontology Research, Image Feature,
Letterform Analysis on Multilingual Medieval Scripts). References: ANR‐12‐CORP‐0010 (Agence
Nationale de la Recherche / Cap Digital), 2013‐2016, http://www.agence‐nationale‐
recherche.fr/projet‐anr/?tx_lwmsuivibilan_pi2[CODE]=ANR‐12‐CORP‐0010
References
T. Bluche, H. Ney, and C. Kermorvant, A Comparison of Sequence‐Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling for Handwriting Recognition, in: L. Besacier, A.‐H. Dediu, and C. Martín‐Vide (ed.), Statistical Language and Speech Processing, Springer International Publishing, 2014, 199‐210.
T. Bluche, Deep Neural Networks for Large Vocabulary Handwritten Text Recognition, PhD thesis, Université Paris‐Sud, 2015.
D. Stutzmann, T. Bluche, A. Lavrentiev, Y. Leydier, and C. Kermorvant, From Text and Image to Historical Re‐source : Text‐Image Alignment for Digital Humanists, in: Digital humanities 2015, Sydney, 2015.
T. Bluche, D. Stutzmann, and C. Kermorvant, Automatic Handwritten Character Segmentation for Paleograph‐ical Character Shape Analysis, [submitted to] DAS2016 – Document Analysis Systems.
Y. Leydier, V. Eglin, S. Bres, and D. Stutzmann, Learning‐Free Text‐Image Alignment for Medieval Manuscripts, in: 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), 2014, pp. 363–368.
53
A Segmentation‐Free Word Spotting Method
Thomas Konidaris1, Anastasios L. Kesidis2, and Basilis Gatos3
1Centre for the Study of Manuscript Cultures, Hamburg, Germany 2Department of Surveying Engineering, Technological Educational Institution of Athens, Greece
3Computational Intelligence Laboratory, Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”, Athens, Greece
We present, a two‐step segmentation‐free word spotting method for historical printed documents
(Konidaris et al. 2015). The first step involves a minimum distance matching between a query
keyword image and a document page image using SIFT keypoints. In the second step of the method,
the matched keypoints on the document image serve as indicators for creating candidate image
areas. The query keyword image is matched against the candidate image areas in order to properly
estimate the bounding boxes of the detected word instances. The method is evaluated using two
datasets of different languages and is compared against segmentation‐free state of the art methods.
The experimental results show that the proposed method outperforms significantly the competitive
approaches.
Introduction
Image analysis for historical manuscripts can be a challenging task. Complex layouts, degradations
and unknown fonts are some of the factors that play a crucial role to the exploitation of their
invaluable content. One of the active research areas is the localization of textual information, namely,
word spotting, directly on the manuscript images without the need of any OCR procedures. Queries
are images that can either be interactively selected by the user, or selected from a list of predefined
images.
In the literature there are two main categories of word spotting methods. The first includes
segmentation based methods (Kim et al. 2005; Rath and Manmatha 2007) while the second concerns
segmentation‐free methods (Leydier et al. 2007; Gatos and Pratikakis 2009). The later assumes that
the processed documents have not undergone any kind of segmentation.
The proposed method falls into the segmentation‐free category. It consists of two distinct steps that
involve SIFT keypoint matching, the creation of candidate image areas and the final bounding box
localization. The bounding boxes are constructed using the RANSAC (Fishler and Bolles 1981)
algorithm and homographies.
Proposed Method
In the proposed method, we follow a segmentation‐free word spotting approach due to the poor
results that usually characterize the segmentation process of historical documents. We are based on
SIFT features (Lowe 2004) that have been proved to provide robustness concerning low image quality
and image degradation. Furthermore, SIFT features are scale and rotation invariant.
The first step of the method involves the matching of the query keyword keypoints with the
keypoints of an entire document page. Instead of following the traditional SIFT matching, we
calculate the K most similar keypoints to each keypoint of the query keyword.The reason is that
54
traditional SIFT scatters the points and when there are multiple instances of the desired word on a
document page, the algorithm fails to correctly localize the keypoints. From the produced point
correspondences we create candidate image areas. That way, we narrow the search space for the
rest of our method. The candidate image areas are created based on the relative position of the
query keyword keypoints and the scale information of the matched keypoints. An example is shown
in Figure 1.
Fig. 1: Candidate image areas that were created from the corresponding keypoints of the document page
image
The next step of the method is to perform an additional matching between the query keyword image
and the candidate image area. Again, the SIFT keypoints are used. This process will serve as the final
bounding box localization. The bounding boxes are constructed using the RANSAC algorithm and
homographies. We use the keypoint correspondences in order for RANSAC to create a model that
will produce a 3 x 3 homography matrix H. This matrix is used in order to create the final bounding
boxes on the document image. An example is shown in Figure 2. Overlapping bounding boxes as well
as irregularly shaped ones are pruned.
Fig. 2: The application of the homography matrix H that is calculated based on the point correspondences be‐
tween the query keyword image and the candidate image area
Experiments
The method has been tested into two different datasets. The one is based on a German book and the
other is based on a Greek. The main load of experiments concerned the German dataset which
consists of 100 pages and 100 keywords. The method showed better performance than other
competitive methods. Results are presented in Figure 3 and on Table 1 and are based on two
different evaluation parameters. TREC‐Eval was also used.
55
Table 1: Results (bold indicated best performance)
Proposed Leydier et.
al.[4]
Gatos et.
al.[5]
SIFT
MAP 0.795 0.776 0.689 0.584
Geometric
MAP
0.751 0.747 0.640 0.503
R Prec 0.771 0.774 0.675 0.604
Bpref 0.927 0.921 0.871 0.625
Reciprocal
Rank
1.000 1.000 0.985 1.000
Proposed Leydier et.
al.[4]
Gatos et.
al.[5]
SIFT
MAP 0.836 0.808 0.730 0.615
Geometric
MAP
0.809 0.787 0.697 0.545
R Prec 0.796 0.792 0.700 0.626
Bpref 0.967 0.990 0.907 0.651
Reciprocal
Rank
1.000 1.000 0.995 1.000
Fig. 3: P‐R Curves for the two evaluation parameters
56
References
M. A. Fishler and R. C. Bolles (1981) A Paradigm for Model Fitting with Applications to Image Analysis and Au‐tomated Cartography. Commun Assoc Comput Mach, 24(6):381–395.
B. Gatos and I. Pratikakis (2009) Segmentation‐free Word Spotting in Historical Printed Documents. In: 10th international conference on document analysis and recognition (ICDAR’09), Barcelona, Spain, pp 271–275.
S. Kim, S. Park, C. Jeong J. Kim, H. Park, and G. Lee (2005) Keyword Spotting on Korean Document Images by Matching the Keyword Image. In: Digital libraries: implementing strategies and sharing experiences, vol 3815, pp 158–166.
T. Konidaris, A. L. Kesidis, B. Gatos (2015) A Segmentation‐free Word Spotting Method for Historical Printed Documents. Pattern Analysis and Applications (PAA).
Y. Leydier, F. LeBourgeois, and H. Emptoz (2007) Text Search for Medieval Manuscript Images. Pattern Recogni‐tion, 40:3552–3567.
D. G. Lowe (2004) Distinctive Image Features from Scale‐Invariant Keypoints. International Journal Computer Vision 60(2):91–110.
T. M. Rath and M. Manmatha (2007) Word Spotting for Historical Documents. International Journal of Docu‐ment Analysis and Recognition (IJDAR), 9(24):139–152.
57
The Quest of lost Ancient Literature: X‐ray Phase Contrast Tomography reveals
the Secrets of Herculaneum Papyri
Vito Mocella1, Emmanuel Brun2,3, Claudio Ferrero2, Daniel Delattre4
1CNR‐IMM‐Istituto per la Microelettronica e Microsistemi‐Unita` di Napoli, Italy 2ESRF—The European Synchrotron, Grenoble, France
3 Inserm, U836, Grenoble, France 4CNRS‐IRHT‐Institut de Recherche et d’Histoire des Textes, France
We present the first experimental demonstration of a non‐destructive technique that reveals the text
of a carbonized and thus extremely fragile Herculaneum papyrus.
Buried by the famous eruption of Vesuvius in 79 AD, the Herculaneum papyri represent a unique
treasure for humanity. Overcoming the difficulties of the other techniques we prove that X‐ray phase
contrast tomography technique can detect the text within scrolls, thanks to the coherence and high‐
energy properties of a synchrotron source.
This new imaging technique represents a turning point for the study of literature and ancient
philosophy, disclosing texts that were believed to be completely lost.
58
59
REX Project: Extraction and processing of underlying texts
Study of a Marie‐Antoinette secret correspondence
Florian Kergourlay1, Christine Andraud1, Anne Michelin1, Aymeric Histace2, Bertrand Lavédrine1,
Isabelle Aristide‐Hastir3, and Rosine Lheureux3
1MNHN‐CRCC, Paris, France 2ETIS, UMR CNRS, Cergy‐Pontois, France
3Archives Nationales, Pierrefitte‐sur‐Seine, France
Marie‐Antoinette and Earl Axel de Fersen have nourished a secret correspondence between June
1791 and August 1792 while the royal family was confined in The Tuileries in Paris. Coded and
partially crossed‐out this correspondence hasn’t yet revealed all of its secrets. Indeed some words,
lines or complete paragraphs have been cleverly censored thanks to very tight curls and false
features to mislead the reader.
This project is part of a field explored by several previous works (Easton and Noel 2010; Bergmann
and Knox 2009; Colombo et al. 2005; Larsen 2011), the specific issue here residing in the voluntary
overlay of two contemporaneously inks. The main goal is thus twofold: (i) develop an experimental
methodology through the combination of non‐invasive and non‐destructive analytical tools and
image processing allowing to distinguish two very similar inks from the 18‐19th century and in fine (ii)
reveal the underlying text.
In this context, an experimental corpus composed of ten letters has been studied by the combination
of (i) micro X‐Ray Fluorescence (µXRF) equipped with a Mo‐Kα excitation source, (ii) Hyper‐Spectral
Imaging in Visible, Near and Short Wavelength InfraRed spectral ranges (HSI‐VNIR and HSI‐SWIR)
from 400 to 2500 nm, (iii) InfraRed flash Thermography (IRT) and stereomicroscopy in transmittance
mode, micro‐topography, 3D numerical microscopy and Reflectance Transformation Imaging (RTI). In
parallel image processing was used to enhance the readability of the obtained image. The
experimental methodology is detailed in Figure 1.
Fig. 1: Experimental methodology
60
µXRF results first confirm that the elementary composition of the under and overlying inks are typical
of metal gall ones and most specifically iron‐gall inks, per se principally composed of Fe, K, Mn, Ni, Zn.
However a crucial dissimilarity has been highlighted on one letter through the presence of Cu in the
underlying ink allowing to propose elemental distribution maps partly revealing the hidden text.
HSI‐VNIR/SWIR analysis combined with multivariate statistics methods has proved to enhance the
inks superposition and to present some similarities with µXRF results. Several methods of image
processing were applied in parallel on a set of pictures improving the differentiation between the
paper, the under and overlying inks (Figure 2). Flash thermography and all the other methods used in
order to underline the inks topography have failed but have shown some great potentiality.
This study, through the combination of the results, offers a new lecture of the crossed‐out
paragraphs with the revealing of an entire underlying text and provides a new way to apprehend
multiple data collected thanks to complementary analytical tools associated to image processing in
order to reveal hidden information on ancient manuscripts.
Fig. 2: Image processing, µXRF and HSI analysis
References
U. Bergmann and K.T. Knox, Pseudo‐color enhanced X‐ray fluorescence imaging of the Archimedes Palimpsest, in: SPIE Proceedings 7247 (2009).
G. Colombo, F. Mercuri, F. Scudieri, U. Zammit, M. Marinelli, and R. Volterri, Restaurator 26, 92‐104 (2005).
R.L. Easton, W. Noel, Infinite possibilities: Ten years of study of the Archimedes Palimpsest, in: Proceedings of the American Philosophical Society, 50‐76 (2010).
C.A. Larsen, Document Flash Thermography. All Graduate Theses and Dissertations, Paper 1018, Utah State University (2011).
61
GraphManuscribble: Interact Intuitively with Digital Facsimiles
Angelika Garz, Mathias Seuret, Andreas Fischer, and Rolf Ingold
DIVA research group, Department of Informatics, University of Fribourg, Fribourg, Switzerland
This abstract presents a new user‐centred and intuitive tool GraphManuscribble developed for
directly interacting with a digital facsimile of a manuscript; particularly to segment, extract, or mark
its contents. It exploits the new human‐computer interaction patterns evolving around touch‐screen
devices that are operated with a stylus, such as Microsoft Surface, and builds upon document graphs
that capture the structure of a manuscript similar to human perception. Specifically, users of the tool
scribble directly onto a digital facsimile of a manuscript in order to select or annotate manuscript
parts. They are assisted by a semi‐automatic system that facilitates imprecise interactions since
accurate operations such as the region segmentation are done automatically. A user study including
participants who did not have any experience with pen‐based interaction has shown a promising
potential of the proposed tool.
Give a person a page of a manuscript, and they will instantly and intuitively be able to recognise its
fundamental structures (such as text lines), regions, and objects (e.g. embellishments) regardless of
the language or script it is written in. Whether that is an ancient European script, Arabian, Cyrillic,
illegible cursive handwriting, or exotic layouts such as in Babylonian Aramaic Magic Bowls,*
complicated curving lines (Asi et al. 2014), or more elusive layouts such as writing embedded in
works of art such as paintings.† To us, a manuscript is a meaningful arrangement of regions; we can
agree on the regions and their semantic meaning on a high level, establish connections and relations
between parts, and understand boundaries and belonging, without needing to read or understand
the content.
Fig. 2: St. Gallen, Stiftsbibliothek, Cod.Sang.18‡, page 95.
Left: original image, centre: document graph, right: close‐up
* http://www.metmuseum.org/collection/the‐collection‐online/search/321885 † http://www.metmuseum.org/collection/the‐collection‐online/search/454611 ‡ Composite manuscript, astronomical clock of Pacificus of Verona, DOI:10.5076/e‐codices‐csg‐0018
62
Gestalt psychology, which emerged in the early 20s of the last century, targeted at understanding
human visual grouping, i.e. the way humans recognise a pattern as an object with boundaries and
extent, and how to differentiate it from background. As such perceptual grouping tries to “solve the
problem of ‘what goes with what’ and the differentiation of figure from ground” (Han 1996). Our
perception is terribly powerful. For us humans, “form emerges as result of the relationship [and
complex interactions] between the parts” (Brock et al. 2006). To fix ideas, let us consider the example
of a zebra herd: we are able to identify each zebra in a herd of similar looking zebras as individual
and agree on its boundaries despite its complex texture and the similar texture of its herd members.
This means, we are able to identify what defines a collection of patterns, and what distinguishes it
from others, and hence, group those with respect to some similarity criterion (Breidbach et al. 2006).
A computational method exposed to the same task, on the other hand, works on an array of pixels
the document image is digitally represented in, without the inherent understanding of the structure.
Methods have been developed to classify, group, and align pixels to model our intuitive
understanding of a document and to provide a means for automatic processing (Nagy 2000). Graphs
in the context of computer science (Conte et al. 2004) are a powerful means to model structural
relationships. They are capable of representing data in a fashion similar to our perception given an
appropriate definition of their topology and criterions for defining relations: graphs are a set of
points connected by edges (see Figure 1 where the graph is a minimum spanning tree based on
triangulated contour points). To fix ideas let us explore the analogy of maps, more particularly road
networks, and graphs. Graphs capture entities and their relationships rather similar to a road
network connecting neuralgic points in a city. In the network, there are roads that are more crucial
than others, are wider, have more lanes, or a higher speed limit, and thus, higher traffic throughput.
A graph can similarly encode different relationship qualities in its edges.
We propose a user‐centred system that is inspired by the humans’ perceptual capabilities and aims
at representing the document image in a way that resembles our understanding using graphs (Garz
et al. 2015). The goal of our first prototype tool is to assist a user in selecting and segmenting entities
in a manuscript in an intuitive manner guided by the humans’ connotation of structure and belonging
in a document. The user edits this representation with a stylus on a touch‐sensitive screen. The range
of application of such a tool extends from extracting parts of a manuscript e.g. for selecting samples
for word spotting (Riba et al. 2015) or illustration retrieval, in order to find other appearances of the
same or word or illuminations, decorations, or drop caps (Nguyen et al. 2011); to a tool for scholars
that empowers them to work on the digital facsimile of a manuscript in an intuitive and natural
fashion. A scholar is empowered to directly mark and annotate a facsimile with simple and intuitive
tools.
We conducted a user study with the aim of assessing the efficiency of the tool for annotating
complex historical manuscript pages, the user fatigue, and the quality of the proposed scribbling
interaction pattern. It demonstrated that our tool and the interaction with a pen on a touch screen
that displays the facsimile does neither require prior training, algorithmic knowledge, nor extensive
computer skills.
63
Fig 3: User Interface of the GraphManuscribble tool with the selected areas (polygons colour‐coded according
to their class), and the user scribbles (interactions) on a second view with a binary version of the image, where
the foreground is white, and the graph over‐imposed as green lines.
References
A. Asi, R. Cohen, K. Kedem, J. El‐Sana, and I. Dinstein (2014). A Coarse‐to‐Fine Approach for Layout Analysis of Ancient Manuscripts. In Proc. Int. Conf. on Frontiers in Handwriting Recognition (pp. 140‐145). IEEE.
O. Breidbach, and J. Jost (2006). On the Gestalt Concept. In Theory in Biosciences (Vol. 125, No. 1, pp. 19‐36). Springer International Publishing.
D. Conte, P. Foggia, C. Sansone , and M. Vento (2004). Thirty Years of Graph Matching in Pattern Recognition. In Int. Journal of Pattern Recognition and Artificial Intelligence (Vol. 18, No. 3, pp. 265‐298). World Scientific.
A. Garz, M. Seuret, F. Simistira, A. Fischer, and R. Ingold (2015). Creating Ground Truth for Historical Manu‐scripts with Document Graphs and Scribbling Interaction. Submitted for review to Int. Workshop on Document on Document Analysis Systems.
S. Han, F. Xiao, and L. Chen (1996). Uniform connectedness and the classical gestalt principles of grouping. In Investigative Ophthalmology & Visual Science (Vol. 37, No. 3, pp. 1350‐1350).
G. Nagy (2000). Twenty Years of Document Image Analysis in PAMI. IEEE Trans. on Pattern Analysis and Ma‐chine Intelligence (Vol. 22, No. 1, pp. 38–62). IEEE.
T. T. H. Nguyen, M. Coustaty, and J. Ogier (2011). Bags of strokes based approach for classification and indexing of drop caps. In International Conference on Document Analysis and Recognition (ICDAR), (pp. 349‐353). IEEE.
P. Riba, L. Lladós, A. Fornés, and A. Dutta (2015). Large‐Scale Graph Indexing Using Binary Embeddings of Node Contexts. In Graph‐Based Representations in Pattern Recognition (pp. 208‐217). Springer International Publish‐ing.
64
65
Visual Literary Topology
Rachid Hedjam1, Margaret Kalacska1, Sumaya S. Ali Al‐ma’adeed2, and Mohamed Cheriet3
1Department of Geography, McGill University, Montreal, Qc H3A 2K6; Canada 2Department of Computer Science and Engineering, Qatar University, Doha, Qatar
3Department of Automated Manufacturing Engineering, ETS, University of Quebec Montreal, Canada
Introduction
This paper highlights some research directions of one of the ongoing project (DiDC: Digging into Data
Challenge) we are conducting in collaboration with the literary department of McGill University
(Canada). The aim of this project is to combine visual image processing (VIP) pattern recognition (PR),
machine learning, network science and text analysis to study cultures of literary communication
across a broad spectrum of space and time: post‐classical Islamic philosophy, Chinese Women’s
Writing from the Ming‐Qing Dynasties, the Anglo‐Saxon Middle‐Ages, and the European
Enlightenment. How are these different periods and places characterized by networks of shared
ideas? How did such literary networks contribute to the distinct intellectual contributions of each
epoch?
Fig. 4: Footnote mark and footnoted word
More recently modeling by networks has become of interest to literary scholars who have explored
the nature of literary networks in a variety of contexts, including visualizations of eighteenth‐century
epistolary networks; literary geographies; and quantitative studies of social networks within different
genres. Traditionally, these networks are generated based on the similarities between available
textual information of their printed or handwritten manuscripts. Natural language processing (NLP) is
the basic tool used to find out the frequent items shared between the different manuscripts based
on which the similarity is computed (Piper and Algee‐Hewitt 2014). Then, various network analysis (2]
approaches can be used to analyze these networks in order to answer existing questions about
66
human sciences. Unfortunately, millions of historical manuscripts are still in the form of document
images and the text they contain is not yet in its live form. Our new vision toward meeting this
challenge addresses manuscripts in their image form using VIP and PR. The latter provide to digital
human expert the possibility to entail the generation of network representations of the text corpuses
based on the drawn visual data in the same way as he would using the semantic information of word
frequencies. The nodes of the network will represent individual works or pages and the edges will
represent the similarity or dissimilarity between them based on their visual features.
Methodology
The core of VIP is based on extracting and transforming graphical marks on page images with a large
set of transforms (known as ‘’patches”). A patch can be seen as a sample taken of a character, for
example, and contains a rich amount of information that can be used to build the graphical relations
between pages. Although the approach is universal to all collections, separate descriptors and data
handling will be developed for each collection based on the nature of script and style of its
documents. In the context of DiD German dataset, the footnote is considered as the most important
visual feature that an image patch can cover. Each footnote is indicated by a footnote mark (FNM),
numeric, alphabetic or special character. Each FNM (see Figure 1) appears twice in a manuscript page,
one appears at the top, (noted as TFNM) and one appears at on the bottom (noted as BFNM). TFNM
is placed just after the footnoted word, and the BFNM is placed just before the footnote itself.
Therefore, virtually a manuscript page contains always an even number of footnote markers (i.e.,
TFNM and BFNM). The footnoted word is used for the content analysis looking, for instance, for the
most often footnoted words that will be served to build a common lexicon dictionary. From the
literary point of view two manuscripts can be connected or similar if they share one or more than
one similar footnoted word.
Fig. 5: Similarity network between manuscripts
67
The footnoted word detection depends on the accuracy of the FNM detection. What making the
automatic processing difficult, if not impossible in many cases, is to deal with degraded manuscript
images scanned with very low resolution, in only black and white colors, add to that all the damages
caused by the geometrical deformation due to poor handling during scanning. Consequently the
footnote detection problem itself is not at all accurate. Some pre‐processing such as noise and
shadow removal, text stroke smoothing, skew correction, must be established first. We have
proposed two methods to detect the FNM. The first method is based on MACH filter (Kumar et al
2015), and the second method is based on Adaboost classification rule (Freund and Schapire 1997).
MACH filter is based on correlation pattern notion. Given a class of instance of a class of FNM patch,
a MACH filter combines the training patches into a single composite template by optimizing four
performance metrics: the average correlation height, the average correlation energy, the average
similarity measure, and the output noise variance. The template is then correlated with testing the
target FNM in the frequency domain via a fast Fourier transform, resulting in a surface in which the
highest peak corresponds to the most likely location of the target FNM in the manuscript image. As
for Adaboost classifiers, we used HAAR features and adaboost algorithm. HAAR features are very
simple and can be calculated efficiently using the integral image. Thanks to its simplicity and
efficiency, it could help in reducing the computational time when processing such kind of huge
collection of document images. Usually; template matching is used for object detection in the binary
images which is computationally expensive. Here, we used a boosting based algorithm, which is
rarely used for binary images, to detect FNMs. The detector can run in real time and provides high
performance. Once the FNM are detected, the footnoted words are located and matched each other
through the manuscript collection in order to build the similarity graph as shown in Figure 2. Finally,
various network analysis (Fortunato 2010) approaches can be used to analyze these networks.
References
A. Piper and M. Algee‐Hewitt. The Werther Effect I: Goethe Topologically. Distant Readings: Topologies of Ger‐man Culture in the Long Nineteenth Century. Ed. Matt Erlin and Lynn Tatlock (Rochester: Camden House, 2014) 155‐184.
S. Fortunato. Community detection in graphs. Physics Reports 486.3 (2010): 75‐174.
B. V. K. Kumar, A. M. Vijaya, and R. D. Juday. Correlation Pattern Recognition. 1st ed. Cambridge: Cambridge University Press, 2005. Cambridge Books Online. Web. 18 November 2015. http://dx.doi.org/10.1017/CBO9780511541087
Y. Freund and R. E. Schapire (1997). A decision‐theoretic generalization of on‐line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119‐139
68
69
An Interactive System to help Transcription of Historical Handwritten Documents
Adolfo Santoro, Angelo Marcelli, and Francesco Carillo
Department of Electrical Engineering and Applied Mathematics@University of Salerno, Fisciano, Italy
The ongoing process of knowledge digitization identifies digital libraries as one of the most important
channels for creating and sharing knowledge all over the world: the concept of digital library evolved
from a simple digital collection to a space where services and people support the whole life‐cycle of
the creation, use and preservation of data, information and knowledge. The advancements in
imaging and storing technologies have favored the digitization of historical books and manuscripts
for better preservation and easy access but, in case of cursive handwritten documents, making the
text available for searching and browsing is a very challenging task. Currently an huge amount of
historical handwritten document images have been scanned and stored into an image format, but
their content is not yet easily and quickly available, as well shown in Bentham project, a
crowdsourced handwritten document transcription, or tranScriptorium project, whose aim is to
develop innovative and efficient solutions for indexing and transcribing the content of handwritten
documents belonging to an huge digital archive. On the other hand, easy access is ensured when the
content of the document can be dealt with information retrieval technology, so that the system can
process user’s queries asking for documents with specific content. For that to happen, document
images must be processed and their content extracted in a way that can be easily manipulated; in
case of cursive handwritten historical documents, Optical Character Recognition is not a viable
solution and therefore different approaches, such as word spotting or holistic handwritten text
recognition, have been proposed in literature but the problem is far to be solved at all. The main
reason is the large variability exhibited by those documents, usually produced by different writers in
different ages, and thus involving very different writing styles, which requires very powerful
recognition engines and human efforts of handwriting experts, also known as “scriptores”, to
interpret the content and to eventually correct errors at recognition level.
In this short abstract we propose a novel approach for helping “scriptores” to transcribe the content
of handwritten documents in huge digital historical archives based on keyword retrieval method
following query‐by‐string paradigm. Our approach aims to address two important issues: the small
size of the training set, which means small efforts requested to the “scriptores” needed to have the
system ready for use, and the possibility of improving the performance of the system along the time
by taking advantage of the interaction with the scriptor. The proposed method is composed of four
different steps, that are described below:
1. Build a Reference set, the input of the first step is a set of handwritten digital documents for
which the transcription of the words is available. Those documents are automatically processed
in order to create a Reference Set, which is composed of word images to which it is associated a
transcript.
2. Build a knowledge base: the input of this step is the whole collection of scanned handwritten
documents in which the retrieval step will be performed. All documents are processed in order to
perform the ink matching algorithm, that associates a possible interpretation obtained from the
word images belonging to the Reference Set to the unknown ink by creating a Knowledge base,
which is composed of the set of all possible interpretations for each ink of the data set.
70
3. Use and interact with the system: the retrieval step is performed by searching the keywords
typed in by the human user into the Knowledge Base. The output is a set of images candidates
for representing the keyword that the human user (i.e. scriptor) can confirm or not as instances
of that keyword.
4. Add knowledge: the last step takes advantage of the interaction with the user by adding
evidence and knowledge to the system: the right candidates are moved from the untranscribed
dataset collection to the Reference set, which will be updated and used in the further searches.
We will also report the results of experiments obtained on an historical handwritten dataset, which
show that with a Reference set composed of only 5 pages the system is able to respond to a number
of 50 keywords with a Recall index up to 80% and a Precision index of 15%; the low value of the
Precision index arises from the choice of engage more the human user at the beginning by enabling
the system to learn from the interaction and to use that evidence in the further searches. Infact in
the last iterations the value of Recall remains stable but the value of the Precision doubles for each
iteration. Moreover, in order to consider a real scenario for using our system, it has been evaluated
time gained by using our system for aided trascription of a number of 1000 words extracted from a
digital archive of handwritten documents: this evaluation has been performed by choosing a
"dictionary" of words more frequent in the pages manually trascribed and without considering stop
words, that are pretty useless for the understanding of the content of the page. In case that the
Reference Set gets a 80% of coverage of all the characters and 45% of all bigrams present in the
collection, the average gain obtained in terms of time needed to transcribe a collection is about 54%.
References
M. T. Rath and R. Manmatha, Word Spotting for historical documents, International Journal of Document Anal‐ysis and Recognition, pp.139‐152 (2007)
C. De Stefano, G. Guadagno, and A. Marcelli, A saliency based segmentation method for on‐line cursive hand‐writing, IJPRAI, 18(6), 1139‐1156 (2004)
Y. Liang, R.M. Guest, and M.C. Fairhurst, Implementing Word Retrieval in Handwritten Documents using a Small Dataset”In: Proc. 3rd International Conference on Frontiers in Handwriting Recognition (2012)
D. Aldavert, M. Rusinol, R. Toledo, and J. Llados, Integrating Visual and Textual Cues for Query‐by‐String Word Spotting Proceedings of the 12th International Conference on Document Analysis and Recognition, ICDAR (2013)
A. Fischer, A. Keller, V. Frinken, and H. Bunke, Lexicon‐free handwritten word spotting using character HMMs, Pattern Recognition Letters, pp.934‐942 (2012)
L. Rothacker, M. Rusinol, and G.A. Fink, Bag‐of‐Features HMMs for Segmentation‐Free Word Spotting in Hand‐written Documents, in. Proc. 12th International Conference on Document Analysis and Recognition (2013)
A. Fischer, V. Frinken, H. Bunke, and Y.C. Suen, Improving HMM‐based Keyword Spotting with Character Lan‐guage Models, in. Proc.12th International Conference on Document Analysis and Recognition (2013)
V. Papavassiliou, T. Stafylakis V. Katsouros, and G. Carayannis, Handwritten Document Image Segmentation into text lines and words” Pattern Recognition 43 pp. 369‐377 (2010)
L.P. Cordella, C. De Stefano, A. Marcelli, and A. Santoro, “Writing order recovery from off‐line Handwriting by Graph Traversal” Proceedings of International Conference of Pattern Recognition ICPR pp. 1896‐1899 (2010)
C. De Stefano, A. Marcelli and A. Santoro, On‐line cursive recognition by ink matching” Proceedings of Interna‐tional Graphonomics Society, IGS pp.23‐37 (2007)
V. Frinken, A. Fischer, R.Manmatha and H. Bunke, “A Novel Word Spotting Method Based on Recurrent Neural Networks” IEEE Transactions on PAMI, Vol.34, No 2,February (2012)
Bentham project http://blogs.ucl.ac.uk/transcribe‐bentham
Trascriptorium project http://transcriptorium.eu
71
Poster
72
73
Imaging Watermarks of 15th Century Islamic Manuscript Kashf Al‐Bayan’an Sifat Al‐Hayawan
Nurgül Akcebe
Department of Manuscript Conservation and Archive (Kitap Şifahanesi) /
Manuscripts Institution of Turkey, Istanbul, Turkey
Kashf al‐Bayan’an Sifat al‐Hayawan , it is in a form of encyclopedia which consists of 62 volumes with
nearly 16 thousand leaves, was written by Abu al Fath Muhammad b. Shaykh Bedreddin also known
as Ibn Atiyah. According to his own words, the author began to write the manuscript in 1487. It
includes many topics not only about zoology but also about botanic, medicine, literature and
philosophy. Also, he indicated that he had benefited from many references of about 3000 to bring
forth this valuable work (Aslan 2015). The manuscript is one of the pieces of a collection that being
hold in Millet Manuscript Library in Istanbul and there is no other copy of it.
The volumes of the manuscripts has been documented and restored by The Department of
Manuscript Conservation and Archive (Kitap Şifahanesi) that is one of the departments of the
Manuscripts Institution of Turkey. It is an ongoing restoration and preservation project. It has been
investigated the watermarks in these volumes when proceeding the documentation and restoration
of them. It has been detected 3 different types of watermarks inside pages of the covers up till now.
Two of them are similar to watermarks which have seen in the 15th century European papers*†. It can
be predictable that these papers had stuck inner covers when the manuscript had been written in
late 1400s. The third watermark that determined is an Eastern type watermark and it could be dated
between from 17th to 19th centuries (Ünver 1962; Ersoy 1963). It can be concluded that these volumes
which contained the Eastern style watermark were restored, maybe after hundreds years later from
its writing date. The text block of these volumes had become hand‐made Eastern paper although the
inner cover pages are Western papers.
Fig. 1: Watermark examples of encyclopedia and their matches with European watermark archives (a),(b).
Eastern style watermark from the same encyclopedia (c)
* Watermarks have been retrieved from: http://www.wasserzeichen‐
online.de/wzis/struktur.php?klassi=001002001001002001001&anzeigeIDMotif=2968
† Watermarks have been retrieved from: http://www.ksbm.oeaw.ac.at/_scripts/php/BR.php
(b) (c)
(a)
74
For taking image from the inner cover page, it is used the NİKON D600 camera with different angle
and then enhanced images with Photoshop CS5 program. It couldn’t be used slim light because pages
are stuck to cover and cannot be separated. Then these watermarks are matched with similar
watermarks that have been detected in 15th century European manuscripts and letters (Figure 1).
References
A. Aslan, Manuscripts Institution of Turkey (2015).
O. Ersoy, XVIII ve XIX. Yüzyıllarda Türkiye’de Kağıt, Ankara Üniversitesi Dil ve Tarih‐ Coğrafya Fakültesi Yayınları, Ankara, 1963, pp 85‐87
S. Ünver, Belleten, 26,739. (1962).
75
The Ignatius of Loyola’s Exercitia Spiritualia autograph:
analyses before and during conservation treatments
Maddalena Bronzato1,, Alfonso Zoleo2, Luca Nodari3, Carlo Federici4, and Melania Zanetti4
1Federchimica, Milano, Italy 2Department of Chemical Sciences, University of Padova, Padova, Italy
3 IENI‐CNR and INSTM, UdR of Padova, Padova, Italy 4 Department of Humanistic Studies, University of Venice Ca’ Foscari, Venezia, Italy
A large number of paper ancient manuscripts is endangered by the corrosive effect of iron gall inks
(Hey 1979). It is well known that the Fe(III) and Fe(II) species occurring in these inks are powerful
catalyzers of paper degradation reactions (Kolar and Strlic 2006). As a consequence, iron gall inks are
a main concern for paper conservators, and iron mobility and migration from the written text to the
surroundings, or iron penetration into the leaves, is an unwanted situation which often occurs,
because iron migration from the inked areas has been related to degradation of the paper leaves
(Rouchon et al. 2009). Water solubility of Fe(II) ions claims for a careful approach of water
treatments, or even humidification, which could induce halo formation. Mixture of water and alcohol
are often suggested to limit the risk of ion migration. However, both water and idroalcoholic
treatments have been recently questioned, proving to be unreliable to limit iron migration (Rouchon
et al. 2009). At present, there is no consensus on the treatments to apply and more data on iron
migration in iron gall ink on paper in different conservation treatments are required, particularly with
respect to discoloration and degradation.
This work concerns examination and conservation treatments of the oldest evidence of Ignatius of
Loyola’s Exercitia Spiritualia. The paper manuscript includes many autograph annotations by Ignatius
de Loyola, the founder of the Catholic Society of Jesus. In it, the severe degradation induced by iron
gall inks had resulted in discoloration and burn‐through. In the first half of 20th century, each leaf was
lined with silk recto/verso in order to prevent paper fragmentation of the ink areas, inducing paper
yellowing, adhesive stains and other undesirable effects.
A new intervention was therefore required to reduce the risks related to the previous intervention
and the impact due to the degradation processes.
The manuscript has been investigated by means of non‐destructive and non‐invasive spectroscopic
techniques, in order to get information toward the choice of a punctual and suitable intervention
procedure.
Infrared spectroscopies ATR‐IR and DRIFT (Derrick et al. 1999), both completely non‐invasive, were
applied in order to get indications about the current state of leaf, about the sizing materials
commonly used to improve the resistance of a sheet of paper to the water sorption, and about paper
fillers, substances added to the cellulose matrix in order to improve optical and superficial features of
paper. FORS (Fiber Optics Reflectance Spectroscopy) was used to investigate discoloration on the leaf
in order to understand their chemical nature, to determine the colorimetric coordinates of the
irradiated spot and to detect the extension of Fe(II) migration, which was demonstrated in literature
to be normally combined with brown halos (Picollo et al. 2002).
76
From the MicroRaman Spectroscopy (Bicchieri et al. 2006) it was possible to collect useful
information about the molecule structure, in particular of the inks used in the written areas. XRF (X‐
Ray Fluorescence) Spectroscopy analysis (Hahn et al. 2005) was widely run in order to detect
chemical elements, such as Ca and K, and in particular to investigate the presence of iron and copper
species, known to be efficient catalysts of paper degradation reactions.
The analysis revealed the use of different iron‐gall inks in the various manuscript leaves and the use
of gelatin as sizing material. The leaves showing the worst conservation conditions were brittle and
presented brown halos around the written areas: the XRF analysis confirmed the migration of iron
ions from the text to the surrounding areas. The highly degraded leaves of the book were
characterized by a general higher amount of iron with respect to the leaves in good conservation
state.
References
M. Bicchieri, A. Sodo, G. Piantanida, C. Coluzza, J. Raman Spectrosc., 37, 1186 (2006).
M. Derrick, D. Stulik, J. M. Landry, Scientific Tools for Conservation. Infrared Spectroscopy in Conservation Sci‐ence (The Getty Conservation Institute, 1999).
O. Hahn, B. Kanngießer, W. Malzer, Studies in Conservation, 50, 23 (2005)
M. Hey, The Paper Conservator 4, 66 (1979).
J. Kolar, M. Strlic, Iron Gall Inks: On Manufacture, Characterisation, Degradation and Stabilization (National University Library, Ljublana, Slovenia, 2006).
M. Picollo, M. Bacci, A. Casini, F. Lotti, S. Porcinai, B. Radicati and L. Stefani, Fiber Optics Reflectance Spectros‐copy: A Non‐destructive Technique for the Analysis of Works of Art, in: S. Martellucci, A.N.Chester, A.G. Mi‐gnani (Ed.), Optical Sensors and Microsystems. New Concepts, Materials, Technologies, Kluwer Academic Pub‐lisher, NY, 2002, pp. 259‐265.
V. Rouchon, B. Durocher, E. Pellizzi, J. Stordiau‐Pallot, Studies in conservation 54, 236 (2009).
77
Can non‐destructive techniques and portable instruments be used
to analyse ink and paper degradation?
Claudia Colini1, Ira Rabin1,2, and Oliver Hahn1,2
1Centre for the Study of Manuscript Cultures, Hamburg, Germany 2BAM Federal Institute for Materials Analysis and Testing, Berlin, Germany
The object of this poster is part of a bigger project aimed at the creation of a database of known raw
materials in black inks and paper coatings in Arabic manuscripts; all of which are identified by means
of non‐destructive techniques and portable instruments.
In order to verify the validity of our database when confronted with ancient manuscripts, we decided
to artificially age our samples and observe the effects of their degradation.
The specific experiment shown in this panel is the pre‐test we decided to run before the start of the
actual ageing process, with a smaller selection of samples (a single iron gall ink recipe and one
coating material), that will allow us to observe several relevant details.
First of all we will verify whether, for these recipes, a difference in the spectra would be observable,
and in such a case whether it would be possible to deduce the degradation mechanism using non‐
destructive and portable technologies alone.
In particular, the following techniques will be applied:
● FTIR‐ATR
● Colorimetry
● XRF
● pH measurement
Moreover we will evaluate the impact of ink on the degradation of cellulose, comparing samples with
and without writing.
With FTIR‐ATR we can identify several organic groups present in papers and inks, characteristic of
cellulose, gums, proteins, alcohol, perfumes, but also the SO42‐ ions peculiar of vitriol (Senvaitiene et
al. 2005). Moreover is possible to observe the behaviour of various carbonyl groups, such as
carboxylic, ketonic, aldehydic groups, which are products of cellulose partial oxidation and hydrolysis
(Calvini and Gorassini 2002; Margutti at al. 2001; Lojewska et al. 2006; Lojewska et al. 2007; Ali 2001;
M. Urescu et al. 2009). The picture will be, however, complicated by the presence of the hydrogen
bonds network, that will shift the peaks to higher frequencies, and free water that can cover some of
our areas of interest.
Concerning Colorimetry we are particularly interested in the L* and b* values for the paper: in fact
lightness can be correlated to the degree of polymerisation (Vives et al. 2001) and the increasing of
yellowness can be correlated to oxidative degradation forming conjugated ketonic groups in the 2
and 3 position of the glycopyranose ring in cellulose (Lojewska et al. 2007). Colorimetry will be
applied to inks as well, in order to evaluate the discolouration of the media (Csefalvayová et al. 2007).
78
XRF will be used to follow the migration of iron ions from the ink to the paper as well as to observe
the formation of crystals of calcium sulphate in the halo surrounding the writing. pH measurements
will be taken to record the acidity of both paper and ink. Concerning non inked paper, pH can be
related to the formation of carboxylic groups, already observed with FTIR‐ATR; however it seems that
some other group is involved in the increasing of acidity during ageing (Lojewska et al., 2007).
Regarding inks, the increasing of acidity can be linked to iron oxidation (Rouchon et al. 2011).
Given the extreme variety of ageing modalities found in bibliography, we decided to age our samples
in a humid chamber at 80°C and 65% RH for 49 days, using two different configurations: a set of
samples was hanged, while the other set was placed inside stacks of paper sheets, covered top and
bottom with boards of polyacrylate in order to simulate the structure of a book.
Samples will be collected after 0 (the reference), 7, 14, 28 and finally 49 days, meaning we will have 5
samples per configuration.
This design will give us the opportunity to verify whether different experimental conditions could
modify the direction of degradation.
References
M. Ali, “Spectroscopic studies of the ageing of cellulosic paper”, Polymer 42 (2001), pp. 2893‐2900;
P. Calvini, A. Gorassini, “FTIR – deconvolution Spectra of Paper Documents”, Restaurator 23 (2002), pp. 48‐66;
L. Csefalvayová et al., “The influence of Iron gall ink in Paper Ageing”, Restaurator 28 (2007), pp. 129‐139.
J.M. Gibert Vives et al., “A Method for the non‐destructive analysis of Paper based on Reflectance and Viscosi‐ty”, Restaurator 22 (2001), pp. 187‐207.
J. Lojewska et al., “FTIR in situ transmission studies on the kinetics of paper degradation via hydrolytic and oxidative reaction paths”, Applied Physics A 83 (2006), pp. 597‐603
J. Lojewska et al., “Carbonyl groups development on degraded cellulose. Correlation between spectroscopic and chemical results” Applied Physics A 89 (2007), pp. 883‐887
S. Margutti at al., “Hydrolytic and Oxidative Degradation of Paper”, Restaurator 22 (2001), pp. 67‐83
V. Rouchon et al., “Room‐Temperature Study of Iron Gall Ink Impregnated Paper Degradation under Various Oxygen and Humidity Conditions: Time‐Dependent Monitoring by Viscosity and X‐ray Absorption Near‐Edge Spectrometry Measurements”, Analytical Chemistry 83, 7 (2011), pp 2589–2597.
J. Senvaitiene et al., “Spectroscopic evaluation and characterisation of different historical writing inks”, Vibra‐tional Spectroscopy 37 (2005), pp. 61‐67.
M. Urescu et al., “Iron gall inks influence on papers' thermal degradation; FTIR spectroscopy applications”, European Journal of Science and Theology vol.5, n.3 (2009), pp. 71‐84.
79
Age and Fiber Structure Study Using 3D, Mesoscale Modeling and Simulation of
Ink Seepage in Paper Porous Media
Reza Farrahi Moghaddam1, Mohamed Cheriet1, and Sumaya Ali Al‐Ma’adeed2
1Synchromedia Lab, ETS, UduQ, Montreal, QC, Canada H3C 1K3 2Department of Computer Science and Engineering, Qatar University, Doha, Qatar
Background
There is a high level of interactions involved between the ink molecules, their carriers, and the media
of the paper in the body of paper. These interactions could basically determine the behavior of the
final ink‐paper product along its life cycle, including its reaction to physical stimulus, such as heat,
humidity and light exposure, in short term or long term. In particular, if the thickness of paper is
small, which would be the case in the future considering the global move toward sustainability and
less consumption of resources, it is possible that the ink could propagate and reach the other side
(the verso side) of the paper. These phenomena, which are usually referred to as the bleed‐through
effect, are very common for ancient manuscripts, which suffer from long term exposure to heat and
humidity. The simplest form of the models used for studying ink seepage is that of the ‘diffusion’
models (Farrahi 2009). These models work at a macro scale but try to produce an output that is
approximately correct to simulate the associated two‐phase fluid dynamic problem. These models
usually work on a discretized spatial space and the evolution or dynamic of the state of each point of
that discrete space is usually determined based on some ‘governing’ equations that operate within a
highly‐localized region around the point of interest (PoI), called its neighborhood. We introduced a
generalized form of the diffusions models in which the diffusion terms are not limited only to the
neighboring points in the ‘same’ space (Farrahi 2009). In other words, various ‘sources’ could
contribute to the same target space in the form source spaces from which the state of their points,
which are ‘neighbors’ of a PoI, could be linked to the future state of that PoI on the target space. By
proper modification of the diffusion coefficient, we were able to introduce the first nonlinear, patch‐
based restoration method based on what we called the ‘reverse’ diffusion (Farrahi 2009).
Fig 1: An X‐ray image of paper’s fiber structure (left), a digital fiber structure, which resembles a real structure
in simulations (right)
Previous Work on 3D Modeling and Simulation of Ink Seepage
To have a low‐level simulation of the ink seepage in paper at the mesoscales, actual displacement of
the ink material within the body of paper should be modeled (see Figure 1). Although there are
various models and techniques developed for 3D simulation and analysis of the liquid propagation
within porous media, almost all these models assume an ‘infinite’ volume of liquid (ink) entering the
porous media. This assumption simplifies the calculation to a great extent. However, it is not
applicable to our case of ink seepage because by definition the volume of ink used to write or paint
80
on a paper in much less than the volume of paper. In order to address this ‘challenge of finite
volume’, we introduced a new mesoscale model at the level of fiber structure discretization that
considers the actual amount of ink used as a constraint in its formulation (Farrahi 2013). To address
the computational challenge associated to direct, iterative numerical solving of the new model, we
also proposed a modified form of the genetic algorithm specialized for our case in order to speed up
process of converging to the final solution. Examples of the simulation results could be seen in
Figure 2.
Fig 2: The digital field that represents the fiber material (left), the same field with addition of the ink
distribution (ink is in red, right)
Proposed Paper Aging Study based on 3D Fiber Structure
Although the model introduced in [Farrahi2013] is capable to simulate 3D mesoscale seepage of ink
within the porous media of fibers, it is recommended to perform various 3D simulations for a
comprehensive set of parameters’ value for a small‐size (microscale) surface of paper, and then
integrate and model the results of the 3D simulations in the form of nonlinear diffusion coefficients
that can be used in the macroscale diffusion simulations. Furthermore, extending and generalizing
the 3D mesoscale model of [Farrahi2013] could be sought toward introducing and considering the
‘sub‐fiber’ potential terms, which could be the actual carriers of the ink molecules along or ‘across’
the fiber pieces or strands, in the calculations. In addition, complementary optical models could start
from the 3D profile of an ink‐seepage instance within paper along with the paper’s fiber structure or
its characteristics, and then calculate the estimated ‘reflective’ or ‘transmissive’ optical observation.
These methods could be then used to perform reverse engineering in estimating the 3D ink profiles
from the observations that can be used to generate the restored or ‘original’ version of a degraded
manuscript. In this work, we focus on a time‐dependent, invasive approach to build a relation
between the ink propagation on the surface or within paper to the fiber structure of the paper. It is
well known that fiber structure of paper could be used as a reliable source to estimate the age and
era of the paper. Our approach is based on a model that relates time‐dependent radial and axial
behavior of ink propagation on the surface of paper to the fiber structure. Therefore, it requires high‐
definition, high‐dynamic‐range, and high‐frame‐rate video observation of ink‐paper interaction. An
extended version of the model is considered that work also with temporal distribution of ink within
the paper in addition of the ‘surficial’, time‐dependent observation. The relations between temporal
behaviors and the fiber structure are then confirmed using 3D simulation of ink seepage and also X‐
ray observation of the structure.
81
Acknowledgements
This publication was made possible by Grants RGPDD/451272‐13 and RGPIN/138344‐14 from the
NSERC of Canada, Grant 412‐2010‐1007 from the SSHRC of Canada, and NPRP Grant #NPRP 7‐442‐1‐
082 from the Qatar National Research Fund (a member of Qatar Foundation). The statements are
solely the responsibility of the authors.
References
R. Farrahi Moghaddam and M. Cheriet, Low quality document image modeling and enhancement, IJDAR, 11 (4), 183‐201, (2009). DOI: http://dx.doi.org/10.1007/s10032‐008‐0076‐2
R. Farrahi Moghaddam, F. Farrahi Moghaddam, and M. Cheriet, Computer Simulation of 3‐D Finite‐Volume Liquid Transport in Fibrous Materials: a Physical Model for Ink Seepage into Paper, arXiv preprint arXiv:1307.2789, pp 26, (2013). Arxiv: http://arxiv.org/abs/1307.2789
82
83
A combination of three complementary non‐destructive Methods applied to Historical Manuscripts
Bernadette Frühmann, Federica Cappa, Wilfried Vetter, and Manfred Schreiner
Institute of Science and Technology in Art, Academy of Fine Arts, Vienna, Austria
Within the framework of the HRSM‐project* with the aim of an investigation of cultural heritage, the
Centre of Image and Material Analysis in Cultural Heritage (CIMA)† was established at the beginning
of 2014. Within this project several historical written manuscripts of the Austrian National Library
(ÖNB) were examined. The selection comprises badly preserved or rewritten manuscripts
(palimpsests) on the one hand, and manuscripts with a remarkable make up on the other, deriving
from the 8th to the 14th century. The material investigations aim at the determination of the inks and
pigments used for writing and illuminating. Besides multispectral imaging‡, different non–destructive
and non–invasive material investigations are required.
As the mentioned manuscripts are very sensitive due to their age as well as due to the fact that they
were in intense use, it was highly aimed that collecting data of used writing inks and parchments
must involve non‐destructive techniques. This means that methods with the ability to measure in‐
situ are needed. XRF Analysis, Raman‐ and FTIR‐spectroscopy covers the demands mentioned and
can be applied under specific conditions also as air‐path systems.
For the elemental identification of the used inks and pigments a portable XRF‐device of the XGLab§,
type ELIO, could be applied. Designed especially for the use in the field of arts it is equipped with a 4
W Rh X‐ray tube with a maximum voltage of 50 kV and mounted with a small x‐y‐stage on a tripod. In
addition, two pointing lasers for alignment purpose and an integrated camera for positioning are
assembled. With an ultra‐fast silicon drift detector – active area 25 mm2 and an energy resolution <
140 eV – practical advantages in spectra quality and the detection of light elements with Z even
below 20, such as Mg, Al, Si and P are given.
For the compound specific identification of pigments e.g. of the same color Raman spectroscopy was
applied**. The measurements could be carried out in‐situ with the Pro‐Raman‐L‐Dual‐G of Enwave
Optronics, USA, a fully integrated and portable instrument. The excitation source applied for this
investigation was a Diode Laser at 785nm (ap. 350W) with narrow line‐width of 2.0 cm‐1. The
instrument is based on a two dimensional CCD array detector which is temperature regulated (‐60 °C).
The integrated microscope is equipped with a 1.3 Mpixel CMOS camera with In‐Line LED illumination.
* HochSchulraum‐StrukturMittel (Structural Fund for Austrian Higher Education) of the Austrian Federal
Ministry of Science and Research, 2013.
† CIMA is an interuniversity research institution with an interdisciplinary approach to the investigation of
cultural heritage.
‡ Imaging in spectral ranges from UV to IR, CVL – Computer Vision Lab, Vienna University of Technology
§ XGLab S.R.L. X and Gamma Ray Electronics – Spinoff del Politecnico di Milano, www.xglab.it
** A.S. Lee, V. Otieno‐Alego, D.C. Creagh, J. Of Raman Spectroscopy 39, 1079‐1084 (2008)
84
In order to get some more information about the compounds or even some organic mixtures FTIR‐
spectroscopy was a useful tool. For these measurements a portable Bruker ALPHA †† FTIR
spectrometer was employed with a measuring point diameter of approximately 5 mm. Total
reflection spectra (specular and diffuse reflection) were collected in‐situ in the range of 4000‐450
cm‐1 at a resolution of 4 cm‐1 over 32 scans. The background was acquired using a gold mirror as
reference sample. The total reflection spectra were transformed to absorption index spectra applying
the Kramers‐Kronig algorithm, which is included in the software package OPUS, version 6.5, used for
controlling the ALPHA instrument and data acquisition and evaluation. After the transformation a
baseline correction was applied to the absorption index spectra.
In this presentation preliminary results of the measurements of historical written manuscripts will be
shown, especially of the Greek and Slavic region in the 12th to 14th century. On the one hand the
black ink should be classified as well as compared within the manuscripts. On the other hand the
identification of the used pigments for the illuminations and initials was obtained. As some of the
manuscripts contain folia with palimpsests underneath the text, it was an additional challenge to
identify the used writing media for these old writings.
With the results of these three complementary methods it was possible to identify a lot of the used
materials and to compare different pigments applied in similar initials or miniatures.
†† 6Bruker Optics, Ettlingen, Germany, http://www.brukeroptics.com/alphaaccessories.html?&L=0&print=1%
25252525253F (accessed 27/10/2009)
85
Old Manuscript Analysis: beyond the Visible
Rachid Hedjam1, Margaret Kalacska1, Sumaya S. Ali Al‐Ma’adeed2, and Mohamed Cheriet3
1Department of Geography, McGill University, Montreal, Canada 2Department of Computer Science and Engineering, Qatar University, Doha, Qatar
3Department of Automated Manufacturing Engineering, ETS, University of Quebec, Montreal, Canada
Introduction
Multispectral (MS) imaging is used mostly to record spectral images in both the visible and the
invisible light range, i.e. from ultraviolet (UV) to infrared (IR). Thanks to the use of UV and IR light, MS
imaging can extract information that the human eye cannot capture with its receptors for red, green
and blue. Light that is visible (to the human eye) has wavelengths in the range of about 380 nm to
740 nm (Hedjam 2013). A spectral image is reproduced as a grey‐scale image or an RGB color image.
Visible light is situated between UV light, which has short wavelengths in the 10 nm to 400 nm range,
and near‐IR light, which has long wavelengths in the 700 nm to 1 mm range. IR spectral IR spectral
images can be combined into a grey‐scale image, and three of them can be used to create false color
RGB images. The principle underlying MS imaging systems is the concept of the spectral signature.
The main idea is that all materials emit, transmit, or absorb electromagnetic radiation based on the
inherent physical structure and chemical composition of the material, and the wavelength of the
radiation. Every material transmits, absorbs, or emits an amount of electromagnetic radiation
commensurate with the wavelength of the radiation impinging on the material. The ratio of reflected
to emitted radiation from the surface of an object varies with the frequency of the wavelength and
the angle of incidence of the radiation. The combination of emitted, reflected, and absorbed
electromagnetic radiation across a range of wavelengths produces what we call a spectral signature,
which is unique to that material. It is therefore possible to differentiate between objects based on
differences in their spectral signatures (Klein et al. 2008). There are a number of applications for MS
imaging, e.g., IR reflectography, UV reflectography, UV fluorescence, etc.
Fig. 1: Using IR imaging for paper substrate examination. (left) RGB image; (right) IR band at 1100 nm.
IR imaging technique is a technique that records portions of absorbed and reflected IR light, which
passes through the document layers to interact with the underwritten portions of the document. It
can provide a document historian with very important information about the types of ink used and
the document constituents, all of which help him assess the condition of the document under study.
It can also be used to examine the sheet (leaf) substrate and discover its physical details as shown in
Figure 1 (right). The latter shows narrow lines, very close each other, run through the leaf from up to
86
bottom are also visible. We believe that, those lines may be due to the traces that rollers of the
production machine can leave when it presses the fiber or the substrate at the time of papermaking.
Those important characteristics of the paper constitute an important source of information provided
to librarians and scholars to help them in detecting the origin of the manuscripts and also know their
date of fabrication. IR imaging can be also a useful technique for ink examination. This problem is
well known in the area of questioned document examination. Forensic scientists still face many false
(forged) documents made with intent to deceive the human eye to, for example, earn profits by
changing the amount of money reported on checks. To discriminate between different suspect inks,
the scientist examines the different spectral signatures recorded from ink samples using a
spectrometer or multispectral sensors. The spectral signature recorded from each pixel represents
the percentage of light absorbed or reflected by those pixels. Pixels from same ink will mostly
generate similar spectral signatures, while pixels from different inks will generate different spectral
signature. For instance, if the ink is made from iron‐gall, it transmits the IR light and thus it will not be
shown in the bands recorded at the IR wavelengths. An example is the numerical text ‘’485” which is
visible in the visible bands and invisible in the IR bans as illustrated in Figure 2. But if the ink is made
from carbon, it absorbs the IR light and thus it appears even in bands recorded at the IR wavelengths.
An example is the Arabic text shown in Figure 2. The reflection, absorption and transmittance of the
light are related mainly to the chemical composition of the materiel which composes the ink. This is
key information to know the period in which such an ink is used.
Fig. 2: Ink discrimination. (a) color image; (b) IR band at 1100 nm.
Another MS technique is Ultraviolet imaging technique, which records portions of absorbed and
reflected UV light. UV light is an effective tool that can be used to detect newly touched up areas and
later restorations that are not visible to the human eye. The experimental setup for UV involves
illuminating the document under study using UV lamps (usually referred to as black light) and
installing a UV pass filter in the front of the acquisition camera to exclude the reflected visible light
and allow only reflected UV light to pass through. The result is a gray‐scale (monochromatic) image
of the UV light reflected from that document. UV is also a very useful tool for investigating ancient
manuscripts such as revealing some traces of text that may be added by an archivist and then erased
after as shown in Figure 3. In the color image shown in Figure 3 (left), it is not possible at all to see it.
By examining the band recorded at UV wavelength (400nm), we can see that there is a trace of a text
written at the upper left side of the leaf (see Figure 3 (middle)). After some improvement in the
image shown in Figure 3 (middle), the text becomes more visible (Figure 3 (right)), even its
deciphering remains difficult. More advanced methods based on MS analysis have been also
proposed (Hedjam and Mohamed Cheriet 2011), mainly in document image segmentation and text
extraction.
87
Fig. 3: Using UV imaging to detect traces of writing
The idea is to use the spectral signature of pixels as feature vectors and apply existing classification
techniques to separate text from background as shown in figure 4. In practice, each representative
spectral signature (e.g., mean spectral signature) is defined from a homogeneous region of a pattern,
such as an area of particular color. The hypothesis of homogeneous regions assumes that patterns
belonging to the same class share the same spectral characteristics. This hypothesis is then used to
distinguish between objects belonging to different classes. A trivial way to classify the patterns is to
map their associated spectral signatures, or to compare each spectral signature to a reference
spectral signature, according to a specific criterion.
Fig. 4: Text extraction
Acknowledgments
This publication was made possible by NPRP grant #NPRP 7‐442‐1‐082 from the Qatar National
Research Fund (a member of Qatar Foundation). The statements are solely the responsibility of the
authors.
References
R. Hedjam, Visual image processing in various representation spaces for documentary heritage preservation,
Ph.D. dissertation, ETS, University of Quebec, Montreal,Quebec, Canada, April, 30 2013.
M. E. Klein, B. J. Aalderink, R. Padoan, G. de Bruin, and T. A. Steemers, Quantitative hyperspectral reflectance
imaging, Sensors, vol. 9, no. 8, 2008.
R. Hedjam and M. Cheriet, Combining statistical and geometrical classifiers for text extraction in multispectral
document images, in Proc. of the 2011 HIP, ser. HIP ’11, 2011, pp. 98–105.
88
89
Scientific Analysis of Early Qur'anic Manuscripts
Tobias J. Jocham und Michael Josef Marx
Corpus Coranicum, Berlin‐Brandenburgische Akademie der Wissenschaften, Potsdam, Germany
In the context of the French‐German cooperation project Coranica an interdisciplinary group took up
the task of including modern scientific analysis methods into the respective manuscript studies. Thus,
the Coranica project includes a module named computatio radiocarbonica where palaeographical
analysis and dating of the oldest manuscripts of the Qurʾān is supplemented by scientific methods
such as radiocarbon dating or ink analysis.
The results for manuscripts dated by colophon and their 14C‐age will be set in relation to the
measured values of undated pieces. Therefore, the first group of samples will be taken from Arabic
papyri from the period 642‐750 AD, which will also test the accuracy of the 14C method for this
particular writing material in this period (Bronk Ramsey & Shortland 2013; Dee et al. 2012). Some
dated early Arabic papyri form an important basis for the palaeographical typology of the early
qurʾānic manuscripts (Grohmann 1958). Since all the early manuscripts of the Qurʾān were written on
parchment, a comparison of dated parchments and palimpsest with non‐qurʾānic texts from the
same region and period, i.e. Syriac and Georgian manuscripts, have been consulted. Such a selection
of documents will be made over an extended period of time (450 to 950 AD), with the benefit of
discerning any systematic errors as well.
With this research, the actual precision and significance of 14C datings can be determined for early
manuscripts of the Qurʾān. The selection forms a representative sample from the known manuscripts
in ḥiǧāzī ductus, which are considered the oldest written textual witnesses of the Qur'an ‐ their
temporal proximity to the proclamation of Muḥammad is still discussed today. Until now, scientific
analysis was performed on these materials in only a few cases and thus had a limited scope of impact.
The structure of this new experimental setup, and the amount of sampled materials, will result in a
database which will provide an accurate basis for further investigations.
This method is not without concerns regarding conservation, as it destroys a small part of the object
for analysis. However, due to improved techniques for extraction of carbon, only a small amount
(about 20 mg) of the original material needs to be removed. For parchment, this corresponds to an
area of about 1 cm² ‐ depending on the respective material's thickness and with some even smaller
bits a species determination was possible.
Through a systematic and comparative analysis of early qurʾānic manuscripts by means of scientific
methods like radiocarbon dating ‐ regardless of palaeographic classification ‐ new findings may be
discovered about the history of the oldest witnesses of the qurʾānic text.
As some first results have already been published (Marx and Jocham 2015), the reaction not only in
the media* but in the scientific circles as well has demonstrated the interest and need for such
interdisciplinary research efforts*.
* Further press coverage in the BBC Persian, Deutsche Welle and the Tagesschau (main news of German public
broadcast).
90
References
Bronk Ramsey & Shortland, Radiocarbon and the Chronologies of Ancient Egypt, 2013
M.W. Dee, J.M. Rowland, T.F.G. Higham, et.al., Synchronising radiocarbon dating and the Egyptian historical chronology by improved sample selection, in Antiquity 86:333, 2012.
A. Grohmann, The Problem of Dating Early Qurʾans, in Der Islam 33:8, 1958.
M. J. Marx and T. J. Jocham, Zu den Datierungen von Koranhandschriften durch die 14C‐Methode, in Frankfur‐ter Zeitschrift für Islamisch‐theologische Studien (2015).
* Cf. the results of the Leidener University Library can be found here. Or the press release of the University
Library
91
Spectroscopic Studies of Armenian Manuscripts: Paper, Inks, Pigments
Yeghis Keheyan1 and Gayane Eliazyan2
1ISMN, CNR, c/o Dept. of Chemistry, University of Rome “La Sapienza”, Rome, Italy 2Restoration Dept. of Matenadaran Museum of Yerevan, Armenia
Since 1998, the Italian group (CNR) and Matenadaran (Armenia) department of restoration have
collaborated in order to identify the chemical composition and the degradation of ancient
manuscripts from the tenth to the seventeenth century.
Different papers have been published and presentations have been given throughout the world
(Eliazyan et al. 1998; Keheyan et al. 2001; Keheyan et al. 2012; Baraldi et al. 2013; Baraldi et al. 2014;
Keheyan et al. 2014; Keheyan and Baraldi 2015; Keheyan et al. 2015).
In this presentation the results obtained with different spectroscopic techniques, such as: SEM‐EDX,
Raman, XRF, FTIR will be given. There are studies on different pieces of papers, inks and pigments
and also the study of the whole manuscript from all its different parts, including cover, binding etc.
This contribution presents results from the technical study of a rare XIV century Armenian
illuminated manuscript in the collection of the Matenadaran. Characterization of the manuscript
components has been undertaken to create a preservation plan for the manuscript.
Proceeding in analysis of the painting materials and techniques of Armenian illuminated manuscripts
we refer about a XIV century manuscript n. 4915 (Gospel) from Aghtamar Island with colorful images
that were under restoration. The Aghtamar Gospel is a single bound manuscript (26.5‐27x18.5‐19 cm)
of 288 leaves written in bolorgir, a medieval Armenian cursive script. While no binding is extant,
sewing holes and trimmed edges show that it was bound at least twice previously. Notable are the
full‐page miniatures of the evangelists Matthew, Mark, Luke and John, each of whom faces the
opening text of his Gospel. Yellow, blue, green, magenta and red are lavishly employed in the
miniatures in a range of shades. White, black, grey and brown are used in discrete areas. The facing
pages of the miniatures contain brightly decorated headpieces that signal the opening of the Gospel
texts, plus stylized initials, zoomorphic writing and linear arabesque marginalia. Most of the text is
written with opaque black ink, with occasional rubrication and headings in orange‐red or dark
magenta. A faint grey color and strong indentations in the paper substrate show the ruling lines. The
paper of the text block is surface sized, probably with starch, and has been burnished to give it a
smooth, glazed appearance similar to parchment. Cover: wood and brown leather. The microsamples
were analyzed with different techniques showing that traditional pigments were used, but other
products and mixtures are typical of Armenia illumination, such as vergaut, a mixture of indigo and
orpiment suitable for foliage. The wide gamut of materials employed is of high significance. Gilding
was applied on Armenian bole together with a proper binder. Among the most frequently found
pigments there are: carbon, white lead, gypsum, calcite, orpiment, lazurite, indigo, cinnabar, goethite,
litharge, massicot, azurite, minium. A green is antlerite, a basic copper sulfate characteristic of
regions around Armenia.
92
References
P. Baraldi, Y.Keheyan, P.Zannini, Un altro paese, un’altra tavolozza,Pigmenti e coloranti nella miniatura di codici armeni da Matenadaran , XIV Congresso nazionale di chimica dell’ambiente e dei beni culturali “La chimica nella società sostenibile”, Rimini, 3‐5 giugno 2013
P. Baraldi, Y. Keheyan, G. Eliazian, A. Mkrtichian, S. Nunziante, C. Baraldi, Armenian illuminated manuscripts, a colourful testimony of religious art examined by molecular spectroscopy techniques, VIth European Symposi‐um on Religious Art, Restoration & Conservation, ESRARC2014, Florence, Italy, 9‐11 June 2014
G. Eliazyan, G. Alaverdyan, Y. Keheyan, Use of polymers for strengthening dilapidated museum materials Work‐shop; Metodi Chimici, Fisici e Biologici per la salvaguardia dei Beni Culturali, 18 dicembre, San Michele, Roma, 1998
Y. Keheyan, G. Eliazyan, G. Alaverdyan, The characterization of medieval colours and papers by laser desorption FT‐ICR mass spectrometry European materials research society spring meeting, E‐MRS 2001, Strasbourg (France, 5‐8) 2001
Y. Keheyan, P.Baraldi, G. Eliazian, A study on the polychromy and technique on some Armenian illuminated manuscripts by Raman microscopy, 2nd Int. Scientific seminar “The faces of memory: The newest Technologies of preservation and restoration of manuscript and printed heritage” October 7‐11, 2012, Yerevan, Armenia.
Y. Keheyan , P. Baraldi, P. Zannini, G Eliazian, C. Baraldi, M.C. Gamberini , S. Nunziante Cesaro A study of some illuminated Armenian manuscripts, ICOM‐CC Triennal Congress, Melbourne (Australia), September 15‐19, 2014
Y. Keheyan , P. Baraldi The use of non invasive micro‐Raman, FT‐IR and SEM‐EDX analyses to study Armenian illuminated manuscripts. Technart15, Catania, April 27‐30, 2015
Y. Keheyan, P.Baraldi , A.Agostino , G.Fenoglio , M.Aceto Spectroscopic study of an Armenian manuscript from the Biblioteca Universitaria di Bologna ESRARC15, VIIth European Symposium on Religious Art, Restora‐tion & Conservation, Trnava, Slovakia, 4‐6 June 2015
93
A First Step to Balinese Script OCR: An Initial Study on Isolated Character Recognition of Balinese
Script on Palm Leaf Manuscripts
Made Windu Antara Kesiman, Jean‐Christophe Burie, and Jean‐Marc Ogier
Laboratoire Informatique Image Interaction (L3i), University of La Rochelle, La Rochelle, France
Bali has a rich tradition of literature that dates back several hundred years. One very valuable cultural
relics that are found in Bali is a collection of palm leaf manuscripts (Fig. 1). The island’s literary works
were mostly recorded on dried and treated palm leaves, called lontar. Lontars store various forms of
knowledge and historical record of the social life of balinese cultures long ago. Lontars were written
in the ancient literary texts composed in the old Javanese language of Kawi and Sanskrit. The epic of
lontar varies from ordinary texts to Bali’s most sacred writings. Many of those epics based on the
famous Indian epics of Ramayana and Mahabharata. They include texts on religion, holy formulae,
rituals, family genealogies, law codes, treaties on medicine (usadha), arts and architecture, calendars,
prose, poems and even magics. Many lontars contain information on important issues such as
medicines and village regulations that are used as daily guidance.
Fig. 1: Balinese palm leaf manuscripts
Lontar is written on a dried palm leaf by using some sort small knife, which is then scrubbed and
colored with natural dyes. Lontars are inscribed with a special tool called pengerupak. It is made of
iron, with its tip sharpened in a triangular shape so it can make both thick and thin inscriptions. The
writings were incised in one (and or both) sides of the leaf and the script is then blackened with soot.
The leaves are held and linked together by a string that passes through the central holes and knotted
at the outer ends. But unfortunately, the physical condition of natural materials from palm leaves
certainly cannot last long. Many lontars discovered is a collection of the museum and private family
that has been in a state of disrepair due to age and due to inadequate storage conditions. Natural
materials from palm leaves certainly can not fight time. Usually, palm leaf manuscripts are of poor
quality since the documents have degraded over time due to storage conditions. Equipment that can
be used to protect the palm leaf to prevent rapid deterioration still relatively few in number, and
therefore the process of digitizing and indexing lontars are important (Kesiman et al. 2015/1;
Kesiman et al. 2015/2). In the last five years, ancient palm leaf manuscripts have received great
attention from researchers in the field of document image analysis. The collection of palm leaf
manuscripts in Southeast Asia attracted the attention of researchers in document image analysis, for
example some works on document analysis of palm leaf manuscript from Thailand (Chamchong et al.
2014; Fung and Chamchong 2010). The majority of Balinese have never read any lontar because of
language obstacles as well as tradition which perceived them as sacrilege. The main objective of this
project is to bring added value to digitized palm leaf manuscripts by developing tools to analyze,
index and access quickly and efficiently to the content of lontar. This research tries to make lontars
94
more accessible, readable and understandable to a wider audience and to scholars and students in
Bali, Indonesia and also all over the world. Lontars offer a new challenge in OCR development due to
the physical characteristics and conditions of the manuscripts. Balinese palm leaf manuscript images
provide a real challenge in the OCR development. The palm leaf manuscripts contain discoloured
parts and artefacts due to aging and low intensity variations or poor contrast, random noises, and
fading. Severals deformations in the character shapes are visibles due to the merges and fractures of
the use of nonstandard fonts, varying space between letters, and varying space between lines (Fig. 2).
Fig. 2: Severals deformations in lontar
(Kesiman et al. 2015/2)
With the aim of developing an OCR system for balinese script on palm leaf manuscript images, in this
paper we present our first initial study on isolated character recognition of Balinese script on palm
leaf manuscripts. We investigated the performance of two image features, gradient feature (Khayyat
et al 2013), and a more complex feature, Bag of Features with Dense SIFT (BoF DenseSIFT) (Rusinol
2011), by using two models as classifier, Support Vector Machine (SVM) and Hidden Markov Model
(HMM). We performed six schemes of experiment on our first isolated balinese character dataset
which is collected from our first balinese palm leaf manuscript collection. It consists of 37 characters
of balinese script alphabet, with a total number of 10.159 isolated character images as data train and
740 isolated character images as data test. For the experiment of SVM Two Classes classifier and
HMM classifier, we built 37 SVM models and 37 HMM models for each character. SVM Multi Classes
classifier is then performed by applying One‐VS‐All classification scheme based on all two classes
classifiers. We also present and analyse the quantitative and visual correlation between characters in
balinese script based on the recognition performance of our six schemes of experiment. This result of
analysis will serve in our futur works as the first step to OCR development of balinese script on palm
leaf manuscripts by using a more appropriate image feature or by proposing a new multi classes
classifier scheme in order to achieve a better recognition rate.
References
R. Chamchong, C.C. Fung, K.W. Wong, Comparing Binarisation Techniques for the Processing of Ancient Manu‐scripts, in: R. Nakatsu, N. Tosa, F. Naghdy, K.W. Wong, P. Codognet (Eds.), Cult. Comput., Springer Berlin Hei‐delberg, Berlin, Heidelberg, 2010: pp. 55–64. http://link.springer.com/10.1007/978‐3‐642‐15214‐6_6 (2014).
C. C. Fung, R. Chamchong, A Review of Evaluation of Optimal Binarization Technique for Character Segmenta‐tion in Historical Manuscripts, in: IEEE, 2010: pp. 236–240. doi:10.1109/WKDD.2010.110.
M.W.A. Kesiman, S. Prum, I.M.G. Sunarya, J.‐C. Burie, J.‐M. Ogier, An Analysis of Ground Truth Binarized Image Variability of Palm Leaf Manuscripts, in: 5th Int. Conf. Image Process. Theory Tools Appl. IPTA 2015, Orleans, France, 2015/1: pp. 229–233.
M.W.A. Kesiman, S. Prum, J.‐C. Burie, J.‐M. Ogier, An Initial Study On The Construction Of Ground Truth Bina‐rized Images Of Ancient Palm Leaf Manuscripts, 13th Int. Conf. Doc. Anal. Recognit. ICDAR, Nancy, France, 2015/2.
M. Khayyat, L. Lam, C.Y. Suen, Verification of Hierarchical Classifier Results for Handwritten Arabic Word Spot‐ting, in: IEEE, 2013: pp. 572–576. doi:10.1109/ICDAR.2013.119.
M. Rusinol, D. Aldavert, R. Toledo, J. Llados, Browsing Heterogeneous Document Collections by a Segmenta‐tion‐Free Word Spotting Method, in: IEEE, 2011: pp. 63–67. doi:10.1109/ICDAR.2011.22.
95
Fibre Analysis of Pattani Manuscripts
Ayşegül Kocaman
Manuscript Institution of Turkey, The Department of Manuscript Conservation and Archive, Istanbul, Turkey
Fiber analysis of historical manuscripts is an important step for determining the nature of paper
material and also deciding the conservation treatment. Owing to the fact that three Patani‐Thailand
manuscripts were analyzed. Test samples were collected from tree manuscript in the name of N1, N2
and N5 which the numbers indicate the manuscript’s collection number given at Pattani. Microscop
slides were prepared with in two groups. In first group fibers were painted with metilen blue and this
group called NM1; NM2 and NM5. Unpainted fibers were in the second group. The identification of
fibers was carried out by Olympus BH2 Polarized Microscopy with 4X and 40X magnification. Stereo‐
microscopy were also used for visual inspection.
Microscopic examination of fibers and optical appearance of fibres were compared with the fiber
atlas. We found mainly two kinds of fibers which were approximately 5‐10 mm in length, and the
others were 3,5‐7,5 mm in length. The first fibers are nucelluar, tough and flexible and the nodes
were usually not obvious. Cells appeared oval in shape. The second type of fibers were more shorten
than the first one and it is probably the fibers of rice. Our investigation revealed that the paper used
in manuscripts were Daulang the special paper of that region.
Pattani1 Pattani2 Pattani5
Fig. 1: Stereomicroscopic image of Patani1 Fig. 2: Microscop slides of samples
96
Fig. 3: Polarized Microscopic images of NM1, NM2 and NM5
References
M. Barkeshli, Historical and Scientific Analysis of Iranian Illuminated Manuscripts And Miniature Painting, The Book and Paper Group Annual, Vol. 22 (2003)
C.T. Beng, Marilyn Zurmuehlen Working Papers in Art Education Vol.3, Issue 1, Article 11
97
Application of forensic multispectral scanner to non‐invasive analysis of iron‐gall inks:
a comparison with XRF and micro‐Raman spectroscopic techniques
Anna Rogulska1 and Barbara Łydżba‐Kopczyńska2
1 Faculty of Chemistry, Jagiellonian University, Kraków, Poland 2 Faculty of Chemistry, University of Wroclaw, Wroclaw, Poland
The aim of this study was to establish the efficiency of a forensic multispectral scanner, constructed
for modern inks forensic analysis, in distinguishing between diverse iron‐gall inks in historic
documents. For that purpose, the results from multispectral scanning were compared with those
obtained with XRF and micro‐Raman spectroscopic examination, traditionally applied in historic inks
analysis. By multispectral scanning we expected to obtain data which allow for differentiation of iron‐
gall inks in cases where neither visual analysis nor traditional spectroscopic techniques are sufficient.
The object investigated in the presented study was a collection of XVII‐ and XVIII‐century Polish
administrative documents from the Ossolinski National Library in Wroclaw, Poland. Each of the
manuscript, coming from different part of the country and containing documentation of various lists,
transactions and signatures, provided us a good study material with diversified iron‐gall inks origin.
The text areas under consideration were first examined with portable X‐ray fluorescence
spectrometer (Tracer, Bruker) and micro‐Raman spectrometer equipped with 514 nm laser line
(Horiba Jobin Yvon T6400), to confirm the presence of iron‐gall ink and analyze the differences
between particular writing fragments. All of the measurements were performed in situ, without
taking the sample from the object. To multispectral scanning, a 2D spectral scanner, constructed for
forensic investigations of documents (Łydżba‐Kopczyńska et al. 2012), was implemented. The CADE
(Computer Aided Document Examination) system integrates different optical acquisitions methods:
microscopy, topography, spectroscopy and scattering. The main element of the system is special
spectral head with camera allowing for parallel acquisition of spectral (VIS/NIR) data. The second,
equally essential component of the system, is the relevant software. It contains several dedicated
algorithms for spectral analysis of questioned documents, as well as special visualization tools. In our
study, we focused mainly on reflectance spectra and false color imaging technique.
As a result of our investigation, we confirmed, that a multispectral scanner devoted to detecting
subtle spectral changes in modern inks, e.g. in the dye used, is also applicable to historic iron‐gall ink
identification. This technique can be an important alternative to traditionally used non‐invasive
techniques, such as micro‐Raman and XRF spectroscopy, which in some cases are not able to
demonstrate the difference between used writing materials of the same type, but diverse origin.
Acknowledgments
The authors acknowledge Ossolinski National Library in Wrocław for the access to the manuscripts
and conservators from the library for their assistance to this investigation.
References
B. I. Łydżba‐Kopczyńska, M. Mrzygłód, J. Reiner, G. Rusek, Application of Polymorphic Scanner 2D in Non‐invasive Investigations of Writing Materials, in: 10th Biennal International Conference of the Infrared and Raman Users Group, 2012 Book of Abstracts, 52‐53
98
99
Perceptual Model with global‐local Vision Primitives for Arabic Script Recognition
Samia Snoussi
Faculty of Computing and Information Technology, Jedda University, Saoudi Arabia
Our research deals with Arabic Handwritten script. We had proposed a recognition system, based on
interactive‐activation and verification perceptual models.
The proposed system is based on a Transparent Neural Network (TNN). The TNN proceeds by a global
vision of structural descriptors during propagation step and local vision by normalized Fourier
Descriptors during the retro‐propagation step (FD). The idea of TNN is coming from reading human
behavior. The ultimate goal of the proposed handwritten Arabic word recognition system is to
imitate the human ability to read at a much faster rate. Psychological studies have proven that
generally the human reads the word globally. He does not need to recognize all letters. The structural
shape of key letters on its own can be sufficient. If he cannot make it out, he will try to see the word
locally. Our recognition system is based on this idea of combining local and global features of the
Arabic words. Indeed, Some experiences have been carried out by psychologists, in order to study in
the first case how humans recognize shapes and in the second case primitives that stimulate human
vision.
These psycho‐cognitive experiences aimed at observing the behavior of the human while he is
reading and they gave us these observations:
● Importance of lexical context: global vision can help us to deduce local information
presented in some case distortions. From the principal primitives of the word we can
recognize some characters distorted or without primitives.
● Flagrant primitives: global vision can be sufficient to recognize a shape.
● Fine analysis: In order to recognize distorted shapes, it is not necessary to learn all possible
distortions. Training a typical prototype can be sufficient
Based on these principles, psychologists proposed perceptual models of human reading.
Some of these models are based on specific neural network and are implemented by automatic
writing researchers.
The proposed system called TNN‐FD is based on a transparent Neural Network (TNN) processessing
the handwriting word globally by the use of flagrant structural features. Local vision is assured by
normalized Fourier Descriptors (FD). The behavior of the TNN‐FD can be explained step by step. Thus,
it falls into the category of “transparent” systems.
The first particularity of TNN is that it needs a simple step of feature extraction and a normalization
post‐processing step used to reduce variability of handwriting. Indeed, feature extraction is one of
the most difficult and important problems of Arabic script recognition due to the variability of the
cursive script. So, the selected set of features should be easily extracted, efficiently discriminating
between patterns of different classes, but invariant for pattern within the same class. The second
particularity of the proposed recognition system is its ability to simulate human reading.
100
The proposed perceptual model for recognition of handwritten Arabic script proceeds with global
vision of structural characteristics that are easy to detect throw a TNN recognition system. In the
case of difficulties, a local vision by finer descriptors (FD) improves the decision rate of the TNN‐FD.
Normalized FDs have been chosen because of their invariability to rotation, position and size. The
particularity of TNN‐FD is that each step of recognition can be explained as features extraction,
letters detection, words estimation and script analysis. This transparency gives us possibility to
recognize script without a hard training step. Recognition rate can attempt the 90%. The proposed
TNN is neither time nor memory consuming. For this reason we choose to extend it to a large
vocabulary.
As first perspectives, we are working on the improvement of structural feature extraction. The
second perspective is the extension of the system to sentences by a combination of the TNN with a
syntactic and semantic analyser as a post‐processing step.
101
Why should philologists learn computer vision?
Daniel Stoekl Ben Ezra
Ecole Pratique des Hautes Etudes, Paris, France
In recent years, algorithms developed by computer vision specialists have become powerful enough
to attack some problems in manuscript studies. Fruitful collaborations with experts from HTR and
NLP, palaeography and forensics are driving the field foreward with promising achievements for all
participants. However, manuscript studies is a combination of highly complex fields with very
complex and greatly varying data in quality and quantity. It is hard to imagine that the “one does it all”
infrastructure will be invented that will be able to cater to assyriologists, egyptologists or other
epigraphists, specialists for Bhuddhist texts on palm leaves and bamboos, Coptic, Hieratic or Aramaic
Ostraca and Latin, Greek, Chinese, Arabic and Hebrew manuscripts with secondary interventions not
to mention palimpsests. Furthermore, once a specific problem in computer vision will be considered
solved from a algorithmic perspective, the field will move on to other theoretical questions that will
permit them to publish innovative articles in their special peer reviewed journals and conferences.
Philologists should therefore prepare to apply computer vision algorithms to their specific problems.
In fact, this situation is not so different from computer vision in the hard sciences. biologists,
astronomers, physicists all learned to code. In philology, this necessity may be even greater, since
images of manuscripts differ from medical or sattelite imagery in having less health, industrial, or
strategic interest and therefore, it is to be expected that less money will be available from outside
the academic and cultural institutions. While perhaps not being cutting edge in informatics, coding as
a manuscript specialist may have the advantage of arriving – sometimes – at better practical results
due to a much shorter evaluation distance.
This contribution will present the results of code written by a philologist for automatic layout analysis
(column and line segmentation) and handwriting analysis (consonant/vocalization segmentation,
transcription alignment and glyph clustering) of Hebrew manuscripts from the Dead Sea Scrolls and
the Middle Ages. The algorithm is based on binarization, vertical and horizontal projection profiles,
morphological transformations, expectations deducted from previous knowledge of the manuscript
and handwriting, synthetic manuscript creation and letter spotting via HOG feature analysis.
102
103
The “decorative style” group reconsidered: A contribution to the study of twelfth and thirteenth
century production of Greek illuminated manuscripts in the Eastern Mediterranean
Marina Toumpouri
Science and Technology in Archaeology Research Center (STARC), The Cyprus Institute, Nicosia, Cyprus
The publication of Annemarie Weyl Carr’s monograph in 1987 resulted in the attribution of the
largest homogeneous group of Byzantine illuminated manuscripts known to have survived, to the
Cypro‐Palestinian region (Weyl Carr 1987). The group which comprises 110 members ‐and continues
to expand‐ was given the name “decorative style” based on the stylistic features of their illumination
(decoration and miniatures). The group dated from ca. 1150 to 1250 (Weyl Carr 1987) is in fact
almost all that is known of Byzantine illuminated manuscript production in the twelfth century and
the only group of deluxe Greek manuscripts from the first half of the thirteenth century. Despite the
fact that the stylistic, iconographic and codicological study of the manuscripts made possible their
classification into three groups and within them, eight subgroups, Carr stressed that the manuscripts
must have not been produced in “stable workshops” but resulted from shifting patterns of
relationships between scribes, illuminators and models. She furthermore made clear that several
issues, such as the position of such an important group within the manuscript and artistic production
in the Eastern Mediterranean during the Crusader period, require further examination (Weyl Carr
1987).
More recent studies challenged the locality of production proposed by Carr, suggesting as more
plausible Constantinople (Maxwell 2014). The time span of the production has also been challenged
since the discovery of new dated manuscripts from the end of the thirteenth century, assigned
membership in the group (Džurova 2002). Studies which focused on the textual evidence (New
Testament) aiming to shed light on the place of production of the “decorative style” or attempting to
verify Carr’s manuscript groupings showed that despite codicological and stylistic ties between the
manuscripts, these connections could not be confirmed, but instead, revealed that subgroup
boundaries are in several cases blurred (Maxwell 2014; Langford 2009).
The above bespeak in fact not only the complexity of the interrelationships of the manuscripts of the
group but also the limitations of traditional methodologies. Information regarding types of materials,
procurement and manufacture processes at a microscopic (inks, pigments, parchment etc.) and
macroscopic (bindings, page layout etc.) level, but also, automated document analyses have a major
contribution to make, since they could generate new objective and unbiased knowledge which was
previously remaining ignored or inaccessible. The “decorative style” group and the questions it
addresses, advocate in fact for the application of a holistic approach including technological, visual,
physicochemical, biomolecular and historical evidence and which would built upon and at the same
time reassess Carr’s art historical analysis and classification.
The poster will present the project launched in 2014 as part of the research agenda of the mobile
laboratory STAR Lab (ΝΕΑ ΥΠΟΔΟΜΥ/ΣΤΡΑΤΗ/0308/30 co‐financed by the European Regional
Development Fund and the Republic of Cyprus through the Research Promotion Foundation). The
project is striving to create a corpus of analytical information with the aim to contribute to the study
of the “decorative style” group, by uncovering critical information as to their place and context of
production, as well as regarding their interrelationships. The great potential of such a holistic
104
approach in the field of manuscript studies given the contribution to our knowledge of distribution
and use of manuscripts, but also, of artistic and cultural practices within the Eastern Mediterranean,
will be highlighted by presenting the results obtained during the first phase of the project. It
concentrated mainly on analytical characterization of manuscripts belonging to the “decorative style”
group found in Cyprus, through non‐invasive techniques (digital microscopy, FORS, XRF, FTIR and
multispectral imaging).
References
A. Weyl Carr, Byzantine Illumination 1150‐1250. The Study of a Provincial Tradition. University of Chicago Press, Chicago, 1987.
K. Maxwell, The Afterlife of Texts: Decorative Style Manuscripts and New Testament Textual Criticism, in: L. Jones (Ed.), Byzantine Images and their Afterlives. Essays in Honor of Annemarie Weyl Carr, Ashgate, Surrey, 2014, pp. 11‐39.
A. Džurova, Byzantinische Miniaturen. Schätze der Buchmalerei vom 4. bis zum 19. Jahrhundert, Schnell und Steiner, Regensburg, 2002.
W. Langford, From Text to Art and Back Again: Verifying A. Weyl Carr's Manuscript Groupings through Textual Analysis, Unpublished Ph.D. Dissertation, Faculty of the New Orleans Baptist Theological Seminary, 2009
105
Meaningless Text OCR Model for Medieval Scripts
Adnan Ul‐Hasan1, Syed Saqib Bukhari2, and Andreas Dengel1,2
1University of Kaiserslautern, Germany 2German Research Center for Artificial Intelligence, Kaiserslautern, Germany
Abstract: Availability of large amount of ground‐truth data for training an Optical Character
Recognition (OCR) engine is extremely critical. Training data is usually produced by manually
transcribing thousands of document images. In order to augment the limited training data, synthetic
training data is also used, where training data is produced by rendering text into images in suitable
fonts and styles. The most important part in synthetic training data is the corresponding real world
text. If real world text data is unavailable, which could be a case in historical manuscripts, generating
synthetic training data is not possible. In this paper, this problem has been addressed for the case of
historical manuscripts whose vocabulary and sentence structure is neither available in text form not
it is similar to any existing (contemporary) scripts. For such a case, we have introduced a novel
meaningless text OCR model, where meaningless words of variable sizes are generated by permuting
characters. Meaningless text lines are subsequently produced by randomly choosing these
meaningless words. Testing of the meaningless text‐line recognizer on real text‐lines shows good
performance.
The rest of the paper answers the following questions in sequence: which types of historical
documents are we dealing here?, why a textline‐based recognizer is preferable over character‐based
recognizer?, what is the traditional way of training textline‐based recognizers?, what novel technique
we are presenting to overcome the limitations of traditional training procedure?, and what initial
results we have achieved?
Which types of historical documents are we dealing here?
The “Narrenschiff” is a medieval 15th century German novel. Its first edition was printed in German
language in 1494 in Basel and gained a lot of popularity at that time. Afterwards a lot of copies were
spread with numerous translations and variations all over the Europe. Almost every edition can be
characterized by the use of historical fonts and vocabulary. We are digitizing these documents under
106
the German government funded project, Kallimachos. Some sample document images from this
novel are shown in Figure 1.
Why a textline‐based recognizer is preferable over character‐based recognizer?
Textline based recognizers, in contrast to character based recognizers, produce better recognition
accuracies without using any language modeling or any other post‐processing techniques. Because,
such line based recognizers trained characters with their context. Albeit simple to use, LSTM‐based
recognizers have shown excellent OCR results for many scripts (Breuel et al. 2013; Ul‐Hasan 2013.;
Ul‐Hasan and Breuel 2013.; Karayil et al. 2015; Simistira et al. 2015).
What is the traditional way of training textline‐based recognizers?
For the development of a text‐line based OCR model ground‐truth data, that is a set of text‐line im‐
ages and the corresponding text lines, plays an important role. Traditionally, the following paradigms
are used for training an LSTM‐based OCR model for a particular script: (i) From the scanned docu‐
ment images, extract the text lines along with their ground‐truth information and use them for the
training of an LSTM network., (ii) If text‐lines obtained from above process are not sufficient, then
generate synthetic text lines using the available text in that language. However, in case of historical
script, both of these traditional paradigms are not applicable, first because a lot of time is required
for manual transcription and second because of the non‐availability of digitally created text.
What novel technique we are presenting to overcome the limitations of traditional training way?
For the training of LSTM networks for non‐existing scripts, we have generated meaningless text lines.
The process of generating such meaningless data for training line‐based recognizer is described in the
following. Firstly, we randomly generate a word corpus consisting of all possible permutations of all
characters for several word lengths. We referred the set of these words as a bag of meaningless
words. For Latin “Narrenshiff” novel, there are around 84 characters (small, capital, punctuation
marks, digits). The permutations of all characters with different word lengths (let say 1 to 8) produce
a huge amount of meaningless words, and for limited memory and processing time resources, it is
difficult to include all of these words. For the proof of concept, in this paper we have limited our‐
selves to small length words (i.e. permutations of only small alphabets of 3 and 4 word lengths). Then,
we generated the text‐line training data by using these bags of meaningless words. After that, we
rendered text lines to develop a training database. Some sample rendered text lines images from our
meaningless training database are shown in Figure 2. Finally we trained a LSTM recognizer using the
meaningless training data.
107
What initial results we have achieved?
To evaluate the trained LSTM model over meaningless training data, we generated 300 textline im‐
ages from real transcriptions consisting of 3 and 4 length meaningful/real words. For this purpose,
we first prepare a word‐list from all 3 and 4 length words and then combine them into text lines of
length up to 10 words per text‐line. The results are shown in Table 1 and Figure 3.
Conclusion:
In this paper, firstly we have introduced a novel process of automatically generating synthetic train‐
ing data for non‐existing historical scripts, i.e. meaningless text. Then, we have showed the perfor‐
mance of meaningless trained model on real‐world test samples. Even though we have trained a
meaningless model on only a limited length of words. The initial results on meaningful/real text data
are promising.
References
T. M. Breuel, A. Ul‐Hasan, M. Al Azawi, F. Shafait. High Performance OCR for Printed English and Fraktur using LSTM Networks. In ICDAR, Washington D.C. USA, aug 2013.
T. Karayil, A. Ul‐Hasan, and T. M. Breuel. A Segmentation‐Free Approach for Printed Devanagari Script Recogni‐tion. In ICDAR, Tunisia, Tunisia, 2015.
F. Simistira, A. Ul‐Hasan, V. Papavassiliou, B. Gatos, V. Katsouros, and M. Liwicki. Recognition of Historical Greek Polytonic Scripts Using LSTM Networks. In ICDAR, Tunisia, Tunisia, 2015.
A. Ul‐Hasan, S. B. Ahmed, S. F. Rashid, F. Shafait, and T. M. Breuel. Offline Printed Urdu Nastaleeq Script Recog‐nition with Bidirectional LSTM Networks. In ICDAR, pages 1061–1065, Washington D.C. USA, 2013.
A. Ul‐Hasan, T. M. Breuel. Can we Build Language Independent OCR using LSTM Networks? In International Workshop on Multilingual OCR, 2013.
108
109
Contacts
Maurizio Aceto Dipartimento di Scienze e Innovazione Tecnologica (DISIT) Università degli Studi del Piemonte Orientale, Viale Teresa Michel, 11 15121 Alessandria, Italy eMail: maurizio.aceto@mfn.unipmn.it. Angelo Agostino Department of Chemistry University of Turin N: 7, ST: P. Giuria, CAP 10125, Turin, Italy eMail: angelo.agostino@unito.it Nurgül Akcebe The Department of Manuscript Conservation and Archive (Kitap Şifahanesi) Manuscripts Institution of Turkey Kanuni Medresesi Sok. No: 1 Fatih 34080 İstanbul, Turkey eMail: nurgulakcebe@gmail.com Fauzia Albertin Faculté des sciences de base Ecole Polytechnique Fédérale de Lausanne (EPFL) CH‐1015 Lausanne, Switzerland eMail: fauzia.albertin@epfl.ch Celena Allen Synchromedia Laboratory École de Technologie Supérieure Montreal, Canada, H3C 1K3 eMail: celena.allen@gmail.com Sumaya S. Ali Al‐ma’adeed Department of Computer Science and Engineering Qatar University, P.O. Box 2713, Doha, Doha, Qatar eMail: s_alali@qu.edu.qa Christine Andraud MNHN‐CRCC 36 rue Geoffroy St Hilaire, 75005 Paris, France eMail: christine.andraud@mnhn.fr Ehsan Arabnejad Synchromedia Laboratory École de Technologie Supérieure Montreal, Canada, H3C 1K3 eMail: earabnejad@synchromedia.ca
110
Isabelle Aristide‐Hastir Archives Nationales 59, rue Guynemer, 93383 Pierrefitte‐sur‐Seine, France Corneliu T.C. Arsene School of Arts, Languages and Cultures University of Manchester United Kingdom eMail: corneliu.arsene@manchester.ac.uk Matteo Bettuzzi Centro Fermi, 00184 Roma, Italy Dipartimento di Fisica e Astronomia, Università di Bologna, 40127 Bologna, Italy INFN Sezione di Bologna, 40127 Bologna, Italy eMail: matteo.bettuzzi@unibo.it Siam Bhayro Department of Theology and Religion University of Exeter United Kingdom eMail: s.bhayro@exeter.ac.uk Marina Bicchieri Head of Chemistry Dpt. Istituto Centrale Restauro e Conservazione del Patrimonio Archivistico e Librario (Icrcpal) Via Milano 76, 00184 Roma, Italy eMail: marina.bicchieri@beniculturali.it Théodore Bluche A2iA 39 rue de la Bienfaisance, 75008 Paris, France eMail: tb@a2ia.com Rosa Brancaccio Centro Fermi, 00184 Roma, Italy Dipartimento di Fisica e Astronomia, Università di Bologna, 40127 Bologna, Italy INFN Sezione di Bologna, 40127 Bologna, Italy eMail: rosa.brancaccio@unibo.it Antonella Brita Universität Hamburg Centre for the Study of Manuscript Cultures Warburgstraße 26, 20354 Hamburg, Germany eMail: antonella.brita@uni‐hamburg.de Christian Brockmann Universität Hamburg Institut für Griechische und Lateinische Philologie Von‐Melle‐Park 6, D‐20146 Hamburg, Germany Centre for the Study of Manuscript Cultures Warburgstraße 26, D‐20354 Hamburg, Germany eMail: christian.brockmann@uni‐hamburg.de
111
Maddalena Bronzato Federchimica via Giovanni da Procida 11, 20149, Milano, Italy eMail: maddalena.bronzato@gmail.com Emmanuel Brun ESRF—The European Synchrotron Radiation Facility 71 Avenue des Martyrs, 38000 Grenoble, France Inserm, U836, Grenoble, F‐38043, France eMail: emmanuel.brun@esrf.fr Syed Saqib Bukhari German Research Center for Artificial Intelligence Kaiserslautern eMail: saqib.bukhari@dfki.de Jean‐Christophe Burie Laboratoire Informatique Image Interaction (L3i) University of La Rochelle, Avenue Michel Crépeau, 17042 La Rochelle Cedex 1, France eMail: jcburie@univ‐lr.fr Pınar Çakar Manuscripts Institution of Turkey Department of Manuscript Conservation and Archive Kanuni Medresesi Sok. No: 1 Suleymaniye Fatih, 34116 Istanbul, Turkey eMail: pinarcakar00@yahoo.com Frederica Cappa
Institute of Science and Technology in Art
Academy of Fine Arts, 1010 Vienna, Austria
eMail: f.cappa@akbild.ac.at
Francesco Carillo Department of Electrical Engineering and Applied Mathematics@University of Salerno 132 Fisciano (Salerno), Italy eMail: f.carillo1@hotmail.it D. Chenouni LIPI/ ENS, Fes, Morocco eMail: d_chenouni@yahoo.fr Mohamed Cheriet Department of Automated Manufacturing Engineering, ETS University of Quebec 1100, Notre‐Dame Street‐ west, Montreal, Quebec H3C 1K3; Canada eMail: mohamed.cheriet@etsmtl.ca
112
Damian Chlebda Jagiellonian University Faculty of Chemistry Ingardena 3, 30‐060 Krakow, Poland eMail: damian.chlebda@uj.edu.pl Florence Cloppet LIPADE Laboratoire d’informatique Paris Descartes (Université Paris Descartes) France eMail: florence.cloppet@mi.parisdescartes.fr Rafi Cohen Department of Computer Science Ben‐Gurion University of the Negev, Israel eMail: rafico@cs.bgu.ac.il Claudia Colini Universität Hamburg Centre for the Study of Manuscript Cultures Warburgstraße 26, 20354 Hamburg, Germany eMail: claudia.sirim@gmail.com Daniel Deckers Universität Hamburg Institut für Griechische und Lateinische Philologie Von‐Melle‐Park 6, D‐20146 Hamburg, Germany eMail: daniel.deckers@uni‐hamburg.de Daniel Delattre CNRS‐IRHT‐Institut de Recherche et d’Histoire des Textes 40 Avenue d' Iéna, 75116 Paris, France eMail: daniel.delattre@irht.cnrs.fr Martin Delhey Universität Hamburg Centre for the Study of Manuscript Cultures Warburgstraße 26, 20354 Hamburg, Germany eMail: martin.delhey@uni‐hamburg.de Andreas Dengel University of Kaiserslautern German Research Center for Artificial Intelligence eMail: andreas.dengel@dfki.de Véronique Eglin LIRIS Laboratoire d’Informatique en Image et Systèmes d'information (INSA de Lyon – UMR 5205) 20 av. Albert Einstein, 69621 Lyon, France eMail: veronique.eglin@insa‐lyon.fr Y. Elfakir LIPI/ ENS Fes, Morocco eMail: elfakir.youssef11@gmail.com
113
Gayane Eliazyan Restoration Dept. of Matenadaran Museum of Yerevan, Armenia, 0009 Yerevan, Mashtotsi Ave., 53, Armenia eMail: elgayane@yahoo.com Jihad El‐Sana Department of Computer Science Ben‐Gurion University of the Negev, Israel eMail: el‐sana@cs.bgu.ac.il Reza Farrahi Moghaddam Synchromedia Lab, ETS UduQ, Montreal, QC, Canada H3C 1K3 eMail: imriss@yahoo.com Carlo Federici Department of Humanistic Studies University of Venice Ca’ Foscari Dorsoduro 3484/D, 30123 Venezia, Italy eMail: cfederici@unive.it Claudio Ferrero ESRF—The European Synchrotron Radiation Facility 71 Avenue des Martyrs, 38000 Grenoble, France eMail: ferrero@esrf.eu Andreas Fischer DIVA research group, Department of Informatics University of Fribourg 1700 Fribourg, Switzerland. eMail: andreas.fischer@unifr.ch Gernot A. Fink Department of Computer Science TU Dortmund University Otto‐Hahn‐Str. 8, D‐44221 Dortmund, Germany eMail: gernot.fink@udo.edu Michael Friedrich Universität Hamburg Asien‐Afrika‐Institut Edmund‐Siemers‐Allee 1, Flügel Ost, D‐20146 Hamburg, Germany Centre for the Study of Manuscript Cultures Warburgstraße 26, D‐20354 Hamburg, Germany eMail: michael.friedrich@uni‐hamburg.de Bernadette Frühmann Institute of Science and Technology in Art Academy of Fine Arts Vienna Schillerplatz 3, 1010 Vienna, Austria eMail: b.fruehmann@akbild.ac.at
114
Basilis Gatos Computational Intelligence Laboratory Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”, Athens, Greece eMail: bgat@iit.demokritos.gr Angelika Garz University of Fribourg DIVA Group (Document, Image and Voice Analysis) Department of Informatics Boulevard de Pérolles 90, CH‐1700 Fribourg, Switzerland eMail: angelika.garz@unifr.ch Mirjam Geissbühler University of Bern Institut für Germanistik Länggasstr. 49, CH‐3000 Bern, Switzerland eMail: mirjam.geissbuehler@germ.unibe.ch Leif Glaser Deutsches Elektronen‐Synchrotron Notkestr. 85, D‐22607 Hamburg, Germany eMail: Leif.Glaser@desy.de Monica Gulmini Department of Chemistry University of Turin N: 7, ST: P. Giuria, CAP: 10125 Turin, Italy eMail: monica.gulmini@unito.it Oliver Hahn BAM Federal Institute for Materials Research and Testing, Berlin, Germany Division 4.5 Unter den Eichen 44‐46, D‐12203 Berlin, Germany Universität Hamburg Centre for the Study of Manuscript Cultures Warburgstraße 26, D‐20354 Hamburg, Germany eMail: oliver.hahn@bam.de Stefan Hartmann Center for X‐ray Analytics Swiss Federal Laboratories for Materials Science and Technology Dubendorf, Switzerland Rachid Hedjam Department of Geography, McGill University 805 Sherbrooke Street West, Montreal, Qc H3A 2K6; Canada eMail: rachid.hedjam@mcgill.ca
115
Aymeric Histace ETIS, UMR CNRS 8051 6, avenue du Ponceau, 95014 Cergy‐Pontoise, France eMail: aymeric.histace@u‐cergy.fr Fabian Hollaus Computer Vision Lab Vienna University of Technology Favoritenstrasse 9‐11, 1040 Vienna, Austria eMail: holl@caa.tuwien.ac.at Rolf Ingold DIVA research group, Department of Informatics University of Fribourg 1700 Fribourg, Switzerland eMail: rolf.ingold@unifr.ch Iwan Jerjen Swiss Light Source Paul‐Scherrer‐Institute Villigen, Switzerland eMail: iwan.jerjen@psi.ch Tobias J. Jocham Corpus Coranicum Berlin‐Brandenburgische Akademie der Wissenschaften Potsdam, Germany College de France eMail: jocham@bbaw.de Margaret Kalacska Department of Geography McGill University 805 Sherbrooke Street West, Montreal, Qc H3A 2K6; Canada eMail: margaret.kalacska@mcgill.ca Rolf Kaufmann Center for X‐ray Analytics Swiss Federal Laboratories for Materials Science and Technology Dubendorf, Switzerland eMail: rolf.kaufmann@empa.ch Klara Kedem Department of Computer Science Ben‐Gurion University of the Negev Israel eMail: klara@cs.bgu.ac.il
116
Yeghis Keheyan ISMN, CNR Dept. of Chemistry University of Rome “La Sapienza” p.le Aldo Moro 5, Rome 00185, Italy eMail: yeghis.keheyan@uniroma1.it Florian Kergourlay MNHN‐CRCC 36 rue Geoffroy St Hilaire, 75005 Paris, France eMail: florian.kergourlay@gmail.com Christopher Kermorvant TEKLIA 164 avenue de Suffren, 75015 Paris, France eMail: kermorvant@teklia.com Ghizlane Khaissidi LIPI/ ENS Fes, Morocco eMail: ghizlane.derkaoui1@hotmail.com Anastasios L. Kesidis Department of Surveying Engineering, Technological Educational Institution of Athens Greece eMail: akesidis@iit.demokritos.gr Made Windu Antara Kesiman Laboratoire Informatique Image Interaction (L3i) University of La Rochelle Avenue Michel Crépeau, 17042, La Rochelle Cedex 1, France eMail: made_windu_antara.kesiman@univ‐lr.fr Mojtaba Mahmoudi Khorandi Department of Chemistry University of Turin (Italy) N: 7, ST: P. Giuria, CAP: 10125, Turin, Italy eMail: mojt.mahmoudi@gmail.com Ayşegül Kocaman The Department of Manuscript Conservation and Archive (Kitap Şifahanesi) Manuscripts Institution of Turkey Kanuni Medresesi Sok. No: 1 Fatih 34080 İstanbul, Turkey eMail: aysegul_80@yahoo.com Thomas Konidaris Universität Hamburg Centre for the Study of Manuscript Cultures Warburgstraße 26, D‐20354 Hamburg, Germany eMail: thomas.konidaris@uni‐hamburg.de
117
Keith T. Knox Imaging Consultant 2739 Puu Hoolai Street, Kihei, Hawaii 96753, USA eMail: knox@cis.rit.edu Z. Lakhliai LIPI/ ENS Fes, Morocco Bertrand Lavédrine MNHN‐CRCC 36 rue Geoffroy St Hilaire 75005 Paris, France eMail: lavedrin@mnhn.fr Yann Leydier LIRIS Laboratoire d’Informatique en Image et Systèmes d'information (INSA de Lyon – UMR 5205) 20 av. Albert Einstein, 69621 Lyon, France LIPADE Laboratoire d’informatique Paris Descartes (Université Paris Descartes) France eMail: yann@leydier.info Rosine Lheureux Archives Nationales 59, rue Guynemer 93383 Pierrefitte‐sur‐Seine, France eMail: rosine.lheureux@culture.gouv.fr Tomasz Łojewski AGH University of Science and Technology Faculty of Materials Science and Ceramics Mickiewicza 30, 30‐059 Krakow, Poland eMail: lojewski@agh.edu.pl Vito Lorusso Universität Hamburg Centre for the Study of Manuscript Cultures Warburgstraße 26, 20354 Hamburg, Germany eMail: vito.lorusso@uni‐hamburg.de Barbara Łydżba‐Kopczyńska Faculty of Chemistry University of Wroclaw F. Joliot‐Curie 14, 50‐383 Wroclaw, Poland eMail: barbara.lydzba@chem.uni.wroc.pl Angelo Marcelli Department of Electrical Engineering and Applied Mathematics@University of Salerno 132 Fisciano (Salerno), Italy eMail: amarcelli@unisa.it
118
Michael Josef Marx Corpus Coranicum Berlin‐Brandenburgische Akademie der Wissenschaften Potsdam, Germany eMail: marx@bbaw.de Manfred Mayer University Library Graz, Special Collections Universitätsplatz 3a, 8010 Graz, Österreich eMail: manfred.mayer@uni‐graz.at Anne Michelin Muséum National d’Histoire Naturelle‐CRCC 36 rue Geoffroy St Hilaire, 75005 Paris, France eMail: anne.michelin@gmail.com Heinz Miklas Institute of Slavic Studie University of Vienna Spitalgasse 2 Hof 3, 1090 Vienna, Austria eMail: heinz.miklas@univie.ac.at Vito Mocella The Institute for Microelectronics and Microsystems (IMM) of CNR via Pietro Castellino, 111 , 80131 Napoli, Italy eMail: vito.mocella@na.imm.cnr.it Maria Pia Morigi Centro Fermi, 00184 Roma, Italy Dipartimento di Fisica e Astronomia Università di Bologna, 40127 Bologna, Italy INFN Sezione di Bologna, 40127 Bologna, Italy eMail: mariapia.morigi@unibo.it Mostafa Mrabti LIPI/ ENS Fes, Morocco eMail: mostafa.mrabti@yahoo.fr Hossein Ziaei Nafchi Synchromedia Laboratory École de Technologie Supérieure Montreal, Canada, H3C 1K3 eMail : hossein.zi@synchromedia.ca Luca Nodari IENI‐CNR and INSTM UdR of Padova Corso Stati Uniti 4, 35127, Padova, Italy eMail: nodari@ieni.cnr.it.
119
Denis Nosnitzin Universität Hamburg Hiob Ludolf Center for Ethiopian Studies Alsterterrasse 1, D‐20354 Hamburg, Germany eMail: denis.nosnitsin@uni‐hamburg.de Jean‐Marc Ogier Laboratoire Informatique Image Interaction (L3i) University of La Rochelle, Avenue Michel Crépeau, 17042 La Rochelle Cedex 1, France eMail: jean‐marc.ogier@univ‐lr.fr Alessandra Patera Swiss Light Source Paul‐Scherrer‐Institute Villigen, Switzerland eMail: alessandra.patera@psi.ch Eva Peccenini Centro Fermi, 00184 Roma, Italy Dipartimento di Fisica e Astronomia Università di Bologna, 40127 Bologna, Italy INFN Sezione di Bologna, 40127 Bologna, Italy eMail: eva.peccenini@unibo.it Peter E. Pormann School of Arts, Languages and Cultures University of Manchester United Kingdom eMail: peter.pormann@manchester.ac.uk Boryana Pouvkova Universität Hamburg Centre for the Study of Manuscript Cultures Warburgstraße 26, D‐20354 Hamburg, Germany eMail: boryana.pouvkova@uni‐hamburg.de Ira Rabin BAM Federal Institute for Materials Research and Testing, Berlin, Germany Division 4.5 Unter den Eichen 44‐46, D‐12203 Berlin, Germany Universität Hamburg Centre for the Study of Manuscript Cultures Warburgstraße 26, D‐20354 Hamburg, Germany eMail: ira.rabin@bam.de Claudia Rapp Institute of Byzantine and Modern Greek Studie University of Vienna Postgasse 7, 1010 Vienna, Austria eMail: c.rapp@univie.ac.at
120
Anna Rogulska Faculty of Chemistry Jagiellonian University Ingardena 3, 30‐060 Kraków, Poland eMail: rogulska@chemia.uj.edu.pl Leonard Rothacker Department of Computer Science TU Dortmund University Otto‐Hahn‐Str. 8, D‐44221 Dortmund, Germany eMail: leonard.rothacker@udo.edu Robert Sablatnig Computer Vision Lab Vienna University of Technology Favoritenstrasse 9‐11 1040 Vienna, Austria eMail: sab@caa.tuwien.ac.at Adolfo Santoro Department of Electrical Engineering and Applied Mathematics@University of Salerno Via Giovanni Paolo II 132 Fisciano (Salerno), Italy eMail: adsantoro@unisa.it Hamed Sayyadshahri Department of Physics and Earth sciences University of Ferrara N: 1, St: Saragat, CAP: 44122 eMail: hamedsayyad.sh@gmail.com Manfred Schreiner Institute of Science and Technology in Art Academy of Fine Arts Vienna Schillerplatz 3, 1010 Vienna, Austria eMail: m.schreiner@akbild.ac.at William I. Sellers School of Arts, Languages and Cultures University of Manchester United Kingdom eMail: william.sellers@manchester.ac.uk Mathias Seuret DIVA research group, Department of Informatics University of Fribourg Boulevard de Pérolles 90, 1700 Fribourg, Switzerland eMail: mathias.seuret@unifr.ch Samia Snoussi Faculty of Computing and Information Technology Jedda University, Saoudi Arabia eMail: samia.maddouri@enit.rnu.tn
121
Daniel Stökl Ben Ezra Directeur d'Études EPHE‐Sorbonne Sciences historiques et philologiques Chargé de mission en humanités numériques à l'EPHE eMail: stoekl@mmsh.univ‐aix.fr Peter A. Stokes Department Digital Humanities King’s College London 26‐29 Drury Lane, London, United Kingdom eMail: peter.stokes@kcl.ac.uk Dominique Stutzmann CNRS‐IRHT‐Institut de Recherche et d’Histoire des Textes 40 Avenue d' Iéna, 75116 Paris, France eMail: dominique.stutzmann@irht.cnrs.fr Sebastian Sudholt Department of Computer Science TU Dortmund University Otto‐Hahn‐Str. 16, 44221 Dortmund, Germany eMail: sebastian.sudholt@udo.edu Marina Toumpouri Science and Technology in Archaeology Research Center (STARC) The Cyprus Institute 20, Konstantinou Kavafi Street, 2121 Aglantzia, Nicosia, Cyprus eMail: m.toumpouri@cyi.ac.cy Elaine Treharne Synchromedia Laboratory École de Technologie Supérieure Montreal, Canada, H3C 1K3 eMail: treharne@stanford.edu Adnan Ul‐Hasan University of Kaiserslautern eMail: adnan@cs.uni‐kl.de Wilfried Vetter Institute of Science and Technology in Art Academy of Fine Arts Vienna Schillerplatz 3, 1010 Vienna, Austria eMail: w.vetter@akbild.ac.at Nicole Vincent LIPADE Laboratoire d’informatique Paris Descartes (Université Paris Descartes), France eMail: nicole.vincent@mi.parisdescartes.fr
122
M. A. El Yacoubi SAMOVAR Télécom SudParis, CNRS Université Paris‐Saclay, France eMail: mounim.el_yacoubi@telecom‐sudparis.eu Melania Zanetti Department of Humanistic Studies University of Venice Ca’ Foscari Dorsoduro 3484/D, 30123 Venezia, Italy eMail: melania.zanetti@unive.it Alfonso Zoleo Department of Chemical Sciences University of Padova Via Marzolo 1, 35131, Padova, Italy eMail: alfonso.zoleo@unipd.it
123
Index of Authors
A
Aceto, M. ∙ 13
Agostino, A. ∙ 13
Akcebe, N. ∙ 73
Albertin, F. ∙ 31
Albritton, B. L. ∙ 41
Allen, C. ∙ 41
Al‐Ma’adeed, S. ∙ 65, 79, 85
Andraud, C. ∙ 59
Arabnejad, E. ∙ 41
Aristide‐Hastir, I. ∙ 59
Arsene, C. ∙ 29
B
Bettuzzi, M. ∙ 31
Bhayro, S. ∙ 29
Bicchieri, M ∙ 11
Bluche, T. ∙ 51
Brancaccio, R. ∙ 31
Brita, A. ∙ 23
Brockmann, C. ∙ 25
Bronzato, M. ∙ 75
Brun, E. ∙ 57
Bukhari, S. S. ∙ 105
Burie, J. C. ∙ 93
C
Çakar, P. ∙ 15
Cappa, F. ∙ 83
Carillo, F. ∙ 69
Chenouni, D. ∙ 43
Cheriet, M. ∙ 41, 65, 79, 85
Chlebda, D. ∙ 37
Cloppet, F. ∙ 51
Cohen, R. ∙ 45
Colini, C. ∙ 77
D
Deckers, D. ∙ 25
Delattre, D. ∙ 57
Delhey, M. ∙ 21
Dengel, A. ∙ 105
E
Eglin, V. ∙ 51
El Yaccoubi, M. A. ∙ 43
Elfakir, Y. ∙ 43
Eliazyan, G. ∙ 91
El‐Sana, Jihad ∙ 45
F
Federici, C. ∙ 75
Ferrero, C. ∙ 57
Fink, G. A. ∙ 49
Fischer, A. ∙ 61
Frühmann, B. ∙ 35, 83
G
Garz, A. ∙ 61
Gatos, B. ∙ 53
Geissbühler, M. ∙ 19
Glaser, L. ∙ 25
Gulmini, M. ∙ 13
H
Hahn, O. ∙ 77
Hartmann, S. ∙ 31
Hedjam, R. ∙ 65, 85
Histace, A. ∙ 59
Hollaus, F. ∙ 35
I
Ingold, R. ∙ 61
124
J
Jerjen, I. ∙ 31
Jocham, T. J. ∙ 89
K
Kalacska, M. ∙ 65, 85
Kaufmann, R. ∙ 31
Keheyan, Y. ∙ 91
Kergourlay, F. ∙ 59
Kermorvant, C. ∙ 51
Kesidis, A. L. ∙ 53
Kesiman, M. ∙ 93
Khaissidi, G. ∙ 43
Khorandi, M. M. ∙ 13
Knox, K. T. ∙ 27
Kocaman, A. ∙ 95
Konidaris, T. ∙ 53
L
Lakhliai, Z. ∙ 43
Lavédrine, B. ∙ 59
Leydier, Y. ∙ 51
Lheureux, R. ∙ 59
Łojewski, T. ∙ 37
Lorusso, V. ∙ 33
Łydżba‐Kopczyńska, B. ∙ 97
M
Marcelli, A. ∙ 69
Marx, M. J. ∙ 89
Mayer, M. ∙ 17
Michelin, A. ∙ 59
Miklas, H. ∙ 35
Mocella, V. ∙ 57
Moghaddam, R. F. ∙ 79
Morigi, A. P. ∙ 31
Mrabti, M. ∙ 43
N
Nafchi, H. Z. ∙ 41
Nodari, L. ∙ 75
Nosnitsin, D. ∙ 23
O
Ogier, J.‐M. ∙ 93
P
Patera, A. ∙ 31
Peccenini, E. ∙ 31
Pormann, P. E. ∙ 29
Pouvkova, B. ∙ 33
R
Rabin, I. ∙ 77
Rapp, C. ∙ 35
Rogulska, A. ∙ 97
Rothacker. L. ∙ 49
S
Sablatnig, R. ∙ 35
Santoro, A. ∙ 69
Sayyadshahri, H. ∙ 13
Schreiner, M. ∙ 35, 83
Sellers, W. I. ∙ 29
Seuret, M. ∙ 61
Snoussi, S. ∙ 99
Stoekl Ben Ezra, D. ∙ 101
Stokes, P. A. ∙ 39
Stutzmann, D. ∙ 51
Sudholt, S. ∙ 49
T
Toumpouri, M. ∙ 103
Treharne, E. ∙ 41
125
U
Ul‐Hasan, A. ∙ 105
V
Vetter, W. ∙ 35, 83
Vincent, N. ∙ 51
Z
Zanetti, M. ∙ 75
Zoleo, A. ∙ 75
Centre for the Study of Manuscript Cultures (CSMC)Warburgstraße 2620354 Hamburg, GermanyTel.:+49-(0)40-42838-7127Fax: +49-(0)40-42838-4899
manuscript-cultures@uni-hamburg.dewww.manuscript-cultures.uni-hamburg.de