XML Watermarking & Information Hiding

39
XML Watermarking & Information Hiding 孙孙孙 孙孙 孙孙孙 孙孙孙孙孙 、、 孙孙孙孙孙孙孙孙孙孙孙孙 孙孙孙孙孙孙孙孙孙孙孙孙孙孙孙

description

XML Watermarking & Information Hiding. 孙星明 博士、教授、博士生导师 湖南大学计算机与通信学院 网络与信息安全湖南省重点实验室. Markup Language. SGML (Standard Generalized Markup Language) XML (Extensible Markup Language) HTML (HyperText Markup Language) XHTML. Publishing Information in WWW. Publishing Information in WWW. - PowerPoint PPT Presentation

Transcript of XML Watermarking & Information Hiding

Page 1: XML Watermarking & Information Hiding

XML Watermarking & Information Hiding

孙星明 博士、教授、博士生导师

湖南大学计算机与通信学院网络与信息安全湖南省重点实验室

Page 2: XML Watermarking & Information Hiding

Markup Language

SGML (Standard Generalized Markup Language)

XML (Extensible Markup Language) HTML (HyperText Markup Language) XHTML

Page 3: XML Watermarking & Information Hiding

Publishing Information in WWW

Page 4: XML Watermarking & Information Hiding

Publishing Information in WWW

Page 5: XML Watermarking & Information Hiding

XML Document

XML element typetext imageVideoAudioexecutive codes…

CorrespondingWatermarking

and information hiding

techniquescan be employed

Can we use its own information to do watermarking or

information hiding?

Page 6: XML Watermarking & Information Hiding

Known content-based technique

Change font size, colorAppend white spaces at the end of

a line0-space ( ) 1-tab (	)

Page 7: XML Watermarking & Information Hiding

Shortcomings

white spaces at the end of a lineIncrease page sizeLayout might be changedDetect very easily by selection

Page 8: XML Watermarking & Information Hiding

Specification Element (Entity)

<name attribute1 … attributen> contents </name ><name attribute1 … attributen> </name ><name attribute1 … attributen>

Attributename=“value”

Example<font face="Verdana" size="4" color="#FFFF00">Student Number: </font>

Page 9: XML Watermarking & Information Hiding

Properties of markup labels

Property 1: Element and attribute names are case-insensitive

<font face="Verdana" size="4" color="#FFFF00">Student Number: </font><Font face="Verdana" size="4" color="#FFFF00">Student Number: </font><font face="Verdana" size="4" color="#FFFF00">Student Number: </Font><Font face="Verdana" size="4" color="#FFFF00">Student Number: </Font>…

Page 10: XML Watermarking & Information Hiding

Properties of markup labels

Property 2: Attributes are order-insensitive

<font face="Verdana" size="4" color="#FFFF00">Student Number: </font><font size="4" face="Verdana" color="#FFFF00">Student Number: </font>

Page 11: XML Watermarking & Information Hiding

Pair attributes technique pair attributes order (Corinna John)

key attribute, corresponding attributekey / corresponding (1) corresponding/key (0) <font face="Verdana" size="4" color="#FFFF0

0">Student Name:</font><font size="4" face="Verdana" color="#FFFF0

0">Student Name:</Font>

key / corresponding tablesize, detect difficultly

Page 12: XML Watermarking & Information Hiding

Attributes permutation technique

equivalent attributes permutation<font face="Verdana" size="4" color="#FFFF00">Student Name:</font><font face="Verdana" color="#FFFF00" size="4">Student Name:</font><font size="4" face="Verdana" color="#FFFF00">Student Name:</font><font size="4" color="#FFFF00" face="Verdana" >Student Name:</font><font color="#FFFF00" face="Verdana" size="4" >Student Name:</font><font color="#FFFF00" size="4" face="Verdana" >Student Name:</font>

lexicographic (alphabetic) order: f precedes a permutation g iff f(k)<g(k) for the minimum value of k such that f(k)<>g(k).

Page 13: XML Watermarking & Information Hiding

Attributes permutation technique

Generating attributes permutations in lexicographical order

<font color="#FFFF00" face="Verdana" size="4" >Student Name:</font><font color="#FFFF00" size="4" face="Verdana" >Student Name:</font><font face="Verdana" color="#FFFF00" size="4">Student Name:</font><font face="Verdana" size="4" color="#FFFF00">Student Name:</font><font size="4" face="Verdana" color="#FFFF00">Student Name:</font><font size="4" color="#FFFF00" face="Verdana" >Student Name:</font>

attributes permutations order numberscolor face size 0color size face 1face color size 2face size color 3size face color 4Size color face 5

Page 14: XML Watermarking & Information Hiding

Attributes permutation technique

If the number of attributes of an element >=2, it may be used to embed hidden information or watermark

Let be the elements, whose number of attributes , in a web page, the embedded capacity is

1{ }ni iE

| | 2iE

21

log (| | !)n

ii

E

Page 15: XML Watermarking & Information Hiding

Embedded capacity exampleName of web page Capacity (bytes)

www.163.com 48

www.sina.com.cn 279

www.sohu.com.cn 338

www.microsfot.com 15

www.ebay.com 78

www.yahoo.com 33

Page 16: XML Watermarking & Information Hiding

Perceivability Can not perceive when browse the

page Hard to perceive through reading the

source codes

Page 17: XML Watermarking & Information Hiding

Robust or resistant against editing Contents can be changed

Page 18: XML Watermarking & Information Hiding

Robust or resistant against editing Font, size, color can be changed

Page 19: XML Watermarking & Information Hiding

Security attributes permutations order numbers

color face size 0color size face 1face color size 2face size color 3size face color 4Size color face 5

Apply hash to concatenation of attributes and key to get order number

( )hash attribute key

Page 20: XML Watermarking & Information Hiding

Performance comparison

TypeSize

change

Perceivable by Capacity

(bit)Extra

payloadview code

White space

Y easy easy Page lines N

Case change

N N easy Tags N

Attribute pair

N N hard Pair table

Equivalent attributes

N N hard N

1

| | / 2n

ii

E

21

log (| | !)n

ii

E

Page 21: XML Watermarking & Information Hiding

Other potential properties

String delimitersname=“value” name=‘value’

White Space Between the Element’s Name and the First Attribute

<font face=”verdana” size=”3”><font face=”verdana” size=”3”>

White Space Between Attributes<font face=”verdana” size=”3”><font face=”verdana” size=”3”>

Page 22: XML Watermarking & Information Hiding

Other potential properties

White Space after “=“<font face=”verdana” size=”3”><font face= ”verdana” size=”3”>

White Space Between Elements<td>con1</td><td>con2</td><td>con1</td> <td>con2</td>

Page 23: XML Watermarking & Information Hiding

Other potential properties

The default value of an attribute<font face=”verdana” size=”3”><font face=”verdana”>

Page 24: XML Watermarking & Information Hiding

Current progress

Introduce insignificant attributes<font face=”verdana”><font face=”verdana” xyz=“abcd”>

Break through the capacity bottle neck

Web page watermarking Text watermarking

21

log (| | !)n

ii

E

Page 25: XML Watermarking & Information Hiding

Our focus on watermarking Text content security

Funded by NSFC Key Project 60736016Funded by NSFC 60373062

Software watermarkingFunded by NSFC 60573045

Wireless sensor network securityFunded by 973 Project 2006CB303000Funded by NSFC 60873198

SteganalysisFunded by 115 Project

Page 26: XML Watermarking & Information Hiding

谢谢联系电话: 0731-8821341 , 13875971258Email : [email protected]://nisl.hnu.cn/

Page 27: XML Watermarking & Information Hiding

HyperText Markup Language (HTML), version 4.0, the publishing language of the World Wide Web

Recall that in HTML, element and attribute names are case-insensitive; the convention is meant to encourage readability.

Element and attribute names in this document have been marked up and may be rendered specially by some user agents.

http://www.w3.org/TR/1998/REC-html40-19980424/about.html#h-1.2.1

Page 28: XML Watermarking & Information Hiding

http://www.w3.org/TR/html/#xhtml HTML 4 [HTML4] is an SGML (Standard Generalized Markup Language) application

conforming to International Standard ISO 8879, and is widely regarded as the standard publishing language of the World Wide Web.

SGML is a language for describing markup languages, particularly those used in electronic document exchange, document management, and document publishing. HTML is an example of a language defined in SGML.

SGML has been around since the middle 1980's and has remained quite stable. Much of this stability stems from the fact that the language is both feature-rich and flexible. This flexibility, however, comes at a price, and that price is a level of complexity that has inhibited its adoption in a diversity of environments, including the World Wide Web.

HTML, as originally conceived, was to be a language for the exchange of scientific and other technical documents, suitable for use by non-document specialists. HTML addressed the problem of SGML complexity by specifying a small set of structural and semantic tags suitable for authoring relatively simple documents. In addition to simplifying the document structure, HTML added support for hypertext. Multimedia capabilities were added later.

In a remarkably short space of time, HTML became wildly popular and rapidly outgrew its original purpose. Since HTML's inception, there has been rapid invention of new elements for use within HTML (as a standard) and for adapting HTML to vertical, highly specialized, markets. This plethora of new elements has led to interoperability problems for documents across different platforms.

Page 29: XML Watermarking & Information Hiding

XML™ is the shorthand name for Extensible Markup Language [XML].

XML was conceived as a means of regaining the power and flexibility of SGML without most of its complexity. Although a restricted form of SGML, XML nonetheless preserves most of SGML's power and richness, and yet still retains all of SGML's commonly used features.

While retaining these beneficial features, XML removes many of the more complex features of SGML that make the authoring and design of suitable software both difficult and costly.

Page 30: XML Watermarking & Information Hiding

XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4 [HTML4]. XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents. The details of this family and its evolution are discussed in more detail in [XHTMLMOD].

XHTML 1.0 (this specification) is the first document type in the XHTML family. It is a reformulation of the three HTML 4 document types as applications of XML 1.0 [XML]. It is intended to be used as a language for content that is both XML-conforming and, if some simple guidelines are followed, operates in HTML 4 conforming user agents. Developers who migrate their content to XHTML 1.0 will realize the following benefits:

XHTML documents are XML conforming. As such, they are readily viewed, edited, and validated with standard XML tools.

XHTML documents can be written to operate as well or better than they did before in existing HTML 4-conforming user agents as well as in new, XHTML 1.0 conforming user agents.

XHTML documents can utilize applications (e.g. scripts and applets) that rely upon either the HTML Document Object Model or the XML Document Object Model [DOM].

As the XHTML family evolves, documents conforming to XHTML 1.0 will be more likely to interoperate within and among various XHTML environments.

The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its attendant benefits, while still remaining confident in their content's backward and future compatibility.

Page 31: XML Watermarking & Information Hiding

Terrorismhttp://www.arabteam2000-forum.com/Jihad 信息隐藏技术训练手册 ( 阿拉伯文 ) 的部分英文翻译

Page 32: XML Watermarking & Information Hiding

Watermark embedding

Page 33: XML Watermarking & Information Hiding

Watermark detection

Page 34: XML Watermarking & Information Hiding

Classification of watermarking—by host Image Audio Video Text (Document) Software / Executive code Database

Page 35: XML Watermarking & Information Hiding

Text watermarking & Information Hiding

email

web

book PDF,WORDWPS,PS,etc

TXTunformatted

Watermarking

Information hiding

Page 36: XML Watermarking & Information Hiding

Any redundance?

Character CodeOne to one

NONO

Page 37: XML Watermarking & Information Hiding

Utilize format information Line-shift Coding

vertically displacing an entire text line Word-shift Coding

horizontally shifting the location of a word within a text line

Character feature codingaltering a particular feature of an individual character

Page 38: XML Watermarking & Information Hiding

Utilize language information Synonym substitution Syntactic transform TMR tree (text meaning representation) Add spaces at the end of a line

Page 39: XML Watermarking & Information Hiding

Text recoverable watermarking

Format based watermarking? Natural language watermarking? How to combine?? Text recoverable watermarking???