Patent Claim Processing for Readability - Structure Analysis and Term Explanation -

27
Patent Claim Processing for Readability - Structure Analysis and Term Explanation - July 12, 2003 Akihiro Shinmori , Manabu Okumura , Yuzo Marukawa , Makoto Iwayama * † Tokyo Institute of Technology & INTEC Web and Genome Informatics ‡ Japan Science and Technology & National Institute of Informatics ACL2003 WS on Patent Corpus Processing

description

ACL2003 WS on Patent Corpus Processing. Patent Claim Processing for Readability - Structure Analysis and Term Explanation -. July 12, 2003 Akihiro Shinmori † , Manabu Okumura ‡ , Yuzo Marukawa ‡ , Makoto Iwayama * - PowerPoint PPT Presentation

Transcript of Patent Claim Processing for Readability - Structure Analysis and Term Explanation -

Page 1: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

Patent Claim Processing for Readability - Structure Analysis and Term Explanation -

July 12, 2003Akihiro Shinmori†, Manabu Okumura‡, Yuzo Marukawa ‡, Makoto Iwayama*

† Tokyo Institute of Technology & INTEC Web and Genome Informatics‡ Japan Science and Technology & National Institute of Informatics* Tokyo Institute of Technology & Hitachi

ACL2003 WS on Patent Corpus Processing

Page 2: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 2

Problem & Approach

Problem=Improve patent claim readability Structural difficulty Term difficulty

Approach Analyze the structure and present it visually

Apply RST and utilize tools for RST Cue-phrase-based approach

Give explanation for terms Utilize the “detailed explanation” part of the

specification

Page 3: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 3

Structure of Patent Document

Patent Specification Invention Title Claim Detailed Explanation Brief Explanation of Drawings

DrawingsSummary

“The claims specify the boundaries of the legal monopoly created by the patent.” (Burgunder 1995)

Page 4: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 4

Sample Japanese Patent Claim操作手段によりアクチュエータを駆動して所望の作業を行なう作業機において、前記作業機の作業機構に作用する負荷を検出する負荷検出手段と、この負荷検出手段の検出値に応じた周波数の信号を出力する第1 の周波数変換器と、当該負荷検出手段の検出値に応じた周波数のパルスを出力する第2 の周波数変換器と、前記第1 の周波数変換器から出力される信号を前記第2 の周波数変換器からのパルスの出力期間だけ間欠的に出力する変調手段と、この変調手段の出力信号に応じて振動を発生する振動発生手段とを設けたことを特徴とする作業機の操作用仮想振動生成装置。 (Publication Number=10-011111, a patent on virtual oscillation generator for construction)

One sentence (noun phrase) with 259 characters!!

Page 5: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 5

Characteristics of Patent Claim Description

1. The length of sentence is long.The average is 242 chars. (cf. 55.4 chars for newspaper articles)

2. The structure is complex.Even native speakers cannot understand them for the first reading!

3. Difficult terms are often used. Abstract terms are preferred.

4. Description styles are established. Patent specifications are usually written by

professionals (such as patent attorneys and IP specialists)

Page 6: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 6

Description Styles of Japanese Patent Claims [Kasai 1999]

Process Sequence Style “ ・・・し [shi](does) 、・・・し [shi]

(does) 、・・・した [shita](and does) ”、・・・

Element Enumeration Style “ ・・・と [to](and) 、・・・と [to](and) 、・・・と

からなる [to karanaru](comprising) ”・・・

Jepson-like Style “ ・・・において [ni-oite](in) 、・・・を特徴とする

[wo-tokuchou tosuru](be characterized by) ”、・・・ First describe the known or precondition part, and next

describe the new or main part.

Page 7: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 7

Structure Analysis of Patent Claims

Apply RST (Rhetorical Structure Theory).

Use Cue-phrase-based Approach.

Our Position: To improve the readability of Japanese Patent

claim, the structure of description needs to be presented in a readable way

Japanese Patent Claims are: Composed of multiple clauses which have

some relationship with each other There exist cue phrases around clause

boundaries

Page 8: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 8

Result of Structure Analysis of Japanese Patent Claim

Graphical view by RSTTool [Odonnel 1997]

Page 9: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

Relations for Patent ClaimType Relation Description

Multi-Nuclear

PROCEDURE[ ~し、 ][ ~し、 ][ ~する ]XXX(XXX which does ~ , and does ~ , and does ~ )

COMPONENT[ ~と、 ][ ~と、 ][ ~と ] を備えた XXX( ~ , ~ , and ~ )

Mono-Nuclear

ELABORATION [XXX した ][YYY]

(YYY which does XXX)

FEATURE [YYY][ を特徴とする ](characterized by YYY)

PRE-CONDITION [XXX であって、 ][YYY]

(In XXX, YYY)

COMPOSE[~ と、 ~ と、 ~ と ][ を備えた ](comprising ~ , ~ , and ~ )

Page 10: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 10

Collection of Cue Phrases

1. From description pattern analysisに ( お | 於 ) いて (in), であって (in), ...を特徴とした (be characterized by)

2. From the description patterns of the claims which contain explicitly-inserted newlines

Page 11: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 11

Example of claims in which newlines are explicitly inserted

原稿が載置される原稿台と、 <NL>この原稿台に対して主走査方向に移動する走査光学手段と、 <NL>この走査光学手段上に配置され原稿を副走査方向に照明する照明手段と、を備えた画像読取装置において、 <NL>前記照明手段は、前記走査光学手段に対して走査移動平面に略平行に回動自在に取付けられることを特徴とする画像読取装置。(Publication Number=8-182670, An image reading device)

Page 12: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 12

Description pattern just before the newlines of newline-inserted claims

No Pattern Ratio Cumulative Ratio

1 (Noun|Symbol) と ( 、 | , )[Note: “ と” is a postpositional particle and means “and”.]

46.1% 46.1%

3 (Verb-Renyoukei|Adverb-Renyoukei) ( 、 | , )

17.5% 63.6%

2 (Noun|Symbol) において ( 、 | , )[Note: “ において” plays a role of postpositional particle and means “in”.]

16.4% 80.0%

4 (Noun|Symbol) であって ( 、 | , )[Note: “ であって” plays a role of postpositional particle means “in”.]

7.2% 87.2%

Page 13: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

Cue phrases which can be used to analyze patent claims

Token Name

Cue Phrase Gloss

JEPSON_CUE に ( お | 於 ) いて ( 、 | , )であって ( 、 | , )に ( 当 | あ )( た )? り ( 、 | , )

in

FEATURE_CUE

を特徴と ( した | する ) ( 、 | , ) characterized by

COMPOSE_CUE

を搭載して構成され ( た | る | ている )( 、| , )?を ( 、 | , )?( 具 | 備 | そな ) え ( た | る | ている )( 、 | , )?を ( 、 | , )? 具備 ( する | した | している | してなる )( 、 | , )?( で | から ) 構成され ( た | る | ている )( 、| , )?を ( 、 | , )? 有 ( する | した | している )( 、| , )?を ( 、 | , )? 包含 ( する | した | している )( 、| , )?を ( 、 | , )? 含 ( む | んだ | んでいる )( 、 | , )?から ( 、 | , )?( なる | なった | なっている )( 、| , )?から ( 、 | , )?( 成る | 成った | 成っている )( 、| , )?を ( 、 | , )? 設け ( た | ている )( 、 | , )?を ( 、 | , )? 装備 ( する | した | している )( 、| , )?

comprising

Page 14: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

Cue phrases which can be used to analyze patent claims

Token Name Cue Phrase Gloss

NOUN, POSTP_TO, PUNCT_TOUTEN

Sequence of “ (Noun|Symbol )と ( 、| , )”

and

VERB_RENYOU, PUNCT_TOUTENVERB_KIHON

Sequence of “ ( Verb-Renyoukei|Adverb-Renyoukei ) ( 、 | , )”, before “ ( Verb-Kihonkei|Adverb-Kihonkei )+( Noun|Symbol ”)

does

Page 15: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 15

Algorithm

1. Morphological Analysis Using Chasen ( with –j option,

specifying the sentence delimiter as “ ”。:; )

2. Lexical Analysis Context-dependent output token

and string value Judge whether Jepson-like style or not Judge whether process sequence style

or element enumeration style

Page 16: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 16

Algorithm (cont.)

3. Syntax Analysis (= Structure Analysis)

Parser generated from a context-free grammar (CFG) Using BISON-compatible parser-generator CFG: 57 rules, 11 terminals, 19non-

terminals Actions

Build-up RS-Tree Newline insertion and indentation Paraphrase

Page 17: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 17

Evaluation Data for Structure Analysis

59,956 claims (in 1999) extracted from “NTCIR3 patent data collection” Analysis was done by using “Sample

data” (59,968 claims in 1998) The IPC (International Patent

Classification) code distribution was almost the same as the total data in 1999 published by Japan Patent Office.

Page 18: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 18

Evaluation and Result

Accept Ratio Ratio of the claims accepted by the

CFG grammar 99.77%

Processing Speed 0.30 sec/claim (on Linux PC with

Pentium 1GHz and 512MB Memory)

Page 19: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 19

Accuracy Evaluation

Indirect Evaluation Newline-insertion by using the result of RS

analysis Baseline:

Mechanically insert newlines at the end of every sequence of “(NOUN|SYMBOL)( 、 | , )” and “(Verb-Renyoukei|Adverb-Renyoukei) ( 、 | , )”.

Direct Evaluation Evaluation of result of randomly selected

100 claims

Page 20: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 20

Accuracy Evaluation Result

Indirect Evaluation

Baseline

Newline Insertion utilizing Structure analysis

Upper Limit

Recall(R) 0.478 0.674 0.873Precision(P)

0.374 0.663 -

F-measure

0.420 0.669 -

Page 21: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 21

Accuracy Evaluation Result

Direct Evaluation

Category Count

Percentage (Excluding “No Judgment”)

Correct 76 80.85%Partially Correct

11 11.70%

Incorrect 7 7.45%No Judgment 6 -

Page 22: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 22

Term Explanation

Difficult terms used in patent claims: Terms specific to the invention Terms specific to the domain

Approach Use the result of structure analysis Give explanation for terms by utilizing

the “detailed explanation” part Because, what is claimed must be

explained in detail in the “detailed explanation” part.

Page 23: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 23

Structure of Patent Document

Patent Specification Invention Title Claim Detailed Explanation

Technical field Prior art Problem to be resolved by the invention Means of solving the problems Embodiments of the invention Effects of the invention

Page 24: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 24

Preliminary Survey

For the Jepson-like claims, the words used in the first part (the known or precondition part) appear more often in the technical field and the prior art than the words used in the last part. 76.3% (cf. 55.5% for the words in the last

part)

“Terms specific to the domain” are often explained in the prior art by using the following cue phrases. so-called, or, ()

Page 25: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 25

Words usage in Jepson-like claims

Patent Specification Invention Title Claim (Jepson-like type)

First part (known things or the precondition) Last part (new things or the body)

Detailed Explanation Technical field Prior art Problem to be resolved by the invention Means of solving the problems Embodiments of the invention Effects of the invention

76.3%55.5%

Page 26: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 26

“Terms specific to the domain” that can be extracted from “prior art”

For the 132 patent specifications in the field of ink-jet printer: 29 terms can be extracted by the cue

phrase “ いわゆる” (so-called”) from the “prior art” part.

9 of 27 terms are used in the claim description.

For 3 terms, useful explanation can be extracted from the “prior art” part.

Page 27: Patent Claim Processing for Readability  - Structure Analysis and Term Explanation -

. 27

Conclusion

NLP technologies can contribute toward improving the readability. Structure can be analyzed by cue-phrase-

based approach and CFG-based parsing. Explanations for some terms can be given

by utilizing the expression in the detailed explanation.

This can be a step toward more challenging task of automatic “patent map” generation.