NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

27
NCNU Linux User Group NCNU Linux User Group 2010 2010 <Regular Expression> <Regular Expression> 王王王 王王王 (Wei-Lun Wang) (Wei-Lun Wang) 2010/07/07 2010/07/07
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    0

Transcript of NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

Page 1: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

NCNU Linux User NCNU Linux User Group 2010 Group 2010

<Regular Expression><Regular Expression>王惟綸王惟綸 (Wei-Lun Wang)(Wei-Lun Wang)

2010/07/072010/07/07

Page 2: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 2

OutlineOutline

What’s a Regular Expression?What’s a Regular Expression? The PurposeThe Purpose What’s grep?What’s grep? Various OperatorsVarious Operators Extended Regular ExpressionsExtended Regular Expressions ExercisesExercises ReferencesReferences

Page 3: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 3

What’s a Regular What’s a Regular Expression?Expression?

A A regular expressionregular expression is a pattern that d is a pattern that describes a set of strings. escribes a set of strings.

ExamplesExamplesXX[[22--77] ] = {X2, X3, X4, X5, X6, X7} = {X2, X3, X4, X5, X6, X7} TT[[aeae]]steste? ? = {Taste, Tast, Teste, Test}= {Taste, Tast, Teste, Test}

Page 4: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 4

The PurposeThe Purpose

The regular expression is used to procThe regular expression is used to process strings. It makes users easily do ess strings. It makes users easily do sesearchingarching, , replacementreplacement, and , and deletiondeletion th though the aid of special characters.ough the aid of special characters.

TT[[aeae]]steste? ? = {Taste, Tast, Teste, Test} = {Taste, Tast, Teste, Test} -- These four strings, -- These four strings, TasteTaste, , TastTast, , TesteTeste,, and and TestTest, can be found out by only sea, can be found out by only searching the pattern “Trching the pattern “T[[aeae]]steste??”.”.

Page 5: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 5

What’s grep?What’s grep? gglobal lobal rregular egular eexpression xpression pprintrint

The The grepgrep command searches for the command searches for the pattern specified by the pattern specified by the PatternPattern parameter and writes each matching line parameter and writes each matching line to standard output. to standard output.

[-i ][-i ] : ignore the type of upper and lower : ignore the type of upper and lower cases cases [-v][-v] : inverse the output : inverse the output

Page 6: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 6

Page 7: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 7

alias & unaliasalias & unalias

Page 8: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 8

Various OperatorsVarious Operators

1.1. [ ][ ] presents any one character among those presents any one character among those characters inside.characters inside.

2.2. [ - ][ - ] presents any one character among the code presents any one character among the code range. range.

3.3. [^ ][^ ] represents the characters not in the range of a represents the characters not in the range of a list.list.

4.4. ^ ^ Matches the empty string at the beginning of a Matches the empty string at the beginning of a line.line.

5.5. $ $ Matches the empty string at the end of a line.Matches the empty string at the end of a line.

6.6. . . Matches any single character.Matches any single character.

7.7. * * The preceding item will be matched zero or The preceding item will be matched zero or more times.more times.

Page 9: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 9

1. [ ]1. [ ] presents any one character among those characters inside. presents any one character among those characters inside.

th[ei] = {the, thi}th[ei] = {the, thi}

Page 10: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 10

2. [ - ]2. [ - ] presents any one character among the code range. presents any one character among the code range. LANG=C : 0 1 2 3 4 ... A B C D ... Z a b c d ...z LANG=zh_TW.Big5 : 0 1 2 3 4 ... a A b B c C d D ... z Z

Page 11: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 11

LANG=C : 0 1 2 3 4 ... A B C D ... Z a b c d ...z LANG=zh_TW.Big5 : 0 1 2 3 4 ... a A b B c C d D ... z Z

2. [ - ]2. [ - ] presents any one character among the code range. presents any one character among the code range.

Page 12: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 12

Symbol Meaning

[:alnum:] 代表英文大小寫字元及數字,亦即 0-9, A-Z, a-z

[:alpha:] 代表任何英文大小寫字元,亦即 A-Z, a-z

[:blank:] 代表空白鍵與 [Tab] 按鍵兩者[:cntrl:] 代表鍵盤上面的控制按鍵,亦即包括 CR, LF, Tab, Del.. 等等[:digit:] 代表數字而已,亦即 0-9

[:graph:] 除了空白字元 ( 空白鍵與 [Tab] 按鍵 ) 外的其他所有按鍵[:lower:] 代表小寫字元,亦即 a-z

[:print:] 代表任何可以被列印出來的字元[:punct:] 代表標點符號 (punctuation symbol) ,亦即: " ' ? ! ; : # $...

[:upper:] 代表大寫字元,亦即 A-Z

[:space:] 任何會產生空白的字元,包括空白鍵 , [Tab], CR 等等

[:xdigit:]代表 16 進位的數字類型,因此包括: 0-9, A-F, a-f 的數字與字元

2. [ - ]2. [ - ] presents any one character among the code range. presents any one character among the code range.

Page 13: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 13

3. [^3. [^ ]] represents the characters not in the range of a list. represents the characters not in the range of a list.

Page 14: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 14

4. ^ 4. ^ Matches the empty string at the beginning of a line.Matches the empty string at the beginning of a line.

Page 15: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 15

5. $ 5. $ Matches the empty string at the end of a line. Matches the empty string at the end of a line.

Page 16: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 16

6. . 6. . Matches any single character.Matches any single character.

Page 17: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 17

7. * 7. * The preceding item will be matched zero or more times.The preceding item will be matched zero or more times.

go* = {g, go, goo, gooo, …}go* = {g, go, goo, gooo, …}

goo* = {go, goo, gooo, …}goo* = {go, goo, gooo, …}

Page 18: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 18

Extended Regular Extended Regular ExpressionsExpressions

In basic regular expressions the metacharacters "?", "+",In basic regular expressions the metacharacters "?", "+", "{", "|", "(", and ")" lose their special meaning; instead use "{", "|", "(", and ")" lose their special meaning; instead use the the backslashed versionsbackslashed versions "\?", "\+", "\{", "\|", "\(", and "\)". "\?", "\+", "\{", "\|", "\(", and "\)".

Using Using grep -Egrep -E or or egrepegrep instead of grep. instead of grep.

1.1. ++ The preceding item will be matched one or more time The preceding item will be matched one or more times. s.

2.2. ? ? The preceding item will be matched zero or one time. The preceding item will be matched zero or one time. 3.3. | | represents the preceding item or the following item. represents the preceding item or the following item.4.4. ( )( ) represents group strings. represents group strings.5.5. {N}{N} The preceding item is matched exactly N times. The preceding item is matched exactly N times. 6.6. {N, }{N, } The preceding item is matched N or more times. The preceding item is matched N or more times. 7.7. {N,M}{N,M} The preceding item is matched at least N times, The preceding item is matched at least N times,

but not more than M times. but not more than M times.

Page 19: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 19

1. +1. + The preceding item will be matched one or more times. The preceding item will be matched one or more times.

goo+ = {goo, gooo, goooo, …}goo+ = {goo, gooo, goooo, …}

Page 20: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 20

2. ?2. ? The preceding item will be matched zero or one time. The preceding item will be matched zero or one time.

goog? = {goog, goo}goog? = {goog, goo}

Page 21: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 21

3. |3. | represents the preceding item or the following item. represents the preceding item or the following item.

goo|fav = {goo, fav}goo|fav = {goo, fav}

Page 22: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 22

4. ( )4. ( ) represents group strings. represents group strings.

f(oo|ee)d = {food, feed}f(oo|ee)d = {food, feed}

Page 23: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 23

5. {N}5. {N} The preceding item is matched exactly N times. The preceding item is matched exactly N times.

go\{2\} = {goo}go\{2\} = {goo}

go\{5\} = {gooooo}go\{5\} = {gooooo}

Page 24: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 24

6. {N, }6. {N, } The preceding item is matched N or more times. The preceding item is matched N or more times.

Page 25: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 25

7. {N,M}7. {N,M} The preceding item is matched at least N times, The preceding item is matched at least N times, but not more than M times.but not more than M times.

go\{2,5\}g = {goog, gooog, goooog, gooooog}go\{2,5\}g = {goog, gooog, goooog, gooooog}

Page 26: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 26

ExercisesExercises

1.1. What does What does grep -n '^[^A-z] 'grep -n '^[^A-z] ' mean? mean?

2.2. How to find out How to find out empty linesempty lines??

3.3. How to find out “How to find out “[LUG2010][LUG2010]”?”?

4.4. Find all files and their contents Find all files and their contents containing the symbol “containing the symbol “**” under ” under /etc/etc

Page 27: NCNU Linux User Group 2010 NCNU Linux User Group 2010 王惟綸 (Wei-Lun Wang) 2010/07/07.

2010/7/7 27

ReferencesReferences

http://linux.vbird.org/linux_basic/0330reghttp://linux.vbird.org/linux_basic/0330regularex.phpularex.php

http://tldp.org/LDP/Bash-Beginners-Guide/http://tldp.org/LDP/Bash-Beginners-Guide/html/chap_04.htmlhtml/chap_04.html

http://en.wikipedia.org/wiki/Regular_exprhttp://en.wikipedia.org/wiki/Regular_expressionession

http://www.regular-expressions.info/http://www.regular-expressions.info/posix.htmlposix.html