2016 PyCon APAC - 너의 사진은 내가 지난 과거에 한일을 알고 있다.
PyCon APAC 2016 Regular Expression[A-Z]+
-
Upload
minji-yang -
Category
Engineering
-
view
1.990 -
download
1
Transcript of PyCon APAC 2016 Regular Expression[A-Z]+
발표자�소개
양민지�/�검객�개발자�
현)�MATA�COMPANY�Software�Engineer��
DEVSISTERS,�The�Beatpacking�Company�
NEXON�Python�보조강사,�Django�Girls�코치
발표에�앞서
이�발표에서는�Python3�를�사용합니다.�
이�발표로�정규표현식을�완전히�이해할�수는�없습니다
다루는�내용
Why�Regex?�
간단한�예제�x�3�
The�re�module�
연습문제와�성능�팁�
그�외�유용한�것들
Why�regex?�
특정한�규칙을�가진�문자열의�집합을�표현하는�데�사용하는�식�
문자열의�검색이나�치환에�편리하다.
100312467 “Why So Lonely” “wondergirls” 3014725 20160306 2016-03-20T12:00:35+09:00
-> “Why So Lonely” “wondergirls” 2016-03-20T12:00
/^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/
WHAAAAT?
How�to�learn�regex?
처음에는�복잡하고�읽을수�없어�어렵게�느껴진다.�
하지만�정규표현식은�생각보다�어렵지�않다.
간단한�예제�x�3
예제�1
핸드폰�번호�매칭
010-3333-7777
\d{3}-\d{4}-\d{4}
예제�2
웹사이트�주소에서�host�이름�가져오기
http://www.google.com/?q=pycon
http:\/\/([^/]*)\/\?q=pycon
The�re�module
re module
Python에서는 re�모듈로�정규�표현식을�처리합니다.�
import re re.search(pattern, string)
re module
>>> re.search(‘abcd’, ‘abcdef’) <_sre.STR_Match object at 0X120670cc2>
>>> re.search(‘zxc’, ‘abcdef’) None
다시�만나는�예제�x�3
re.sub()
import re
phone = '010-1234-5678' re.sub( r'(\d{3}-\d{4}-)(\d{4})', r'\1****', phone )
>>> ’010-1234-****'
re.match()
import re
link = 'http://www.google.com/?q=pycon' match = re.match( r’(http:\/\/)([^/]*)(.*)’, link ) match.group(2)
>>> 'www.google.com'
re.search()
import re
email = '[email protected]' match = re.search('^[^@]*', email)
match.group()
>>> 'minji'
match vs search
import re
sample = '2016pycon' re.match('[a-z]+', sample) >>> None
re.search('[a-z]+', sample) >>> <_sre.SRE_Match object; span=(4, 8), match='pycon'>
re module re.search(pattern, string, flags=0)
= match되는�첫번째�문자열을�찾아줌
re.match(pattern, string, flags=0)
= string 처음부터�match되는지�확인
re.findall(pattern, string, flags=0)
= string 전체에서�pattern과�일치하는�것을�모두�찾아�list로�돌려�줌
Character�classes
. 줄바꿈�문자를�제외한�모든�문자와�매치됨
\d 모든�숫자와�매치됨�[0-9]
\D 숫자가�아닌�문자와�매치됨�[^0-9]
\w 숫자�또는�문자와�매치됨�[a-zA-Z0-9]��(파이썬에선�숫자도�포함)
\W 숫자�또는�문자가�아닌�것과�매치됨�[^a-zA-Z0-9]
\s 화이트�스페이스�문자와�매치됨
\S 화이트�스페이스가�아닌�것과�매치됨
Anchors�and�Repetition
^abc$ 문자열의�시작/�문자열의�마지막과�매치됨
* 0회�이상�반복
+ 1회�이상�반복
? 0회�또는�1회
{x} x회�반복�(e.g�{3}�)
{x,y} x회부터�y회까지�반복
[abc] 문자�집합�중�한�문자를�의미
[^abc] a,b,c�가�아닌�문자
[a-d] a,�b,�c�or�d�사이에�있는�문자를�의미
연습문제�풀어봅시다
<html op="news"><head><meta name="referrer" content="origin"><meta name="viewport" content="width=device-width, initial-scale=1.0"><link rel="stylesheet" type="text/css" href="news.css?8h9C3zM9d2ErvunVTkjK">
<link rel="shortcut icon" href="favicon.ico">
<link rel="alternate" type="application/rss+xml" title="RSS" href="rss">
<title>Hacker News</title></head><body><center><table id="hnmain" border="0" cellpadding="0" cellspacing="0" width="85%" bgcolor="#f6f6ef">
<tr><td bgcolor="#ff6600"><table border="0" cellpadding="0" cellspacing="0" width="100%" style="padding:2px"><tr><td style="width:18px;padding-right:4px"><a href="http://www.ycombinator.com"><img src="y18.gif" width="18" height="18" style="border:1px white solid;"></a></td>
<td style="line-height:12pt; height:10px;"><span class="pagetop"><b class="hnname"><a href="news">Hacker News</a></b>
link: https://bugzilla.mozilla.org/show_bug.cgi?id=1173199#c31 title: “Our primary goal is to un-fork the Tor Browser”
link: http://siliconangle.com/blog/2016/08/05/watson-correctly-diagnoses-woman-after-doctors-were-stumped/ title: IBM Watson correctly diagnoses a form of leukemia
link: http://gping.io title: Show HN: Gping.io – Like TinyURL for your car
link: http://bit-player.org/2016/the-39th-root-of-92 title: The 39th Root of 92
link: http://www.sciencealert.com/we-just-got-even-weirder-results-about-the-alien-megastructure-star title: Tabby's star is dimming at an incredible rate
우리가�원하는�Output
regex�안쓰고�코딩해보기
re.DOTALL�??
data = ‘<title>\nPYCON APAC 2016\n\nRegular Expressions\n\n</title>\n’
re.search(‘<title>(.*)</title>’, data).group(1) AttributeError: 'NoneType' object has no attribute ‘group'
re.search(‘<title>(.*)</title>’, data, re.DOTALL).group(1) '\nPYCON APAC 2016\n\nRegular Expressions[A-Z]+\nMinji Yang\n’
re.compile
그�외�유용한�것들
Vim:�Find�and�Replace�
:%s/old/new/g
http://vimregex.com/
1033303 -> 1233303, 1033213 -> 1233213:%s/103\(\d\{4}\)/123\1/g
str.find�vs�re.match�vs�in
http://stackoverflow.com/questions/4901523/whats-a-faster-operation-re-match-search-or-str-find
str.find�vs�re.match�vs�in
http://stackoverflow.com/questions/4901523/whats-a-faster-operation-re-match-search-or-str-find
strfind : 0.441393852234 re.match: 2.12302494049 in : 0.251421928406
WHAAAAT?
성능
정규표현식의�성능은�좋지�않다�
하지만�코딩은�편리하다�
성능이�중요한�코드에는�regex�가�답이�아닐�수�있다
print(“Thank You”)