Discussion on Chinese Domain Name technology including encoding, testing.
-
Upload
polly-powers -
Category
Documents
-
view
212 -
download
0
Transcript of Discussion on Chinese Domain Name technology including encoding, testing.
![Page 1: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/1.jpg)
Discussion on Chinese Domain Name technology
including encoding, testing
![Page 2: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/2.jpg)
Clean 8 bits & UTF-8 problem
Escape code “\” rule must be clear. Ex. 成功 成功 \
Other special character in UNIX shell Ex. 教育 (|) “ 教育” will be workable
![Page 3: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/3.jpg)
Windows 9X http://user:account@[email protected]/
Ex. http:// 統一企業 will be error Automatic insertion “\” in DNS, not insertion
“\” in DHCP Ex. ping 成功 \ 大學
Windows 2K UTF-8 in resolver(ping,ftp) Clean 8 bits in nslookup Double encoding in IE5 and resolver
Clean 8 bits & UTF-8 problem
![Page 4: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/4.jpg)
Windows Client & Server 之轉碼
IE Browser
Windos OS DNS client
DNS Server
Win 9X Win 2000
ONUTF-8 OFF ON OFF
auto convert to UTF-8
UTF-8
Big-5UTF-8 ^2
UTF8
(default)(default)
![Page 5: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/5.jpg)
Suggestion
Chinese character & Alpha numeric character mixed sub-domain name.
if there exist 8 bits character then that sub-domain character is case sensitive
![Page 6: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/6.jpg)
For example: www.A 王 .tw wWw.A 王 .TW the same
For example: www.a 王 .tw www.A 王 .tw different
Suggestion
![Page 7: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/7.jpg)
Multi-lingual
Multi-Byte character & single byte character 的問題
多國語言使用 multi-byte character
![Page 8: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/8.jpg)
Problem (1)
Multi-byte character has the byte code that is equivalent to single byte ASCII code, and some intermediate processing software package(Ex. BIND, sendmail, web proxy) can not recognize them separately. Especially in control character code (“\”,”@”,”|”…)
![Page 9: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/9.jpg)
Solutions Solution 1
Multi-byte character: \nnn\nnn. Solution 2
Non ASCII code transformation. UTF-8 Solution 3
All character transform to pure ASCII code, UTF-7, UTF-5
Solution 4 Clear byte stuffing, Escape code rule “\\”,”\
@”
![Page 10: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/10.jpg)
Problem (2)
All alphanumeric domain name is case insensitive, Multi-byte character is case sensitive.
![Page 11: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/11.jpg)
Solutions
Solution 1 Alphanumeric character transfer to lower
(or upper) case first. (client iDNS UTF5) Solution 2
All Multi-byte character are transformed to UTF-8, so the multi-byte character will 8th bit set (negative byte) and it will be recognized easily. (win 2K DNS server)
![Page 12: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/12.jpg)
Solutions
Solution 3 If there exists one multi-byte character in
sub-domain name, than that sub-domain will be case sensitive. (BIND server)
For ex. : www.A 王 .tw “A 王” is case sensitive
![Page 13: Discussion on Chinese Domain Name technology including encoding, testing.](https://reader037.fdocument.pub/reader037/viewer/2022110404/56649e925503460f94b977fb/html5/thumbnails/13.jpg)
Why need solution 3
Clear 8 bits is possible Leading byte encoding has been
used popularly (BIG5, GB, JIS…) Compression ratio and conversion
efficiency ? An intermediate stage toward
UNICODE.