小虫玩電腦: 文件編碼自動偵測

當我們拿到一個文字檔的時候，如果不知道他的編碼，就有可能顯示錯誤或是轉碼錯誤。之前看到的 solution 都不是 java 開發的，今天剛好看到，就記錄下來，以備以後不時之需：

jchardet ：http://jchardet.sourceforge.net/ 看起來是乎是把 Mozilla Charset Detector 的演算法改用 java 撰寫。
cpdetector：http://cpdetector.sourceforge.net/ 據說比 jchardet 更可靠，不過有時會出現奇怪的 exception
ROME 的 XML Charset Encoding detection 功能：http://wiki.java.net/bin/view/Javawsxml/Rome05CharsetEncoding

以上是 java 相關的 implement ，如需其它語言可看後面的參考聯結。參考： http://blog.linux.org.tw/~jserv/archives/001672.html http://www.arachna.com/roller/spidaman/entry/character_set_encoding_detection_in http://william.cswiz.org/blog/archives/2008-10-28/charset-patch-part2/ http://fredeaker.blogspot.com/2007/01/character-encoding-detection.html

小虫玩電腦

2008-12-04

文件編碼自動偵測

沒有留言:

張貼留言