Perl 5 version 32.0 documentation



Encode::CN - China-based Chinese Encodings


  1. use Encode qw/encode decode/;
  2. $euc_cn = encode("euc-cn", $utf8); # loads Encode::CN implicitly
  3. $utf8 = decode("euc-cn", $euc_cn); # ditto


This module implements China-based Chinese charset encodings. Encodings supported are as follows.

  1. Canonical Alias Description
  2. --------------------------------------------------------------------
  3. euc-cn /\beuc.*cn$/i EUC (Extended Unix Character)
  4. /\bcn.*euc$/i
  5. /\bGB[-_ ]?2312(?:\D.*$|$)/i (see below)
  6. gb2312-raw The raw (low-bit) GB2312 character map
  7. gb12345-raw Traditional chinese counterpart to
  8. GB2312 (raw)
  9. iso-ir-165 GB2312 + GB6345 + GB8565 + additions
  10. MacChineseSimp GB2312 + Apple Additions
  11. cp936 Code Page 936, also known as GBK
  12. (Extended GuoBiao)
  13. hz 7-bit escaped GB2312 encoding
  14. --------------------------------------------------------------------

To find how to use this module in detail, see Encode.


Due to size concerns, GB 18030 (an extension to GBK ) is distributed separately on CPAN, under the name Encode::HanExtra. That module also contains extra Taiwan-based encodings.


When you see charset=gb2312 on mails and web pages, they really mean euc-cn encodings. To fix that, gb2312 is aliased to euc-cn . Use gb2312-raw when you really mean it.

The ASCII region (0x00-0x7f) is preserved for all encodings, even though this conflicts with mappings by the Unicode Consortium.