Lingua::KO::Hangul::Util - utility functions for Hangul in Unicode
use Lingua::KO::Hangul::Util qw(:all); decomposeSyllable("\x{AC00}"); # "\x{1100}\x{1161}" composeSyllable("\x{1100}\x{1161}"); # "\x{AC00}" decomposeJamo("\x{1101}"); # "\x{1100}\x{1100}" composeJamo("\x{1100}\x{1100}"); # "\x{1101}" getHangulName(0xAC00); # "HANGUL SYLLABLE GA" parseHangulName("HANGUL SYLLABLE GA"); # 0xAC00
A Hangul syllable consists of Hangul jamo (Hangul letters).
Hangul letters are classified into three classes:
CHOSEONG (the initial sound) as a leading consonant (L), JUNGSEONG (the medial sound) as a vowel (V), JONGSEONG (the final sound) as a trailing consonant (T).
Any Hangul syllable is a composition of (i) L + V, or (ii) L + V + T.
$resultant_string = decomposeSyllable($string)
It decomposes a precomposed syllable (LV or LVT) to a sequence of conjoining jamo (L + V or L + V + T) and returns the result as a string.
LV
LVT
L + V
L + V + T
Any characters other than Hangul syllables are not affected.
$resultant_string = composeSyllable($string)
It composes a sequence of conjoining jamo (L + V or L + V + T) to a precomposed syllable (LV or LVT) if possible, and returns the result as a string. A syllable LV and final jamo T are also composed.
T
Any characters other than Hangul jamo and syllables are not affected.
$resultant_string = decomposeJamo($string)
It decomposes a complex jamo to a sequence of simple jamo if possible, and returns the result as a string. Any characters other than complex jamo are not affected.
e.g. CHOSEONG SIOS-PIEUP to CHOSEONG SIOS + PIEUP JUNGSEONG AE to JUNGSEONG A + I JUNGSEONG WE to JUNGSEONG U + EO + I JONGSEONG SSANGSIOS to JONGSEONG SIOS + SIOS
$resultant_string = composeJamo($string)
It composes a sequence of simple jamo (L1 + L2, V1 + V2 + V3, etc.) to a complex jamo if possible, and returns the result as a string. Any characters other than simple jamo are not affected.
L1 + L2
V1 + V2 + V3
e.g. CHOSEONG SIOS + PIEUP to CHOSEONG SIOS-PIEUP JUNGSEONG A + I to JUNGSEONG AE JUNGSEONG U + EO + I to JUNGSEONG WE JONGSEONG SIOS + SIOS to JONGSEONG SSANGSIOS
$resultant_string = decomposeFull($string)
It decomposes a syllable/complex jamo to a sequence of simple jamo. Equivalent to decomposeJamo(decomposeSyllable($string)).
decomposeJamo(decomposeSyllable($string))
$string_decomposed = decomposeHangul($code_point)
@codepoints = decomposeHangul($code_point)
If the specified code point is of a Hangul syllable, it returns a list of code points (in a list context) or a string (in a scalar context) of its decomposition.
decomposeHangul(0xAC00) # U+AC00 is HANGUL SYLLABLE GA. returns "\x{1100}\x{1161}" or (0x1100, 0x1161); decomposeHangul(0xAE00) # U+AE00 is HANGUL SYLLABLE GEUL. returns "\x{1100}\x{1173}\x{11AF}" or (0x1100, 0x1173, 0x11AF);
Otherwise, returns false (empty string or empty list).
decomposeHangul(0x0041) # outside Hangul syllables returns empty string or empty list.
$string_composed = composeHangul($src_string)
@code_points_composed = composeHangul($src_string)
Any sequence of an initial jamo L and a medial jamo V is composed to a syllable LV; then any sequence of a syllable LV and a final jamo T is composed to a syllable LVT.
L
V
composeHangul("\x{1100}\x{1173}\x{11AF}.") # returns "\x{AE00}." or (0xAE00,0x2E);
$code_point_composite = getHangulComposite($code_point_here, $code_point_next)
It returns the codepoint of the composite if both two code points, $code_point_here and $code_point_next, are in Hangul, and composable.
$code_point_here
$code_point_next
Otherwise, returns undef.
undef
The following functions handle only a precomposed Hangul syllable (from U+AC00 to U+D7A3), but not a Hangul jamo or other Hangul-related character.
U+AC00
U+D7A3
Names of Hangul syllables have a format of "HANGUL SYLLABLE %s".
"HANGUL SYLLABLE %s"
$name = getHangulName($code_point)
If the specified code point is of a Hangul syllable, it returns its name; otherwise it returns undef.
getHangulName(0xAC00) returns "HANGUL SYLLABLE GA"; getHangulName(0x0041) returns undef.
$codepoint = parseHangulName($name)
If the specified name is of a Hangul syllable, it returns its code point; otherwise it returns undef.
parseHangulName("HANGUL SYLLABLE GEUL") returns 0xAE00; parseHangulName("LATIN SMALL LETTER A") returns undef; parseHangulName("HANGUL SYLLABLE PERL") returns undef; # Regrettably, HANGUL SYLLABLE PERL does not exist :-)
Standard Korean syllable block consists of L+ V+ T* (a sequence of one or more L, one or more V, and zero or more T) according to conjoining jamo behabior revised in Unicode 3.2 (cf. UAX #28). A sequence of L followed by T is not a syllable block without V, but consists of two nonstandard syllable blocks: one without V, and another without L and V.
L+ V+ T*
$bool = isStandardForm($string)
It returns boolean whether the string is encoded in the standard form without a nonstandard sequence. It returns true only if the string contains no nonstandard sequence.
$resultant_string = insertFiller($string)
It transforms the string into standard form by inserting fillers into each syllables and returns the result as a string. Choseong filler (Lf, U+115F) is inserted into a syllable block without L. Jungseong filler (Vf, U+1160) is inserted into a syllable block without V.
Lf
U+115F
Vf
U+1160
$type = getSyllableType($code_point)
It returns the Hangul syllable type (cf. HangulSyllableType.txt) for the specified code point as a string: "L" for leading jamo, "V" for vowel jamo, "T" for trailing jamo, "LV" for LV syllables, "LVT" for LVT syllables, and "NA" for other code points (as Not Applicable).
"L"
"V"
"T"
"LV"
"LVT"
"NA"
By default:
decomposeHangul composeHangul getHangulName parseHangulName getHangulComposite
On request:
decomposeSyllable composeSyllable decomposeJamo composeJamo decomposeFull isStandardForm insertFiller getSyllableType
This module does not support Hangul jamo assigned in Unicode 5.2.0 (2009).
A list of Hangul charcters this module supports:
1100..1159 ; 1.1 # [90] HANGUL CHOSEONG KIYEOK..HANGUL CHOSEONG YEORINHIEUH 115F..11A2 ; 1.1 # [68] HANGUL CHOSEONG FILLER..HANGUL JUNGSEONG SSANGARAEA 11A8..11F9 ; 1.1 # [82] HANGUL JONGSEONG KIYEOK..HANGUL JONGSEONG YEORINHIEUH AC00..D7A3 ; 2.0 # [11172] HANGUL SYLLABLE GA..HANGUL SYLLABLE HIH
SADAHIRO Tomoyuki <SADAHIRO@cpan.org>
Copyright(C) 2001, 2003, 2005, SADAHIRO Tomoyuki. Japan. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
http://www.unicode.org/reports/tr15/
http://www.unicode.org/reports/tr28/#3_11_conjoining_jamo_behavior
http://www.unicode.org/Public/UNIDATA/HangulSyllableType.txt
http://www.unicode.org/Public/2.1-Update3/UnicodeData-2.1.8.txt
Paper by K. KIM: New canonical decomposition and composition processes for Hangeul
http://std.dkuug.dk/JTC1/SC22/WG20/docs/N954.PDF
(summary: http://std.dkuug.dk/JTC1/SC22/WG20/docs/N953.PDF) (cf. http://std.dkuug.dk/JTC1/SC22/WG20/docs/documents.html)
To install Lingua::KO::Hangul::Util, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Lingua::KO::Hangul::Util
CPAN shell
perl -MCPAN -e shell install Lingua::KO::Hangul::Util
For more information on module installation, please visit the detailed CPAN module installation guide.