Unicode::Tussle - Tom's Unicode Scripts So Life is Easier
This is a collection of separate program. See the documentation for each program.
These programs are installed wherever you told the build system to install programs. These are part of Tom's Perl Unicode Essentials talk from OSCON 2011. You might have to adjust your fonts to see some of the characters in these examples. George Douros’s Symbola font, which handles Unicode 6.0, is a good one: http://users.teilar.gr/~g1951d/
HOLY TRIO OF INDISPENSABLE TOOLS FOR UNDERSTANDING THE UCD & UNICODE IN GENERAL unichars - show which code points match arbitrary criteria uniprops - show which props a code point has (by number or name, etc) uninames - intelligrep the now-excised NameList.txt (included) REWRITES OF CRITICAL UNIX PROGRAMS: uniquote - replacement for od(1) or -v option to cat(1), but for Unicode tcgrep - very ancient grep(1) replacment, needs rewrite but now supports named character unilook - look(1) rewrite but with grep and agrep support; requires included words.utf8 file ucsort - sort(1) rewrite using the UCA, includes Unicode locales, and intelligent --pre stuff unifmt - fmt(1) rewrite, using the ULA; both smarter and dumber than Damian's rename - ancient rewrite of Larry's old rename(1) rewrite; might help Unicode filesyssues uniwc - wc(1) rewrite for Unicode, includes \R support, graphemes, etc; needs refactoring PROGRAMS FOR NORMALIZATION FILTERS, CHECKER nfd, nfc, nfkd, nfkc - Unicode normalization filters nfcheck - report which which of NF{,K}[DC} apply to any given file % nfcheck leo hantest nunez tc macroman leo: NFC NFD hantest: NFC nunez: NFC NFKC tc: NFC NFKC NFD NFKD (RE)CASING FILTER PROGRAMS: lc - filter to do the Unicode toLower casemapping % echo "Filter to Convert a Title's Words to the Right Case" | lc filter to convert a title's words to the right case tc - filter to do the Unicode toTitle casemapping (intelligently) % echo "filter to convert a title's words to the right case" | tc Filter To Convert A Title's Words To The Right Case titulate - \u\L-converts string args to English **HEADLINE** case (NB: headline != titlecase) % titulate "filter to CONVERT a title's words to the right case" Filter to Convert a Title's Words to the Right Case uc - filter to do the Unicode toUpper casemapping % echo "filter to convert a title's words to the right case" | uc FILTER TO CONVERT A TITLE'S WORDS TO THE RIGHT CASE FONT GAME PROGRAMS: leo - uʍopəpᴉsdn sƃuᴉɥʇ əʇᴉɹʍ oʇ ɹəʇlᴉɟ unifont - filter for showing all Unicode "alternate font" letters % echo hic sunt data unicodica | unifont Double-Struck: 𝕙𝕚𝕔 𝕤𝕦𝕟𝕥 𝕕𝕒𝕥𝕒 𝕦𝕟𝕚𝕔𝕠𝕕𝕚𝕔𝕒 Monospace: 𝚑𝚒𝚌 𝚜𝚞𝚗𝚝 𝚍𝚊𝚝𝚊 𝚞𝚗𝚒𝚌𝚘𝚍𝚒𝚌𝚊 Sans-Serif: 𝗁𝗂𝖼 𝗌𝗎𝗇𝗍 𝖽𝖺𝗍𝖺 𝗎𝗇𝗂𝖼𝗈𝖽𝗂𝖼𝖺 Sans-Serif Italic: 𝘩𝘪𝘤 𝘴𝘶𝘯𝘵 𝘥𝘢𝘵𝘢 𝘶𝘯𝘪𝘤𝘰𝘥𝘪𝘤𝘢 Sans-Serif Bold: 𝗵𝗶𝗰 𝘀𝘂𝗻𝘁 𝗱𝗮𝘁𝗮 𝘂𝗻𝗶𝗰𝗼𝗱𝗶𝗰𝗮 Sans-Serif Bold Italic: 𝙝𝙞𝙘 𝙨𝙪𝙣𝙩 𝙙𝙖𝙩𝙖 𝙪𝙣𝙞𝙘𝙤𝙙𝙞𝙘𝙖 Script: 𝒽𝒾𝒸 𝓈𝓊𝓃𝓉 𝒹𝒶𝓉𝒶 𝓊𝓃𝒾𝒸ℴ𝒹𝒾𝒸𝒶 Italic: h𝑖𝑐 𝑠𝑢𝑛𝑡 𝑑𝑎𝑡𝑎 𝑢𝑛𝑖𝑐𝑜𝑑𝑖𝑐𝑎 Bold: 𝐡𝐢𝐜 𝐬𝐮𝐧𝐭 𝐝𝐚𝐭𝐚 𝐮𝐧𝐢𝐜𝐨𝐝𝐢𝐜𝐚 Bold Italic: 𝒉𝒊𝒄 𝒔𝒖𝒏𝒕 𝒅𝒂𝒕𝒂 𝒖𝒏𝒊𝒄𝒐𝒅𝒊𝒄𝒂 Fraktur: 𝔥𝔦𝔠 𝔰𝔲𝔫𝔱 𝔡𝔞𝔱𝔞 𝔲𝔫𝔦𝔠𝔬𝔡𝔦𝔠𝔞 Bold Fraktur: 𝖍𝖎𝖈 𝖘𝖚𝖓𝖙 𝖉𝖆𝖙𝖆 𝖚𝖓𝖎𝖈𝖔𝖉𝖎𝖈𝖆 unicaps - Fɪʟᴛᴇʀ ᴛᴏ ᴄᴏɴᴠᴇʀᴛ ᴛᴏ sᴍᴀʟʟ ᴄᴀᴘs unisubs, unisupers - filter to show subscripted₁₉₈₇ and ˢᵘᵖᵉʳˢᶜʳⁱᵖᵗᵉᵈ versions unititle - prototype to over/underline things (real version inprogress) uniwide, uninarrow - reversable filters for converting to FULLWIDTH equivs TEST AND DEMO PROGRAMS: macroman - show mapping between MacRoman and Unicode byte2uni - early prototype of general-purpose version of the macroman DEMO: byte2uni -a -ecp1252 es-sort - how to do fancy UCA sorts, using Iberian city-names hantest - demo of Unihan stuff and Unicode::{LineBreak, GCString} havshpx - vs lbh unir gb nfx, lbh qb abg jnag gb xabj hypertest - demo support trans-Unicode code point support nunez - demo accent-insensitive searches; ¡MUY BIEN COMENTADO! vowel-sigs - show how to create your own properties; also, regex subroutines MODULES ForbidUnderscore.pm - "no Underscore;" forbids unlocalized $_ access FixString.pm - tries to sort text items with numbers, including Roman, intelligently, includes support for Unicode Romans, and for Romans written in Latin script, but requires Roman.pm module for the latter. Falls back to the UCA. tchrist-unicode-charclasses__alpha.java - EGAD! I talked them into making most of this functionality part of JDK7. LIBRARIES: unicore/{all,html,uwords}_alias.pl - a forgotten charnames facility FILES: words.utf8 - dictionary list needed for for unilook
Modules: FixString.pm - program & module to do "logical" sorting w/numbers ForbidUnderscore.pm - forbid unlocalized $_ with no Underscore Libraries: unicore/html_alias.pl - allows for customer charclass names \N{egrave} etc unicore/uwords_alias.pl - ditto with specials for unilook, like \N{spu} unicore/all_alias.pl - both the above Programs for probing the UCD: unichars - list characters for one or more properties uniprops - list regex properties of one or more characters uninames - search the current Unicode NamesList Encoding Demos macroman - how the MacRoman encoding maps to Unicode byte2uni - generalized `macroman` program; try `byteuni -a -cp1252` Unix Tool Rewrites (not fmt but) unifmt - like `fmt` but uses the Unicode Linebreaking Algorithm (ULA) (not grep but) tcgrep - like `grep`, but groks unicode patterns and data (not look but) unilook - improved `look` + `grep` + `agrep` on included `words.utf8` (not mv but) rename - a better version of rename, takes a perl pattern (not od but) uniquote - like `cat -v` or `od`, but better (not sort but) ucsort - `sort` input records according to the Unicode Collation Algorithm (UCA) (not wc but) uniwc - Unicode rewrite a `wc` (needs nonslurpy rewrite) Casing Filters lc - filter into Unicode lowercase uc - filter into Unicode uppercase tc - filter into Unicode titlecase+lowercase titulate - like tc but used English headline rulers Normalization Filters nfc - convert to NFD nfd - convert to NFD nfkc - convert to NFKC nfkd - convert to NFKD nfcheck - report which NF forms file(s) are in Font Games unifont - display equivalent Math/fonted versions leo - write like Leonardo unicaps - convert lowercase to Unicode small caps unisubs - show equivalent subscripts where available unisupers - show equivalent supercripts where available uniwide - convert regular text to full-width if possible uninarrow - convert full-wdith to regular width if possible unititle - prototype to use combining underlines unifrac - create a fraction with super and subscripts Demos and Test Programs es-sort - demo how to use a custom UCA on Spanish cities hantest - demo various Unihan bits, including the ULA havshpx - you have to figure this one out yourself hypertest - demo forbidden Unicode chars, like supers and hypers vowel-sigs - get the CVCVVC signatures for word nunez - demo how to use the UCA for accent-insensitive searching
A few use Unicode not just in literals, but in identifiers, too:
% tcgrep '^\h*(use|require)\h+(charnames|re|Unicode|Encode|DBM_Filter|feature|open|Lingua|utf8|v?5|encoding|bytes|locale)\b' * hypertest: my @ὑπέρμεγας = ( leo: my $ʇndʇno = uʍopəpᴉƨdn($input); leo: say $ʇndʇno; leo: sub uʍopəpᴉƨdn($) { nunez: my $INCLUÍR_NINGUNOS = 0; nunez: my $SI_IMPORTAN_MARCAS_DIACRÍTICAS = 0; nunez: sub sí_ó_no(_) { $_[0] ? "sí" : "no" } nunez: my @ciudades_españolas = ordenar_a_la_española(<<'LA_ÚLTIMA' =~ /\S.*\S/g); nunez: my $cmáx = -(2 + max map { length } @ciudades_españolas); nunez: my @búsquedas = < {A,E,I,O,U}N AL >; nunez: my $bmáx = -(2 + max map { length } @búsquedas); nunez: for my $aldea (@ciudades_españolas) { nunez: my $déjà_imprimée; # Mais oui! C’est en français celle‐ci! nunez: for my $búsqueda (@búsquedas) { nunez: my @resultados = $ordenador->gmatch($aldea, $búsqueda); nunez: next unless @resultados || $INCLUÍR_NINGUNOS; nunez: $cmáx => !$déjà_imprimée++ && encomillar($aldea), nunez: $bmáx => "/$búsqueda/", nunez: sub cuántos_sitios { nunez: sub ordenar_a_la_española { nunez: state $ordenador_a_la_española = new Unicode::Collate:: nunez: return $ordenador_a_la_española->sort(@lista); ucsort: ($OFS, $IFS) if /\x{FFFF}/; # déjà vu uniquote: my $fh = $file; # is *so* a lexical filehandle! ☺ uniquote: sub commaʼd_list { uniwc: my $fh = $file; # is *so* a lexical filehandle! ☺
% tcgrep '^\h*(use|require)\h+(charnames|re|Unicode|Encode|DBM_Filter|feature|open|Lingua|utf8|v5|encoding|bytes|locale)\b' * FixString.pm: use v5.12; FixString.pm: use utf8; FixString.pm: use charnames qw< :full greek >; FixString.pm: use Unicode::Collate; FixString.pm: use open < :std :utf8 >; FixString.pm: require Unicode::Normalize; byte2uni: use v5.10; byte2uni: use charnames qw[ :full ]; byte2uni: use Encode ( byte2uni: require Encode; es-sort: use Unicode::Collate; hantest: use 5.10.1; hantest: use utf8; hantest: use open qw[ :utf8 :std ]; hantest: use charnames qw[ :full ]; hantest: use Encode qw[ decode ]; hantest: use Unicode::UCD qw{ hantest: use Unicode::Normalize qw[ NFC NFD ]; # UAX#15 hantest: use Unicode::Unihan; # UAX#38 hantest: use Unicode::GCString; # UAX#29 hantest: use Unicode::LineBreak qw(:all); # UAX#14-C2 hantest: use Lingua::KO::Hangul::Util qw[ :all ]; hantest: use Lingua::JA::Romanize::Japanese; hantest: use Lingua::ZH::Romanize::Pinyin; hantest: use Lingua::KO::Romanize::Hangul; havshpx: hfr puneanzrf ":shyy"; havshpx: hfr Havpbqr::Abeznyvmr; hypertest: use 5.10.0; hypertest: use utf8; lc: use v5.12; lc: use utf8; lc: use feature qw< unicode_strings >; lc: use open qw< :std :utf8 >; lc: use Unicode::Normalize qw< NFD NFC >; leo: use 5.010_000; leo: use utf8; leo: use open qw[ :std :utf8 ]; macroman: use charnames qw[ :full ]; macroman: use Encode ( nfc: use 5.10.1; nfc: use open qw[ :std IO :utf8 ]; nfc: use Unicode::Normalize; nfcheck: use open qw< :std IN :utf8 >; nfcheck: use Unicode::Normalize @NF_NAMES; nfd: use 5.10.1; nfd: use open qw[ :std IO :utf8 ]; nfd: use Unicode::Normalize; nfkc: use 5.10.1; nfkc: use open qw[ :std IO :utf8 ]; nfkc: use Unicode::Normalize; nfkd: use 5.10.1; nfkd: use open qw[ :std IO :utf8 ]; nfkd: use Unicode::Normalize; nunez: use 5.10.1; nunez: use utf8; nunez: use charnames qw< :full >; nunez: use Unicode::Collate; tc: use v5.12; tc: use utf8; tc: use feature qw< unicode_strings >; tc: use open qw< :std :utf8 >; tc: use charnames qw< :full >; tc: use Unicode::Normalize qw< NFD NFC >; tcgrep: use re "eval"; tcgrep: use charnames qw[ :full uc: use v5.12; uc: use utf8; uc: use feature qw< unicode_strings >; uc: use open qw< :std :utf8 >; uc: use Unicode::Normalize qw< NFD NFC >; ucsort: use 5.10.1; ucsort: use utf8; ucsort: use open qw[ :std IO :utf8 ]; ucsort: use Unicode::Collate; ucsort: require Unicode::Collate::Locale; unicaps: use 5.10.1; unicaps: use utf8; unichars: use 5.10.1; unichars: use charnames qw[ :full :short latin greek ]; unichars: use Encode qw[ decode ]; unichars: use Unicode::UCD qw(charinfo casefold); unichars: require Unicode::Normalize; unichars: require Unicode::Collate; unichars: require Unicode::Collate::Locale; unichars: require Unicode::UCD; unifmt: use 5.10.1; unifmt: use utf8; unifmt: use open qw[ :utf8 :std ]; unifmt: use charnames qw[ :full ]; unifmt: use Unicode::GCString; # UAX#29 unifmt: use Unicode::LineBreak qw(:all); # UAX#14-C2 unifont: use v5.12; unifont: use utf8; unifont: use open qw< :std :encoding(UTF-8) >; unifont: use charnames qw< :full >; unifont: use Unicode::Normalize; unilook: use 5.010_000; unilook: use charnames ( unilook: use Unicode::Normalize; unilook: use Encode qw( encode decode ); unilook: require Unicode::Collate; uninames: use 5.10.1; uninames: use re "eval"; uninames: use Unicode::UCD; # not exported: qw[ UnicodeVersion ]; uninarrow: use utf8; uninarrow: use open qw(:std :utf8); uniprops: use 5.10.0; # but prefer 5.12.0 uniprops: use charnames qw[ uniprops: use Encode qw[ decode ]; uniquote: use 5.10.1; uniquote: use utf8; uniquote: use charnames qw[:full]; uniquote: require Encode; unisubs: use 5.014; unisubs: use utf8; unisubs: use open qw< :std :utf8 >; unisubs: use charnames qw< :full >; unisubs: use feature qw< unicode_strings >; unisubs: use re "/msux"; unisubs: use Encode qw< encode decode >; unisubs: use Unicode::Normalize qw< NFD NFC NFKD NFKC >; unisupers: use 5.014; unisupers: use utf8; unisupers: use open qw< :std :utf8 >; unisupers: use charnames qw< :full >; unisupers: use feature qw< unicode_strings >; unisupers: use re "/msux"; unisupers: use Encode qw< encode decode >; unisupers: use Unicode::Normalize qw< NFD NFC NFKD NFKC >; unititle: use 5.010; uniwc: require Encode; uniwide: use utf8; uniwide: use open qw(:std :utf8); vowel-sigs: use v5.12; vowel-sigs: use Lingua::EN::Syllable;
There's absolutely nothing like examples.
uniprops '[' uniprops '[' '{' ')' uniprops '[' '{' '<' uniprops '[' '{' '>' uniprops ']' uniprops 08 uniprops a8 uniprops 00ff uniprops 180B uniprops 180B 303E uniprops 180B 303E FE01 uniprops 180B 303E FE01 E0101 uniprops 2026 uniprops 202e uniprops 2058 uniprops 2060 uniprops 2062 uniprops 20a8 uniprops 20e0 uniprops 2163 uniprops 2241 uniprops 2421 uniprops 2461 uniprops 26bd uniprops 2e2c uniprops 3000 uniprops fb01 uniprops feff uniprops ffef uniprops FFFD uniprops 1011c uniprops 1101c uniprops 12000 uniprops 13000 uniprops 1F42A uniprops 1F42A '$' uniprops 1F42A '$' % @ uniprops 1F4A9 uniprops 1F608 uniprops 4 uniprops -a 03 08 uniprops -a 1F42a uniprops -a 2062 uniprops -a 20e0 uniprops -a 3350 uniprops -a 3 8 uniprops -a 4 uniprops -a FFFD uniprops -a 'MATHEMATICAL BOLD FRAKTUR CAPITAL T' uniprops -ga 4 uniprops -gl | less -r uniprops HYPHEN uniprops -l uniprops 'LADY BEETLE' uniprops -l | less -r uniprops -taw75 20e0 uniprops -w75 -a 20e0
uninames ancient uninames ankh uninames arrow uninames ass uninames AT uninames atom uninames AT SIGN uninames '\bAA\b' uninames ball uninames BALL uninames balls uninames '\bALPHA\b' uninames '\band\b' uninames '\bAND\b' uninames '\bAT\b' uninames beetle uninames '\b[IJ]\b' -WITH uninames bird uninames black letter uninames BLACK LETTER uninames BLACK-LETTER uninames '\bNO\b' uninames BOLD TWO uninames book uninames brac uninames brace uninames brok uninames '\bSCRIPT\b' uninames '\bSCRIPT\b' -MATHEMATICAL uninames '\bTAU\b' uninames '\bT\b' WITH uninames bug uninames bullet uninames burro uninames '\bY\b' | tcgrep -v '^\t' | ucsort | less -r uninames '\bz\b' uninames '\bZ\b' uninames '\bz\b' | tcgrep -v '^\t' | ucsort | less -r uninames '\bZ\b' | tcgrep -v '^\t' | ucsort | less -r uninames camel uninames CAMEL uninames care uninames CARE uninames caution uninames chi uninames CHI uninames CIRCL uninames circled uninames circle k uninames circle 'one|two' uninames clown uninames colon uninames COMB 'HOOK|TAIL|CURV' uninames COMBIN uninames combin ferm uninames combin hacek uninames combining uninames COMBINING uninames COMBINING DOTS uninames combining enclosing uninames combining enclosing prohib uninames COMBIN REV SOL uninames COMB LINE uninames COMB 'MACRO|LINE' uninames commer uninames commonly abbreviated uninames coptic uninames cross uninames crying uninames CUEN | tcgrep -v '^\t' uninames cun uninames CUN uninames CUNE uninames CUNEI | tcgrep -v '^\t' uninames CUNI uninames CUNIE uninames curren uninames currenc uninames d7 uninames dagger uninames dash uninames dead uninames desert uninames destr uninames diaer uninames DIAER uninames diag uninames divis uninames does not prevent uninames does not prevent | fmt uninames dog uninames donkey uninames DOT ABOVE uninames DOT CIRC uninames dots uninames double uninames DOUBLE uninames DOUBLE HY uninames DOUBLE MATH uninames DOUBLE QUOT uninames DOUBLE STR uninames DOUBLE STR CAPITAL -MATH uninames DOUBLE STR CAPITAL -MATH | tail uninames double struc uninames DOUBLE STRUCK uninames double struct uninames DOUBL ITALI uninames earth uninames edit uninames EIGHTEEN uninames ellip uninames em uninames ENC CIR uninames EQUAL uninames EQUIV uninames evil uninames example uninames exclam uninames EYE uninames face uninames FACE uninames fair uninames farthing uninames feather uninames fem uninames fermata uninames ff lig uninames flash uninames flip uninames fl lig uninames four uninames fractu uninames FRAKT uninames fraktu uninames fraktur uninames fullwidth uninames gothic uninames GOTHIC uninames GREEK LETTER WITH uninames GREEK LETTER WITH | tcgrep -v '^\t' | ucsort | less -r uninames GREEK PHI uninames GREEK SUBSCRIPT uninames greek yp uninames GREEK YP uninames gun uninames hallo uninames hazard uninames head uninames HEAD uninames heart uninames HEART uninames HIERO uninames horse uninames hurt uninames hyphen uninames HYPHEN uninames ideo stop uninames insect uninames insters uninames INSULAR uninames INSULAR | lc uninames INSULAR | lc | tcgrep -v '^\t' uninames INSULAR | uc uninames INSULAR | uc | tcgrep -v '^\t' uninames intro uninames invis uninames invisible uninames iota sub uninames iso uninames jackol uninames jong uninames lake uninames left bracket uninames left single quot uninames LESS uninames LESS THAN uninames LIG uninames liga ff uninames liga gg uninames ligat fi uninames ligature uninames ligature -arabic uninames light uninames magic uninames mah jong uninames mah jong | tcgrep -v '^\t' uninames male uninames MATH CAPIT FRAK uninames MATH DIGIT uninames MATHE uninames MATHEM uninames MATHEM '\bA\b' uninames MATHEM CAPITA '\bA\b' uninames MATHEM CAPITA '\bA\b' | grep font uninames MATHEM CAPITA '\bA\b' | grep -v font uninames MATH FRACTU BOLD uninames MATH SCRIPT CAPITAL uninames -MATH SCRIPT E uninames MATH SCRIPT E uninames MATH SCRIPT SMALL uninames -MATH -SUB -SUPER SCRIPT uninames -MATH -SUB -SUPER SCRIPT E uninames 'MODIFIER|(?i:superscript)' uninames MODIFIER -LETTER uninames moon uninames mountain uninames multi uninames music uninames music -combin uninames music -combin | tcgrep -v '^\t' uninames MUSIC SHARP uninames NL uninames no one under uninames numeral uninames oasis uninames one uninames one way uninames ordina uninames pain uninames pen uninames people uninames person uninames PG uninames phi uninames PILE POO uninames '\pL' '\p{Latin}' | less -r uninames plum uninames PLUS uninames poo uninames POO uninames power uninames pumpkin uninames punct uninames quill uninames quot uninames radio uninames right bracket uninames right left uninames RIGHT LEFT uninames roman uninames ROMAN NUM uninames roman numeral uninames round uninames rx uninames Rx uninames SAME uninames santa uninames script uninames SCRIPT uninames set uninames SET uninames sex uninames Sigma uninames skull uninames slash uninames soccer uninames space uninames SPACE uninames spanish uninames sphere uninames square uninames st uninames start uninames st lig uninames stlig uninames subscript -SUBSCRIPT uninames sun uninames SUN uninames 'SU(PER|B)SCRIPT|MODIFIER' '\b[AT]' uninames 'SU(PER|B)SCRIPT|MODIFIER LETTER' '\b[AT]' uninames superscript uninames switch uninames SYM DEL uninames teste uninames testi uninames thin uninames tilde uninames times uninames tongue uninames traffic uninames TWO DOT LEAD uninames VARIATION uninames VARIATION | grep -c VARIATION uninames wand uninames warn uninames WIDTH uninames -WITH '\bAND\b' uninames WITH BAR uninames WITH SLASH uninames WITH STROKE uninames WITH STROKE '\b[BROKENNESS]\b' uninames wiz uninames writ uninames wrong uninames yuan uninames zero uninames ZERO
unichars is the most important and useful program, so here are 861 of them, ucsorted, of course.
unichars -aBbs '\p{Age=6}' unichars -aBbs '\p{Age=6}' '\P{Miscellaneous_Symbols_And_Pictographs}' > /tmp/u6 unichars -ac '/\bLETTER\b.*\b[A-E]\p{Lu}?\b/' > /tmp/na unichars -ac '/\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 - > /tmp/ua unichars -ac '/\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/ua unichars -ac 'checkFCC' | less -r unichars -ac 'checkFCC(NFD)' unichars -ac 'checkFCC NFD' | less -r unichars -ac 'checkFCC(NFD)' | less -r unichars -ac 'checkFCD' | less -r unichars -ac 'checkNFD' unichars -ac '! checkNFD' | less unichars -ac 'checkNFD' | less unichars -ac '! checkNFD' | less -r unichars -ac 'Comp_Ex' | less -r unichars -ac 'Exclusion' | less -r unichars -ac 'isExclusion' | less -r unichars -ac 'isSingleton' unichars -ac 'isSingleton()' unichars -ac 'isSingleton()' | less -r unichars -ac 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' > /tmp/n1 unichars -ac 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --level=1 --upper-before-lower --preprocess='s/..\K.*//' > /tmp/u2 unichars -ac 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | wc -l unichars -ac 'NFC_NO' | less -r unichars -ac 'NFD_NO' | less -r unichars -ac 'NFD =~ /\pM/ && NFD =~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less unichars -ac 'NFD =~ /\pM/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less -r unichars -ac 'NFD =~ /\pM/ && NFD =~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less -r unichars -ac 'NFD =~ /^\PM\pM*\z/ && NFD !~ /^(?:\p{Grapheme_Base}\p{Grapheme_Extend}*|\p{Grapheme_Extend})\z/' | less -r unichars -ac 'NFD =~ /^\PM\pM*\z/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*\z/' | less -r unichars -ac 'NFD =~ /^\X$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less unichars -ac 'NFD =~ /^\X$/ && NFD !~ /^\PM\pM*\z/' | less -r unichars -ac 'NFD =~ /^\X$/ && NFD =~ /^\PM\pM*\z/' | less -r unichars -ac 'NFKC_NO' | less -r unichars -ac 'NonStDecomp' | less -r unichars -ac 'not checkFCC' | less -r unichars -ac 'not checkFCD' | less -r unichars -ac 'not checkNFD' | less unichars -ac 'ord>0xffff && /\p{Latin}/' unichars -ac '\p{cased}' '\PL' | less unichars -ac '\p{cased}' '\P{upper}' | less unichars -ac '\p{cased}' '\P{upper}' '\P{Lower}' | less unichars -ac '\p{Greek}' | less unichars -ac '\p{Lower}' 'NAME =~ /CAPITAL/' > /tmp/s unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --normalization=NFKD > /tmp/uk unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LETTER)\b.*\b[ADP]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc unichars -ac '/(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc unichars -ac '/\pL/ && '[\p{Latin}\p{Common}]' && NAME =~ /\b(MATHEMATICAL|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc unichars -ac '/\pL/ && [\p{Latin}\p{Common}] && NAME =~ /\b(MATHEMATICAL|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' > /tmp/uc unichars -ac '/[\pM\pL]/ && NAME =~ /\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/ua unichars -ac '/[\pM\pL]/ && NAME =~ /\b(MATHEMATICAL|COMBINING|MODIFIER|LETTER)\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/ua unichars -ac '\p{Upper}' 'NAME !~ /CAPITAL/' unichars -acsbBCnf '\p{Cased}' '[^\p{Ll}\p{Lu}]' unichars -acsbBCnf '/\p{CWCF}/ != /p{CWCM}/' unichars -acsbBCnf '/\p{CWCF}/ != \p{CWCM}' unichars -a -dgfs '\p{Cased}' '\PL' unichars -a -gfs '\p{Cased}' '\PL' unichars -a -gfs '\p{Cased}' '[^\p{Upper}\p{Title}]' unichars -ags 'length NFKD > 5' unichars -a -gs 'length(uc) > 1' unichars -a -gs 'length(ucfirst) > 1' | wc -l unichars -agsn NUM unichars -ags '\p{lowercase}' '\P{Ll}' unichars -ags '\p{lowercase}' '\P{Ll}' | wc -l unichars -ags '\p{uppercase}' '\P{Lu}' unichars -ags '\p{uppercase}' '\P{Lu}' | wc -l unichars -a 'NAME =~ /BALL/' unichars -a 'NAME =~ /EARTH GLOBE/' unichars -anc 'NUM && (10*NUM) !~ /0/' unichars -anc 'UCA =~ UCA("d")' unichars -a 'NFKD =~ /\[/' unichars -a -ngfs 'ord > 0xFFFF' '\p{Cased}' unichars -a -ngfs 'ord > 0xFFFF' '\p{Cased}' '\PL' unichars -a -ngfs '\p{Cased}' '\PL' unichars -a 'ord > 0xffff' 'NAME =~ /FACE/' unichars -a '\p{Age:6.0}' '\P{Numeric_Value=NaN}' unichars -a '\P{Alnum}' '\w' | wc -l unichars -a '\P{Bidi_Class=NSM}' '\p{Mn}' unichars -a '\P{Bidi_Class=NSM}' '\p{Mn}' | wc -l unichars -a '\P{Block=CombiningDiacriticalMarks}' '\p{Mn}' | wc -l unichars -a '\p{Cased}' '[^\p{Ll}\p{Lt}\p{Lu}]' | wc -l unichars -a '\p{Cased}' '\p{Lm}' | wc -l unichars -a '\p{Cased}' '\PL' | wc -l unichars -a '\p{InMiscellaneousSymbolsAnd_Pictographs}' unichars -a '\p{InMiscellaneousSymbolsAnd_Pictographs}' > /tmp/emoji unichars -a '\p{IsThai}' '\P{InThai}' unichars -a '\P{IsThai}' '\p{InThai}' unichars -a '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d unichars -a '\p{Latin}' '\w' | wc -l unichars -a '\p{Lower}' '\P{CWU}' | wc -l unichars -a '\PL' '\p{Alphabetic}' | wc -l unichars -a '\p{Nchar}' unichars -a '\pN' '\W' | wc -l unichars -a '\p{Other_Alphabetic}' '\PM' | less unichars -a '\p{Other_Alphabetic}' '\PM' | M unichars -a '[\p{Pf}\p{Pi}]' unichars -a '[\p{Pi}]' unichars -a '\p{Po}' unichars -a '\p{Title}' '[^\p{CWL}\p{CWU}]' | wc -l unichars -a '\p{Upper}' '\P{CWL}' | wc -l unichars -a 'UCA1 eq UCA1("a")' unichars -a 'UCA1 eq UCA1("a")' | cat -n unichars -a 'UCA1 eq UCA1("a")' | less unichars -a 'UCA1 eq UCA1("d")' | cat -n unichars -a 'UCA1 eq UCA1("e")' | cat -n unichars -a 'UCA1 eq UCA1("g")' | cat -n unichars -a 'UCA1 eq UCA1("m")' | cat -n unichars -a 'UCA1 eq UCA1("p")' | cat -n unichars -a 'UCA eq UCA("d")' unichars -a 'UCA eq UCA("d")' 'NFKD !~ /d/i' unichars -a 'UCA eq UCA("d")' 'NFKD !~ /d/i' | ucsort unichars -a 'UCA eq UCA("d")' > /tmp/d unichars -a '(UCA(NFKD) =~ (UCA("o")."|".UCA("a"))) || NFKD =~ /[ao]/i' | ucsort | less -r unichars -a '\w' '[^_\p{Alphabetic}\p{Nd}]' | wc -l unichars -a '\w' '\PM' 'ord > 0xffff' '\PN' | less unichars -a '\w' '\PM' '\PL' unichars -a '\w' '\PM' '\PL' '\PN' | less unichars -Bbs '\p{Age=6}' unichars -Bbs '\p{Age=6}'o unichars -BCgsa '[\p{CCC=224}\p{CCC=226}' unichars -BCgsa '[\p{CCC=224}\p{CCC=226}]' unichars -BCgsa '[\p{CCC=Left}\p{CCC=Right}]' unichars -BCgsa '\p{Mn}' unichars --bmp --smp 'UCA1 eq UCA1("d")' | cat -n >> /tmp/lets unichars --bmp --smp 'UCA1 eq UCA1("e")' | cat -n unichars --bmp --smp 'UCA1 eq UCA1("e")' | cat -n >> /tmp/lets unichars --bmp --smp 'UCA1(NFKD) eq UCA1("d")' | cat -n >> /tmp/lets2 unichars --bmp --smp 'UCA1(NFKD) eq UCA1("e")' | cat -n >> /tmp/lets2 unichars -bs . unichars -bs 1 unichars '\bSCRIPT\b' '[CEFHILMRego]' unichars -bs '\p{Age=6}' unichars -Bs '\p{Bidiclass:M}' unichars -Bs '\p{BidiM}' unichars -Bs '\p{BI:M}' unichars -B '\w' unichars -B /\w' unichars -c unichars -c '\D' NUM unichars -Cgas '\pM' unichars -Cgas '\pM' '\P{CCC=0}' | sort -t= -k4,4n -k1,1 | less -r unichars -Cgas '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|SOLIDUS|STROKE|LINE/' | sort -t= -k4,4n -k2,2 unichars -Cgas '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 unichars -Cgsa '\p{Mc}' unichars -Cgsa '\p{Mn}' unichars -Cgs '\p{Me}' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 | less unichars -Cgs '\pM' 'NAME =~ /above/' | sort -t= -k4,4n -k2,2 unichars -Cgs '\pM' 'NAME =~ /ABOVE/' | sort -t= -k4,4n -k2,2 unichars -Cgs '\pM' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 unichars -Cgs '\pM' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 | less unichars -Cgs '\pM' 'NAME =~ /SLASH/' | sort -t= -k4,4n -k2,2 unichars -Cgs '\pM' 'NAME =~ /TILDE/' | sort -t= -k4,4n -k2,2 unichars -Cgs '\pM' '\P{CCC=0}' | sort -t= -k4,4n -k2,2 | less -r unichars -Cgs '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 unichars -Cgs '\pM' '[\p{Common}\p{Inherited}]' 'NAME =~ /BAR|SLASH|STROKE|LINE/' | sort -t= -k4,4n -k2,2 | less unichars -Cgs '\pM' | sort -t= -k4,4n -k2,2 | grep -i TILDE unichars -Cgs '\pM' | sort -t= -k4,4n -k2,2 | less -r unichars -c 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' unichars -c 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | less unichars -c 'length NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' | wc -l unichars -c 'NAME =~ /LATIN LETTER SMALL CAPITAL/' | less -r unichars -c 'NAME =~ /ORD/' unichars -c 'NFD ==1 && ! /[a-zA-Z]/ && /(?=\pL)[\p{Latin}\p{Common}]/ && NAME =~ /\b(MATHEMATICAL|LATIN|LETTER)\b.*\b[ADO]\p{Lu}?\b/' unichars -c 'NFD =~ /^\PM\pM*$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less unichars -c 'NFD =~ /^\X*$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less unichars -c 'NFD =~ /^\X+$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less unichars -c 'NFD =~ /^\X$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less unichars -c 'NFD =~ /^\X$/ && NFD =~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' | less unichars -c NUM unichars -c 'NUM && (10*NUM) !~ /0/' unichars -c 'NUM && 10*NUM !~ /0/' unichars -c 'ord>0xffff && /\p{Latin}/' unichars -c 'ord == 640' unichars -c '\p{Alphabetic}' unichars -c '\p{Alphabetic}' | head -1000 | tail unichars -c '\p{Alphabetic}' | head -3000 | tail unichars -c '\p{Alphabetic}' | less -r unichars -c '\p{Alphabetic}' '\pM' unichars -c '\p{Alphabetic}' '\pM' | less unichars -c '\p{Alphabetic}' '\pM' | less -r unichars -c '\p{cased}' '[^\p{CWU}\p{CWL}\p{CWT}]' | less -r unichars -c '\p{cased}' '\PL' unichars -c '\p{cased}' '\PL' | less unichars -c '\p{Dash}' '\P{Pd}' unichars -c '\p{Greek}' unichars -c '\p{Greek}' | less unichars -c '\p{Greek}' '\p{Lower}' 'ord <= ord("\N{greek:alpha}")' 'ord >= ord "\N{omega}"' unichars -c '\p{Greek}' '\p{Lower}' 'ord() < ord("\N{greek:alpha}") || ord > ord "\N{omega}"' unichars -c '\p{Greek}' '\p{Lower}' 'ord() <= ord("\N{greek:alpha}") || ord >= ord "\N{omega}"' unichars -c '\p{Greek}' '\p{Lower}' 'ord() <= ord("\N{greek:alpha}")' 'ord >= ord "\N{omega}"' unichars -c '\p{IDC}' '\W' unichars -c '\p{IDC}' '\W' | cat -n unichars -c '\p{IDC}' '\W' | wc -l unichars -c '\p{InEnclosed_Alphanumerics}' unichars -c '\p{InEnclosed_Alphanumerics}' '\p{lower}' unichars -c '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/uu unichars -c '\p{Lower}' 'NAME =~ /CAPITAL/' > /tmp/s unichars -c '\p{lower}' '\P{CWU}' unichars -c '\p{lower}' '\P{CWU}' | less unichars -c '\p{lower}' '\p{Lm}' | less unichars -c '\p{lower}' '\p{Lm}' | less -r unichars -c '\p{lower}' '\p{Lm}' | | perl -pe 's/.//' | ucsort --reverse-fields | less unichars -c '\p{lower}' '\p{Lm}' | perl -pe 's/.//' | ucsort --reverse-fields | less unichars -c '\p{lower}' '\p{Lm}' | perl -pe 's/.//' | ucsort --reverse-fields | less -r unichars -c '\p{Mc}' unichars -c '\pM' '\P{Diacritic}' unichars -c '\PM' '\p{Diacritic}' unichars -c '\p{No}' unichars -c '\p{No}' | head unichars -c '\p{No}' | less unichars -c '\p{No}' '\w' unichars -c '\p{No}' '\W' unichars -c '\p{No}' '\w' | head unichars -c '\p{No}' '\W' | head unichars -c '\p{No}' '\W' | less unichars -c '[\p{Pi}\p{Ps}]' 'NAME =~ /VERTICAL/' unichars -c '\pP' '\P{QMark}' 'NAME =~ /QUOT/' unichars -c '\p{Upper}' 'NAME !~ /CAPITAL/' unichars -cs /k/i unichars -cs 'NFD \!~ /d/i' 'NFKD \!~ /d/i' 'UCA eq UCA("d")' unichars -cs 'NFD \!~ /d/i' 'NFKD =~ /d/i' 'UCA eq UCA("d")' unichars -cs 'NFD \!~ /d/i' 'NFKD =~ /d/' 'UCA eq UCA("d")' unichars -cs 'NFD =~ /d/i' 'NFKD =~ /d/' 'UCA eq UCA("d")' unichars -cs 'NFD \!~ /o/i' 'NFKD \!~ /o/i' 'UCA eq UCA("o")' unichars -cs 'NFKD !~ /a/i' 'UCA eq UCA("a")' unichars -cs 'NFKD \!~ /a/i' 'UCA eq UCA("a")' unichars -cs 'NFKD \!~ /a/i' 'UCA =~ UCA("a")' unichars -cs 'NFKD \!~ /a/i' 'UCA =~ UCA("ae")' unichars -cs 'NFKD \!~ /b/i' 'UCA eq UCA("b")' unichars -cs 'NFKD \!~ /c/i' 'UCA eq UCA("c")' unichars -cs 'NFKD \!~ /d/i' 'UCA eq UCA("d")' unichars -cs 'NFKD \!~ /e/i' 'UCA eq UCA("e")' unichars -cs 'NFKD \!~ /f/i' 'UCA eq UCA("f")' unichars -cs 'NFKD \!~ /f/i' 'UCA =~ UCA("f")' unichars -cs 'NFKD \!~ /g/i' 'UCA eq UCA("g")' unichars -cs 'NFKD \!~ /h/i' 'UCA eq UCA("h")' unichars -cs 'NFKD \!~ /o/i' 'UCA eq UCA("o")' unichars -cs 'NFKD \!~ /o/i' 'UCA =~ UCA("oe")' unichars -cs '\P{ASCII}' '(lc() . uc) =~ /\p{ASCII}/' unichars -cs '\P{ASCII}' 'NFD \!~ /\p{ASCII}/' 'NFKD =~ /\p{ASCII}' unichars -cs '\P{ASCII}' 'NFD \!~ /\p{ASCII}/' 'NFKD =~ /\p{ASCII}/' unichars -cs /s/i unichars -cs 'UCA eq UCA("o")' unichars -cs 'UCA =~ UCA ( "a" ) ' unichars -cs ' 'UCA =~ UCA ( "ae" ) ' unichars -cs 'UCA =~ UCA ( "ae" ) ' unichars -c '\w' '\W' unichars --debug 'ord() < 0x100 || die' '\p{No}' '\W' unichars --debug 'ord() < 0x100 || die' '\p{No}' '\W' | head unichars --debug 'ord() < 0x100 || die' '\p{No}' '\W' | less unichars --debug '\p{No}' '\W' | head unichars --debug '\p{No}' '\W' | less unichars --debug '\p{No}' '\W' 'ord < 0xFF || die' | head unichars --debug '\p{No}' '\W' 'ord > 0xFF && die' | head unichars --debug '\p{No}' '\W' 'ord() < 0xFF || die' | head unichars 'defined(NUM) && () ~~ [1..10]' | less -r unichars 'defined(NUM) && [1..10] ~~ NUM' | less -r unichars 'defined(NUM) && [1..1] ~~ NUM' | less -r unichars 'defined(NUM) && ! (NUM() ~~ [0..10])' | less -r unichars 'defined(NUM) && ! NUM() ~~ [0..10]' | less -r unichars 'defined(NUM) && NUM <= 10' unichars 'defined(NUM) && NUM <= 10' | less unichars 'defined(NUM) && NUM <= 10' | less -r unichars 'defined(NUM) && ! NUM() ~~ [1..10]' | less -r unichars 'defined(NUM) && NUM() ~~ [1..10]' | less -r unichars '\d' '\p{common}' unichars '\d' '\p{Latin} unichars '\d' '\p{Latin}' unichars -fgas 'CF =~ "."' unichars -fgas 'CF eq "C"' unichars -fgas 'CF eq "F"' unichars -fgas 'length(uc . ucfirst . lc) != 3' unichars -fgas 'length(uc . ucfirst . lc) != length NFKD * 3' unichars -fgas 'length(uc . ucfirst . lc) != length(NFKD) * 3' unichars -fgas 'not /\U\Q$_/i' unichars -fgas '/\U$_/i' unichars -fgas '/\U\Q$_/i' unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper}' | ucsort | cat -n | less -r unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper})' | ucsort | cat -n | less -r unichars -fgns '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r unichars -fgns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r unichars -fgns '(?x) (?= \P{Ll} ) \p{Lower} | (?=\P{Lu}) \p{Upper}' unichars -fgns '(?x) (?= \P{Ll} ) \p{Lower} | (?=\P{Lu}) \p{Upper}' | ucsort | cat -n | less -r unichars -gacs '\p{Cased}' '\P{CWCM}' | cat -n | less -r unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '^[\p{Ll}\p{Lu}]' unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}]' unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]' unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]' | wc -l unichars -ga '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l unichars -ga '\P{ASCII}' '\p{Common}' '\pM' unichars -ga '\P{ASCII}' '\p{Common}' '[\pP\pS]' unichars -ga '\P{ASCII}' '\p{Inherited}' '\PL' unichars -ga '\P{ASCII}' '\p{Inherited}' '\pM' unichars -ga '\P{ASCII}' '[\pP\pS]' unichars -ga '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' unichars -ga '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l unichars -gas 'length(uc) > 1' unichars -gas 'NFKD eq "."' unichars -gasn 'not /\d/' 'NFKD =~ /\d/' unichars -gasn 'not /\d/' 'NFKD =~ /\d/' | wc -l unichars -gasn 'not /\pN/' 'NFKD =~ /^(?=\D*$)\pN/' unichars -gasn 'not /\pN/' 'NFKD =~ /\pN/' unichars -gasn 'not /\pN/' 'NFKD =~ /\pN/' | wc -l unichars -gasn NUM unichars -gasn 'NUM && NUM < 0' unichars -gasn 'NUM || (/\pN/ && /\p{Enclosed_Alphanumerics}/)' unichars -gasn 'NUM || /\p{pokey(tchrist)% ls -d1F uni* unichars -gasn NUM > /tmp/num unichars -gasn NUM | wc -l unichars -gasn '\pN' 'not NUM' unichars -gasn '\PN' 'NUM' unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort --upper-before-lower | less -r unichars -gas '\p{di}' unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BK}]' unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BK}]' '\V' unichars -gas '[\P{LB=CR}\P{LB=LF}\P{LB=NL}\P{LB=BK}]' '\v' unichars -gas '[\P{LB=CR}\P{LB=LF}\P{LB=NL}\P{LB=BK}]' '\V' unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BR}' unichars -gas '[\p{LB=CR}\p{LB=LF}\p{LB=NL}\p{LB=BR}]' unichars -gas '\p{LB=LF}' unichars -gas '\p{LB=NL}' unichars -gas '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r unichars -gas '\pL' 'NAME =~ /\bSCRIPT/' unichars -gas '\PL' 'NAME =~ /\bSCRIPT/' unichars -gas '\p{Lower}' '\P{Ll}' unichars -gas '\p{Lower}' '\P{Ll}' | ucsort | less unichars -gas '\PL' 'uc =~ /\p{Upper}/' unichars -gas '\pM' unichars -gas '\p{Me}' unichars -gas '\p{Other_Lowercase}' unichars -gas '\p{Other_Lowercase}' | wc -l unichars -gas '\p{SB=AT}' unichars -gas '\p{SB=ST}' unichars -gas '\p{sc=greek}' '\P{blk=greek}' unichars -gas '\P{Upper}' '\PL' 'uc =~ /\p{Upper}/' unichars -gas '\R' unichars -gas 'UCA eq UCA("d")' unichars -gas 'UCA eq UCA("d")' 'NFKD !~ /d/i' unichars -gas 'UCA eq UCA("o")' 'NFKD !~ /o/i' unichars -gbas '\p{sc=greek}' '\P{blk=greek}' unichars -gbas '\P{sc=greek}' '\p{blk=greek}' unichars -gcas '\pM' unichars -gCas '\pM' unichars -gc '\p{Control}' unichars -gcs '\p{Cased}' '\P{CWCF}' | cat -n | less -r unichars -gcs '\p{Cased}' '\P{CWCM}' | cat -n | less -r unichars -gcs '\p{Cased}' '[^\p{CWU}\p{CWD' | cat -n | less -r unichars -gcs '\p{Cased}' '\PL' | cat -n | less -r unichars -gCs '\pM' '\P{CCC=0}' | sort -k5.3,5n | less -r unichars -gCs '\pM' '\P{CCC=0}' | sort -k5.4,5n | less -r unichars -gCs '\pM' '\P{CCC=0}' | sort -t= -k4,4n -k1,1 | less -r unichars -gCs '\pM' | sort -k5.4,5n | less -r unichars -gCs '\pM' | sort -k5.4n | less -r unichars -gCs '\pM' | sort -k5.5n | less unichars -gCs '\pM' | sort -k5.5n | less -r unichars -gcs '\p{Titlecase}' unichars -gcs '\p{Titlecase}' | wc -l unichars -gfns '/\p{Lower}/ && /\p{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort --upper | less -r unichars -gfs '\p{Cased}' unichars -gfs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' unichars -gfs '\p{Cased}' '[^\p{Upper}\p{Lower}]' unichars -gfs '\p{Cased}' '[^\p{Upper}\p{Title}]' unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | less -r unichars -g '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]' | wc -l unichars -g '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l unichars -g '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFKD !~ /\p{ASCII}/' | wc -l unichars -g '\P{ASCII}' '\p{Common}' '[\pP\pS]' unichars -g '\p{Cased}' '\P{Alphabetic}' unichars -g '\p{Cased}' '\PL' | cat -n | less -r unichars -g '\p{InHalfwidthAndFullwidthForms}' '\p{bidim}' unichars -gs '/\A\p{alpha}+\z/ and not NFD =~ /\A\p{alpha}+$ unichars -gs '/\A\p{alpha}+\z/ and not NFD =~ /\A\p{alpha}+\z/' unichars -gs '/\A\p{alpha}+\z/ and not NFD =~ /\A\p{perlword}+\z/' unichars -gs '/\A\p{alpha}+\z/ && NFD !~ /\A\p{perlword}+\z/' unichars -gs '/\A\p{alpha}+\z/ && NFKC !~ /\A\p{perlword}+\z/' unichars -gs '/\A\p{alpha}+\z/ && NFKD =~ /\W/' unichars -gs '/\A\p{alpha}+\z/ && ! /\w/' unichars -gsCB 'ord ~~ (0x345,0x37A)' unichars -gsCB 'ord ~~ [0x345,0x37A]' unichars -gsCB 'ord == 0x345 || ord == 0x37A' unichars -gs '\d' unichars -gsfB 'ord == 0x345 || ord == 0x37A' unichars -gsf '/\p{Upper}/ && /\P{CWL}/' unichars -gs 'length(lc) > 1' | wc -l unichars -gs 'length NFKD == 2' unichars -gs 'length NFKD > 4' unichars -gs 'length NFKD > 5' unichars -gs 'length NFKD > 6' unichars -gs 'length NFKD > 7' unichars -gs 'length NFL unichars -gs 'length(uc) > 1' unichars -gs 'length(uc) > 1' 'length(ucfirst) == 1' unichars -gs 'length(uc) > 1' | wc -l unichars -gs 'length(ucfirst) > 1' unichars -gs --locale=de_phonebook ''UCA1 eq UCA1 ( "ae" ) ' unichars -gs --locale=de_phonebook 'UCA1 eq UCA1 ( "ae" ) ' unichars -gs ' NFKD !~ /d/i && UCA1 eq UCA1("d")' unichars -gs 'NFKD !~ /d/i' 'UCA1 eq UCA1("d")' unichars -gs 'NFKD !~ /f/i && UCA1 eq UCA1("f")' unichars -gs 'NFKD !~ /h/i && UCA1 eq UCA1("h")' unichars -gs 'NFKD !~ /,/i' 'UCA1 eq UCA1(",")' unichars -gs 'NFKD !~ /;/i' 'UCA1 eq UCA1(";")' unichars -gs 'NFKD !~ /\?/i' 'UCA1 eq UCA1("?")' unichars -gs 'NFKD !~ /\./i' 'UCA1 eq UCA1(".")' unichars -gs 'NFKD !~ /o/i && UCA1 eq UCA1("o")' unichars -gs ' NFKD(string) !~ /d/i && UCA1 eq UCA1("d")' unichars -gs 'NFKD(string) !~ /d/i' 'UCA1 eq UCA1("d")' unichars -gsn NUM unichars -gsn 'NUM && NUM < 0' unichars -gs '\P{ASCII}' '\p{Common}' '\pP' unichars -gs '\p{Bidi_Class=NSM}' '\P{Mn}' unichars -gs '\P{Bidi_Class=NSM}' '\p{Mn}' unichars -gs '\p{bidim}' unichars -gs '[\p{bidim}\p{Ps}]' unichars -gs '[\p{bidim}\p{Ps}\p{Pe}]' unichars -gs '\p{Cased}' 'Comp_Ex()' unichars -gs '\p{Cased}' 'Exclusion()' unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL} NFKD =~ /\W/' unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r unichars -gs '\p{Cased}' 'Singleton()' unichars -gs '\p{Inherited}' unichars -gs '/(?=\P{Ll})\p{Lower}|(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r unichars -gs '/(?=\P{Ll})\p{Lower}|/(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r unichars -gs '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r unichars -gs '/(?= \P{Ll} ) \p{Lower} /x || / (?= \P{Lu} ) \p{Upper} /x' | ucsort --upper-before-lower | cat -n | less -r unichars -gs '\p{Lower}' unichars -gs '\p{lowercase}' '\P{Ll}' unichars -gs '\p{Lower}' '\P{CWCM}' unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less -r unichars -gs '\p{Lower}' '\p{CWU}' | wc -l unichars -gs '\p{Lower}' '\P{Ll}' | ucsort | less -r unichars -gs '\PL' '\p{Lower}' '\p{CWCF}' unichars -gs '\PL' '\p{Lower}' '\p{CWCM}' unichars -gs '\PL' '\p{Lower}' '\P{CWCM}' unichars -gs '\PL' '\p{Lower}' '\p{CWU}' unichars -gs '\pL' '\p{Lower}' '\p{CWU}' | wc -l unichars -gs '\PL' '\p{Lower}' '\p{CWU}' | wc -l unichars -gs '\pL' '\p{Lower}' '\P{Ll}' unichars -gs '\pL' '\p{Lower}' '\P{Ll}' '\p{CWU}' | wc -l unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower | less -r unichars -gs '\pS' 'NFKD !~ /\pS/' unichars -gs '[\pS\pP]' 'NFKD !~ /[\pS\pP]/' unichars -gs '\p{Symbol}' unichars -gs '\p{uppercase}' '\P{Lu}' | wc -l unichars -gs '/\p{Upper}/ && /\P{CWL}/' unichars -gs '/\p{Upper}/ && /\P{CWL}/' | ucsort | less unichars -gs '/\p{Upper}/ && /\P{CWT}/' | ucsort | less unichars -gsS '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' unichars -gs 'UCA1 eq UCA1(";")' unichars -gs 'UCA eq UCA("&")' unichars -gs 'UCA eq UCA("d")' unichars -gua '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\p{Ll}\p{Lu}\pM\pC\pZ]' | wc -l unichars -gua '\P{ASCII}' '[\p{Common}\p{Inherited}]' '[^\pM\pC\pZ]' 'NFD !~ /\p{ASCII}/' | wc -l unichars --help unichars '/ij/i' unichars 'length(lc) > 1 unichars 'length(lc) > 1' unichars 'length(lcfirst) != length' unichars 'length(lcfirst) != length(uc)' unichars 'length(lc) < length(uc)' unichars 'length(NFD) == 1 && length(NFC) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | less -r unichars 'length(NFD) == 1 && length(NFKD) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | less -r unichars 'length(NFD) == 1 && length(NFKD) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | ucsort unichars 'length(NFD) == 1 && length(NFKD) != 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | wc -l unichars 'length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' > /tmp/ndc & unichars 'length(uc) > 1' unichars 'length(ucfirst) > 1' unichars --locale=de__phonebook 'NFD =~ /a/i/' unichars --locale=de__phonebook 'NFD() =~ /a/i' unichars --locale=de__phonebook 'NFD() =~ /a/i' | less -r unichars --locale=de__phonebook 'UCA eq UCA("ae")' 'NFKD !~ /d/i' | ucsort unichars --locale=de__phonebook 'UCA eq UCA("ae")' 'NFKD !~ /d/i' | ucsort --locale=de__phonebook unichars --locale=de__phonebook 'UCA(NFKD) =~ UCA("a WITH DIAERESIS")' unichars --locale=de__phonebook 'UCA() =~ UCA("a")' unichars --locale=de__phonebook 'UCA() =~ UCA("a")' | less -r unichars --locale=de__phonebook 'UCA =~ UCA("a WITH DIAERESIS")' unichars --locale=en 'UCA eq UCA("ae")' unichars --locale=en "UCA eq UCA("ae")' unichars 'NAME =~ /BALL/' unichars 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/' unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/' | wc -l unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH\b.*\b[ABCD]\b/' unichars 'NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH\b.*\b[ABCD]\b/' | less -r unichars 'NAME =~ /PRIME/' unichars -nc 'NUM && (10*NUM) !~ /0/' unichars -nc 'UCA eq UCA("d")' unichars 'NFD =~ /ij/i' unichars 'NFD ne NFKD' unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | less unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' > /tmp/ndc & unichars 'NFD !~ /\pM/ && NAME =~ /LATIN\b.*\bLETTER\b.*\bWITH/' | wc -l unichars 'NFD !~ /\pM/ && NAME =~ /LATIN.*LETTER WITH/' unichars 'NFD =~ /^\PM\pM*$/ && NFD !~ /^\p{Grapheme_Base}\p{Grapheme_Extend}*$/' unichars 'NFKD =~ /\[/' unichars 'NFKD eq ","' unichars 'NFKD eq ":"' unichars 'NFKD eq ".."' unichars 'NFKD eq "*"' unichars 'NFKD eq 'comma' unichars 'NFKD eq "\N{PRIME}"' unichars 'NFKD =~ /ij/i' unichars 'NFKD \!~ /s/i and UCA =~ UCA "s"' unichars 'NFKD \!~ /s/i || UCA =~ UCA "s"' unichars 'NFKD =~ /s/i || UCA =~ UCA "s"' unichars -ngas 'NUM && not NUM ~~ [ 0..10 ]' unichars -ngas 'NUM && not NUM ~~ [ 1..10 ]' unichars --nopager -gaBsn 'NUM && int(NUM) != NUM' unichars --nopager -gaBsn 'NUM && NUM == 100' unichars --nopager -gasn 'NUM && NUM == 100' unichars --nopager -gsn 'NUM && int(NUM) != NUM' unichars --nopager -gsn 'NUM && NUM < 0' unichars --nopager --locale=de__phonebook 'UCA eq UCA("ae")' unichars --nopager --locale=en 'UCA eq UCA("ae")' unichars --nopager --locale=is 'UCA eq UCA("ae")' unichars --nopager 'UCA eq UCA("ae")' unichars --nopager 'UCA eq UCA("ae")' | ucsort unichars --nopager 'UCA eq UCA("ae")' | ucsort --upper unichars 'not /\d/' 'NFKD =~ /\d/' unichars 'not /\w/' unichars 'not /\w/' 'not /\W/' unichars -nsag '\p{Cased}' 'NUM' unichars -nsag '\p{Lower}' '\P{CWU}' unichars -ns 'UCA eq UCA("d")' unichars NUM unichars 'ord ~~ [ 0x2622, 0x26bd]' unichars 'ord ~~ [ 0x2622, 0x2bbd]' unichars 'ord == 0x2622 || ord == 0x26bd' unichars 'ord>0xffff' '\p{Po}' unichars 'ord < 255' '\p{pattern_syntax}' unichars 'ord() < 255' '\p{pattern_syntax}' unichars 'ord() < 255' '\p{pattern_syntax}' | less unichars 'ord() < 255' '\p{pattern_syntax}' | wc -l unichars '\p{Age:6.0}' unichars '\p{Age:6.0}' '\p{Numeric_Value=NaN}' unichars '\p{Age:6.0}' '\P{Numeric_Value=NaN}' unichars '\p{alnum}' '\P{word}' unichars '\P{alnum}' '\p{word}' unichars '\p{alnum}' '\W' unichars '\p{Alnum}' '\W' unichars '\P{Alnum}' '\w' unichars '\P{Alnum}' '\w' | less unichars '\p{alnum}' '\W' | wc -l unichars '\P{alnum}' '\w' | wc -l unichars '\P{alnum}' '\W' | wc -l unichars '\P{Alnum}' '\w' | wc -l unichars '\p{Alphabetic}' '\P{XPosixAlpha}' | less unichars '\P{Alphabetic}' '\p{XPosixAlpha}' | less unichars '\p{alpha}' '\p{CI}' | less -r unichars '\p{alpha}' '\p{CI}' '[\p{CWU}\p{CWL}\p{CWT}]' | less -r unichars '\p{alpha}' '\P{XPosixAlpha}' | less unichars '\P{alpha}' '\p{XPosixAlpha}' | less unichars '\P{alpha}' '\P{XPosixAlpha}' | less unichars '\P{ASCII}' '(lc() . uc) =~ /\p{ASCII}/' unichars '\P{ASCII}' 'lc.uc =~ /\p{ASCII}/ unichars '\P{ASCII}' 'lc.uc =~ /\p{ASCII}/' unichars '\P{ASCII}' 'ord() < 255' '\p{pattern_syntax}' | wc -l unichars '\P{ASCII}' 'ord() < 255' '\W' | wc -l unichars '\P{ASCII}' '\p{Common}' '\pP' unichars '\p{BC=ON}' unichars '\P{Bidi_Class=NSM}' '\p{Mn}' unichars '\p{BidiM}' '\pS' unichars '\p{Block=CombiningDiacriticalMarks}' unichars '\p{Block=CombiningDiacriticalMarks}' '\PM' unichars '\p{Block=CombiningDiacriticalMarks}' '\p{Mn}' unichars '\p{Block=CombiningDiacriticalMarks}' '\P{Mn}' unichars '\P{Block=CombiningDiacriticalMarks}' '\p{Mn}' unichars '\P{Block=CombiningDiacriticalMarks}' '\p{Mn}' | wc -l unichars '\p{Cased}' '\P{Changes_When_Casefolded}' unichars '\p{Cased}' '\p{Changes_When_Casemapped}' unichars '\p{Cased}' '\P{Changes_When_Casemapped}' unichars '\P{Cased}' '\p{Changes_When_Casemapped}' unichars '\p{Cased}' '\p{Changes_When_Casemapped}' | less unichars '\p{Cased}' '\P{Changes_When_Casemapped}' | less unichars '\p{Cased}' '\p{Changes_When_Casemapped}' | less -r unichars '\p{Cased}' '\P{Changes_When_Casemapped}' | less -r unichars '\P{Cased}' '\p{Changes_When_Casemapped}' | less -r unichars '\p{Cased}' '\p{CI}' unichars '\p{Cased}' '\p{CI}' | less unichars '\p{Cased}' '\p{CI}' | less -r unichars '\p{cased}' '[^\p{CWU}\p{CWL}\p{CWT}]' | less -r unichars '\p{cased}' '[\^p{CWU}\p{CWL}\p{CWT}]' | less -r unichars '\p{cased}' '[\p{CWU}\p{CWL}\p{CWT}]' | less -r unichars '\p{cased}' '\PL' unichars '\p{Cased}' '\PL' unichars '\p{cased}' '\PL' | less unichars '\p{Cased}' '[^\p{Ll}\p{Lt}\p{Lu}]' unichars '\p{Cased}' '[^\p{Ll}\p{Lt}\p{Lu}]' | wc -l unichars '\p{Cased}' '\p{Lm}' | wc -l unichars '\p{Cased}' '\PL' | wc -l unichars '\p{Cased}' '\pM' unichars '\p{cased}' '[^\p{upper}\p{lower}]' | less unichars '\p{cased}' '[^\p{upper}\p{lower}\p{title}]' | less unichars '\p{cased}' '[^\p{upper}\p{lower}\p{title}]' | less -r unichars '\p{CC=A}' unichars '\p{CCC=A}' unichars '\p{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}' unichars '\P{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}' unichars '\P{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}' | less unichars '\p{Changes_When_Casefolded}' '\P{Changes_When_Casemapped}' | less -r unichars '\P{Changes_When_Casefolded}' '\p{Changes_When_Casemapped}' | less -r unichars '\p{CI}' | less -r unichars '\p{CI}' '[\p{CWU}\p{CWL}\p{CWT}]' | less -r unichars '\p{Common}' '\pP' unichars '\p{Control}' unichars '\p{Control_Pictures}' unichars '\p{CWL}' 'NAME =~ /LATIN LETTER SMALL CAPITAL/' unichars '\p{CWTC}' '\PL' unichars '\p{CWT}' '\PL' unichars '\p{CWU}' 'NAME =~ /LATIN LETTER SMALL CAPITAL/' unichars '\p{Dash}' unichars '\p{di}' unichars '\p{E unichars '\p{EA=W}' unichars '\p{Greek}' '\pP' unichars '\p{Greek}' '\pS' unichars '\p{InGreek}' '\P{IsGreek}' | wc -l unichars '\P{InGreek}' '\p{IsGreek}' | wc -l unichars '\p{InHalfwidthAndFullwidthForms}' '\p{bidim}' unichars '\p{InHiragana}' '\P{Hiragana}' unichars '\p{InHiragana}' '\P{Kana}' unichars '\p{InHirakana}' '\P{Kana}' unichars '\p{InKatakana}' '\P{Kana}' unichars '\P{InKatakana}' '\p{Kana}' unichars '\P{InKatakana}' '\p{Kana}' | less unichars '\p{InLatin}' '\P{IsLatin}' | wc -l unichars '\p{InMiscellaneousSymbolsAnd_Pictographs}' unichars '\p{InThai}' '\P{IsThai}' unichars '\p{IsThai}' '\P{InThai}' unichars '\p{Latin} unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-D]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' | less -r unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-D]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 | less -r unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' | less -r unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 > /tmp/u1 unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/u4 unichars '/\p{Latin}/ && length(NFD) == 1 && NAME =~ /LATIN\b.*\bLETTER\b.*\b[C-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' | less -r unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' '$$CF{full} =~ / /' unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /./' '$$CF{full} =~ / /' unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /F/' unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /F/' '$$CF{full} =~ / /' unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,3}$/' 'CF =~ /[SF]/' '$$CF{full} =~ / /' unichars '\p{Latin}' 'NAME =~ /\b[\h\pL]{2,4}$/' 'CF =~ /F/' '$$CF{full} =~ / /' unichars '\p{Latin}' 'NAME =~ /\b\pL{2,3}$/' 'CF =~ /F/' '$$CF{full} =~ / /' unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' unichars '\p{Latin}''NAME =~ /\bWITH\b/' 'length(NFKD) == 1' unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort | less unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | less unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | ucsort | less unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | ucsort | less -r unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[ABCD].*\bWITH\b/' | wc -l unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b.*\bWITH\b/' | ucsort | less -r unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b.*\bWITH\b/' | ucsort --level=1 | less -r unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/' | wc -l unichars '\p{Latin} && NAME =~ /LATIN\b.*\bLETTER\b.*\b[ABCD]\b.*\bWITH\b/' | wc -l unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort | less -r unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=1 | less -r unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=1 > /tmp/u1 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=2 > /tmp/u2 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=3 > /tmp/u3 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b.?[ABCD].?\b/' | ucsort --level=4 > /tmp/u4 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --level=1 | less -r unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --level=4 > /tmp/u4 unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --preprocess 's/..\K\h+\S+//' --level=1 | less -r unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --preprocess 's/..\K.*//' --level=1 | less -r unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b([A-D]|.[A-D]|.[A-D])\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 | less -r unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-D]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=1 | less -r unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[A-E]\p{Lu}?\b/' | ucsort --upper-before-lower --preprocess 's/..\K.*//' --level=4 > /tmp/uu unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*\b[CD]\b.*\bWITH\b/' | wc -l unichars '/\p{Latin}/ && NAME =~ /LATIN\b.*\bLETTER\b.*[CD].*\bWITH\b/' | wc -l unichars '\p{Latin}' '\w' unichars '\p{Latin}' '\w' | wc -l unichars '\pL' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' unichars '\pL' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | wc -l unichars '\pL' 'length(NFD) == 1' 'NAME =~ /WITH/' unichars '\p{Lm}' '\p{Cased}' unichars '\p{Lm}' '\p{Changes_When_Casemapped}' unichars '\p{Lm}' '\p{upper}' unichars '\pL' 'NAME =~ /\bSCRIPT/' unichars '\PL' 'NAME =~ /\bSCRIPT/' unichars '\pL' 'NAME =~ /SCRIPT/' unichars '\pL' 'NAME =~ /WITH/' | wc -l unichars '\p{Lower}' 'length(uc) > 1' unichars '\p{Lower}' 'NAME =~ /CAPITAL/' unichars '\p{Lower}' 'NAME =~ /CAPITAL/' | less unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age:6}' unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age:6.0}' unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age=6.0}' unichars '\p{Lower}' 'NAME =~ /CAPITAL/' '\p{Age:6.0.0}' unichars '\p{Lower}' 'NAME =~ /CAPITAL/' > /tmp/s unichars '\p{Lower}' 'NAME =~ /CAPITAL/' | wc -=l unichars '\p{Lower}' 'NAME =~ /CAPITAL/' | wc -l unichars '\p{Lower}' 'NAME !~ /SMALL CAPITAL|CAPITAL LETTER/' | wc -l unichars '\p{Lower}' 'NAME =~ /SMALL CAPITAL|CAPITAL LETTER/' | wc -l unichars '\p{lower}' '\P{CWL}' | less -r unichars '\p{Lower}' '\P{CWU}' unichars '\p{lower}' '\P{CWU}' | less -r unichars '\p{Lower}' '\P{CWU}' | wc -l unichars '[\p{Lower}\p{Upper}' '[^\p{CWU}\p{CWL}]' | less -r unichars '[\p{Lower}\p{Upper}]' '[^\p{CWU}\p{CWL}]' | less -r unichars '\pL' '\p{Alphabetic}' | wc -l unichars '\PL' '\p{Alphabetic}' | wc -l unichars '\pL' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' unichars '\pL' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | wc -l unichars '\pL' '\p{Latin}' 'NAME =~ /WITH/' | wc -l unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | field %1 unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | field %2 unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | field %2 > /tmp/a unichars '\pL' '\P{Lm}' '\p{Latin}' 'length(NFD) == 1 && length(NFKD) == 1' 'NAME =~ /WITH/' | wc -l unichars '\pL' '\P{Lm}' '\p{Latin}' 'NAME =~ /WITH/' | wc -l unichars '\PL' 'uc =~ /\p{Lower}/' unichars '\PL' 'uc =~ /\p{Upper}/' unichars '\p{Math}' unichars '\p{Mc}' unichars '\p{Me}' unichars '[\p{Miscellaneous_Symbols}\p{Miscellaneous_Symbols_and_Pictographs}]' unichars '\pM' '\P{Grapheme_Extend}' unichars '\PM' '\p{Grapheme_Extend}' unichars '\PM' '\P{Grapheme_Extend}' unichars '\p{Nchar}' unichars '\p{Nl}' unichars '\p{No}' '\w' unichars '\p{No}' '\W' unichars '\p{No}' '\W' | head unichars '\p{No}' '\W' | less unichars '\pN' '\P{Nd}' | less unichars '\p{Numeric_Value=NaN}' unichars '/\P{NV=NaN}/ && ! (NUM() ~~ [0..10])' | less -r unichars '/\P{NV=NaN/ && ! (NUM() ~~ [0..10])' | less -r unichars '/\P{NV=NAN/ && ! (NUM() ~~ [0..10])' | less -r unichars '\pN' '\W' unichars '\pN' '\W' | wc -l unichars '\p{Other_Alphabetic}' unichars '\p{Other_Alphabetic}' | less unichars '\p{Other_Alphabetic}' '\PM' | less unichars '\p{Other_Lowercase}' unichars '\p{Other_Lowercase}' | less unichars '\pP' unichars '\p{Pc}' unichars '\P{Pd}' '\p{Dash}' unichars '\p{Pe}' unichars '\p{Pf}' unichars '[\p{Pf}\p{Pi}]' unichars '[\p{Pf}\p{Pi}]' '\p{BidiM}' unichars '[\p{Pf}\p{Pi}]' '\P{BidiM}' unichars '[\p{Pi}]' unichars '\p{Pi}' '\p{bidim}' unichars '[\p{Pi}]' '\P{BidiM}' unichars '[\p{Pi}\p{Ps}]' 'NAME =~ /VERTICAL/' unichars '\p{Po}' unichars '\p{Po}' '\p{bidim}' unichars '[\p{Po}\p{Pe}]' '\P{BidiM}' unichars '\pP' '\P{QMark}' 'NAME =~ /QUOT/' unichars '\p{Ps}' 'NAME =~ /VERTICAL/' unichars '\p{Ps}' '\p{bidim}' unichars '\p{Ps}' '\P{bidim}' unichars '\p{Ps}' '\P{bidim}' | less unichars '\p{Ps}' '\p{bidim}' | wc -l unichars '\p{Ps}' '\P{bidim}' | wc -l unichars '[\p{Ps}\p{Pe}]' '\P{BidiM}' unichars '\p{Qmark}' unichars '\p{QMark}' unichars '\pS' unichars '\p{Sk}' unichars '\pS' 'NAME =~ /BALL/' unichars '\p{Surrogate}' unichars '\p{Title}' 'lc !~ /\p{Lower}/' unichars '\p{Title}' '[^\p{CWL}\p{CWU}]' unichars '\p{Title}' '[^\p{CWL}\p{CWU}]' | wc -l unichars '\p{Title}' '\P{Lt}' unichars '\p{Title}' '\P{Lt|} unichars '\p{Title}' 'uc !~ /\p{Upper}/' unichars '\p{Upper}' 'NAME !~ /CAPITAL/' unichars '\p{UPPER}' 'NAME =~ /SMALL/' unichars '\p{Upper}' '\P{CWL}' unichars '\p{Upper}' '\P{CWL}' | wc -l unichars '\p{WhiteSpace}' '\PZ' unichars '\p{XPosixAlnum}' | less unichars '\p{XPosixAlnum}' '\P{XPosixAlpha}' | less unichars '\R' unichars '\R' | field %2 unichars -sag Comp_Ex unichars -sag '\p{Lower}' '\P{CWU}' unichars -sag '\PL' '\p{Lower}' '\p{CWU}' unichars -sag '\PL' '\p{Lower}' '\P{CWU}' unichars -sag '\PL' '\p{Upper}' '\p{CWL}' unichars -sag '\PL' '\p{Upper}' '\P{CWL}' unichars -sag '\p{Upper}' '\P{CWL}' unichars -sag Singleton unichars /s/i unichars -ua '\p{Assigned}' unichars -ua '\p{Assigned}' | wc -l unichars 'UCA1 eq UCA1("a")' unichars 'UCA1 eq UCA1("a")' | cat -n unichars 'UCA1 eq UCA1("ae")' unichars 'UCA1 eq UCA1("d")' unichars 'UCA1 eq UCA1("d")' | cat -n unichars 'UCA1 eq UCA1("ij")' unichars 'UCA1 =~ UCA1("d")' unichars 'UCA2 eq UCA2("ij")' unichars 'UCA3 eq UCA3("ij")' unichars 'UCA eq UCA "%"' unichars 'UCA eq UCA("a")' unichars 'UCA eq UCA("ae")' unichars 'UCA eq UCA("d")' unichars 'UCA eq UCA("d")'df unichars 'UCA eq UCA "\N{PRIME}"' unichars 'UCA eq UCA("p")' unichars 'UCA eq UCA("p")' | wc -l unichars 'UCA eq UCA("s")' | wc -l unichars 'UCA(NFKD) =~ UCA("a")' unichars 'UCA(NFKD) =~ UCA("ae")' unichars 'UCA(NFKD) =~ UCA("a")' 'NFKD !~ /a/i' unichars 'UCA(NFKD) =~ UCA("d")' unichars 'UCA(NFKD) =~ UCA("d")' 'UCA ne UCA("d")' unichars 'UCA(NFKD) =~ UCA("i")' 'NFKD !~ /i/i' unichars 'UCA(NFKD) =~ UCA("m")' unichars 'UCA(NFKD) =~ UCA("o")' unichars 'UCA(NFKD) =~ UCA("oe")' unichars 'UCA(NFKD) =~ UCA("o")' 'NFKD !~ /o/i' unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))' unichars '(UCA(NFKD) =~ (UCA("o")."|".UCA("a"))) || NFKD =~ /[ao]/i' | ucsort | less -r unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))' 'NFKD !~ /o/i' unichars 'UCA(NFKD) =~ UCA("o")."|".UCA("a")' 'NFKD !~ /o/i' unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))' | ucsort | less unichars 'UCA(NFKD) =~ (UCA("o")."|".UCA("a"))' | ucsort | less -r unichars 'UCA(NFKD) =~ UCA("s")' unichars 'UCA(NFKD) =~ UCA("z")' unichars 'UCA(NFKD) =~ UCA("z")' 'NFKD !~ /z/i' unichars 'UCA(NFKD) =~ UCA("z")' 'UCA ne UCA("z")' unichars 'UCA =~ UCA("d")' 'UCA ne UCA("d")' unichars 'UCA =~ UCA("i") && UCA =~ UCA("j")' unichars 'UCA =~ UCA("p")' unichars 'UCA =~ UCA("p")' | wc -l unichars 'UCA =~ UCA("s")' 'UCA ne UCA("s")' unichars 'UCA =~ UCA("s")' 'UCA ne UCA("s")' | wc -l unichars 'UCA =~ UCA("s")' | wc -l unichars 'uc ne ucfirst' unichars -v unichars '\w' '[^_\p{Alphabetic}\p{Nd}]' unichars '\w' '[^_\p{Alphabetic}\p{Nd}]' | wc -l unichars '\w' '[^\pL\pN\pM]' unichars '\w' '[^\pL\pN\pM\p{Pc}]' unichars '\w' '[^\pL\pN\pM\p{Pc}]' | less unichars '\w' '[^\pL\p\pM]' unichars '\w' '\P{word}' unichars '\W' '\p{word}' unichars '/\w/ == /\W/' unichars '\w' '\W'
ucsort ../CRAFT-dumps/lc-not-unique | less ucsort --level=1 --upper-before-lower --preprocess="s/\s.*//" fing-2 | less ucsort --locale=ca /tmp/cat ucsort --locale=es /tmp/cat ucsort --locale=es__traditional /tmp/cat ucsort --locale=es_traditional /tmp/cat ucsort --locale=ru /tmp/cyril > /tmp/cyril.ru ucsort overlapping-obos | less ucsort --pre '/*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less ucsort --pre='/*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less ucsort --preprocess='/.*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less ucsort --preprocess='/*\N{RIGHTWARDS ARROW} (\d+)/; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less ucsort --preprocess='s/(.*)([\[\]].*[\[\]])(.*)/$2 $1 $3/' --reverse-input stem-fail-tally | less ucsort --preprocess='s/(.*)([\[\]].*[\[\]])(.*)/$2 $1 $3/' --reverse-input stem-fail-tally > stem-fail-sort ucsort --preprocess='s/([\[\]].*[\[\]])(.*)/$2 $1/' --reverse-input stem-fail-tally | less ucsort --preprocess='s/(\[.*\])(.*)/$2 $1/' --reverse-input stem-fail-tally | less ucsort --preprocess='s/(\].*\[)(.*)/$2 $1/' --reverse-input stem-fail-tally | less ucsort --preprocess='s/(\[FAIL: .*\])(.*)/$2 $1/' --reverse-input stem-fail-tally | less ucsort --preprocess='s/.*\gt//' go-greek ucsort --preprocess='s/.*\N{RIGHTWARDS ARROW} (\d+)//' ../CRAFT-dumps/lc-not-unique | & less ucsort --preprocess='s/.*\N{RIGHTWARDS ARROW} (\d+)//' ../CRAFT-dumps/lc-not-unique | less ucsort --preprocess='s/.*\N{RIGHTWARDS ARROW} (\d+)//; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less ucsort --preprocess='s/.*\t//' go-greek | less ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' ../CRAFT-dumps/lc-not-unique | & less ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' ../CRAFT-dumps/lc-not-unique | tac > ../CRAFT-dumps/lcnu-sort ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' ../CRAFT-dumps/not-unique | tac > ../CRAFT-dumps/nu-sort ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' lc-not-unique | tac > lc-nu-sort ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $a)/e' not-unique | tac > nu-sort ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $a = $1; s/^/sprintf("%06d", $n)/e' ../CRAFT-dumps/lc-not-unique | & less ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)(?:\N{DIVISION SIGN}\d+)?//; $n = $1; s/^/sprintf("%06d", $n)/e' ../CRAFT-dumps/lc-not-unique | & less ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)//; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | & less ucsort --preprocess='use charnames qw(:full); s/.*\N{RIGHTWARDS ARROW} (\d+)//; s/^/sprintf("%06d", $1)/e' ../CRAFT-dumps/lc-not-unique | less ucsort --pre='s/.*] //' gene* | less ucsort --pre='s/.*=> //' gene* | less ucsort --pre 's/.*\gt//' go-greek | less ucsort --pre='s/^\S+\h+//' pmc-weirds | less ucsort --reverse-input go-uglies | perl -nle ' printf "%50s\n", $_' > flip-go-uglies ucsort --reverse-input stem-fail-tally | less ucsort --reverse-input stem-fail-tally > stem-fail-sort2 ucsort --reverse-input ugly-tally > flip-ugly-tally ucsort -reverse-input ugly-tally > flip-ugly-tally ucsort --reverse-input uuglies > flip-uuglies ucsort /tmp/emoji | less ucsort /tmp/emoji | unifmt -180 | less ucsort < /tmp/u | uniq > /tmp/uu ucsort /tmp/uw > /tmp/u ucsort --upper-before-lower --preprocess="s/\h.*//" fing-2 > fa ucsort --upper-before-lower --preprocess="s/\s.*//" fing-2 ucsort --upper --pre='s/^\S+\h+//' pmc-weirds | less ucsort --upper --pre='s/^\S+\h+//' pmc-weirds > pmc-wsort unichars -a '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d unichars -a 'UCA eq UCA("d")' 'NFKD !~ /d/i' | ucsort unichars -a '(UCA(NFKD) =~ (UCA("o")."|".UCA("a"))) || NFKD =~ /[ao]/i' | ucsort | less -r unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper}' | ucsort | cat -n | less -r unichars -fgns '(?=\P{Ll})\p{Lower}|(?=\p{Lu})\p{Upper})' | ucsort | cat -n | less -r unichars -fgns '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r unichars -fgns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r unichars -fgns '(?x) (?= \P{Ll} ) \p{Lower} | (?=\P{Lu}) \p{Upper}' | ucsort | cat -n | less -r unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r unichars -gas '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort --upper-before-lower | less -r unichars -gas '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r unichars -gas '\p{Lower}' '\P{Ll}' | ucsort | less unichars -gfns '/\p{Lower}/ && /\p{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort | less -r unichars -gfns '/\p{Lower}/ && /\P{CWU}/ || /\p{Upper}/ && /\P{CWL}/' | ucsort --upper | less -r unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | cat -n | less -r unichars -gns '\p{Lower}' '\P{Ll}' | ucsort | less -r unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r unichars -gs '\p{Cased}' '[^\p{CWU}\p{CWT}\p{CWL}]' | ucsort | less -r unichars -gs '/(?=\P{Ll})\p{Lower}|(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r unichars -gs '/(?=\P{Ll})\p{Lower}|/(?=\P{Lu})\p{Upper}/x' | ucsort --upper-before-lower | cat -n | less -r unichars -gs '/(?= \P{Ll} ) \p{Lower}/x || /(?=\P{Lu}) \p{Upper}/x' | ucsort | cat -n | less -r unichars -gs '/(?= \P{Ll} ) \p{Lower} /x || / (?= \P{Lu} ) \p{Upper} /x' | ucsort --upper-before-lower | cat -n | less -r unichars -gs '/(?= \P{Ll} ) \p{Lower} /x || / (?= \P{Lu} ) \p{Upper} /x' | ucsort --upper-before-lower | cat -n | less -r unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less unichars -gs '/\p{Lower}/ && /\P{CWT}/' | ucsort | less -r unichars -gs '\p{Lower}' '\P{Ll}' | ucsort | less -r unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower | less -r unichars -gs '\pL' '\p{Lower}' '\P{Ll}' | ucsort --upper-before-lower | less -r unichars -gs '/\p{Upper}/ && /\P{CWL}/' | ucsort | less unichars -gs '/\p{Upper}/ && /\P{CWT}/' | ucsort | less unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort | less unichars '\p{Latin}' 'NAME =~ /\bWITH\b/' 'length(NFKD) == 1' | ucsort > /tmp/d uninames CYRIL | ucsort | less cat *-top | field %2 | tally | ucsort --reverse-input | less cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %13s\n", @F[0,1]' | less cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %15s\n", @F[0,1]' | less cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %18s\n", @F[0,1]' > hapax50 cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %18s\n", @F[0,1]' | less cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %25s\n", @F[0,1]' cat *-top | field %2 | tally | ucsort --reverse-input | perl -lane 'printf "%2d %25s\n", @F[0,1]' | less egrep '^GO:(0005024|0005025|0005026|0007179|0015052)' ../CRAFT-dumps/CRAFT-go | sort -u | ucsort | less egrep '^GO:(0005024|0005025|0005026|0007179|0015052)' ../CRAFT-dumps/CRAFT-go | ucsort | less ls *.obo* | ucsort ls *.obo* | ucsort | perl -le 'printf "%-40s", $_' ls *.obo* | ucsort | perl -lne 'printf "%-40s\n", $_' ls *.obo* | ucsort | perl -lne 'printf "%40s\n", $_' perl5.12.0 -S -CLA unichars '/s/i' | ucsort perl -CS -E 'say for split " ", "cat ca\x{308}t czt c\x{e4}t bat dat"' | ucsort --locale=sv perl fingerprint $cat | ucsort > fing-all2 perl fingerprint $cat | ucsort --upper-before-lower --preprocess="s/\h.*//" > fing-all2 perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep 'ABBREV' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//' > /tmp/ab perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep 'GEN' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//' > /tmp/ge perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/......//' perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//' perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/^\h+\d+\h//; s/\h.*//' > /tmp/d perl -F'\t' -lane 'print $F[0], "\t", $F[1] if @F' collisions-uniq | tally | sort -k2 -k1rn | tcgrep '\xB1' | perl -MText::Tabs -nle 'BEGIN{$tabstop = 30} print expand $_' | ucsort --pre='s/......//; s/\h.*//' perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='$_ = fixstring($_)' ovl | & less perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='$_ = fixstring($_)' ovl > ovl-sort perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='\&fixstring' ovl | & less perl -I ../CRAFT-dumps -MFixString -S ucsort --preprocess='\&fixstring' ovl | less repeat 200 randline /tmp/u6 | ucsort | uniq > /tmp/u tcgrep '^0' ../../new-output/results-go-gene-stemwords-GOOD | ucsort --reverse-input | less
unilook activation unilook adi unilook adieu unilook 'alis\b' unilook angina unilook arthrit unilook ascite unilook betab unilook '\boverexertion\b' unilook capitali unilook catheterization unilook defib unilook delineate unilook digitalis unilook dofetilide unilook dysauto unilook dyssyn unilook dysyn unilook edema unilook estiv unilook etouf unilook euphon unilook fentan unilook fentanyl unilook /glob unilook gw unilook gwen unilook gyneco unilook hemochr unilook hemodialysis unilook hibern unilook hippodam unilook hippo | wc -l unilook hypogl unilook idyl unilook '(?i)^[^\N{LEFTWARDS ARROW}]*?(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL' unilook inscrut unilook ischemia unilook leucocy unilook leuc | wc -l unilook leukocy unilook leuk | wc -l unilook lighthead unilook lymp unilook lympho unilook lymphom unilook meningitis unilook '\N{oslash}' unilook oligu unilook oto unilook otot unilook overeat unilook overexert unilook overexertion unilook pacem unilook '(?\P{ASCII})\pL' unilook '(?=\P{ASCII})\pL' unilook pectoris unilook pheo unilook phonious unilook 'phonious\b' unilook '\pM' unilook -ppro unilook -ppro . unilook -Ppro . unilook -ppronoun . unilook -ppronoun wh unilook primum unilook pseudonormal unilook pulmon unilook rale unilook rheto unilook rosuv unilook secund unilook sputum unilook sum unilook tachyph unilook thiazol unilook thyrotox unilook tracheitis unilook uephon unilook uremia unilook -v . unilook -v activation unilook -v angina unilook -v ascite unilook vascul unilook -v '\boverexertion\b' unilook verna unilook vesnar unilook -v holter unilook -v '(?i)^[^\N{LEFTWARDS ARROW}]*?(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL' unilook -V '(?i)^[^\N{LEFTWARDS ARROW}]*?(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL' unilook -V '(?i)(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}\N{oe}])\pL' unilook -V '(?i)(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}])\pL' unilook -v meningitis unilook -v 'overexertion' unilook -V '(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}\N{ae}])\pL' unilook -V '(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}])\pL' unilook -V '(?=\P{ASCII})(?=[^\N{stress1}\N{stress2}]\pL' unilook -V '(?=\P{ASCII})\pL' unilook -v pneumoconiosis unilook -vz 'comeraderie' unilook wh unilook who unilook widespread unilook -z dofetilide unilook -z eplerenone unilook -z pectoris unilook -z sulfoxide unilook -zv anterolateral unilook -zv holter unilook -zv metformin
Tom Christiansen <tchrist@perl.org> wrote all the code.
<tchrist@perl.org>
brian d foy <brian.d.foy@gmail.com> wrapped the distribution around it.
<brian.d.foy@gmail.com>
Copyright 2011-2021, Tom Christiansen.
You can use this distribution under the same terms as Perl itself.
To install Unicode::Tussle, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Unicode::Tussle
CPAN shell
perl -MCPAN -e shell install Unicode::Tussle
For more information on module installation, please visit the detailed CPAN module installation guide.