Xcode Utility

beta version 0.61.3 2012-05-03

Purpose:

Xcode is a command-line utility which converts English text files from one character-encoding scheme to another. It can be used as a filter reading from standard input, or it can read from one or more files specified on the command line. The resulting text is written to standard output; it can be piped or redirected.

Xcode does not support multi-byte code pages such as those for Chinese, Japanese, or Korean. In particular, code pages 932, 936, 949, and 950 are not supported.

Syntax:

XCODE /H /N:cdefq /I:n /A:n /U /UN /UTF8 filename…

/Hheaders: show filenames before text; /HH only when matching wildcards
/N:cdefqdisable translation of special characters; suboptions include:
     C — Circles: don’t convert ©, ®, ™ to/from (C), (R), (TM)
     D — Dashes: don’t convert em dashes — to/from double hyphens --
     E — Ellipses: don’t convert ellipses … to triple periods ...
     F — Fractions: don’t convert Unicode fractions (e.g. ⅓ or ⅝) to ASCII strings (e.g. 1/3 or 5/8)
     Q — Quotes: don’t convert Unicode quotes and apostrophes to/from ASCII near-equivalents
/Ndisables all of the above
/I:ninterpret non-Unicode input text using code page n
/Aoutput 8-bit text using the default Windows code page
/A:noutput 8-bit text using code page n
/Uoutput text as Unicode (UTF-16) — the default output encoding
/UNoutput text as Unicode (UTF-16) with no Byte Order Mark
/UTF8output text as UTF-8

If standard input (stdin) is redirected, Xcode will read from stdin before any filenames specified on the command line. If no filenames are specified, then Wrap will read from stdin whether it is redirected or not. If /H is used, each file’s name will be printed before it is processed. (For standard input, <stdin> will be shown.)

Input text format option: When the input text is not in Unicode, the /I:n option tells Xcode how to interpret character codes greater than 127. The default behavior is to use the default Windows code page, usually 1525 in the United States; /I:n allows you to select a different one. When the input text is in Unicode, /I:n has no effect.

Output text format options: /A causes output to be non-Unicode. /A by itself writes text as 8-bit character codes using the default Windows code page. /A:n allows you to specify the code page used to encode character values greater than 127. Either way, any characters which can’t be represented using the default or specified code page will replaced with a question mark. /U and its variants cause output to be in Unicode. If neither /A nor /U is specified, the output encoding defaults to UTF-16.

Example:

rem  Convert an ASCII text file to Unicode:
xcode "Little, Big.txt" > Unicode_out.txt

Environment Variables:

The following variables specify a Unicode character used to replace a generic ASCII quote character. The value of the variable may be a single character; a decimal value 32 through 65535; or a hexadecimal value 0x20 through 0xFFFF.

OPENQUOTE:replaces the ASCII double-quote ( " ) at the start of a quotation; the default value is 0x201C (  ).
CLOSEQUOTE:replaces the ASCII double-quote ( " ) at the end of a quotation; the default is 0x201D (  ).
OPENSQUOTE:replaces the ASCII apostrophe ( ' ) at the start of a quotation; the default is 0x2018 (  ).
CLOSESQUOTE:replaces the ASCII apostrophe ( ' ) at the end of a quotation; the default is 0x2019 (  ).
APOSTROPHE:replaces the ASCII apostrophe ( ' ) within a word; the default is 0x2019 (  ).
'OKINA:replaces the ASCII apostrophe ( ' ) between two vowels; the default is 0x2018 (  ).

Note that the variable name 'OKINA begins, ironically enough, with an apostrophe. To disable ‘okinas, SET 'OKINA=0X2019  (or the same value as the apostrophe).

Example:

rem  Use guillemets for quotations:
set openquote=0xab & set closequote=0xbb
xcode "Little, Big.txt"

Exit Codes:

0All files successfully processed
0Syntax request via /?
1Any syntax error
2Any error while processing files

Acknowledgments:

At present, I compress the binary using either UPX (http://upx.sourceforge.net/) or MPRESS (http://www.matcode.com/mpress.htm).

Status and Licensing:

This is beta software. It may very well have issues. Try it at your own risk.

This program is currently licensed only for testing purposes. I may make binaries and source code available under some free license once I consider it ready for use.