TextUtils plugin for Take Command / TCC / 4NT

beta version 0.73.6     2017-12-05

Charles Dye

Purpose:

This plugin implements a variety of text-related features. There are new commands to count words, sentences, and paragraphs in English text; find words in text and display them in context; replace words in text; generate random passwords; display the lines of a text file in reverse order; wrap text to a desired width; and save an entire array to disk and reload it later. New functions allow you to obscure text to make it unreadable, and restore it later; determine the character encoding and text format of text files; generate Metaphone codes; remove accents from text strings; and count vowels in a string.

Installation:

To use this plugin, copy TextUtils.dll and TextUtils.chm to some known location on your hard drive. (If you are using the 64-bit version of Take Command, take TextUtils-x64.dll instead of TextUtils.dll.) Load the plugin with a PLUGIN /L command, for example:

plugin /l c:\bin\tcmd\test\textutils.dll

If you copy these files to a subdirectory named PlugIns within your Take Command program directory, the plugin will be loaded automatically when TCC starts.

Plugin Features:

New commands: CONTEXT, DEDUP, DEHTML, FFIELDS, LOADARRAY, OINK, PARSEARGS, PASSWORD, REPLACETEXT, ROT13, SAVEARRAY, SHUFFLE, TEXTUTILSHELP, UNICODIFY, UPEND, UTYPE, WORDS, WRAP, XFILTER

New functions: @B85TOBIN, @BETWEEN, @BINTOB85, @CLARIFY, @INIVALUE, @METAPHONE, @OBSCURE, @OINK, @ROT13, @STRIPACCENTS, @TEXTENCODING, @TEXTFORMAT, @UQUOTES, @VOWELS

New variables: _CHARACTERS, _CHARACTERSALL, _GETACP, _INIVALUERC, _LINES, _LINESALL, _LONGESTLINE, _LONGESTLINEALL, _NONBLANKLINES, _NONBLANKLINESALL, _PARAGRAPHS, _PARAGRAPHSALL, _PASSWORD, _PROPERNOUNS, _PROPERNOUNSALL, _SENTENCES, _SENTENCESALL, _SENTENCESD, _SENTENCESDALL, _SENTENCESE, _SENTENCESEALL, _SENTENCESQ, _SENTENCESQALL, _SENTENCEWORDS, _SENTENCEWORDSALL, _TITLES, _TITLESALL, _UNIQUEWORDS, _UNIQUEWORDSALL, _WC, _WCALL, _WORDFILES, _WORDS, _WORDSALL

Syntax Note:

The syntax definitions in the following text use these conventions for clarity:

BOLD CODEindicates text which must be typed exactly as shown.
CODEindicates optional text, which may be typed as shown or omitted.
Bold italicnames a required argument; a value must be supplied.
Regular italicnames an optional argument.
ellipsis…after an argument means that more than one may be given.

New Commands:

CONTEXT — Searches for words in English text and displays them in context.

Syntax:
CONTEXT /A:attribs /C:n /F:n /H:n /HA /I:n /K:n /N /S /V /W:base /X:word /Y:word filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/C:nspecifies the number of sentences of context to display, before and after
/F:nspecifies the format of the input text; n is one of:
   0 — best guess (default)
   1 — unformatted (line breaks are used only to end paragraphs)
   2 — prewrapped (line breaks are used to wrap text)
   3 — unformatted, with blank lines between paragraphs
/H:nset highlight colors for matching words
/HAuse ANSI codes for highlighting
/I:ninterpret non-Unicode input text using code page n
/K:noutput columns for word-wrap
/Ndisable features
/Ssearch in subdirectories for matching filenames
/Vverbose; report counts of found items after each file and at the end
/W:basesearch for forms of a word
/W:"base base…"search for a series of word forms
/X:wordsearch for an exact word
/X:"word word…"search for a series of exact words
/Y:wordsearch for words that sound like word

CONTEXT can read from disk files or from a pipe. If you want to pipe to CONTEXT, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to search for words on the clipboard.

Word search: /W:base searches for forms of a word; this will probably be your most frequently-used option. Specify the base form of a word, and CONTEXT will attempt to match variations of it. For example, /W:DOG will match dog, dogs, dog’s, doggy, and even doggedly.

A word in the input text is considered a “form” of the specified base word if (1) the beginning matches for the entire length of base, except that a final Y at the end of the base word will match an I in the word from the text; and (2) the remainder of the word does not contain more than one vowel other than Y. Case is not significant, and most common accents are ignored; /W:garcon will match garçon, /W:"deja vu" will match Déjà vu, and so on.

If a word from the input text contains a hyphen, the /W: search will also look for the specified base word to either side of the hyphen; /W:LEVEL will match level-headed, sub-level, and even poorly-levelled.

Word series: You can search for a series of words with /W:"base base…". To match, a series of words must appear within the same sentence in the input text; a word series cannot span the end of a sentence. Matching words must be consecutive, and may be separated by spaces, tabs, or other punctuation. CONTEXT will check for forms of each base word as above, but will not look for the base within hyphenated words. For instance, /W:"LITTLE OLD LADY" will match little, old ladies.

Exact-word search: /X:word searches for a word without checking for variant forms. /X: does not look for the specified word within hyphenated words. Case and accents are still ignored. You can search for a series of exact words with /X:"word word…".

Sound-alike search: /Y:word searches for words which sound similar to the specified word. The comparison uses a Metaphone-like algorithm to guess at a word’s pronunciation. (This type of search does not support word series.)

Surrounding context: By default, CONTEXT displays one sentence before, and one sentence after, each sentence containing any of the specified search words. You can adjust this value with /C:n; legal values are 0 to 15. Note that you may see more than 2n sentences between found words that are close together; CONTEXT will display a little extra text rather than introduce a very short break. You may also see fewer than n sentences near the start or the end of a file.

Highlighting: If CONTEXT’s output is to the screen (i.e. stdout is not redirected), text which matches your search words will be highlighted in a different color. By default, CONTEXT picks a highlight color which contrasts with the current console colors. You can specify your own highlight color either with the option /H:n, or by setting an environment variable named HIGHLIGHT. Either way, the value should be a decimal number from 1 to 254, or a hexadecimal value from 0x01 to 0xFE. The high four bits set the background color, and the low four bits set the foreground color; the two values must be different. The command-line option takes precedence over the environment variable. You can disable highlighting with /NH. Text is not highlighted if the commands’s output is redirected.

If you specify /HA, the command will use ANSI codes to highlight found items. Unlike the usual method, ANSI highlighting can used with redirection. However, ANSI highlighting won’t work when output is to the screen unless you have enabled TCC’s ANSI support (OPTION //ANSI=YES, or the “ANSI Colors” tick box in the OPTION dialog).

Reports: If /V is specified, CONTEXT will also report the number of times each search word was found within a file. If more than one file is processed it will also show a final report for all files, giving the number of times each search word was found in total, and in how many files.

Text encoding: CONTEXT automatically detects Unicode text files. If the file is not Unicode, the command has no way of detecting the character encoding; the default Windows code page is assumed. You can specify a different code page for non-Unicode text files with /I:n. Most single-byte (i.e., alphabetic) code pages are supported, but multibyte code pages (Chinese, Japanese, Korean) are not. This option only affects non-Unicode files.

Text format: Text files use line-break characters in different ways. In some files, line break characters are used only to mark where a line end should occur: the end of a paragraph. In other files, line breaks are used to wrap text to some desired width. You can use /F:n to tell CONTEXT how to handle line breaks. /F:1 indicates that the text is unformatted, with line breaks only at the ends of paragraphs. CONTEXT will honor all line breaks, and add an extra blank line after each paragraph. /F:2 means that the input text is prewrapped, having line breaks within paragraphs and even within sentences. CONTEXT will skip single line breaks, honoring only sequences of two or more in a row. /F:3 is also for unformatted text and acts like /F:1, but does not insert a blank line after each paragraph. If you specify /F:0 or do not specify any /F:n, CONTEXT will attempt to guess how the input text is formatted. (Guessing is not reliable when there isn’t much input text.)

Word wrap: Text output by CONTEXT will be word-wrapped. If output is to the screen, it will be wrapped to the screen width. If output has been redirected, the default width is 100 columns. You can set a different width using the /K:n option; the value must be between 40 and 512.

Disabling features: /N disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NHdo not highlight matching words
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.


C:\> context D:\download\pg11.txt /w:paint

File "D:\download\pg11.txt" :

CHAPTER VIII. The Queen's Croquet-Ground

A large rose-tree stood near the entrance of the garden: the roses growing on it were white, but there were three gardeners at it, busily painting them red. Alice thought this a very curious thing, and she went nearer to watch them, and just as she came up to them she heard one of them say, 'Look out now, Five! Don't go splashing paint over me like that!'

'I couldn't help it,' said Five, in a sulky tone; 'Seven jogged my elbow.'

*    *    *


Seven flung down his brush, and had just begun 'Well, of all the unjust things--' when his eye chanced to fall upon Alice, as she stood watching them, and he checked himself suddenly: the others looked round also, and all of them bowed low.

'Would you tell me,' said Alice, a little timidly, 'why you are painting those roses?'

Five and Seven said nothing, but looked at Two.


C:\>



DEDUP — Dumps text files to standard output, merging repeated lines.

Syntax:
DEDUP /A:attribs /B /C /D /H /I /I:n /M /N /S /T /U filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Bdiscard blank lines
/Ccount repeating lines
/Dshow only repeating lines
/Hdisplay filenames
/Iignore case when comparing lines
/I:ninterpret non-Unicode input text using code page n
/Mmerge repeating lines
/Ndisable features (the default)
/Ssearch in subdirectories for matching files
/Ttrim leading and trailing whitespace
/Ushow only lines which do not repeat

Input filenames may be specified on the command line, or text may be redirected or piped into DEDUP. If you want to pipe to DEDUP, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read text from the clipboard.

Options /C, /D, /M, and /U select the operating mode. If you don’t specify one, the default is /M. If you specify more than one, the last one wins.

/N disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.



DEHTML — Strips HTML tags from a file and dumps the contents to standard output.

Syntax:
DEHTML /A:attribs /C /E /H /I:n /M /N /N: /O:n /S filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Cinclude text in <!-- comments -->
/Eomit empty (blank) lines
/Hdisplay filenames
/I:ninterpret non-Unicode input text using code page n
/Mlook in <meta> tags for charset info
/Nby itself: include text in <noscript> or <applet> tags
/N:with suboptions: disable features
/O:ninclude text inside <option> tags:
   0 — don’t include any (the default)
   1 — include only the first <option>
   2 — include all <option> text
/Ssearch in subdirectories for matching files

Input filenames may be specified on the command line, or text may be redirected or piped into DEHTML. If you want to pipe to DEHTML, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to dump the clipboard if it contains HTML.

DEHTML will strip HTML tags from the file and replace many HTML entities with the corresponding characters; most of the remaining text will be dumped to stdout. This command will also discard: any text in the header which does not appear within <title> tags; anything in <script> or <style> tags; anything within an HTML comment unless you specify /C; anything in <noscript> or <applet> tags unless you specify /N; and anything in <option> tags within a <select> block unless you specify /O:1 or /O:2.

If you specify /M, DEHTML will look in <meta> tags in the header for information about the document’s character encoding. This only works if the file is not in Unicode; /M has no effect with Unicode files.

/N with suboptions disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.


•  Note: HTML files often include some unusual characters like non-breaking spaces, bullets, em dashes, ellipses, and guillemets. If you want to pipe or redirect the output from this command, it’s a good idea to enable Unicode output with OPTION //UNICODEOUTPUT=YES. If Unicode output is disabled, some characters may be mangled in translation.



FFIELDS — Reads a file and prints fields in a specified format.

Syntax:
FFIELDS /A:attribs /C /E /F:"format" /H /I:n /L:string /N /S /W /X filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Cseparate fields at commas
/Eseparate fields at first unquoted equals sign
/F:"format"format string; see below
/Hdisplay filenames
/I:ninterpret non-Unicode input text using code page n
/L:stringinsert line numbers on the left
/Ndisable features
/Ssearch in subdirectories for matching files
/Wseparate fields at whitespace
/Xperform variable expansion on each line

The FFIELDS command reads a file, divides each line into fields (blank lines are skipped), and then prints the fields using a format string. FFIELDS can read from disk files or from a pipe. If you want to pipe to FFIELDS, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read from the clipboard instead of a file.

The format string may contain $n to print field n, or $n=wf to print field n truncated to length w; the final letter is L to left-justify the field if it contains fewer than w characters, R to right-justify it, C to center it, or T to simply truncate the field without padding it to length w. For example, a field specifier of $4=10L would print field 4, left-justified to 10 characters. Use $$ to print a literal dollar sign, or $N to insert a line break.

Fields are numbered starting from 0.


set |! ffields /e /f:"$0=20l $1=58t"

…displays variable names truncated to 20 characters, followed by a space and the variables’ values truncated to 58 characters.

If you include /L on the command line, FFIELDS will insert line numbers to the left of each output line. Lines are numbered starting at 0. If you include the optional string argument, FFIELDS will perform variable expansion on it before prepending it to each output line; use the variable _LINE to get the current line number. For example, /L:"%%@FORMAT[03,%%_LINE]" will prepend the line number, zero-padded to at least three digits.

If you don’t specify a format string, FFIELDS will invent one at random:


alias |! ffields /e

/N disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.

/X does variable expansion on each line before displaying it. You could, for example, count the characters in each alias definition:


alias |! ffields /e /f:"$0 = %%@len[$1]" /x



LOADARRAY — Loads data from a file into an array variable.

Syntax:
LOADARRAY filename arrayname

filenamea file created by SAVEARRAY
arraynamean array variable name

The arrayname must begin with a letter. It may contain only letters, digits, underscores, and dollar signs; it should not be more than 31 characters long. If you don’t specify an arrayname, the name of the original array saved in the file will be used. The array will be created (or recreated) automatically, with the correct dimensions to hold the data from the file.

All elements in the file will be loaded. There is no provision for loading a partial array.

•  Note: This command is not available in TCC/LE, in 4NT, or in older versions of TCC which don’t support array variables.


See also: the SAVEARRAY command.



OINK — Translates a text file to Pig Latin.

Syntax:
OINK /A:attribs /H /I:n /N /Q /S filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Hdisplay filenames
/I:ninterpret non-Unicode input text using code page n
/Ndisable features
/Qreplace ASCII quotes and apostrophes with Unicode open and close quotes
/Ssearch in subdirectories for matching files

If standard input (stdin) is redirected, OINK will read from stdin before any filenames specified on the command line. If no filenames are specified, then OINK will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read from the clipboard.

If you want to pipe to OINK, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL in the shell’s .DLL directory; or else use temporary files or an in-process pipe.

/N disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.

(Yes, this is silly. It was a simple test driver to generate gribble for testing some of the other commands and functions in this plugin. It’s very small — most of the code is shared with other commands — so I left it it.)


See also: the @OINK function, which renders a string as Pig Latin.



PARSEARGS — Divides a string into arguments.

Syntax:
PARSEARGS /A:array /F:flags /Q /V:var !string

/A:arrayname of an array to receive the arguments; the default is ARG
/F:flagsparse flags; bitmapped, see below; the default is 1
/Qquiet; don’t display arguments to stdout
/V:varname of an environment variable containing the string to parse
!stringthe string to parse

This command exposes the plugin’s internal ParseArgs() function, which divides a string into command-line arguments. Its operation can be changed in various ways with the /F:flags option.

The string to be parsed may be passed in two different ways. You can pass the string on the command line, immediately following an exclamation point. The string must be the last item on the command line; everything following the equals sign is considered the string to parse. Alternatively, you can store the string in an environment variable, and pass the name of the variable with the /V:var option.

The resulting arguments will be stored in an array. You can specify the name of the array with the /A:array option. The array name must begin with a letter. It may contain only letters, digits, underscores, and dollar signs; it should not be more than 31 characters long. If you don’t specify an array name, the default is ARG. The number of arguments found will be stored in an environment variable; the name of this variable is the name of the array with an _N appended, for example ARG_N.

Parse flags:
1divide the string at unquoted spaces
2divide the string at unquoted commas
4slashes kludge: treat /A/B like /A /B
8quotes kludge: treat /A"foo" like /A:"foo"
16equals kludge: break at the first unquoted equals sign
32one-arg kludge: allow unquoted spaces in arg not beginning with /
64don’t swallow double quotes
128force all arguments to uppercase
256don’t trim spaces from the end of args
512disable special handling of double quotes

You should specify at least one of 1, 2, or 16; specifying more than one is allowed. If you don’t specify any, then 1 is assumed. Note that if you include a value of 2 (break at commas), then empty arguments are possible.

A value 4 causes causes a slash to terminate an argument beginning with a slash followed by a letter. It treats an argument like /A/B as two separate arguments.

A value of 8 checks for arguments beginning with a slash followed by a single letter and then a double quote. If this kind of construction is found, the missing colon is supplied, changing /A"foo" into /A:foo.

If you only expect one argument which does not begin with a slash, and if that argument will always be the last one in the string, you can add 32 to flags. This allows the argument to contain spaces without the necessity of double quotes.

A value of 16 is useful for commands that, like SET or ASSOC, expect a name=value pair. This mode has a number of peculiar quirks. It splits arguments at the first unquoted equals sign in an argument which does not begin with a slash. Spaces around the equals sign are dropped. Spaces in the argument after the equals sign, the value part, are retained even if they are not quoted; the name=value pair is expected to be the last item on the command line. The equals sign is retained as the first character in the value argument; this allows you to distinguish a name= construction (to clear or reset the value for name, perhaps) from a name alone (to report the value for name without changing it.)

Normal behavior is to remove double quotes from the string. Typically the double quotes are not part of the filename, value, etc. per se, but a syntactic mechanism for escaping spaces; once the string has been parsed there is no further need for them. If you want to keep the double quotes, add 64 to the value of flags.

•  Note: This command is not available in TCC/LE, in 4NT, or in older versions of TCC which don’t support array variables.



PASSWORD — Generates random strings suitable for use as passwords.

Syntax:
PASSWORD /A:min,max /C:n /D:min,max /E:min,max /F /L:min,max /N:n /P:min,max /S:min,max /Y

/A:min,maxthe number of alphabetic characters to use
/C:nspecify the case of the alphabetic characters:
     0: random
     1: lowercase
     2: uppercase
     3: word case
     5: alternating
     6: leet (vowels lower, consonants upper)
     7: unleet (reverse of the above)
/D:min,maxthe number of digits to use
/E:min,maxthe number of extended characters to use
/Fmake the first character a letter if possible
/L:min,maxthe total length of the password, in characters
/N:nthe number of strings to generate
/P:min,maxthe number of punctuation characters to use
/S:min,maxthe number of syllables to use
/Yalso copy the password to the clipboard

This command displays proposed passwords to standard output. Output can be redirected.

The default behavior is to generate a password from 7 to 10 characters long. You can specify the desired length with /L:min,max. The allowed range is 4 to 1024 characters. If you specify only one value after the /L: it will be used as both the minimum and the maximum. (All the other options which accept a min,max range behave the same way.)

/A:min,max sets the number of alphabetic characters to include. “Alphabetic characters” are the unaccented Latin letters, A to Z. The values must be from 0 to 512. The legal range is from 0 to 512 alpha characters.

/D:min,max specifies the number of digits to include; digits are of course 0 to 9. The legal range is from 0 to 128 digits.

Punctuation is by default limited to standard ASCII punction marks with no special meaning to TCC: !@#$*()-_=+;:,./?{}~ You can specify a custom set of punctuation characters by setting an environment variable named PUNCTUATION_CHARACTERS. You may include from 0 to 64 punctuation characters.

“Extended characters” are the Unicode code points from U+00C0 through U+00FF: accented Latin letters, thorn, eth, easc, eszett, and a few other hard-to-type glyphs. These characters are not included unless you specify a nonzero value using /E:. You can include up to 64 extended characters.

“Syllables” are series of four letters, alternating consonant and vowel sounds. They are intended to be somewhat pronounceable, and perhaps more memorable than an entirely random letter salad. Syllables are not guaranteed to be real words; nor are they not guaranteed not to be real words. You may include up to 64 syllables.

The /C:n case option, if specified, is only applied to the regular Latin letters A — Z. It does not affect extended characters. If you specify /C:3 (word case), then the first letter in a run of consecutive letters will be capitalized and the remainder will be in lowercase. These runs are not likely to correspond to actual words. The /C:5 option will give roughly equal numbers of uppercase and lowercase letters.


rem  Generate a 10-character random password, and
rem  stash it on the clipboard:

password /l:10 /y


This command also saves its parameters for future calls to the _PASSWORD variable.



REPLACETEXT — Replaces strings in text from a file.

Syntax:
REPLACETEXT /A:attribs /C /H /I:n /N /R:from:to /S /W /X:from:to filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Creplace character escapes (affects following /R: and /X:)
/Hdisplay filenames
/I:ninterpret non-Unicode input text using code page n
/Ndisable features
/R:from:tospecify old and replacement text
/Ssearch in subdirectories for matching files
/Wwhole words only (affects following /R: and /X:)
/X:from:tospecify old and replacement text (do not auto-capitalize)

If standard input (stdin) is redirected, REPLACETEXT will read from stdin before any filenames specified on the command line. If no filenames are specified, then REPLACETEXT will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read from the clipboard.

If you want to pipe to REPLACETEXT, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL in the shell’s .DLL directory; or else use temporary files or an in-process pipe.

Use /R: or /X: to specify the strings to search for (from) and to substitute (to). You must have at least one of these; you may add as many as you like. The text from each matching file will be dumped to stdout, with every occurrence of from replaced with the corresponding to string. If you give a from string without a matching to, then matching strings will simply be omitted from the output. The difference between the two options is that /R: automatically capitalizes the to string to match the from text which it replaces, but /X: does not. The rules for /R: are simple:

/W only affects those /R: and /X: options which follow it on the command line. /W prevents matching text which immediately follows or immediately precedes a letter or digit.

/C only affects those /R: and /X: options which follow it on the command line. /C expands character escapes of the form \nnn (decimal) or \Xxx (hexadecimal) in both the from and to text. Use this option to embed troublesome characters. For example, you could use /C /R:\x22: to strip double-quote marks from a file.

/N disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.


replacetext "Engine Summer.txt" /w /r:winter:autumn /r:but:yet



ROT13 — Encodes or decodes text with ROT13.

Syntax:
ROT13 /A:attribs /H /I:n /N /S filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Hdisplay filenames
/I:ninterpret non-Unicode input text using code page n
/Ndisable features
/Ssearch in subdirectories for matching files

If standard input (stdin) is redirected, ROT13 will read from stdin before any filenames specified on the command line. If no filenames are specified, then ROT13 will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read from the clipboard.

If you want to pipe to ROT13, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL in the shell’s .DLL directory; or else use temporary files or an in-process pipe.

/N disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.


See also: the @ROT13 function, which transforms a string using ROT13.



SAVEARRAY — Saves data from an array variable to a file.

Syntax:
SAVEARRAY /O /P /X:m,n /Y:m,n /Z:m,n /W:m,n arrayname filename

/Othe command may overwrite an existing file
/Psave a Partial array as if it were the whole thing; only useful with /X: /Y: /Z: /W:
/X:m,nsave only X index m through n
/Y:m,nsave only Y index m through n
/Z:m,nsave only Z index m through n
/W:m,nsave only W index m through n
arraynamean array variable name
filenamethe file to create

The arrayname should begin with a letter. It should contain only letters, digits, underscores, and dollar signs; it should not be more than 31 characters long.

All non-empty elements in the array will be saved. You can restore the data later with LOADARRAY.

The default behavior is to save the entire array. You can restrict the elements saved using the /X:, /Y:, /Z:, and /W: options. /X: restricts the first dimension of the array, /Y: affects the second, /Z: the third, and /Z: the fourth.

•  Note: The maximum size for any element in the array is 8,191 characters. Longer elements may cause issues!

•  Note: This command is not available in TCC/LE, in 4NT, or in older versions of TCC which don’t support array variables.


See also: the LOADARRAY command.



SHUFFLE — Dumps randomized lines from a text file.

Syntax:
SHUFFLE /A:attribs /B /H /I:n /J /L /M:n /N /P /S filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Bdiscard blank lines
/Hdisplay the filename before each file
/I:ninterpret non-Unicode input text using code page n
/Jshow line numbers (original)
/Lshow line numbers (new)
/M:nmaximum number of lines to show
/Ndisable features
/P:npause after every n lines
/Ssearch in subdirectories for matching files

SHUFFLE randomly reorders lines from the specified file. It can read from disk files or from a pipe. If you want to pipe to SHUFFLE, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

If standard input (stdin) is redirected, SHUFFLE will read from stdin before any filenames specified on the command line. If no filenames are specified, then SHUFFLE will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read lines from the clipboard.

/P:n makes SHUFFLE pause after every n lines and wait for a keystroke. /P without a count defaults to the number of console lines minus 2.

/N disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.


shuffle /b "engine summer.txt"



TEXTUTILSHELP — Opens the TextUtils plugin help file.

Syntax:
TEXTUTILSHELP topic

topicthe page to display

The TEXTUTILSHELP command will locate and open this plugin’s help file. In most cases, the internal HELP command, and the F1 and Ctrl-F1 keys, will be more convenient. The sole advantage to this command is that it can be used to open the help file to any desired topic, not only to the names of commands, functions, and variables.



UNICODIFY — Converts text files to Unicode.

Syntax:
UNICODIFY /A:attribs /I:n /L /N /O /P /Q /S filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/I:ninterpret non-Unicode input text using code page n
/Lnormalize line endings to CR/LF
/Ndisable features
/Ooverwrite read-only files
/Qquietly
/Ssearch in subdirectories for matching files

UNICODIFY rewrites the contents of text files, changing them to UTF-16 format. By default, it will skip:

The original contents of the file will be saved in a new file with the extension .original.

•  Note: This command only converts files. Standard input, internet URLs, and the clipboard are not supported. (You can use wildcards, directory aliases, @file lists, and so on.)

OEM characters will be interpreted according to the current Windows code page by default; use the /I:n option to specify a different code page. To check the translation before you actually convert the file, try UTYPE with the /I:n option first.

/N disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.



UPEND — Displays lines from a file in reverse order.

Syntax:
UPEND /A:attribs /B /C /E /H /I:n /L:string /N /P:n /R:string /S /T /V /W:n filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Bdiscard blank lines
/Creplace control characters with ^ sequences
/Eexpand variables in the /L: and /R: strings
/Hdisplay the filename before each file
/I:ninterpret non-Unicode input text using code page n
/L:stringinsert string to the left of each line
/Ndisable features
/P:npause after every n lines
/R:stringinsert string to the right of each line
/Ssearch in subdirectories for matching files
/Ttrim leading and trailing whitespace
/Valso reverse each line in the file
/W:ntruncate lines to n characters

UPEND is a low-budget substitute for the Unix tac command. It can read from disk files or from a pipe. If you want to pipe to UPEND, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

If standard input (stdin) is redirected, UPEND will read from stdin before any filenames specified on the command line. If no filenames are specified, then UPEND will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to read lines from the clipboard.

If /L: is specified, the given string will be inserted to the left of each line; /R: inserts a string to the right. If /E is also specified, variable expansion will be performed on each string. Along with TCC’s usual complement of internal variables, functions, and so on, UPEND will set an environment variable _LINE. _LINE will contain the value 0 for the first line listed (i.e. the last line in the file), 1 for the second line listed, and so on. You can massage this value with functions like @INC, @EVAL, @FORMAT, and so on. To prevent the variables from being expanded before UPEND executes, you must either enclose the string in backquotes or double the percent signs.

/P:n makes UPEND pause after every n lines and wait for a keystroke. /P without a count defaults to the number of console lines minus 2.

/N disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.


upend D:\download\pg11.txt /l:"%%@format[4,%%_line] " /e



UTYPE — Dumps text files to standard output.

Syntax:
UTYPE /A:attribs /B /C /D /E /F:string /H /I:n /K:n /L:format /N /P:n /Q /S /T /U:string /X filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Bdiscard BEL characters (control-G, ASCII 7)
/Creplace control characters with ^ sequences
/Ddiscard blank lines at the start of the file
/Ediscard all empty lines
/F:stringshow only lines following this string; /FF: inclusive
/Hdisplay the filename before each file
/I:ninterpret non-Unicode input text using code page n
/K:nexpand tabs to n columns
/L:formatinsert line numbers on the left
/Ndisable features
/P:npause after every n lines
/Qreplace ASCII quotes and apostrophes with Unicode open and close quotes
/Ssearch in subdirectories for matching files
/Ttrim leading and trailing whitespace
/U:stringshow only lines until (before) this string; /UU: inclusive
/Xdump file in hexadecimal

UTYPE displays files to standard output, much like the internal TYPE command. The primary advantage of UTYPE is that it recognizes and handles UTF-8 text files; you can think of it as a “UTF-8 TYPE”.

If you want to pipe to UTYPE, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

If standard input (stdin) is redirected, UTYPE will read from stdin before any filenames specified on the command line. If no filenames are specified, then UTYPE will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to display the contents of the clipboard.

/P:n makes UTYPE pause after every n lines and wait for a keystroke. /P without a count defaults to the number of console lines minus 2.

If you include /L on the command line, UTYPE will insert line numbers on the left, starting at 1, as TYPE does. If you include the optional format string, UTYPE will perform variable expansion on the string before displaying it; use the variable _LINE to get the current (zero-based) line number. For example, /L:"%%@FORMAT[03,%%_LINE] " will show the line number zero-padded to at least three digits.

/F: and /U: can be used to chop off a simple header or footer. /F: discards all lines up to and including the first line which contains the specified string (case-insensitive); /U: discards all lines including and after a line which contains the specified string (again, case-insensitive). For example, most Project Gutenberg ebooks include a header which ends in a line beginning with “*** START” and a footer beginning with “*** END”. You can strip them off like this:

utype "http://www.gutenberg.org/cache/epub/11/pg11.txt" /f:"*** start" /u:"*** end" /d | list

If you double the option letter — /FF: or /UU: — the matching line will be included in UTYPE’s output, not discarded.

/E discards all blank lines; /D discards only those at the start of a file. If you specify both, /D wins. If you combine /D with /F:string, UTYPE will discard any blank lines following the header. A line containing only spaces or tabs is considered blank.

/N disables features:

/NBdo not write a Byte Order Mark
/NCdisable the handbrake keys
/NDdo not search into hidden directories; only useful with /S
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.

The handbrake: When scrolling a long file to the console and /P was not specified, UTYPE checks for the Control and Esc keys. Hold down the Control key to slow the scrolling; press Esc to pause the file as if /P had been specified. This feature will be disabled automatically if you specify /P or if output is redirected; you can also disable it with /NC.


Quotes replacement: /Q causes UTYPE to replace generic ASCII apostrophes and quote marks ( ' and " ) with Unicode open and close quote marks (   and    ). The new quote marks may or may not look different from the originals, depending on how they are displayed and the font used. If the output is displayed in a non-Unicode font, the curly quotes will be lost or mangled. You can set some environment variables to control this feature.


utype "Engine Summer.txt"



WORDS — Counts words, sentences, and paragraphs in English text.

Syntax:
WORDS /A:attribs /D /F:fmt /I:n /K /N /S /U:mode filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Ddumps lists of unique words, sorted by frequency
/F:fmtspecifies the format for input text; fmt is one of:
   0 — best guess (default)
   1 — unformatted (line breaks are used only to end paragraphs)
   2 — prewrapped (line breaks are used to wrap text)
/I:ninterpret non-Unicode input text using code page n
/Kkeeps hyphens when reassembling split words
/Nby itself: no words containing digits
/Nwith suboptions: disable features
/Ssearch in subdirectories for matching files
/U:modecontrols the counting of unique words; mode is one of:
   0 — do not count unique words (faster for large files)
   1 — count unique words for each file individually (the default)
   2 — count unique words for all files together (slower)
   3 — separate counts for each file and for all files together (double oink!)

WORDS counts words, sentences, and paragraphs in English text. It can read text from standard input, or from one or more files specified on the command line. A report is written to standard output; this report can be piped or redirected. The results of the last file processed are also saved internally, and can be acessed through internal variables.

This command is designed for use with English prose. A “word” in this command has a rather complicated definition designed to catch most actual English words. The command may give strange or undesired results when used on source code, program output, HTML, or whatnot. It makes Anglocentric assumptions which may be inappropriate to other languages.

If standard input (stdin) is redirected, WORDS will read from stdin before any filenames specified on the command line. If no filenames are specified, then WORDS will read from stdin whether it is redirected or not. Filenames may include wildcards and directory aliases. You can search into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to count words on the clipboard.

This command’s definition of a “word” is complex and subject to ongoing tweaking. In general, though, a word may contain only letters, digits (unless /N is specified), periods, apostrophes, and hyphens; at least one character must be a letter. For instance, 20th, 1920s, 1969's, and post-1941 are all considered words, but 1984 is not. The first character must be alphanumeric or (very rarely) an apostrophe.

Words that differ only in case are counted as the same word. In the phrase polish Polish furniture using Polish furniture polish, this command will find only three “unique” words.

A word is counted as “proper” only if it never occurs in an all-lowercase form; no proper nouns will be found in Polish polish. Acronyms like NATO will be counted as “proper nouns”; so will ordinary words capitalized at the start of a sentence. The latter are often common words like articles and prepositions, which tend to be weeded out in longer files as they recur midsentence.

Note that a hyphenate is always counted as a single word. Without a dictionary, the command has no way of knowing whether it is composed of actual words (red-eye, half-baked) or not (pre-K, Wi-Fi).

WORDS also gives counts of sentences, paragraphs, lines, characters, and bytes. All counts should be viewed as estimates rather than gospel truth. The sentences count in particular must be taken with a healthy dose of salt; the command has no good way to determine whether a period ends an abbreviation, a sentence, or both.

A line, or a series of lines, which contains one or more sentences is counted as a “paragraph”. A line or series of lines which contains one or more words, but no recognized sentences, is instead counted as a “title”. It might actually be a title, subtitle, or chapter heading; or it might be a byline, date line, attribution, salutation, signature, line of poetry….

The number of lines reported may differ from the number of carriage returns or line feeds in the text, e.g. if the last line in the file is not terminated. A line containing only whitespace characters (spaces and tabs) is considered blank. The character and byte counts do not include any Unicode byte-order mark at the beginning of the file.

Split words: If a hyphenated word is split across a line break, WORDS will reassemble it and treat it as a single word. By default, the hyphen is dropped — the command has no way of knowing whether a hyphenated compound word was broken at a hyphen, or whether a normal word was divided between syllables and a hyphen added. The latter seems more common, and I wanted to avoid cluttering the vocabulary list with differently-hyphenated versions of the same word. If /K is specified, the command will instead retain hyphens when reassembling words broken at the end of a line. This option may cause a larger number of “unique” words to be reported.

Vocabularies: In order to count unique words and “proper nouns”, WORDS must build a list of all words found. Building this list can slow down the process and use a good deal of memory if the text file involved is large. /U:mode controls the vocabulary lists. /U:0 disables vocabularies; the command executes faster, but there will be no counts of unique and proper words. /U:1 causes WORDS to build a vocabulary list for each file it processes; this is the default behavior. /U:2 builds a combined vocabulary for all files that WORDS processes; this is slower than the default. Finally, /U:3 builds a vocabulary for each file that WORDS reads, and at the same time builds a master vocabulary for all files together; this is much slower than the default behavior, and devours memory shamelessly.

If you are processing extremely large text files, or files which are not English prose — e.g. output from a program or command — I strongly recommend using /U:0 to disable vocabulary lists.

Dump: If /D is specified, the vocabulary for each file will be dumped to stdout. If /D is combined with /U:2, you’ll instead get a combined vocabulary for all files. The list is sorted by frequency, with more common words appearing first. Note that words may be shown in a different case than they appear in the input text. This is because the command stores all words in lowercase internally for speed (lowercase letters are more streamlined).

Text format: Text files use line-break characters in different ways. In some files, line break characters are used only to mark where a line end should occur: the end of a paragraph. In other files, line breaks are used to wrap text to some desired width. You can use /F:n to tell CONTEXT how to handle line breaks. /F:1 indicates that the text is unformatted, with line breaks only at the ends of paragraphs. CONTEXT will honor all line breaks, and add an extra blank line after each paragraph. /F:2 means that the input text is prewrapped, having line breaks within paragraphs and even within sentences. CONTEXT will skip single line breaks, honoring only sequences of two or more in a row. /F:3 is also for unformatted text and acts like /F:1, but does not insert a blank line after each paragraph. If you specify /F:0 or do not specify any /F:n, CONTEXT will attempt to guess how the input text is formatted. (Guessing is not reliable when there isn’t much input text.)

Text encoding: WORDS automatically detects Unicode text files. If the file is not Unicode, the command has no way of detecting the character encoding; the default Windows code page is assumed. You can specify a different code page for non-Unicode text files with /I:n. Most single-byte (i.e., alphabetic) code pages are supported, but multibyte code pages (Chinese, Japanese, Korean) are not. This option only affects non-Unicode files.

Disabling features: /N with suboptions disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.


C:\> type EBS.txt
This is a test.  For the next sixty seconds, this station will conduct a test
of the Emergency Broadcast System.  This is only a test.

C:\> words /d EBS.txt

File "C:\EBS.txt" :
  25 words total, 17 unique, 4 proper.  25 runs of non-blanks.
  3 sentences total:  3.  0!  0?   Average sentence 8.3 words.
  1 paragraph, 0 titles.  Average paragraph 3.0 sentences.
  2 lines total, 2 not blank; the longest had 77 characters.
  137 characters in 137 bytes (OEM, prewrapped).

3:  a test this
2:  is the
1:  Broadcast conduct Emergency For next of only seconds sixty station System will

C:\>


The results from the last file processed are saved, and can be accessed using these internal variables:

_WORDS_UNIQUEWORDS_PROPERNOUNS_WC
_SENTENCES_SENTENCESD_SENTENCESE_SENTENCESQ
_SENTENCEWORDS_PARAGRAPHS_TITLES 
_LINES_NONBLANKLINES_LONGESTLINE_CHARACTERS

The cumulative results from all files processed by the last invocation of WORDS can be accessed through these variables:

_WORDSALL_UNIQUEWORDSALL_PROPERNOUNSALL_WCALL
_SENTENCESALL_SENTENCESDALL_SENTENCESEALL_SENTENCESQALL
_SENTENCEWORDSALL_PARAGRAPHSALL_TITLESALL_WORDFILES
_LINESALL_NONBLANKLINESALL_LONGESTLINEALL_CHARACTERSALL


WRAP —Word-wraps English text to fit a specified number of columns.

Syntax:
WRAP /A:attribs /C: /D /F:fmt /H /I:n /J /N:n /N /P:n,m /Q /R /S /T:n /W:width /Z:char filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/C:ncondense repeated spaces in input text
/Ddisable special handling of soft hyphens (character 173 / 0xAD)
/F:fmtspecifies the format for input text; fmt is one of:
   0 — best guess (default)
   1 — unformatted (line breaks are used only to end paragraphs)
   2 — prewrapped (line breaks are used to wrap text)
   3 — unformatted, with blank lines between paragraphs
/Hdisplay filenames
/I:ninterpret non-Unicode input text using code page n
/Jjustify right margins
/N:nminimum characters left on each line to split at a hyphen; 0 disables breaking at hyphens
/Ndisable features
/P:n,mindent all paragraphs n spaces; if m is specified, it’s the indent for the second and later lines
/Qreplace ASCII quotes and apostrophes with Unicode open and close quotes
/Rremove hyphens from line ends
/Ssearch in subdirectories for matching filenames
/T:ntab stops every n spaces
/W:widthdesired width of output text
/Z:chardefine a forced line-break character

The WRAP command word-wraps English text to fit a specified width. It can be used as a filter reading from standard input, or it can read from one or more files specified on the command line. The resulting text is written to standard output; it can be piped or redirected.

If you want to pipe to WRAP, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to wrap text from the clipboard.

“Width” here refers to a specified number of character positions, or columns. All characters are assumed to have the same width. The word-wrapped output should have neat, reasonably uniform line lengths when viewed or printed in a fixed-pitch font such as Courier, or displayed in a console window. Note that the specifed width includes the final newline character; if you specify a width of 80, then up to 79 printable characters may appear on a line.

This command is designed for use with English prose. It may give weird or undesired results when used on source code, program output, HTML, or whatnot. It makes Anglocentric assumptions that may be inappropriate to other languages.

If standard input (stdin) is redirected, WRAP will read from stdin before any filenames specified on the command line. If no filenames are specified, then WRAP will read from stdin whether it is redirected or not. If /H is used, each file’s name will be printed before it is processed. (For standard input, <stdin> will be shown.)

Output width: /W:width sets the desired width in characters for the output text. Width may be from 40 to 512. If no /W:width is specified, the default is the console width if output is to the console, or defaults to 100 columns if output is redirected. (You can set an environment variable COLUMNS to change this default.) If you type just a /W without a colon or width, then the current console width is assumed; this is useful if you are redirecting WRAP’s output but want it wrapped to the console width anyway, e.g. for piping to LIST.

Text format: Text files use line-break characters in different ways. In some files, line break characters are used only to mark where a line end should occur: the end of a paragraph. In other files, line breaks are used to wrap text to some desired width. You can use /F:n to tell WRAP how to handle line breaks. /F:1 indicates that the text is unformatted, with line breaks only at the ends of paragraphs. WRAP will honor all line breaks, and add an extra blank line after each paragraph. /F:2 means that the input text is prewrapped, having line breaks within paragraphs and even within sentences. WRAP will skip single line breaks, honoring only sequences of two or more in a row. /F:3 is also for unformatted text and acts like /F:1, but does not insert a blank line after each paragraph; use this option to wrap the output from DEHTML. If you specify /F:0 or do not specify any /F:n, WRAP will attempt to guess how the input text is formatted. (Guessing is not reliable when there isn’t much input text.)

Tab size: The /T:n option controls the expansion of tab characters. By default, tab stops are every four columns (set an environment variable TABSIZE to change this default). /T:8 would make tabs eight columns wide. /T:0 disables special handling of tab characters, treating them like any other character; this will probably bollix word-wrapping and is not recommended. n may be 0 to 20.

Breaking at hyphens: WRAP will usually break lines at spaces. It may also break a line after a hyphen, if all of the following are true: (1) the character before the hyphen is a letter, and the following character is either a letter or a digit; (2) at least three characters, not counting the hyphen, will remain at the end of the line; and (3) at least three characters will move to the start of the following line. So, for example, if the phrase true-blue fell near the end of a line, WRAP might break the line after the hyphen, since true and blue have four letters each. The phrases do-nothing and derring-do would not be divided, however, since splitting either one would leave a two-letter do on a line by itself. You can adjust this behavior with /N:n, which sets the minimum number of characters for both lines. If you specify /N:4 then at least four characters, not counting the hyphen, must remain on each line. /N:0 prevents WRAP from breaking lines after hyphens.

Removing hyphens: If /R is used, WRAP may discard a hyphen at the end of a line if the preceding character was a letter, and if the first character on the following line is also a letter. Without /R, WRAP retains all hyphens from line ends.

Forced indentation: The /P:n option forcibly indents each new paragraph n spaces (not tabs.) Any indentation in the input text will be lost. n must be 0 to 20. /P:0 will strip all leading whitespace, leaving text flush with the left margin. The optional second value, if present, indents the second and later lines m spaces; m is also 0 to 20. You might use /P:0,4 to produce a hanging indent. If /P: is not specified, any indentation in the input text is preserved.

Condensing spaces: The /C:n option allows you to condense runs of consecutive spaces in the input text. Any sequence of more than n spaces will be truncated. Only spaces (character 32) are counted, not other whitespace characters. Spaces generated by the program itself (e.g. by expanding tabs or indenting paragraphs) will not be condensed. n must be 0 to 10; if n is 0, spaces are not condensed (the default.) This option might be useful for packing output text just a little more tightly; if the original text file had extra spaces inserted to justify margins; or if you are one of those unfortunates who suffer a violent reaction to the sight of two spaces after a period.

Quotes replacement: /Q causes WRAP to replace generic ASCII apostrophes and quote marks ( ' and " ) with Unicode open and close quote marks (   and    ). The new quote marks may or may not look different from the originals, depending on how they are displayed and the font used. If the output is displayed in a non-Unicode font, the curly quotes will be lost or mangled. You can set some environment variables to control this feature.

Text encoding: WRAP automatically detects Unicode text files. If the file is not Unicode, the command has no way of detecting the character encoding; the default Windows code page is assumed. You can specify a different code page for non-Unicode text files with /I:n. Most single-byte (i.e., Western) code pages are supported, but multibyte code pages (Chinese, Japanese, Korean) are not. This option only affects non-Unicode files.

Forced line break: /Z:char defines a forced line-break character. char may be entered as either a single character, or as a decimal or hexadecimal (prefixed with 0x) character code. If a matching character is found in the input file or stream, WRAP will end the current line and begin a new one.

Disabling features: /N with suboptions disables features:

/NBdo not write a Byte Order Mark
/NDdo not search into hidden directories; only useful with /S
/NHdo not add a hyphen when breaking a word
/NJdo not search into junctions; only useful with /S

You can combine these, e.g. /NDJ.


These variables may be set to a numeric value to modify the command’s default behavior:

COLUMNS:sets the default width when output is redirected and /W is not specified. Legal values are 40 to 512.
TABSIZE:sets the default number of columns between tab stops when /T is not specified. Legal values are 1 to 20.

wrap /w:100 "Fishy Story.txt"



XFILTER — Processes lines of a file using variable expansion.

Syntax:
XFILTER /A:attribs /B /F:"format" /H /I:n /N /S /T filename…

/A:attribsattributes mask; valid flags are -ACEHIORS
/Bdiscard blank lines
/F:"format"format string; see below
/Hdisplay filenames
/I:ninterpret non-Unicode input text using code page n
/Ndisable features
/Ssearch in subdirectories for matching files
/Ttrim leading and trailing whitespace

The required format string contains TCC variables and functions, which will be expanded for each line in the file. Double all percent signs to prevent variables from being expanded before the command is executed. An asterisk in the format string will be replaced with each line from the file. The current (zero-based) line number is also available in the variable _LINE.

XFILTER can be used as a filter reading from standard input, or it can read from one or more files specified on the command line. The resulting text is written to standard output; it can be piped or redirected. If you want to pipe to XFILTER, remember that pipes open a new shell. To pipe to a plugin command, you must either ensure that the plugin is loaded in the transient shell, e.g. by installing the .DLL file in the shell’s PlugIns directory; or else use temporary files or an in-process pipe.

You may specify more than one filename; wildcards and directory aliases are supported. You can search recursively into subdirectories for matching files with /S. @File lists and internet files are supported. You may also specify CLIP: to process text from the clipboard instead of from a file.

To prevent problems caused by troublesome characters in the input text, certain “dangerous” characters from the file will be temporarily replaced with safe alternatives from Unicode’s “Halfwidth and Fullwidth Forms” block. They will be restored to ASCII after variable expansion. This shuffle prevents issues when characters with special meanings to TCC are inadvertently present in the input text, but it might be confusing if you want to find or replace any of the remapped characters. The characters which are temporarily replaced are:

CharacterASCIIHexRemapped to
"3422U+FF02
%3725U+FF05
(4028U+FF08
)4129U+FF09
,442CU+FF0C
[915BU+FF3B
]935DU+FF3D
^945EU+FF3E
`9660U+FF40

rem  Dump a file in uppercase:
xfilter /f:"%%@upper[*]" "Engine Summer.txt"

rem  Display the length of each line:
xfilter /f:"Line %%_line has %%@len[*] characters." "Engine Summer.txt"



New Functions:

@B85TOBIN — Decodes a base-85 string into a binary buffer.

Syntax:
%@B85TOBIN[handle,start,string]

handlethe handle to a binary buffer, as returned by @BALLOC
startthe offset in bytes to which to begin decoding; defaults to 0
stringa base-85 encoded string as returned by @BINTOB85

This function decodes a base-85 string returned by @BINTOB85 and stores the resulting data in a binary buffer. Note that there is no option to control the number of bytes written; the entire string is decoded and written to the buffer. If there is any error in decoding the string, no change will be made to the binary buffer.

Note that the two commas between parameters are both required. You must supply both commas even if you omit the optional start value.

The return value is the number of bytes written to the buffer.



@BETWEEN — Returns the portion of a string between two delimiters.

Syntax:
%@BETWEEN[delims,string]

delimsexactly two characters, one start and one end delimiter
stringthe string to parse

You generally do not need to quote or escape the delims string; the first two characters found are assumed to be the start and end delimiter characters, and the third must be a comma. (Exception: If you want to use a close bracket as a delimiter, escape it.) To use the same character as both start and end delimiter, type it twice.

The function returns the portion of string between the start and end delimiters. If the start delimiter is not found in the string, an empty string is returned. If the start delimiter occurs more than once, the first one found is used. If the start delimiter is found but the end delimiter is not, everything after the start delimiter is returned.

echo %@between[<>,This is <only> a test.]
only

echo %@between["",Let's parse out a "quoted chunk" of text.]
quoted chunk



@BINTOB85 — Encodes the contents of a binary buffer as a base-85 string.

Syntax:
%@BINTOB85[handle,start,length]

handlethe handle to a binary buffer, as returned by @BALLOC
startthe offset in bytes at which to begin encoding; defaults to 0
lengththe number of bytes to encode; defaults to 128 or the remainder of the buffer

This function encodes binary data (from a binary buffer) as a string which can be easily handled by TCC. You can store this string in an environment variable, write it to an .INI file, and so on. To restore the original binary data, use the @B85TOBIN function.

Four bytes of data are encoded into five characters; encoding a 1024-byte buffer will result in a 1,281-character-long string (counting the terminal null). Keep in mind that encoding long series of bytes will produce even longer strings! If you don’t specify a length, the default is 128 bytes or to the end of the buffer.

This implementation of base-85 differs from others. The set of characters used to encode binary data has been chosen to avoid syntactically troublesome signs like quotes, percent signs, ampersands, carets, and so on. All characters are ASCII, so the string should not be mangled by code page translations.



@CLARIFY — Returns the original text mangled by @OBSCURE.

Syntax:
%@CLARIFY[obscured-text]

obscured-textobfuscated text

The input obscured-text should be a string returned by the @OBSCURE function; anything else is very unlikely to return meaningful text.

You probably should not write the restored value into an environment variable, an .INI file, or a registry value, or display it to the screen. Just use it immediately, plugging the @CLARIFY function directly into the command which requires the original text. (The ditzy little example below displays a password to the screen because it’s just a ditzy little example.)

set inifile="%userprofile\Passwords.ini"
set password=%@iniread[%inifile,Personal,Password]

echo Password: %@clarify[%password]
unset inifile password


See also: the @OBSCURE function.



@INIVALUE — Returns a value from an .INI file.

Syntax:
%@INIVALUE[filename,section,entry,index,errorstr,flags]

filenamethe file to examine
sectionthe name of the section to search for the entry
entrythe name associated with the desired value
indexwhich entry to return; defaults to 0 (the first); -1 returns the number of matching entries
errorstrthe string to return on any error; defaults to nothing (the empty string)
flagsa bitmapped integer controlling advanced features:
   1 — bomb out on file errors
   2 — treat section as a wildcard to match
   4 — treat entry as a wildcard to match

This function is essentially @INIREAD without GetPrivateProfileString(). It can handle some things that @INIREAD can’t, such as UTF-8 .INI files, sectionless values, multiple values with the same name, and multiple headers for the same section.

You must specify the full name and extension of the filename. If you do not include a path, the file is assumed to be in the Windows directory, not in the current directory! To force this function to look in the current directory, begin the filename with .\.

If you do not specify a section, the function will look for a matching entry before the first section header. If section is an asterisk, the function will look for a matching entry throughout the file, ignoring all section headers.

Sometimes an .INI file will contain multiple lines with the same entry name. For example, TCMD.INI may have more than one NormalKey directive. You can loop through multiple entries with the index argument. An index of 0 returns the first matching entry, 1 returns the second, and so on. Set index to -1 to return the number of matching entries.

The default behavior is to return an empty string on any error: file not found, access denied, or no matching section or entry. If you specify an errorstr, then that value will be returned instead. (This is useful if the .INI file can contain empty values.) Additionally, you can set flags to 1, and any error opening the file will result in an error message instead of returning a string value. You can also check the _INIVALUERC internal variable to get information about the last call to @INIVALUE.


See also: the _INIVALUERC variable, which returns an exit code for this function.



@METAPHONE — Returns a roughly phonetic code for an English word.

Syntax:
%@METAPHONE[word,length,flags]

wordthe word or words to process
lengththe maximum length of the codes to return (8)
flagsset to 1 for better compatibility

Metaphone codes are meant to roughly approximate the pronunciation of a word. Words that sound similar should have similar Metaphone codes. You can use this function to compare the sounds of words, to suggest similar words, or to group words by pronunciation.

If you pass more than one word, separate them with spaces. The resulting codes will also be separated by spaces.

rem  Compare two words:

set word1=cougher
set word2=coffer
if %@metaphone[%word1] == %@metaphone[%word2] echo "%word1" may sound like "%word2".


By default, this function returns Metaphone codes of up to eight characters long. You can specify a different length with the length parameter, e.g. %@metaphone[word,10] to return ten-letter Metaphone codes. Legal values are 4 to 20.


•  Note: Values returned by this function are not guaranteed to match those generated by any other implementation. Documentation of the Metaphone algorithm is invariably unclear and self-contradictory, and never seems to agree with the corresponding code. This is my attempt to implement Lawrence Philips’s original algorithm to the best of my limited understanding, with a few additional tweaks thrown in.

More specifically, comparing against assertFull_v1.1.txt, dated 2011-11-25, by the Metaphone-standards project, @METAPHONE produces different codes for 40 out of 2753 words: about 98.5% agreement. If flags is set to 1, there are no mismatches — but I still cannot guarantee perfect agreement with any other implementation.



@OBSCURE — Mangles a text string, making it difficult to read.

Syntax:
%@OBSCURE[text]

texttext to be obfuscated

The input text should be reasonably short, preferably not more than a kilobyte or two. The resulting, mangled string will be longer than the original string, usually by about one-third. The same input text can return different obfuscated text; you cannot meaningfully compare the output from two calls to @OBSCURE. Do not edit or alter the returned text in any way.

If the input text comes from an environment variable, it’s probably a good idea to remove or overwrite that variable as soon as possible after calling @OBSCURE. One way to do this would be to simply store the returned string back in the original variable.

set inifile="%userprofile\Passwords.ini"
input /p Enter password:  %%password
set password=%@obscure[%password]

set rv=%@iniwrite[%inifile,Personal,Password,%password]
unset inifile password


•  Note: This function does not provide secure cryptography! It was designed for ease of use, not for real security. Using @OBSCURE to muddle text will discourage casual snooping, but a sophisticated user can recover the original data easily by passing the obscured text to @CLARIFY. (A determined attacker could also reverse-engineer the algorithm, although that would be a pointless waste of time when the plugin itself is readily available.)


See also: the @CLARIFY function.



@OINK — Translates text to Pig Latin.

Syntax:
%@OINK[text]

echo %@oink[This is only a test.]


See also: the OINK command, which Pig Latinizes text files.



@ROT13 — Transforms a string using ROT13.

Syntax:
%@ROT13[text]

echo %@rot13[This is only a test.]


See also: the ROT13 command, which encodes or decodes text files.



@STRIPACCENTS — Removes accents from letters.

Syntax:
%@STRIPACCENTS[text]

Only characters in the range U+00C0 through U+00FF will be replaced. (This function just does a table lookup; it only recognizes a few accented characters, so it’s fast.)

echo %@stripaccents[Déjà vu]



@TEXTENCODING — Returns a guess at the character encoding of a text file.

Syntax:
%@TEXTENCODING[filename,flags]

filenamethe file to examine
flagsset to 1 to also report presence of a BOM

If file begins with a Unicode Byte Order Mark, then it is assumed to be Unicode; the encoding is inferred from the BOM. If the file does not begin with a BOM, the function can only guess at the encoding; the longer the file, the more likely the guess is to be accurate.

Possible return values include:

EmptyThere is no data in the file.
OEMThe file is probably not Unicode.
UTF-16LEThe file is probably 16-bit Unicode.
UTF-16BEThe file is probably 16-bit Unicode (big-endian).
UTF-8The file is probably UTF-8 encoded Unicode.

If flags is 1, and if the file is Unicode and begins with a Byte Order Mark, the phrase with BOM will be appended.

set filename=myfile.txt
echo File %filename is %@textencoding[%filename].



@TEXTFORMAT — Returns a guess at the formatting of a text file.

Syntax:
%@TEXTFORMAT[filename]

filenamethe file to examine

Text files use line-break characters in different ways. In some files, line break characters are used only to mark where a line end should occur: the end of a paragraph. In other files, line breaks are used to limit text to a desired width. This function attempts to determine how the specified text file is formatted.

Possible return values include:

EmptyThere is no text in the file.
UnformattedLine breaks are used to end paragraphs.
PrewrappedLine breaks are used to limit line width.

set filename=myfile.txt
set format=%@textformat[%filename]

if %format == Unformatted echo File %filename is not word-wrapped.



@UQUOTES — Replaces ASCII apostrophes and quote marks with Unicode open and close quotes.

Syntax:
%@UQUOTES[text]

textEnglish text containing apostrophes or quotation marks

Generic ASCII apostrophes ( ' ) and quote marks ( " ) in text will be replaced with Unicode open and close quote marks (   and   ). Also, any doubled hyphens will be replaced with em dashes.

The modified string may or may not look different from the original, depending on how you use it and the font used to display it. If it is redirected to a file and //UnicodeOutput=No, then the fancy Unicode quotes will be smashed right back into ASCII. (Worse yet, under some versions of Windows the Unicode single open-quote character may be mangled to a grave accent….) If the modified string is ECHOed to the console and the console font doesn’t support the relevant Unicode characters, then again the Unicode quotes may be lost. In Take Command, curly quotes must be supported by both the tab-window font (Options / Configure Take Command / Tabs / Font) and also the underlying console window (detach a tab to check this).

echo %@uquotes["Never use a GUI to do a shell's work!" said Tom commandingly.]


You can set some environment variables to control this feature.



@VOWELS — Returns the number of vowels in a string.

Syntax:
%@VOWELS[string]

stringthe text to examine

Only vowels in the Latin alphabet are counted: A, E, I, O, U, and Y. Accented variants in the range U+00C0 through U+00FF (Unicode’s “Latin-1 Supplement”) are also recognized.

echo %@vowels[Déjà vu]



New Variables:

_CHARACTERS — Returns the number of characters in the last file processed by WORDS.

Syntax:
%_CHARACTERS

This count does not include any Unicode byte-order mark at the beginning of the file. If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_CHARACTERSALL — Returns the number of characters in all files processed by the last call to WORDS.

Syntax:
%_CHARACTERSALL

This count does not include any Unicode byte-order marks at the beginnings of files. If the WORDS command has not been called, this variable returns the value N/A.



_GETACP — Returns the current Windows code page.

Syntax:
%_GETACP

This function returns the current Windows code page. (This value is also traditionally miscalled the “ANSI code page”, although it has nothing to do with ANSI.) Note that this value can and usually does differ from the OEM code page returned by %_CODEPAGE.

echo The current Windows code page is %_getacp.



_INIVALUERC — Returns an exit code for the last call to @INIVALUE.

Syntax:
%_INIVALUERC

This variable returns a code indicating the success or failure of the last call to the @INIVALUE function, and the nature of the error if it failed. Possible return values include:


 an empty string if @INIVALUE has not been called
Syntax errorany error in arguments
File error nany error opening the file; n is a Windows error number
File emptythe file contains no data
Found na matching entry was found at line n
Count nsuccessfully counted matching entries
No sectionno matching section header was found
No entry nno matching entry, or fewer than n entries found

If the correct entry was found, the return value is Found n. The n is the line number, starting from zero and not counting any blank lines.


See also: the @INIVALUE function.



_LINES — Returns the number of lines in the last file processed by WORDS.

Syntax:
%_LINES

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_LINESALL — Returns the number of lines in all files processed by the last call to WORDS.

Syntax:
%_LINESALL

If the WORDS command has not been called, this variable returns the value N/A.



_LONGESTLINE — Returns the number of characters in the longest line of the last file processed by WORDS.

Syntax:
%_LONGESTLINE

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_LONGESTLINEALL — Returns the number of characters in the longest line in all files processed by the last call to WORDS.

Syntax:
%_LONGESTLINEALL

If the WORDS command has not been called, this variable returns the value N/A.



_NONBLANKLINES — Returns the number of non-blank lines in the last file processed by WORDS.

Syntax:
%_NONBLANKLINES

A line which contains only whitespace characters such as spaces or tabs is considered blank. Subtract %_NONBLANKLINES from %_LINES to get the number of blank lines.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_NONBLANKLINESALL — Returns the number of non-blank lines in all files processed by the last call to WORDS.

Syntax:
%_NONBLANKLINESALL

A line which contains only whitespace characters such as spaces or tabs is considered blank. Subtract %_NONBLANKLINESALL from %_LINESALL to get the number of blank lines.

If the WORDS command has not been called, this variable returns the value N/A.



_PARAGRAPHS — Returns the number of paragraphs in the last file processed by WORDS.

Syntax:
%_PARAGRAPHS

A “paragraph” is a line or series of lines which contains at least one sentence. Divide %_SENTENCES by %_PARAGRAPHS to get the avarage paragraph length in sentences. Divide %_SENTENCEWORDS by by %_PARAGRAPHS to get the avarage paragraph length in words.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_PARAGRAPHSALL — Returns the number of paragraphs in all files processed by the last call to WORDS.

Syntax:
%_PARAGRAPHSALL

A “paragraph” is a line or series of lines which contains at least one sentence. Divide %_SENTENCESALL by %_PARAGRAPHSALL to get the avarage paragraph length in sentences. Divide %_SENTENCEWORDSALL by by %_PARAGRAPHSALL to get the avarage paragraph length in words.

If the WORDS command has not been called, this variable returns the value N/A.



_PASSWORD — Returns a random string suitable for use as a password.

Syntax:
%_PASSWORD

You can use the PASSWORD command to adjust the parameters used to generate the string.



_PROPERNOUNS — Returns the number of proper nouns in the last file processed by WORDS.

Syntax:
%_PROPERNOUNS

Counting proper nouns requires WORDS to build a vocabulary list for each file. If you disable this step with /U:0 or /U:2, the list will not be available and this variable will return the value N/A.

For the purposes of this plugin, a “proper noun” is any word which never appears in an all-lowercase form. If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_PROPERNOUNSALL — Returns the number of proper nouns in all files processed by the last call to WORDS.

Syntax:
%_PROPERNOUNSALL

Counting proper nouns in all files requires WORDS to build a vocabulary list for all files processed; this list is not built by default. Unless you enable the omnibus vocabulary list with /U:2 or /U:3, this variable will return the value N/A.

For the purposes of this plugin, a “proper noun” is any word which never appears in an all-lowercase form. If the WORDS command has not been called, this variable returns the value N/A.



_SENTENCES — Returns the total number of sentences in the last file processed by WORDS.

Syntax:
%_SENTENCES

A “sentence” is a word or series of words ending with a period, exclamation mark, or question mark. Divide %_SENTENCEWORDS by %_SENTENCES to get the average sentence length.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_SENTENCESALL — Returns the total number of sentences in all files processed by the last call to WORDS.

Syntax:
%_SENTENCESALL

A “sentence” is a word or series of words ending with a period, exclamation mark, or question mark. Divide %_SENTENCEWORDSALL by %_SENTENCESALL to get the average sentence length.

If the WORDS command has not been called, this variable returns the value N/A.



_SENTENCESD — Returns the number of declarative sentences in the last file processed by WORDS.

Syntax:
%_SENTENCESD

A “declarative sentence” is a word or series of words ending with a period.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_SENTENCESDALL — Returns the number of declarative sentences in all files processed by the last call to WORDS.

Syntax:
%_SENTENCESDALL

A “declarative sentence” is a word or series of words ending with a period.

If the WORDS command has not been called, this variable returns the value N/A.



_SENTENCESE — Returns the number of exclamatory sentences in the last file processed by WORDS.

Syntax:
%_SENTENCESE

An “exclamatory sentence” is a word or series of words ending with an exclamation mark.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_SENTENCESEALL — Returns the number of exclamatory sentences in all files processed by the last call to WORDS.

Syntax:
%_SENTENCESEALL

An “exclamatory sentence” is a word or series of words ending with an exclamation mark.

If the WORDS command has not been called, this variable returns the value N/A.



_SENTENCESQ — Returns the number of interrogative sentences in the last file processed by WORDS.

Syntax:
%_SENTENCESQ

An “interrogative sentence” is a word or series of words ending with a question mark.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_SENTENCESQALL — Returns the number of interrogative sentences in all files processed by the last call to WORDS.

Syntax:
%_SENTENCESQALL

An “interrogative sentence” is a word or series of words ending with a question mark.

If the WORDS command has not been called, this variable returns the value N/A.



_SENTENCEWORDS — Returns the total number of words in the last file processed by WORDS which are part of a recognized sentence.

Syntax:
%_SENTENCEWORDS

A “sentence” is a word or series of words ending with a period, exclamation mark, or question mark. Divide %_SENTENCEWORDS by %_SENTENCES to get the average sentence length.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_SENTENCEWORDSALL — Returns the total number of words in all files processed by the last call to WORDS which are part of a recognized sentence.

Syntax:
%_SENTENCEWORDSALL

A “sentence” is a word or series of words ending with a period, exclamation mark, or question mark. Divide %_SENTENCEWORDSALL by %_SENTENCESALL to get the average sentence length.

If the WORDS command has not been called, this variable returns the value N/A.



_TITLES — Returns the number of titles in the last file processed by WORDS.

Syntax:
%_TITLES

A “title” is a line or series of lines which contains one or more words, but no recognized sentences. It might actually be a title, subtitle, or chapter heading; or it might be a byline, date line, attribution, salutation, signature, line of poetry….

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_TITLESALL — Returns the number of titles in all files processed by the last call to WORDS.

Syntax:
%_TITLESALL

A “title” is a line or series of lines which contains one or more words, but no recognized sentences. It might actually be a title, subtitle, or chapter heading; or it might be a byline, date line, attribution, salutation, signature, line of poetry….

If the WORDS command has not been called, this variable returns the value N/A.



_UNIQUEWORDS — Returns the number of unique words in the last file processed by WORDS.

Syntax:
%_UNIQUEWORDS

Counting unique words requires WORDS to build a vocabulary list for each file. If you disable this step with /U:0 or /U:2, the list will not be available and this variable will return the value N/A.

Words that differ only in case are counted as the same word. In the phrase polish Polish furniture using Polish furniture polish, this plugin will find only three “unique” words.

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_UNIQUEWORDSALL — Returns the number of unique words in all files processed by the last call to WORDS.

Syntax:
%_UNIQUEWORDSALL

Counting unique words for all files requires WORDS to build a vocabulary list for all files processed; this list is not built by default. Unless you enable the omnibus vocabulary list with /U:2 or /U:3, this variable will return the value N/A.

Words that differ only in case are counted as the same word. In the phrase polish Polish furniture using Polish furniture polish, this plugin will find only three “unique” words.

If the WORDS command has not been called, this variable returns the value N/A.



_WC — Returns the number of contiguous series of non-blank characters in the last file processed by WORDS.

Syntax:
%_WC

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.

•  Note: Unlike the other variables set by WORDS, _WC does include any Byte Order Mark at the start of a file. A BOM will be treated as a non-blank character, and therefore count as a “word” unto itself if the following character is whitespace. This, to my mind, is stupid behavior; a leading BOM should either be ignored altogether, or else treated as whitespace. I count it this way only for compatibility with certain ports of the Unix wc.



_WCALL — Returns the number of contiguous series of non-blank characters in all files processed by the last call to WORDS.

Syntax:
%_WCALL

If the WORDS command has not been called, this variable returns the value N/A.



_WORDFILES —Returns the number of files processed by the last call to WORDS.

Syntax:
%_WORDFILES



_WORDS — Returns the total number of words in the last file processed by WORDS.

Syntax:
%_WORDS

If the WORDS command has not been called, or if there was any error reading the last file, this variable returns the value N/A.



_WORDSALL — Returns the total number of words in all files processed by the last call to WORDS.

Syntax:
%_WORDSALL

If the WORDS command has not been called, this variable returns the value N/A.



Code Pages Supported:

Many of the commands in this plugin offer a /I:n option to specify a code page. The value determines how non-ASCII characters in non-Unicode files are interpreted. This option does not affect Unicode files or ASCII characters. The following code pages are supported:


numbername numbername
1252Latin I 775Baltic (OEM)
1250Central Europe 850Multilingual Latin I (OEM)
1251Cyrillic 852Latin II
1253Greek 855Cyrillic (OEM)
1254Turkish 857Turkish (OEM)
1255Hebrew 858Latin I with Euro sign (OEM)
1256Arabic 862Hebrew (OEM)
1257Baltic 866Russian (OEM)
1258Vietnam 874Thai
437United States (OEM) 10000Mac OS Roman
720Arabic (OEM) 20866KOI8-R
737Greek (OEM) 21866KOI8-U

The default is the current Windows code page. You can also type /I:OEM to use the current OEM code page; /I:OEM is a synonym for /I:%_CODEPAGE (but easier to type).

UQuotes Control Variables:

The following environment variables specify a Unicode character used to replace an ASCII character in the @UQUOTES function, or in several commands when /Q is used. The value of the variable may be a single character; a decimal value 32 through 65533; or a hexadecimal value 0x20 through 0xFFFD.

OPENQUOTE:replaces the ASCII double-quote ( " ) at the start of a quotation; the default value is 0x201C (  ).
CLOSEQUOTE:replaces the ASCII double-quote ( " ) at the end of a quotation; the default is 0x201D (  ).
OPENSQUOTE:replaces the ASCII apostrophe ( ' ) at the start of a quotation; the default is 0x2018 (  ).
CLOSESQUOTE:replaces the ASCII apostrophe ( ' ) at the end of a quotation; the default is 0x2019 (  ).
APOSTROPHE:replaces the ASCII apostrophe ( ' ) within a word; the default is 0x2019 (  ).
'OKINA:replaces the ASCII apostrophe ( ' ) between two vowels; the default is 0x2018 (  ).
PRIME:replaces the ASCII apostrophe ( ' ) after a number; the default is 0x27 ( ' ).
DOUBLEPRIME:replaces the ASCII double-quote ( " ) after a number; the default is 0x22 ( " ).
EMDASH:replaces pairs of ASCII hyphens ( - ); the default is 0x2014

Note that the variable name 'OKINA begins, ironically enough, with an apostrophe. To disable ‘okinas, SET 'OKINA=0X2019  (or the same value as the apostrophe).

These environment variables control the interpretation of some old-fashioned ASCII text conventions:

UQUOTES_DOUBLES:set to 0 to prevent replacing doubled apostrophes with quotes
UQUOTES_GRAVES:set to 0 to prevent replacing grave accents with open quotes

For example:

rem  Use guillemets for quotations:
set openquote=0xab
set closequote=0xbb
echo %@uquotes["Sacré bleu!" he exclaimed.]

Startup Message:

This plugin displays an informational line when it initializes. The message will be suppressed in transient or pipe shells. You can disable it for all shells by defining an environment variable named NOLOADMSG, for example:

set /e /u noloadmsg=1

Acknowledgments:

The original Metaphone algorithm is by Lawrence Philips. The variant implemented in this plugin is my own adaptation (improvement? perversion?) Blame me, not him, for its peculiarities.

Status and Licensing:

Consider this beta software. It may well have issues. Try it at your own risk. If you do find a problem, you can report it in the JP Software support forum.

TextUtils is currently licensed only for testing purposes. I may make binaries and source code available under some free license once I consider it ready for use.

Download:

You can download the current version of the plugin from http://prospero.unm.edu/dl/textutils.zip or ftp://prospero.unm.edu/textutils.zip.