gutcheck logo

Home   |   Example    |   Documentation     |   Etc.  

 Abandon Hope, 

all ye who try these programs! These are lifted straight from my machines with no testing. They are utilities I have written for myself, that I do use, but that either are not ready for release or not useful to enough people for me to make the effort to polish them. However, people often do ask me to mail them a copy, so I'm putting them here for download by such brave and adventurous souls.

Consider them all alpha versions. Please don't expect documentation, or help, or, indeed, that they should work at all! If they were at that stage, they wouldn't be on this page.

If the program needs an external file, it should work if that file is in the working directory; it may or may not do so otherwise.

All of these programs are descended from gutcheck, and use more or less the same conventions: thus "jeebies myfile.txt" or "unitame myfile.txt" is the basic usage pattern for all of them. If invoked without arguments, they will display basic usage help. Beyond that, check the source!

The executables provided are all for Win32 rather than MS-DOS, and you'll need a minimum of Windows 95 to run them, so anybody still on the original MS-DOS is out of luck, I'm afraid. Of course, you can always recompile them, as *nix users do.


From time to time, I fiddle with gutcheck by adding extra checks. I try these out by using the program on incoming texts, and seeing whether, on balance, the check is worthwhile. If it is, I include it in the next release; if it isn't, I drop it. This is the copy of gutcheck that I am using more or less now. There are two significant changes from the current release: a "-u" switch to invoke checking against a user-defined likely-typo file named gutcheck.typ, and a "-d" switch to ignore DP-specific markup.

gutcheck.typ is a simple text file, with one word per line, and one blank line at end. Max lines 999. With the -u switch, gutcheck will flag any line containing one of these words.



I made a very enthusiastic start on this, but I need a big dictionary with possible parts of speech listed for every word to do the next thing with it, and I never got around to doing that.

Now, it simply lists every word that isn't in its dictionary that occurs only once. Still, as a superfast check, it does still catch some typos. It has a bad habit of obsessing on one word sometimes, and reporting lots of instances. I must fix that one day. Its dictionary is the file gutspell.dic



The common OCR error of mistaking a "b" for a "h" and vice versa used to lead to horrible things with the words "he" and "be". With the vast improvement in OCR programs in the last few years, this is not the nightmare it used to be.

jeebies detects common he/be errors by a simple lookup table. I really need to add some extra intelligence; I have a set of heuristics that I used previously, and I will probably get the time to plug them in at some point. For now, it's quick and does have some value, especially in checking older texts. It needs its lookup table, which is in the files he.jee and be.jee



Unitame hails from a time when about half of all UTF-8 files uploaded to PG were invalid -- that is, contained invalid UTF-8 characters. Often, texts would be a mixture of UTF-8 and ISO-8859-1. Recode is great for checking validity, but it doesn't really help much with a diagnosis of these things that is helpful for fixing them. I wrote unitame as a UTF-8 validity checker, and extended it to a very rough converter to ISO-8859-1.

Nowadays, thanks largely to Thundergnat's excellent Guiguts, which contains a good UTF-8 editor, we get very few invalid UTF-8 files, so this is a minority interest. It needs its data file unitame.dat, and will just report on the text if invoked as unitame myfile.txt, or will convert if used with a -c switch.



Linkchk is a simple command-line link checker for HTML. For a thorough check, use the W3C service, but for a quick local check, this is quite useful.

Linkchk is Win-32 only, but wouldn't be too hard to convert to *nix conventions, if there's a need.


SourceForge Logo Project Gutenberg