gutcheck/gutcheck.c
author ali <ali@juiblex.co.uk>
Tue Jan 24 23:54:05 2012 +0000 (2012-01-24)
changeset 0 c2f4c0285180
permissions -rw-r--r--
Initial version
     1 /*************************************************************************/
     2 /* gutcheck - check for assorted weirdnesses in a PG candidate text file */
     3 /*                                                                       */
     4 /* Version 0.991                                                         */
     5 /* Copyright 2000-2005 Jim Tinsley <jtinsley@pobox.com>                  */
     6 /*                                                                       */
     7 /* This program is free software; you can redistribute it and/or modify  */
     8 /* it under the terms of the GNU General Public License as published by  */
     9 /* the Free Software Foundation; either version 2 of the License, or     */
    10 /* (at your option) any later version.                                   */
    11 /*                                                                       */
    12 /* This program is distributed in the hope that it will be useful,       */
    13 /* but WITHOUT ANY WARRANTY; without even the implied warranty of        */
    14 /* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the         */
    15 /* GNU General Public License for more details.                          */
    16 /*                                                                       */
    17 /* You should have received a copy of the GNU General Public License     */
    18 /* along with this program; if not, write to the                         */
    19 /*      Free Software Foundation, Inc.,                                  */
    20 /*      59 Temple Place,                                                 */
    21 /*      Suite 330,                                                       */
    22 /*      Boston, MA  02111-1307  USA                                      */
    23 /*                                                                       */
    24 /*                                                                       */
    25 /*                                                                       */
    26 /* Overview comments:                                                    */
    27 /*                                                                       */
    28 /* If you're reading this, you're either interested in how to detect     */
    29 /* formatting errors, or very very bored.                                */
    30 /*                                                                       */
    31 /* Gutcheck is a homebrew formatting checker specifically for            */
    32 /* spotting common formatting problems in a PG e-text. I typically       */
    33 /* run it once or twice on a file I'm about to submit; it usually        */
    34 /* finds a few formatting problems. It also usually finds lots of        */
    35 /* queries that aren't problems at all; it _really_ doesn't like         */
    36 /* the standard PG header, for example.  It's optimized for straight     */
    37 /* prose; poetry and non-fiction involving tables tend to trigger        */
    38 /* false alarms.                                                         */
    39 /*                                                                       */
    40 /* The code of gutcheck is not very interesting, but the experience      */
    41 /* of what constitutes a possible error may be, and the best way to      */
    42 /* illustrate that is by example.                                        */
    43 /*                                                                       */
    44 /*                                                                       */
    45 /* Here are some common typos found in PG texts that gutcheck            */
    46 /* will flag as errors:                                                  */
    47 /*                                                                       */
    48 /* "Look!John , over there!"                                             */
    49 /* <this is a HTML tag>                                                  */
    50 /* &so is this;                                                          */
    51 /* Margaret said: " Now you should start for school."                    */
    52 /* Margaret said: "Now you should start for school. (if end of para)     */
    53 /* The horse is said to he worth a lot.                                  */
    54 /* 0K - this'11 make you look close1y.                                   */
    55 /* "If you do. you'll regret it!"                                        */
    56 /*                                                                       */
    57 /* There are some complications . The extra space left around that       */
    58 /* period was an error . . . but that ellipsis wasn't.                   */
    59 /*                                                                       */
    60 /* The last line of a paragraph                                          */
    61 /* is usually short.                                                     */
    62 /*                                                                       */
    63 /* This period is an error.But the periods in a.m. aren't.               */
    64 /*                                                                       */
    65 /* Checks that are do-able but not (well) implemented are:               */
    66 /*        Single-quote chcking.                                          */
    67 /*          Despite 3 attempts at it, singlequote checking is still      */
    68 /*          crap in gutcheck. It may not be possible without analysis    */
    69 /*          of the whole paragraph.                                      */
    70 /*                                                                       */
    71 /*************************************************************************/
    72 
    73 
    74 #include <stdio.h>
    75 #include <stdlib.h>
    76 #include <string.h>
    77 #include <ctype.h>
    78 
    79 #define MAXWORDLEN    80    /* max length of one word             */
    80 #define LINEBUFSIZE 2048    /* buffer size for an input line      */
    81 
    82 #define MAX_USER_TYPOS 1000
    83 #define USERTYPO_FILE "gutcheck.typ"
    84 
    85 #ifndef MAX_PATH
    86 #define MAX_PATH 16384
    87 #endif
    88 
    89 char aline[LINEBUFSIZE];
    90 char prevline[LINEBUFSIZE];
    91 
    92                  /* Common typos. */
    93 char *typo[] = { "teh", "th", "og", "fi", "ro", "adn", "yuo", "ot", "fo", "thet", "ane", "nad",
    94                 "te", "ig", "acn",  "ahve", "alot", "anbd", "andt", "awya", "aywa", "bakc", "om",
    95                 "btu", "byt", "cna", "cxan", "coudl", "dont", "didnt", "couldnt", "wouldnt", "doesnt", "shouldnt", "doign", "ehr",
    96                 "hmi", "hse", "esle", "eyt", "fitrs", "firts", "foudn", "frmo", "fromt", "fwe", "gaurd", "gerat", "goign",
    97                 "gruop", "haev", "hda", "hearign", "seeign", "sayign", "herat", "hge", "hsa", "hsi", "hte", "htere",
    98                 "htese", "htey", "htis", "hvae", "hwich", "idae", "ihs", "iits", "int", "iwll", "iwth", "jsut", "loev",
    99                 "sefl", "myu", "nkow", "nver", "nwe", "nwo", "ocur", "ohter", "omre", "onyl", "otehr", "otu", "owrk",
   100                 "owuld", "peice", "peices", "peolpe", "peopel", "perhasp", "perhpas", "pleasent", "poeple", "porblem",
   101                 "porblems", "rwite", "saidt", "saidh", "saids", "seh", "smae", "smoe", "sohw", "stnad", "stopry",
   102                 "stoyr", "stpo", "tahn", "taht", "tath", "tehy", "tghe", "tghis", "theri", "theyll", "thgat", "thge",
   103                 "thier", "thna", "thne", "thnig", "thnigs", "thsi", "thsoe", "thta", "timne", "tirne", "tkae",
   104                 "tthe", "tyhat", "tyhe", "veyr", "vou", "vour", "vrey", "waht", "wasnt", "awtn", "watn", "wehn", "whic", "whcih",
   105                 "whihc", "whta", "wihch", "wief", "wiht", "witha", "wiull", "wnat", "wnated", "wnats",
   106                 "woh", "wohle", "wokr", "woudl", "wriet", "wrod", "wroet", "wroking", "wtih", "wuould", "wya", "yera",
   107                 "yeras", "yersa", "yoiu", "youve", "ytou", "yuor",
   108                 /* added h/b words for version 12 - removed a few with "tbe" v.25 */
   109                 "abead", "ahle", "ahout", "ahove", "altbough", "balf", "bardly", "bas", "bave", "baving", "bebind", 
   110                 "beld", "belp", "belped", "ber", "bere", "bim", "bis", "bome", "bouse", "bowever", "buge", "dehates", 
   111                 "deht", "han", "hecause", "hecome", "heen", "hefore", "hegan", "hegin", "heing", 
   112                 "helieve", "henefit", "hetter", "hetween", "heyond", "hig", "higber", "huild", "huy", "hy", "jobn", "joh", 
   113                 "meanwbile", "memher", "memhers", "numher", "numhers", 
   114                 "perbaps", "prohlem", "puhlic", "witbout", 
   115                 /* and a few more for .18 */
   116                 "arn", "hin", "hirn", "wrok", "wroked", "amd", "aud", "prornise", "prornised", "modem", "bo",
   117                 "heside", "chapteb", "chaptee", "se",
   118                  ""};
   119 
   120 char *usertypo[MAX_USER_TYPOS];
   121 
   122                  /* Common abbreviations and other OK words not to query as typos. */
   123                  /* 0.99 last-minute - removed "ms"      */
   124 char *okword[] = {"mr", "mrs", "mss", "mssrs", "ft", "pm", "st", "dr", "hmm", "h'm", "hmmm", "rd", "sh", "br",
   125                   "pp", "hm", "cf", "jr", "sr", "vs", "lb", "lbs", "ltd", "pompeii","hawaii","hawaiian",
   126                   "hotbed", "heartbeat", "heartbeats", "outbid", "outbids", "frostbite", "frostbitten",
   127                   ""};
   128 
   129                  /* Common abbreviations that cause otherwise unexplained periods. */
   130 char *abbrev[] = {"cent", "cents", "viz", "vol", "vols", "vid", "ed", "al", "etc", "op", "cit",
   131                   "deg", "min", "chap", "oz", "mme", "mlle", "mssrs",
   132                   ""};
   133                  /* Two-Letter combinations that rarely if ever start words, */
   134                  /* but are common scannos or otherwise common letter        */
   135                  /* combinations.                                            */
   136 char *nostart[] = { "hr", "hl", "cb", "sb", "tb", "wb", "tl",
   137                     "tn", "rn", "lt", "tj",
   138                     "" };
   139 
   140                  /* Two-Letter combinations that rarely if ever end words    */
   141                  /* but are common scannos or otherwise common letter        */
   142                  /* combinations                                             */
   143 char *noend[]   = { "cb", "gb", "pb", "sb", "tb", 
   144                     "wh","fr","br","qu","tw","gl","fl","sw","gr","sl","cl",
   145                     "iy",
   146                     ""};
   147 
   148 char *markup[]  = { "a", "b", "big", "blockquote", "body", "br", "center", 
   149                     "col", "div", "em", "font", "h1", "h2", "h3", "h4", 
   150                     "h5", "h6", "head", "hr", "html", "i", "img", "li", 
   151                     "meta", "ol", "p", "pre", "small", "span", "strong", 
   152                     "sub", "sup", "table", "td", "tfoot", "thead", "title", 
   153                     "tr", "tt", "u", "ul", 
   154                     ""};
   155 
   156 char *DPmarkup[] = { "<sc>", "</sc>", "/*", "*/", "/#", "#/", "/$", "$/", "<tb>",
   157                     ""}; /* <tb> added .991 */
   158 
   159 char *nocomma[]  = { "the", "it's", "their", "an", "mrs", "a", "our", "that's",
   160                      "its", "whose", "every", "i'll", "your", "my", 
   161                      "mr", "mrs", "mss", "mssrs", "ft", "pm", "st", "dr", "rd", 
   162                      "pp", "cf", "jr", "sr", "vs", "lb", "lbs", "ltd", 
   163                      "i'm", "during", "let", "toward", "among",
   164                      ""};
   165 
   166 
   167 char *noperiod[] = { "every", "i'm", "during", "that's", "their", "your", "our", "my", "or", 
   168                      "and", "but", "as", "if", "the", "its", "it's", "until", "than", "whether", 
   169                      "i'll", "whose", "who", "because", "when", "let", "till", "very",
   170                      "an", "among", "those", "into", "whom", "having", "thence",
   171                      ""}; 
   172 
   173 
   174 char vowels[] = "aeiouàáâãäæèéêëìíîïòóôõöùúûü";  /* Carlo's old suggestion, updated .991 */
   175 
   176 struct {
   177     char *htmlent;
   178     char *htmlnum;
   179     char *textent;
   180     } entities[] = { "&amp;",           "&#38;",        "&", 
   181                      "&lt;",            "&#60;",        "<",
   182                      "&gt;",            "&#62;",        ">",
   183                      "&deg;",           "&#176;",       " degrees",
   184                      "&pound;",         "&#163;",       "L",
   185                      "&quot;",          "&#34;",        "\"",   /* -- quotation mark = APL quote, */
   186                      "&OElig;",         "&#338;",       "OE",  /* -- latin capital ligature OE, */
   187                      "&oelig;",         "&#339;",       "oe",  /* -- latin small ligature oe, U+0153 ISOlat2 --> */
   188                      "&Scaron;",        "&#352;",       "S",  /* -- latin capital letter S with caron, */
   189                      "&scaron;",        "&#353;",       "s",  /* -- latin small letter s with caron, */
   190                      "&Yuml;",          "&#376;",       "Y",  /* -- latin capital letter Y with diaeresis, */
   191                      "&circ;",          "&#710;",       "",  /* -- modifier letter circumflex accent, */
   192                      "&tilde;",         "&#732;",       "~",  /* -- small tilde, U+02DC ISOdia --> */
   193                      "&ensp;",          "&#8194;",      " ", /* -- en space, U+2002 ISOpub --> */
   194                      "&emsp;",          "&#8195;",      " ", /* -- em space, U+2003 ISOpub --> */
   195                      "&thinsp;",        "&#8201;",      " ", /* -- thin space, U+2009 ISOpub --> */
   196                      "&ndash;",         "&#8211;",      "-", /* -- en dash, U+2013 ISOpub --> */
   197                      "&mdash;",         "&#8212;",      "--", /* -- em dash, U+2014 ISOpub --> */
   198                      "&lsquo;",         "&#8216;",      "'", /* -- left single quotation mark, */
   199                      "&rsquo;",         "&#8217;",      "'", /* -- right single quotation mark, */
   200                      "&sbquo;",         "&#8218;",      "'", /* -- single low-9 quotation mark, U+201A NEW --> */
   201                      "&ldquo;",         "&#8220;",      "\"", /* -- left double quotation mark, */
   202                      "&rdquo;",         "&#8221;",      "\"", /* -- right double quotation mark, */
   203                      "&bdquo;",         "&#8222;",      "\"", /* -- double low-9 quotation mark, U+201E NEW --> */
   204                      "&lsaquo;",        "&#8249;",      "\"", /* -- single left-pointing angle quotation mark, */
   205                      "&rsaquo;",        "&#8250;",      "\"", /* -- single right-pointing angle quotation mark, */
   206                      "&nbsp;",          "&#160;",       " ", /* -- no-break space = non-breaking space, */
   207                      "&iexcl;",         "&#161;",       "!", /* -- inverted exclamation mark, U+00A1 ISOnum --> */
   208                      "&cent;",          "&#162;",       "c", /* -- cent sign, U+00A2 ISOnum --> */
   209                      "&pound;",         "&#163;",       "L", /* -- pound sign, U+00A3 ISOnum --> */
   210                      "&curren;",        "&#164;",       "$", /* -- currency sign, U+00A4 ISOnum --> */
   211                      "&yen;",           "&#165;",       "Y", /* -- yen sign = yuan sign, U+00A5 ISOnum --> */
   212                      "&sect;",          "&#167;",       "--", /* -- section sign, U+00A7 ISOnum --> */
   213                      "&uml;",           "&#168;",       " ", /* -- diaeresis = spacing diaeresis, */
   214                      "&copy;",          "&#169;",       "(C) ", /* -- copyright sign, U+00A9 ISOnum --> */
   215                      "&ordf;",          "&#170;",       " ", /* -- feminine ordinal indicator, U+00AA ISOnum --> */
   216                      "&laquo;",         "&#171;",       "\"", /* -- left-pointing double angle quotation mark */
   217                      "&shy;",           "&#173;",       "-", /* -- soft hyphen = discretionary hyphen, */
   218                      "&reg;",           "&#174;",       "(R) ", /* -- registered sign = registered trade mark sign, */
   219                      "&macr;",          "&#175;",       " ", /* -- macron = spacing macron = overline */
   220                      "&deg;",           "&#176;",       " degrees", /* -- degree sign, U+00B0 ISOnum --> */
   221                      "&plusmn;",        "&#177;",       "+-", /* -- plus-minus sign = plus-or-minus sign, */
   222                      "&sup2;",          "&#178;",       "2", /* -- superscript two = superscript digit two */
   223                      "&sup3;",          "&#179;",       "3", /* -- superscript three = superscript digit three */
   224                      "&acute;",         "&#180;",       " ", /* -- acute accent = spacing acute, */
   225                      "&micro;",         "&#181;",       "m", /* -- micro sign, U+00B5 ISOnum --> */
   226                      "&para;",          "&#182;",       "--", /* -- pilcrow sign = paragraph sign, */
   227                      "&cedil;",         "&#184;",       " ", /* -- cedilla = spacing cedilla, U+00B8 ISOdia --> */
   228                      "&sup1;",          "&#185;",       "1", /* -- superscript one = superscript digit one, */
   229                      "&ordm;",          "&#186;",       " ", /* -- masculine ordinal indicator, */
   230                      "&raquo;",         "&#187;",       "\"", /* -- right-pointing double angle quotation mark */
   231                      "&frac14;",        "&#188;",       "1/4", /* -- vulgar fraction one quarter */
   232                      "&frac12;",        "&#189;",       "1/2", /* -- vulgar fraction one half */
   233                      "&frac34;",        "&#190;",       "3/4", /* -- vulgar fraction three quarters */
   234                      "&iquest;",        "&#191;",       "?", /* -- inverted question mark */
   235                      "&Agrave;",        "&#192;",       "A", /* -- latin capital letter A with grave */
   236                      "&Aacute;",        "&#193;",       "A", /* -- latin capital letter A with acute, */
   237                      "&Acirc;",         "&#194;",       "A", /* -- latin capital letter A with circumflex, */
   238                      "&Atilde;",        "&#195;",       "A", /* -- latin capital letter A with tilde, */
   239                      "&Auml;",          "&#196;",       "A", /* -- latin capital letter A with diaeresis, */
   240                      "&Aring;",         "&#197;",       "A", /* -- latin capital letter A with ring above */
   241                      "&AElig;",         "&#198;",       "AE", /* -- latin capital letter AE */
   242                      "&Ccedil;",        "&#199;",       "C", /* -- latin capital letter C with cedilla, */
   243                      "&Egrave;",        "&#200;",       "E", /* -- latin capital letter E with grave, */
   244                      "&Eacute;",        "&#201;",       "E", /* -- latin capital letter E with acute, */
   245                      "&Ecirc;",         "&#202;",       "E", /* -- latin capital letter E with circumflex, */
   246                      "&Euml;",          "&#203;",       "E", /* -- latin capital letter E with diaeresis, */
   247                      "&Igrave;",        "&#204;",       "I", /* -- latin capital letter I with grave, */
   248                      "&Iacute;",        "&#205;",       "I", /* -- latin capital letter I with acute, */
   249                      "&Icirc;",         "&#206;",       "I", /* -- latin capital letter I with circumflex, */
   250                      "&Iuml;",          "&#207;",       "I", /* -- latin capital letter I with diaeresis, */
   251                      "&ETH;",           "&#208;",       "E", /* -- latin capital letter ETH, U+00D0 ISOlat1 --> */
   252                      "&Ntilde;",        "&#209;",       "N", /* -- latin capital letter N with tilde, */
   253                      "&Ograve;",        "&#210;",       "O", /* -- latin capital letter O with grave, */
   254                      "&Oacute;",        "&#211;",       "O", /* -- latin capital letter O with acute, */
   255                      "&Ocirc;",         "&#212;",       "O", /* -- latin capital letter O with circumflex, */
   256                      "&Otilde;",        "&#213;",       "O", /* -- latin capital letter O with tilde, */
   257                      "&Ouml;",          "&#214;",       "O", /* -- latin capital letter O with diaeresis, */
   258                      "&times;",         "&#215;",       "*", /* -- multiplication sign, U+00D7 ISOnum --> */
   259                      "&Oslash;",        "&#216;",       "O", /* -- latin capital letter O with stroke */
   260                      "&Ugrave;",        "&#217;",       "U", /* -- latin capital letter U with grave, */
   261                      "&Uacute;",        "&#218;",       "U", /* -- latin capital letter U with acute, */
   262                      "&Ucirc;",         "&#219;",       "U", /* -- latin capital letter U with circumflex, */
   263                      "&Uuml;",          "&#220;",       "U", /* -- latin capital letter U with diaeresis, */
   264                      "&Yacute;",        "&#221;",       "Y", /* -- latin capital letter Y with acute, */
   265                      "&THORN;",         "&#222;",       "TH", /* -- latin capital letter THORN, */
   266                      "&szlig;",         "&#223;",       "sz", /* -- latin small letter sharp s = ess-zed, */
   267                      "&agrave;",        "&#224;",       "a", /* -- latin small letter a with grave */
   268                      "&aacute;",        "&#225;",       "a", /* -- latin small letter a with acute, */
   269                      "&acirc;",         "&#226;",       "a", /* -- latin small letter a with circumflex, */
   270                      "&atilde;",        "&#227;",       "a", /* -- latin small letter a with tilde, */
   271                      "&auml;",          "&#228;",       "a", /* -- latin small letter a with diaeresis, */
   272                      "&aring;",         "&#229;",       "a", /* -- latin small letter a with ring above */
   273                      "&aelig;",         "&#230;",       "ae", /* -- latin small letter ae */
   274                      "&ccedil;",        "&#231;",       "c", /* -- latin small letter c with cedilla, */
   275                      "&egrave;",        "&#232;",       "e", /* -- latin small letter e with grave, */
   276                      "&eacute;",        "&#233;",       "e", /* -- latin small letter e with acute, */
   277                      "&ecirc;",         "&#234;",       "e", /* -- latin small letter e with circumflex, */
   278                      "&euml;",          "&#235;",       "e", /* -- latin small letter e with diaeresis, */
   279                      "&igrave;",        "&#236;",       "i", /* -- latin small letter i with grave, */
   280                      "&iacute;",        "&#237;",       "i", /* -- latin small letter i with acute, */
   281                      "&icirc;",         "&#238;",       "i", /* -- latin small letter i with circumflex, */
   282                      "&iuml;",          "&#239;",       "i", /* -- latin small letter i with diaeresis, */
   283                      "&eth;",           "&#240;",       "eth", /* -- latin small letter eth, U+00F0 ISOlat1 --> */
   284                      "&ntilde;",        "&#241;",       "n", /* -- latin small letter n with tilde, */
   285                      "&ograve;",        "&#242;",       "o", /* -- latin small letter o with grave, */
   286                      "&oacute;",        "&#243;",       "o", /* -- latin small letter o with acute, */
   287                      "&ocirc;",         "&#244;",       "o", /* -- latin small letter o with circumflex, */
   288                      "&otilde;",        "&#245;",       "o", /* -- latin small letter o with tilde, */
   289                      "&ouml;",          "&#246;",       "o", /* -- latin small letter o with diaeresis, */
   290                      "&divide;",        "&#247;",       "/", /* -- division sign, U+00F7 ISOnum --> */
   291                      "&oslash;",        "&#248;",       "o", /* -- latin small letter o with stroke, */
   292                      "&ugrave;",        "&#249;",       "u", /* -- latin small letter u with grave, */
   293                      "&uacute;",        "&#250;",       "u", /* -- latin small letter u with acute, */
   294                      "&ucirc;",         "&#251;",       "u", /* -- latin small letter u with circumflex, */
   295                      "&uuml;",          "&#252;",       "u", /* -- latin small letter u with diaeresis, */
   296                      "&yacute;",        "&#253;",       "y", /* -- latin small letter y with acute, */
   297                      "&thorn;",         "&#254;",       "th", /* -- latin small letter thorn, */
   298                      "&yuml;",          "&#255;",       "y", /* -- latin small letter y with diaeresis, */
   299                       "", "" };
   300                     
   301 /* ---- list of special characters ---- */
   302 #define CHAR_SPACE        32
   303 #define CHAR_TAB           9
   304 #define CHAR_LF           10
   305 #define CHAR_CR           13
   306 #define CHAR_DQUOTE       34
   307 #define CHAR_SQUOTE       39
   308 #define CHAR_OPEN_SQUOTE  96
   309 #define CHAR_TILDE       126
   310 #define CHAR_ASTERISK     42
   311 #define CHAR_FORESLASH    47
   312 #define CHAR_CARAT        94
   313 
   314 #define CHAR_UNDERSCORE    '_'
   315 #define CHAR_OPEN_CBRACK   '{'
   316 #define CHAR_CLOSE_CBRACK  '}'
   317 #define CHAR_OPEN_RBRACK   '('
   318 #define CHAR_CLOSE_RBRACK  ')'
   319 #define CHAR_OPEN_SBRACK   '['
   320 #define CHAR_CLOSE_SBRACK  ']'
   321 
   322 
   323 
   324 
   325 
   326 /* ---- longest and shortest normal PG line lengths ----*/
   327 #define LONGEST_PG_LINE   75
   328 #define WAY_TOO_LONG      80
   329 #define SHORTEST_PG_LINE  55
   330 
   331 #define SWITCHES "ESTPXLOYHWVMUD" /* switches:-                            */
   332                                   /*     D - ignore DP-specific markup     */
   333                                   /*     E - echo queried line             */
   334                                   /*     S - check single quotes           */
   335                                   /*     T - check common typos            */
   336                                   /*     P - require closure of quotes on  */
   337                                   /*         every paragraph               */
   338                                   /*     X - "Trust no one" :-) Paranoid!  */
   339                                   /*         Queries everything            */
   340                                   /*     L - line end checking defaults on */
   341                                   /*         -L turns it off               */
   342                                   /*     O - overview. Just shows counts.  */
   343                                   /*     Y - puts errors to stdout         */
   344                                   /*         instead of stderr             */
   345                                   /*     H - Echoes header fields          */
   346                                   /*     M - Ignore markup in < >          */
   347                                   /*     U - Use file of User-defined Typos*/
   348                                   /*     W - Defaults for use on Web upload*/
   349                                   /*     V - Verbose - list EVERYTHING!    */
   350 #define SWITNO 14                 /* max number of switch parms            */
   351                                   /*        - used for defining array-size */
   352 #define MINARGS   1               /* minimum no of args excl switches      */
   353 #define MAXARGS   1               /* maximum no of args excl switches      */
   354 
   355 int pswit[SWITNO];                /* program switches set by SWITCHES      */
   356 
   357 #define ECHO_SWITCH      0
   358 #define SQUOTE_SWITCH    1
   359 #define TYPO_SWITCH      2
   360 #define QPARA_SWITCH     3
   361 #define PARANOID_SWITCH  4
   362 #define LINE_END_SWITCH  5
   363 #define OVERVIEW_SWITCH  6
   364 #define STDOUT_SWITCH    7
   365 #define HEADER_SWITCH    8
   366 #define WEB_SWITCH       9
   367 #define VERBOSE_SWITCH   10
   368 #define MARKUP_SWITCH    11
   369 #define USERTYPO_SWITCH  12
   370 #define DP_SWITCH        13
   371 
   372 
   373 
   374 long cnt_dquot;       /* for overview mode, count of doublequote queries */
   375 long cnt_squot;       /* for overview mode, count of singlequote queries */
   376 long cnt_brack;       /* for overview mode, count of brackets queries */
   377 long cnt_bin;         /* for overview mode, count of non-ASCII queries */
   378 long cnt_odd;         /* for overview mode, count of odd character queries */
   379 long cnt_long;        /* for overview mode, count of long line errors */
   380 long cnt_short;       /* for overview mode, count of short line queries */
   381 long cnt_punct;       /* for overview mode, count of punctuation and spacing queries */
   382 long cnt_dash;        /* for overview mode, count of dash-related queries */
   383 long cnt_word;        /* for overview mode, count of word queries */
   384 long cnt_html;        /* for overview mode, count of html queries */
   385 long cnt_lineend;     /* for overview mode, count of line-end queries */
   386 long cnt_spacend;     /* count of lines with space at end  V .21 */
   387 long linecnt;         /* count of total lines in the file */
   388 long checked_linecnt; /* count of lines actually gutchecked V .26 */
   389 
   390 void proghelp(void);
   391 void procfile(char *);
   392 
   393 #define LOW_THRESHOLD    0
   394 #define HIGH_THRESHOLD   1
   395 
   396 #define START 0
   397 #define END 1
   398 #define PREV 0
   399 #define NEXT 1
   400 #define FIRST_OF_PAIR 0
   401 #define SECOND_OF_PAIR 1
   402 
   403 #define MAX_WORDPAIR 1000
   404 
   405 char running_from[MAX_PATH];
   406 
   407 int mixdigit(char *);
   408 char *getaword(char *, char *);
   409 int matchword(char *, char *);
   410 char *flgets(char *, int, FILE *, long);
   411 void lowerit(char *);
   412 int gcisalpha(unsigned char);
   413 int gcisdigit(unsigned char);
   414 int gcisletter(unsigned char);
   415 char *gcstrchr(char *s, char c);
   416 void postprocess_for_HTML(char *);
   417 char *linehasmarkup(char *);
   418 char *losemarkup(char *);
   419 int tagcomp(char *, char *);
   420 char *loseentities(char *);
   421 int isroman(char *);
   422 int usertypo_count;
   423 void postprocess_for_DP(char *);
   424 
   425 char wrk[LINEBUFSIZE];
   426 
   427 /* This is disgustingly lazy, predefining max words & lengths,   */
   428 /* but now I'm out of 16-bit restrictions, what's a couple of K? */
   429 #define MAX_QWORD           50
   430 #define MAX_QWORD_LENGTH    40
   431 char qword[MAX_QWORD][MAX_QWORD_LENGTH];
   432 char qperiod[MAX_QWORD][MAX_QWORD_LENGTH];
   433 signed int dupcnt[MAX_QWORD];
   434 
   435 
   436 
   437 
   438 int main(int argc, char **argv)
   439 {
   440     char *argsw, *s;
   441     int i, switno, invarg;
   442     char usertypo_file[MAX_PATH];
   443     FILE *usertypofile;
   444 
   445 
   446     if (strlen(argv[0]) < sizeof(running_from))
   447         strcpy(running_from, argv[0]);  /* save the path to the executable gutcheck */
   448 
   449     /* find out what directory we're running from */
   450     for (s = running_from + strlen(running_from); *s != '/' && *s != '\\' && s >= running_from; s--)
   451         *s = 0;
   452 
   453 
   454     switno = strlen(SWITCHES);
   455     for (i = switno ; --i >0 ; )
   456         pswit[i] = 0;           /* initialise switches */
   457 
   458     /* Standard loop to extract switches.                   */
   459     /* When we come out of this loop, the arguments will be */
   460     /* in argv[0] upwards and the switches used will be     */
   461     /* represented by their equivalent elements in pswit[]  */
   462     while ( --argc > 0 && **++argv == '-')
   463         for (argsw = argv[0]+1; *argsw !='\0'; argsw++)
   464             for (i = switno, invarg = 1; (--i >= 0) && invarg == 1 ; )
   465                 if ((toupper(*argsw)) == SWITCHES[i] ) {
   466                     invarg = 0;
   467                     pswit[i] = 1;
   468                     }
   469 
   470     pswit[PARANOID_SWITCH] ^= 1;         /* Paranoid checking is turned OFF, not on, by its switch */
   471 
   472     if (pswit[PARANOID_SWITCH]) {                         /* if running in paranoid mode */
   473         pswit[TYPO_SWITCH] = pswit[TYPO_SWITCH] ^ 1;      /* force typo checks as well   */
   474         }                                                 /* v.20 removed s and p switches from paranoid mode */
   475 
   476     pswit[LINE_END_SWITCH] ^= 1;         /* Line-end checking is turned OFF, not on, by its switch */
   477     pswit[ECHO_SWITCH] ^= 1;             /* V.21 Echoing is turned OFF, not on, by its switch      */
   478 
   479     if (pswit[OVERVIEW_SWITCH])       /* just print summary; don't echo */
   480         pswit[ECHO_SWITCH] = 0;
   481 
   482     /* Web uploads - for the moment, this is really just a placeholder     */
   483     /* until we decide what processing we really want to do on web uploads */
   484     if (pswit[WEB_SWITCH]) {          /* specific override for web uploads */
   485         pswit[ECHO_SWITCH] =     1;
   486         pswit[SQUOTE_SWITCH] =   0;
   487         pswit[TYPO_SWITCH] =     1;
   488         pswit[QPARA_SWITCH] =    0;
   489         pswit[PARANOID_SWITCH] = 1;
   490         pswit[LINE_END_SWITCH] = 0;
   491         pswit[OVERVIEW_SWITCH] = 0;
   492         pswit[STDOUT_SWITCH] =   0;
   493         pswit[HEADER_SWITCH] =   1;
   494         pswit[VERBOSE_SWITCH] =  0;
   495         pswit[MARKUP_SWITCH] =   0;
   496         pswit[USERTYPO_SWITCH] = 0;
   497         pswit[DP_SWITCH] = 0;
   498         }
   499 
   500 
   501     if (argc < MINARGS || argc > MAXARGS) {  /* check number of args */
   502         proghelp();
   503         return(1);            /* exit */
   504         }
   505 
   506 
   507     /* read in the user-defined stealth scanno list */
   508 
   509     if (pswit[USERTYPO_SWITCH]) {                    /* ... we were told we had one! */
   510         if ((usertypofile = fopen(USERTYPO_FILE, "rb")) == NULL) {   /* not in cwd. try gutcheck directory. */
   511             strcpy(usertypo_file, running_from);
   512             strcat(usertypo_file, USERTYPO_FILE);
   513             if ((usertypofile = fopen(usertypo_file, "rb")) == NULL) {  /* we ain't got no user typo file! */
   514                 printf("   --> I couldn't find gutcheck.typ -- proceeding without user typos.\n");
   515                 }
   516             }
   517 
   518         usertypo_count = 0;
   519         if (usertypofile) {  /* we managed to open a User Typo File! */
   520             if (pswit[USERTYPO_SWITCH]) {
   521                 while (flgets(aline, LINEBUFSIZE-1, usertypofile, (long)usertypo_count)) {
   522                     if (strlen(aline) > 1) {
   523                         if ((int)*aline > 33) {
   524                             s = malloc(strlen(aline)+1);
   525                             if (!s) {
   526                                 fprintf(stderr, "gutcheck: cannot get enough memory for user typo file!!\n");
   527                                 exit(1);
   528                                 }
   529                             strcpy(s, aline);
   530                             usertypo[usertypo_count] = s;
   531                             usertypo_count++;
   532                             if (usertypo_count >= MAX_USER_TYPOS) {
   533                                 printf("   --> Only %d user-defined typos allowed: ignoring the rest\n");
   534                                 break;
   535                                 }
   536                             }
   537                         }
   538                     }
   539                 }
   540             fclose(usertypofile);
   541             }
   542         }
   543 
   544 
   545 
   546 
   547     fprintf(stderr, "gutcheck: Check and report on an e-text\n");
   548 
   549     cnt_dquot = cnt_squot = cnt_brack = cnt_bin = cnt_odd = cnt_long =
   550     cnt_short = cnt_punct = cnt_dash = cnt_word = cnt_html = cnt_lineend =
   551     cnt_spacend = 0;
   552 
   553     procfile(argv[0]);
   554 
   555     if (pswit[OVERVIEW_SWITCH]) {
   556                          printf("    Checked %ld lines of %ld (head+foot = %ld)\n\n",
   557                             checked_linecnt, linecnt, linecnt - checked_linecnt);
   558                          printf("    --------------- Queries found --------------\n");
   559         if (cnt_long)    printf("    Long lines:                             %5ld\n",cnt_long);
   560         if (cnt_short)   printf("    Short lines:                            %5ld\n",cnt_short);
   561         if (cnt_lineend) printf("    Line-end problems:                      %5ld\n",cnt_lineend);
   562         if (cnt_word)    printf("    Common typos:                           %5ld\n",cnt_word);
   563         if (cnt_dquot)   printf("    Unmatched quotes:                       %5ld\n",cnt_dquot);
   564         if (cnt_squot)   printf("    Unmatched SingleQuotes:                 %5ld\n",cnt_squot);
   565         if (cnt_brack)   printf("    Unmatched brackets:                     %5ld\n",cnt_brack);
   566         if (cnt_bin)     printf("    Non-ASCII characters:                   %5ld\n",cnt_bin);
   567         if (cnt_odd)     printf("    Proofing characters:                    %5ld\n",cnt_odd);
   568         if (cnt_punct)   printf("    Punctuation & spacing queries:          %5ld\n",cnt_punct);
   569         if (cnt_dash)    printf("    Non-standard dashes:                    %5ld\n",cnt_dash);
   570         if (cnt_html)    printf("    Possible HTML tags:                     %5ld\n",cnt_html);
   571         printf("\n");
   572         printf("    TOTAL QUERIES                           %5ld\n",
   573             cnt_dquot + cnt_squot + cnt_brack + cnt_bin + cnt_odd + cnt_long +
   574             cnt_short + cnt_punct + cnt_dash + cnt_word + cnt_html + cnt_lineend);
   575         }
   576 
   577     return(0);
   578 }
   579 
   580 
   581 
   582 /* procfile - process one file */
   583 
   584 void procfile(char *filename)
   585 {
   586 
   587     char *s, *t, *s1, laststart, *wordstart;
   588     char inword[MAXWORDLEN], testword[MAXWORDLEN];
   589     char parastart[81];     /* first line of current para */
   590     FILE *infile;
   591     long quot, squot, firstline, alphalen, totlen, binlen,
   592          shortline, longline, verylongline, spacedash, emdash,
   593          space_emdash, non_PG_space_emdash, PG_space_emdash,
   594          footerline, dotcomma, start_para_line, astline, fslashline,
   595          standalone_digit, hyphens, htmcount, endquote_count;
   596     long spline, nspline;
   597     signed int i, j, llen, isemptyline, isacro, isellipsis, istypo, alower,
   598          eNon_A, eTab, eTilde, eAst, eFSlash, eCarat;
   599     signed int warn_short, warn_long, warn_bin, warn_dash, warn_dotcomma,
   600          warn_ast, warn_fslash, warn_digit, warn_hyphen, warn_endquote;
   601     unsigned int lastlen, lastblen;
   602     signed int s_brack, c_brack, r_brack, c_unders;
   603     signed int open_single_quote, close_single_quote, guessquote, dquotepar, squotepar;
   604     signed int isnewpara, vowel, consonant;
   605     char dquote_err[80], squote_err[80], rbrack_err[80], sbrack_err[80], cbrack_err[80],
   606          unders_err[80];
   607     signed int qword_index, qperiod_index, isdup;
   608     signed int enddash;
   609     signed int Dutchcount, isDutch, Frenchcount, isFrench;
   610 
   611 
   612     
   613 
   614 
   615     laststart = CHAR_SPACE;
   616     lastlen = lastblen = 0;
   617     *dquote_err = *squote_err = *rbrack_err = *cbrack_err = *sbrack_err =
   618         *unders_err = *prevline = 0;
   619     linecnt = firstline = alphalen = totlen = binlen =
   620         shortline = longline = spacedash = emdash = checked_linecnt =
   621         space_emdash = non_PG_space_emdash = PG_space_emdash =
   622         footerline = dotcomma = start_para_line = astline = fslashline = 
   623         standalone_digit = hyphens = htmcount = endquote_count = 0;
   624     quot = squot = s_brack = c_brack = r_brack = c_unders = 0;
   625     i = llen = isemptyline = isacro = isellipsis = istypo = 0;
   626     warn_short = warn_long = warn_bin = warn_dash = warn_dotcomma = 
   627         warn_ast = warn_fslash = warn_digit = warn_endquote = 0;
   628     isnewpara = vowel = consonant = enddash = 0;
   629     spline = nspline = 0;
   630     qword_index = qperiod_index = isdup = 0;
   631     *inword = *testword = 0;
   632     open_single_quote = close_single_quote = guessquote = dquotepar = squotepar = 0;
   633     Dutchcount = isDutch = Frenchcount = isFrench = 0;
   634 
   635 
   636     for (j = 0; j < MAX_QWORD; j++) {
   637         dupcnt[j] = 0;
   638         for (i = 0; i < MAX_QWORD_LENGTH; i++)
   639             qword[i][j] = 0;
   640             qperiod[i][j] = 0;
   641             }
   642 
   643 
   644     if ((infile = fopen(filename, "rb")) == NULL) {
   645         if (pswit[STDOUT_SWITCH])
   646             fprintf(stdout, "gutcheck: cannot open %s\n", filename);
   647         else
   648             fprintf(stderr, "gutcheck: cannot open %s\n", filename);
   649         exit(1);
   650         }
   651 
   652     fprintf(stdout, "\n\nFile: %s\n\n", filename);
   653     firstline = shortline = longline = verylongline = 0;
   654 
   655 
   656     /*****************************************************/
   657     /*                                                   */
   658     /*  Run a first pass - verify that it's a valid PG   */
   659     /*  file, decide whether to report some things that  */
   660     /*  occur many times in the text like long or short  */
   661     /*  lines, non-standard dashes, and other good stuff */
   662     /*  I'll doubtless think of later.                   */
   663     /*                                                   */
   664     /*****************************************************/
   665 
   666     /*****************************************************/
   667     /* V.24  Sigh. Yet Another Header Change             */
   668     /*****************************************************/
   669 
   670     while (fgets(aline, LINEBUFSIZE-1, infile)) {
   671         while (aline[strlen(aline)-1] == 10 || aline[strlen(aline)-1] == 13 ) aline[strlen(aline)-1] = 0;
   672         linecnt++;
   673         if (strstr(aline, "*END") && strstr(aline, "SMALL PRINT") && (strstr(aline, "PUBLIC DOMAIN") || strstr(aline, "COPYRIGHT"))) {
   674             if (spline)
   675                 printf("   --> Duplicate header?\n");
   676             spline = linecnt + 1;   /* first line of non-header text, that is */
   677             }
   678         if (!strncmp(aline, "*** START", 9) && strstr(aline, "PROJECT GUTENBERG")) {
   679             if (nspline)
   680                 printf("   --> Duplicate header?\n");
   681             nspline = linecnt + 1;   /* first line of non-header text, that is */
   682             }
   683         if (spline || nspline) {
   684             lowerit(aline);
   685             if (strstr(aline, "end") && strstr(aline, "project gutenberg")) {
   686                 if (strstr(aline, "end") < strstr(aline, "project gutenberg")) {
   687                     if (footerline) {
   688                         if (!nspline) /* it's an old-form header - we can detect duplicates */
   689                             printf("   --> Duplicate footer?\n");
   690                         else 
   691                             ;
   692                         }
   693                     else {
   694                         footerline = linecnt;
   695                         }
   696                     }
   697                 }
   698             }
   699         if (spline) firstline = spline;
   700         if (nspline) firstline = nspline;  /* override with new */
   701 
   702         if (footerline) continue;    /* 0.99+ don't count the boilerplate in the footer */
   703 
   704         llen = strlen(aline);
   705         totlen += llen;
   706         for (i = 0; i < llen; i++) {
   707             if ((unsigned char)aline[i] > 127) binlen++;
   708             if (gcisalpha(aline[i])) alphalen++;
   709             if (i > 0)
   710                 if (aline[i] == CHAR_DQUOTE && isalpha(aline[i-1]))
   711                     endquote_count++;
   712             }
   713         if (strlen(aline) > 2
   714             && lastlen > 2 && lastlen < SHORTEST_PG_LINE
   715             && lastblen > 2 && lastblen > SHORTEST_PG_LINE
   716             && laststart != CHAR_SPACE)
   717                 shortline++;
   718 
   719         if (*aline) /* fixed line below for 0.96 */
   720             if ((unsigned char)aline[strlen(aline)-1] <= CHAR_SPACE) cnt_spacend++;
   721 
   722         if (strstr(aline, ".,")) dotcomma++;
   723         /* 0.98 only count ast lines for ignoring purposes where there is */
   724         /* locase text on the line */
   725         if (strstr(aline, "*")) {
   726             for (s = aline; *s; s++)
   727                 if (*s >='a' && *s <= 'z')
   728                     break;
   729              if (*s) astline++;
   730              }
   731         if (strstr(aline, "/"))
   732             fslashline++;
   733         for (i = llen-1; i > 0 && (unsigned char)aline[i] <= CHAR_SPACE; i--);
   734         if (aline[i] == '-' && aline[i-1] != '-') hyphens++;
   735 
   736         if (llen > LONGEST_PG_LINE) longline++;
   737         if (llen > WAY_TOO_LONG) verylongline++;
   738 
   739         if (strstr(aline, "<") && strstr(aline, ">")) {
   740             i = (signed int) (strstr(aline, ">") - strstr(aline, "<") + 1);
   741             if (i > 0) 
   742                 htmcount++;
   743             if (strstr(aline, "<i>")) htmcount +=4; /* bonus marks! */
   744             }
   745 
   746         /* Check for spaced em-dashes */
   747         if (strstr(aline,"--")) {
   748             emdash++;
   749             if (*(strstr(aline, "--")-1) == CHAR_SPACE ||
   750                (*(strstr(aline, "--")+2) == CHAR_SPACE))
   751                     space_emdash++;
   752             if (*(strstr(aline, "--")-1) == CHAR_SPACE &&
   753                (*(strstr(aline, "--")+2) == CHAR_SPACE))
   754                     non_PG_space_emdash++;             /* count of em-dashes with spaces both sides */
   755             if (*(strstr(aline, "--")-1) != CHAR_SPACE &&
   756                (*(strstr(aline, "--")+2) != CHAR_SPACE))
   757                     PG_space_emdash++;                 /* count of PG-type em-dashes with no spaces */
   758             }
   759 
   760         for (s = aline; *s;) {
   761             s = getaword(s, inword);
   762             if (!strcmp(inword, "hij") || !strcmp(inword, "niet")) 
   763                 Dutchcount++;
   764             if (!strcmp(inword, "dans") || !strcmp(inword, "avec")) 
   765                 Frenchcount++;
   766             if (!strcmp(inword, "0") || !strcmp(inword, "1")) 
   767                 standalone_digit++;
   768             }
   769 
   770         /* Check for spaced dashes */
   771         if (strstr(aline," -"))
   772             if (*(strstr(aline, " -")+2) != '-')
   773                     spacedash++;
   774         lastblen = lastlen;
   775         lastlen = strlen(aline);
   776         laststart = aline[0];
   777 
   778         }
   779     fclose(infile);
   780 
   781 
   782     /* now, based on this quick view, make some snap decisions */
   783     if (cnt_spacend > 0) {
   784         printf("   --> %ld lines in this file have white space at end\n", cnt_spacend);
   785         }
   786 
   787     warn_dotcomma = 1;
   788     if (dotcomma > 5) {
   789         warn_dotcomma = 0;
   790         printf("   --> %ld lines in this file contain '.,'. Not reporting them.\n", dotcomma);
   791         }
   792 
   793     /* if more than 50 lines, or one-tenth, are short, don't bother reporting them */
   794     warn_short = 1;
   795     if (shortline > 50 || shortline * 10 > linecnt) {
   796         warn_short = 0;
   797         printf("   --> %ld lines in this file are short. Not reporting short lines.\n", shortline);
   798         }
   799 
   800     /* if more than 50 lines, or one-tenth, are long, don't bother reporting them */
   801     warn_long = 1;
   802     if (longline > 50 || longline * 10 > linecnt) {
   803         warn_long = 0;
   804         printf("   --> %ld lines in this file are long. Not reporting long lines.\n", longline);
   805         }
   806 
   807     /* if more than 10 lines contain asterisks, don't bother reporting them V.0.97 */
   808     warn_ast = 1;
   809     if (astline > 10 ) {
   810         warn_ast = 0;
   811         printf("   --> %ld lines in this file contain asterisks. Not reporting them.\n", astline);
   812         }
   813 
   814     /* if more than 10 lines contain forward slashes, don't bother reporting them V.0.99 */
   815     warn_fslash = 1;
   816     if (fslashline > 10 ) {
   817         warn_fslash = 0;
   818         printf("   --> %ld lines in this file contain forward slashes. Not reporting them.\n", fslashline);
   819         }
   820 
   821     /* if more than 20 lines contain unpunctuated endquotes, don't bother reporting them V.0.99 */
   822     warn_endquote = 1;
   823     if (endquote_count > 20 ) {
   824         warn_endquote = 0;
   825         printf("   --> %ld lines in this file contain unpunctuated endquotes. Not reporting them.\n", endquote_count);
   826         }
   827 
   828     /* if more than 15 lines contain standalone digits, don't bother reporting them V.0.97 */
   829     warn_digit = 1;
   830     if (standalone_digit > 10 ) {
   831         warn_digit = 0;
   832         printf("   --> %ld lines in this file contain standalone 0s and 1s. Not reporting them.\n", standalone_digit);
   833         }
   834 
   835     /* if more than 20 lines contain hyphens at end, don't bother reporting them V.0.98 */
   836     warn_hyphen = 1;
   837     if (hyphens > 20 ) {
   838         warn_hyphen = 0;
   839         printf("   --> %ld lines in this file have hyphens at end. Not reporting them.\n", hyphens);
   840         }
   841 
   842     if (htmcount > 20 && !pswit[MARKUP_SWITCH]) {
   843         printf("   --> Looks like this is HTML. Switching HTML mode ON.\n");
   844         pswit[MARKUP_SWITCH] = 1;
   845         }
   846         
   847     if (verylongline > 0) {
   848         printf("   --> %ld lines in this file are VERY long!\n", verylongline);
   849         }
   850 
   851     /* If there are more non-PG spaced dashes than PG em-dashes,    */
   852     /* assume it's deliberate                                       */
   853     /* Current PG guidelines say don't use them, but older texts do,*/
   854     /* and some people insist on them whatever the guidelines say.  */
   855     /* V.20 removed requirement that PG_space_emdash be greater than*/
   856     /* ten before turning off warnings about spaced dashes.         */
   857     warn_dash = 1;
   858     if (spacedash + non_PG_space_emdash > PG_space_emdash) {
   859         warn_dash = 0;
   860         printf("   --> There are %ld spaced dashes and em-dashes. Not reporting them.\n", spacedash + non_PG_space_emdash);
   861         }
   862 
   863     /* if more than a quarter of characters are hi-bit, bug out */
   864     warn_bin = 1;
   865     if (binlen * 4 > totlen) {
   866         printf("   --> This file does not appear to be ASCII. Terminating. Best of luck with it!\n");
   867         exit(1);
   868         }
   869     if (alphalen * 4 < totlen) {
   870         printf("   --> This file does not appear to be text. Terminating. Best of luck with it!\n");
   871         exit(1);
   872         }
   873     if ((binlen * 100 > totlen) || (binlen > 100)) {
   874         printf("   --> There are a lot of foreign letters here. Not reporting them.\n");
   875         warn_bin = 0;
   876         }
   877 
   878     /* isDutch and isFrench added .991 Feb 06 for Frank, Jeroen, Renald */
   879     isDutch = 0;
   880     if (Dutchcount > 50) {
   881         isDutch = 1;
   882         printf("   --> This looks like Dutch - switching off dashes and warnings for 's Middags case.\n");
   883         }
   884 
   885     isFrench = 0;
   886     if (Frenchcount > 50) {
   887         isFrench = 1;
   888         printf("   --> This looks like French - switching off some doublepunct.\n");
   889         }
   890 
   891     if (firstline && footerline)
   892         printf("    The PG header and footer appear to be already on.\n");
   893     else {
   894         if (firstline)
   895             printf("    The PG header is on - no footer.\n");
   896         if (footerline)
   897             printf("    The PG footer is on - no header.\n");
   898         }
   899     printf("\n");
   900 
   901     /* V.22 George Davis asked for an override switch to force it to list everything */
   902     if (pswit[VERBOSE_SWITCH]) {
   903         warn_bin = 1;
   904         warn_short = 1;
   905         warn_dotcomma = 1;
   906         warn_long = 1;
   907         warn_dash = 1;
   908         warn_digit = 1;
   909         warn_ast = 1;
   910         warn_fslash = 1;
   911         warn_hyphen = 1;
   912         warn_endquote = 1;
   913         printf("   *** Verbose output is ON -- you asked for it! ***\n");
   914         }
   915 
   916     if (isDutch)
   917         warn_dash = 0;  /* Frank suggested turning it REALLY off for Dutch */
   918 
   919     if ((infile = fopen(filename, "rb")) == NULL) {
   920         if (pswit[STDOUT_SWITCH])
   921             fprintf(stdout, "gutcheck: cannot open %s\n", filename);
   922         else
   923             fprintf(stderr, "gutcheck: cannot open %s\n", filename);
   924         exit(1);
   925         }
   926 
   927     if (footerline > 0 && firstline > 0 && footerline > firstline && footerline - firstline < 100) { /* ugh */
   928         printf("   --> I don't really know where this text starts. \n");
   929         printf("       There are no reference points.\n");
   930         printf("       I'm going to have to report the header and footer as well.\n");
   931         firstline=0;
   932         }
   933         
   934 
   935 
   936     /*****************************************************/
   937     /*                                                   */
   938     /* Here we go with the main pass. Hold onto yer hat! */
   939     /*                                                   */
   940     /*****************************************************/
   941 
   942     /* Re-init some variables we've dirtied */
   943     quot = squot = linecnt = 0;
   944     laststart = CHAR_SPACE;
   945     lastlen = lastblen = 0;
   946 
   947     while (flgets(aline, LINEBUFSIZE-1, infile, linecnt+1)) {
   948         linecnt++;
   949         if (linecnt == 1) isnewpara = 1;
   950         if (pswit[DP_SWITCH])
   951             if (!strncmp(aline, "-----File: ", 11))
   952                 continue;    // skip DP page separators completely
   953         if (linecnt < firstline || (footerline > 0 && linecnt > footerline)) {
   954             if (pswit[HEADER_SWITCH]) {
   955                 if (!strncmp(aline, "Title:", 6))
   956                     printf("    %s\n", aline);
   957                 if (!strncmp (aline, "Author:", 7))
   958                     printf("    %s\n", aline);
   959                 if (!strncmp(aline, "Release Date:", 13))
   960                     printf("    %s\n", aline);
   961                 if (!strncmp(aline, "Edition:", 8))
   962                     printf("    %s\n\n", aline);
   963                 }
   964             continue;                /* skip through the header */
   965             }
   966         checked_linecnt++;
   967         s = aline;
   968         isemptyline = 1;      /* assume the line is empty until proven otherwise */
   969 
   970         /* If we are in a state of unbalanced quotes, and this line    */
   971         /* doesn't begin with a quote, output the stored error message */
   972         /* If the -P switch was used, print the warning even if the    */
   973         /* new para starts with quotes                                 */
   974         /* Version .20 - if the new paragraph does start with a quote, */
   975         /* but is indented, I was giving a spurious error. Need to     */
   976         /* check the first _non-space_ character on the line rather    */
   977         /* than the first character when deciding whether the para     */
   978         /* starts with a quote. Using *t for this.                     */
   979         t = s;
   980         while (*t == ' ') t++;
   981         if (*dquote_err)
   982             if (*t != CHAR_DQUOTE || pswit[QPARA_SWITCH]) {
   983                 if (!pswit[OVERVIEW_SWITCH]) {
   984                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", parastart);
   985                     printf(dquote_err);
   986                     }
   987                 else
   988                     cnt_dquot++;
   989             }
   990         if (*squote_err) {
   991             if (*t != CHAR_SQUOTE && *t != CHAR_OPEN_SQUOTE || pswit[QPARA_SWITCH] || squot) {
   992                 if (!pswit[OVERVIEW_SWITCH]) {
   993                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", parastart);
   994                     printf(squote_err);
   995                     }
   996                 else
   997                     cnt_squot++;
   998                 }
   999             squot = 0;
  1000             }
  1001         if (*rbrack_err) {
  1002             if (!pswit[OVERVIEW_SWITCH]) {
  1003                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", parastart);
  1004                 printf(rbrack_err);
  1005                 }
  1006             else
  1007                 cnt_brack++;
  1008             }
  1009         if (*sbrack_err) {
  1010             if (!pswit[OVERVIEW_SWITCH]) {
  1011                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", parastart);
  1012                 printf(sbrack_err);
  1013                 }
  1014             else
  1015                 cnt_brack++;
  1016             }
  1017         if (*cbrack_err) {
  1018             if (!pswit[OVERVIEW_SWITCH]) {
  1019                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", parastart);
  1020                 printf(cbrack_err);
  1021                 }
  1022             else
  1023                 cnt_brack++;
  1024             }
  1025         if (*unders_err) {
  1026             if (!pswit[OVERVIEW_SWITCH]) {
  1027                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", parastart);
  1028                 printf(unders_err);
  1029                 }
  1030             else
  1031                 cnt_brack++;
  1032             }
  1033 
  1034         *dquote_err = *squote_err = *rbrack_err = *cbrack_err = 
  1035             *sbrack_err = *unders_err = 0;
  1036 
  1037 
  1038         /* look along the line, accumulate the count of quotes, and see */
  1039         /* if this is an empty line - i.e. a line with nothing on it    */
  1040         /* but spaces.                                                  */
  1041         /* V .12 also if line has just spaces, * and/or - on it, don't  */
  1042         /* count it, since empty lines with asterisks or dashes to      */
  1043         /* separate sections are common.                                */
  1044         /* V .15 new single-quote checking - has to be better than the  */
  1045         /* previous version, but how much better? fingers crossed!      */
  1046         /* V .20 add period to * and - as characters on a separator line*/
  1047         s = aline;
  1048         while (*s) {
  1049             if (*s == CHAR_DQUOTE) quot++;
  1050             if (*s == CHAR_SQUOTE || *s == CHAR_OPEN_SQUOTE)
  1051                 if (s == aline) { /* at start of line, it can only be an openquote */
  1052                     if (strncmp(s+2, "tis", 3) && strncmp(s+2, "Tis", 3)) /* hardcode a very common exception! */
  1053                         open_single_quote++;
  1054                     }
  1055                 else
  1056                     if (gcisalpha(*(s-1)) && gcisalpha(*(s+1)))
  1057                         ; /* do nothing! - it's definitely an apostrophe, not a quote */
  1058                     else        /* it's outside a word - let's check it out */
  1059                         if (*s == CHAR_OPEN_SQUOTE || gcisalpha(*(s+1))) { /* it damwell better BE an openquote */
  1060                             if (strncmp(s+1, "tis", 3) && strncmp(s+1, "Tis", 3)) /* hardcode a very common exception! */
  1061                                 open_single_quote++;
  1062                             }
  1063                         else { /* now - is it a closequote? */
  1064                             guessquote = 0;   /* accumulate clues */
  1065                             if (gcisalpha(*(s-1))) { /* it follows a letter - could be either */
  1066                                 guessquote += 1;
  1067                                 if (*(s-1) == 's') { /* looks like a plural apostrophe */
  1068                                     guessquote -= 3;
  1069                                     if (*(s+1) == CHAR_SPACE)  /* bonus marks! */
  1070                                         guessquote -= 2;
  1071                                     }
  1072                                 }
  1073                             else /* it doesn't have a letter either side */
  1074                                 if (strchr(".?!,;:", *(s-1)) && (strchr(".?!,;: ", *(s+1))))
  1075                                     guessquote += 8; /* looks like a closequote */
  1076                                 else
  1077                                     guessquote += 1;
  1078                             if (open_single_quote > close_single_quote)
  1079                                 guessquote += 1; /* give it the benefit of some doubt - if a squote is already open */
  1080                             else
  1081                                 guessquote -= 1;
  1082                             if (guessquote >= 0)
  1083                                 close_single_quote++;
  1084                             }
  1085 
  1086             if (*s != CHAR_SPACE
  1087                 && *s != '-'
  1088                 && *s != '.'
  1089                 && *s != CHAR_ASTERISK
  1090                 && *s != 13
  1091                 && *s != 10) isemptyline = 0;  /* ignore lines like  *  *  *  as spacers */
  1092             if (*s == CHAR_UNDERSCORE) c_unders++;
  1093             if (*s == CHAR_OPEN_CBRACK) c_brack++;
  1094             if (*s == CHAR_CLOSE_CBRACK) c_brack--;
  1095             if (*s == CHAR_OPEN_RBRACK) r_brack++;
  1096             if (*s == CHAR_CLOSE_RBRACK) r_brack--;
  1097             if (*s == CHAR_OPEN_SBRACK) s_brack++;
  1098             if (*s == CHAR_CLOSE_SBRACK) s_brack--;
  1099             s++;
  1100             }
  1101 
  1102         if (isnewpara && !isemptyline) {   /* This line is the start of a new paragraph */
  1103             start_para_line = linecnt;
  1104             strncpy(parastart, aline, 80); /* Capture its first line in case we want to report it later */
  1105             parastart[79] = 0;
  1106             dquotepar = squotepar = 0; /* restart the quote count 0.98 */
  1107             s = aline;
  1108             while (!gcisalpha(*s) && !gcisdigit(*s) && *s) s++;    /* V.97 fixed bug - overran line and gave false warning - rare */
  1109             if (*s >= 'a' && *s <='z') { /* and its first letter is lowercase */
  1110                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1111                 if (!pswit[OVERVIEW_SWITCH])
  1112                     printf("    Line %ld column %d - Paragraph starts with lower-case\n", linecnt, (int)(s - aline) +1);
  1113                 else
  1114                     cnt_punct++;
  1115                 }
  1116             isnewpara = 0; /* Signal the end of new para processing */
  1117             }
  1118 
  1119         /* Check for an em-dash broken at line end */
  1120         if (enddash && *aline == '-') {
  1121             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1122             if (!pswit[OVERVIEW_SWITCH])
  1123                 printf("    Line %ld column 1 - Broken em-dash?\n", linecnt);
  1124             else
  1125                 cnt_punct++;
  1126             }
  1127         enddash = 0;
  1128         for (s = aline + strlen(aline) - 1; *s == ' ' && s > aline; s--);
  1129         if (s >= aline && *s == '-')
  1130             enddash = 1;
  1131             
  1132 
  1133         /* Check for invalid or questionable characters in the line */
  1134         /* Anything above 127 is invalid for plain ASCII,  and      */
  1135         /* non-printable control characters should also be flagged. */
  1136         /* Tabs should generally not be there.                      */
  1137         /* Jan 06, in 0.99: Hm. For some strange reason, I either   */
  1138         /* never created or deleted the check for unprintable       */
  1139         /* control characters. They should be reported even if      */
  1140         /* warn_bin is on, I think, and in full.                    */
  1141 
  1142         for (s = aline; *s; s++) {
  1143             i = (unsigned char) *s;
  1144             if (i < CHAR_SPACE && i != CHAR_LF && i != CHAR_CR && i != CHAR_TAB) {
  1145                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1146                 if (!pswit[OVERVIEW_SWITCH])
  1147                     printf("    Line %ld column %d - Control character %d\n", linecnt, (int) (s - aline) + 1, i);
  1148                 else
  1149                     cnt_bin++;
  1150                 }
  1151             }
  1152 
  1153         if (warn_bin) {
  1154             eNon_A = eTab = eTilde = eCarat = eFSlash = eAst = 0;  /* don't repeat multiple warnings on one line */
  1155             for (s = aline; *s; s++) {
  1156                 if (!eNon_A && ((*s < CHAR_SPACE && *s != 9 && *s != '\n') || (unsigned char)*s > 127)) {
  1157                     i = *s;                           /* annoying kludge for signed chars */
  1158                     if (i < 0) i += 256;
  1159                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1160                     if (!pswit[OVERVIEW_SWITCH])
  1161                         if (i > 127 && i < 160)
  1162                             printf("    Line %ld column %d - Non-ISO-8859 character %d\n", linecnt, (int) (s - aline) + 1, i);
  1163                         else
  1164                             printf("    Line %ld column %d - Non-ASCII character %d\n", linecnt, (int) (s - aline) + 1, i);
  1165                     else
  1166                         cnt_bin++;
  1167                     eNon_A = 1;
  1168                     }
  1169                 if (!eTab && *s == CHAR_TAB) {
  1170                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1171                     if (!pswit[OVERVIEW_SWITCH])
  1172                         printf("    Line %ld column %d - Tab character?\n", linecnt, (int) (s - aline) + 1);
  1173                     else
  1174                         cnt_odd++;
  1175                     eTab = 1;
  1176                     }
  1177                 if (!eTilde && *s == CHAR_TILDE) {  /* often used by OCR software to indicate an unrecognizable character */
  1178                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1179                     if (!pswit[OVERVIEW_SWITCH])
  1180                         printf("    Line %ld column %d - Tilde character?\n", linecnt, (int) (s - aline) + 1);
  1181                     else
  1182                         cnt_odd++;
  1183                     eTilde = 1;
  1184                     }
  1185                 if (!eCarat && *s == CHAR_CARAT) {  
  1186                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1187                     if (!pswit[OVERVIEW_SWITCH])
  1188                         printf("    Line %ld column %d - Carat character?\n", linecnt, (int) (s - aline) + 1);
  1189                     else
  1190                         cnt_odd++;
  1191                     eCarat = 1;
  1192                     }
  1193                 if (!eFSlash && *s == CHAR_FORESLASH && warn_fslash) {  
  1194                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1195                     if (!pswit[OVERVIEW_SWITCH])
  1196                         printf("    Line %ld column %d - Forward slash?\n", linecnt, (int) (s - aline) + 1);
  1197                     else
  1198                         cnt_odd++;
  1199                     eFSlash = 1;
  1200                     }
  1201                 /* report asterisks only in paranoid mode, since they're often deliberate */
  1202                 if (!eAst && pswit[PARANOID_SWITCH] && warn_ast && !isemptyline && *s == CHAR_ASTERISK) {
  1203                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1204                     if (!pswit[OVERVIEW_SWITCH])
  1205                         printf("    Line %ld column %d - Asterisk?\n", linecnt, (int) (s - aline) + 1);
  1206                     else
  1207                         cnt_odd++;
  1208                     eAst = 1;
  1209                     }
  1210                 }
  1211             }
  1212 
  1213         /* Check for line too long */
  1214         if (warn_long) {
  1215             if (strlen(aline) > LONGEST_PG_LINE) {
  1216                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1217                 if (!pswit[OVERVIEW_SWITCH])
  1218                     printf("    Line %ld column %d - Long line %d\n", linecnt, strlen(aline), strlen(aline));
  1219                 else
  1220                     cnt_long++;
  1221                 }
  1222             }
  1223 
  1224         /* Check for line too short.                                     */
  1225         /* This one is a bit trickier to implement: we don't want to     */
  1226         /* flag the last line of a paragraph for being short, so we      */
  1227         /* have to wait until we know that our current line is a         */
  1228         /* "normal" line, then report the _previous_ line if it was too  */
  1229         /* short. We also don't want to report indented lines like       */
  1230         /* chapter heads or formatted quotations. We therefore keep      */
  1231         /* lastlen as the length of the last line examined, and          */
  1232         /* lastblen as the length of the last but one, and try to        */
  1233         /* suppress unnecessary warnings by checking that both were of   */
  1234         /* "normal" length. We keep the first character of the last      */
  1235         /* line in laststart, and if it was a space, we assume that the  */
  1236         /* formatting is deliberate. I can't figure out a way to         */
  1237         /* distinguish something like a quoted verse left-aligned or     */
  1238         /* the header or footer of a letter from a paragraph of short    */
  1239         /* lines - maybe if I examined the whole paragraph, and if the   */
  1240         /* para has less than, say, 8 lines and if all lines are short,  */
  1241         /* then just assume it's OK? Need to look at some texts to see   */
  1242         /* how often a formula like this would get the right result.     */
  1243         /* V0.99 changed the tolerance for length to ignore from 2 to 1  */
  1244         if (warn_short) {
  1245             if (strlen(aline) > 1
  1246                 && lastlen > 1 && lastlen < SHORTEST_PG_LINE
  1247                 && lastblen > 1 && lastblen > SHORTEST_PG_LINE
  1248                 && laststart != CHAR_SPACE) {
  1249                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", prevline);
  1250                     if (!pswit[OVERVIEW_SWITCH])
  1251                         printf("    Line %ld column %d - Short line %d?\n", linecnt-1, strlen(prevline), strlen(prevline));
  1252                     else
  1253                         cnt_short++;
  1254                     }
  1255             }
  1256         lastblen = lastlen;
  1257         lastlen = strlen(aline);
  1258         laststart = aline[0];
  1259 
  1260         /* look for punctuation at start of line */
  1261         if  (*aline && strchr(".?!,;:",  aline[0]))  {            /* if it's punctuation */
  1262             if (strncmp(". . .", aline, 5)) {   /* exception for ellipsis: V.98 tightened up to except only a full ellipsis */
  1263                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1264                 if (!pswit[OVERVIEW_SWITCH])
  1265                     printf("    Line %ld column 1 - Begins with punctuation?\n", linecnt);
  1266                 else
  1267                     cnt_punct++;
  1268                 }
  1269             }
  1270 
  1271         /* Check for spaced em-dashes                            */
  1272         /* V.20 must check _all_ occurrences of "--" on the line */
  1273         /* hence the loop - even if the first double-dash is OK  */
  1274         /* there may be another that's wrong later on.           */
  1275         if (warn_dash) {
  1276             s = aline;
  1277             while (strstr(s,"--")) {
  1278                 if (*(strstr(s, "--")-1) == CHAR_SPACE ||
  1279                    (*(strstr(s, "--")+2) == CHAR_SPACE)) {
  1280                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1281                     if (!pswit[OVERVIEW_SWITCH])
  1282                         printf("    Line %ld column %d - Spaced em-dash?\n", linecnt, (int) (strstr(s,"--") - aline) + 1);
  1283                     else
  1284                         cnt_dash++;
  1285                     }
  1286                 s = strstr(s,"--") + 2;
  1287                 }
  1288             }
  1289 
  1290         /* Check for spaced dashes */
  1291         if (warn_dash)
  1292             if (strstr(aline," -")) {
  1293                 if (*(strstr(aline, " -")+2) != '-') {
  1294                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1295                     if (!pswit[OVERVIEW_SWITCH])
  1296                         printf("    Line %ld column %d - Spaced dash?\n", linecnt, (int) (strstr(aline," -") - aline) + 1);
  1297                     else
  1298                         cnt_dash++;
  1299                     }
  1300                 }
  1301             else
  1302                 if (strstr(aline,"- ")) {
  1303                     if (*(strstr(aline, "- ")-1) != '-') {
  1304                         if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1305                         if (!pswit[OVERVIEW_SWITCH])
  1306                             printf("    Line %ld column %d - Spaced dash?\n", linecnt, (int) (strstr(aline,"- ") - aline) + 1);
  1307                         else
  1308                             cnt_dash++;
  1309                         }
  1310                     }
  1311 
  1312         /* v 0.99                                                       */
  1313         /* Check for unmarked paragraphs indicated by separate speakers */
  1314         /* May well be false positive:                                  */
  1315         /* "Bravo!" "Wonderful!" called the crowd.                      */
  1316         /* but useful all the same.                                     */
  1317         s = wrk;
  1318         *s = 0;
  1319         if (strstr(aline, "\" \"")) s = strstr(aline, "\" \"");
  1320         if (strstr(aline, "\"  \"")) s = strstr(aline, "\"  \"");
  1321         if (*s) {
  1322             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1323             if (!pswit[OVERVIEW_SWITCH])
  1324                 printf("    Line %ld column %d - Query missing paragraph break?\n", linecnt, (int)(s - aline) +1);
  1325             else
  1326                 cnt_punct++;
  1327             }
  1328 
  1329 
  1330 
  1331         /* Check for "to he" and other easy he/be errors          */
  1332         /* This is a very inadequate effort on the he/be problem, */
  1333         /* but the phrase "to he" is always an error, whereas "to */
  1334         /* be" is quite common. I chuckle when it does catch one! */
  1335         /* Similarly, '"Quiet!", be said.' is a non-be error      */
  1336         /* V .18 - "to he" is _not_ always an error!:             */
  1337         /*           "Where they went to he couldn't say."        */
  1338         /* but I'm leaving it in anyway.                          */
  1339         /* V .20 Another false positive:                          */
  1340         /*       What would "Cinderella" be without the . . .     */
  1341         /* and another "If he wants to he can see for himself."   */
  1342         /* V .21 Added " is be " and " be is " and " be was "     */
  1343         /* V .99 Added jeebies code -- removed again.             */
  1344         /*       Is jeebies code worth adding? Rare to see he/be  */
  1345         /*       errors with modern OCR. Separate program? Yes!   */
  1346         /*       jeebies does the job without cluttering up this. */
  1347         /*       We do get a few more queryable pairs from the    */
  1348         /*       project though -- they're cheap to implement.    */
  1349         /*       Also added a column number for guiguts.          */
  1350 
  1351         s = wrk;
  1352         *s = 0;
  1353         if (strstr(aline," to he ")) s = strstr(aline," to he ");
  1354         if (strstr(aline,"\" be ")) s = strstr(aline,"\" be ");
  1355         if (strstr(aline,"\", be ")) s = strstr(aline,"\", be ");
  1356         if (strstr(aline," is be ")) s = strstr(aline," is be ");
  1357         if (strstr(aline," be is ")) s = strstr(aline," be is ");
  1358         if (strstr(aline," was be ")) s = strstr(aline," was be ");
  1359         if (strstr(aline," be would ")) s = strstr(aline," be would ");
  1360         if (strstr(aline," be could ")) s = strstr(aline," be could ");
  1361         if (*s) {
  1362             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1363             if (!pswit[OVERVIEW_SWITCH])
  1364                 printf("    Line %ld column %d - Query he/be error?\n", linecnt, (int)(s - aline) +1);
  1365             else
  1366                 cnt_word++;
  1367             }
  1368 
  1369         s = wrk;
  1370         *s = 0;
  1371         if (strstr(aline," i bad ")) s = strstr(aline," i bad ");
  1372         if (strstr(aline," you bad ")) s = strstr(aline," you bad ");
  1373         if (strstr(aline," he bad ")) s = strstr(aline," he bad ");
  1374         if (strstr(aline," she bad ")) s = strstr(aline," she bad ");
  1375         if (strstr(aline," they bad ")) s = strstr(aline," they bad ");
  1376         if (strstr(aline," a had ")) s = strstr(aline," a had ");
  1377         if (strstr(aline," the had ")) s = strstr(aline," the had ");
  1378         if (*s) {
  1379             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1380             if (!pswit[OVERVIEW_SWITCH])
  1381                 printf("    Line %ld column %d - Query had/bad error?\n", linecnt, (int)(s - aline) +1);
  1382             else
  1383                 cnt_word++;
  1384             }
  1385 
  1386 
  1387         /* V .97 Added ", hut "  Not too common, hut pretty certain   */
  1388         /* V.99 changed to add a column number for guiguts            */
  1389         s = wrk;
  1390         *s = 0;
  1391         if (strstr(aline,", hut ")) s = strstr(aline,", hut ");
  1392         if (strstr(aline,"; hut ")) s = strstr(aline,"; hut ");
  1393         if (*s) {
  1394             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1395             if (!pswit[OVERVIEW_SWITCH])
  1396                 printf("    Line %ld column %d - Query hut/but error?\n", linecnt, (int)(s - aline) +1);
  1397             else
  1398                 cnt_word++;
  1399             }
  1400 
  1401         /* Special case - angled bracket in front of "From" placed there by an MTA */
  1402         /* when sending an e-mail.  V .21                                          */
  1403         if (strstr(aline, ">From")) {
  1404             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1405             if (!pswit[OVERVIEW_SWITCH])
  1406                 printf("    Line %ld column %d - Query angled bracket with From\n", linecnt, (int)(strstr(aline, ">From") - aline) +1);
  1407             else
  1408                 cnt_punct++;
  1409             }
  1410 
  1411         /* V 0.98 Check for a single character line - often an overflow from bad wrapping. */
  1412         if (*aline && !*(aline+1)) {
  1413             if (*aline == 'I' || *aline == 'V' || *aline == 'X' || *aline == 'L' || gcisdigit(*aline))
  1414                 ; /* nothing - ignore numerals alone on a line. */
  1415             else {
  1416                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1417                 if (!pswit[OVERVIEW_SWITCH])
  1418                     printf("    Line %ld column 1 - Query single character line\n", linecnt);
  1419                 else
  1420                     cnt_punct++;
  1421                 }
  1422             }
  1423 
  1424         /* V 0.98 Check for I" - often should be ! */
  1425         if (strstr(aline, " I\"")) {
  1426             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1427             if (!pswit[OVERVIEW_SWITCH])
  1428                 printf("    Line %ld column %ld - Query I=exclamation mark?\n", linecnt, strstr(aline, " I\"") - aline);
  1429             else
  1430                 cnt_punct++;
  1431             }
  1432 
  1433         /* V 0.98 Check for period without a capital letter. Cut-down from gutspell */
  1434         /*        Only works when it happens on a single line.                      */
  1435 
  1436         if (pswit[PARANOID_SWITCH])
  1437             for (t = s = aline; strstr(t,". ");) {
  1438                 t = strstr(t, ". ");
  1439                 if (t == s)  {
  1440                     t++;
  1441                     continue; /* start of line punctuation is handled elsewhere */
  1442                     }
  1443                 if (!gcisalpha(*(t-1))) {
  1444                     t++;
  1445                     continue;
  1446                     }
  1447                 if (isDutch) {  /* For Frank & Jeroen -- 's Middags case */
  1448                     if (*(t+2) == CHAR_SQUOTE &&
  1449                       *(t+3)>='a' && *(t+3)<='z' &&
  1450                       *(t+4) == CHAR_SPACE &&
  1451                       *(t+5)>='A' && *(t+5)<='Z') {
  1452                         t++;
  1453                         continue;
  1454                         }
  1455                       }
  1456                 s1 = t+2;
  1457                 while (*s1 && !gcisalpha(*s1) && !isdigit(*s1))
  1458                     s1++;
  1459                 if (*s1 >= 'a' && *s1 <= 'z') {  /* we have something to investigate */
  1460                     istypo = 1;
  1461                     for (s1 = t - 1; s1 >= s && 
  1462                         (gcisalpha(*s1) || gcisdigit(*s1) || 
  1463                         (*s1 == CHAR_SQUOTE && gcisalpha(*(s1+1)) && gcisalpha(*(s1-1)))); s1--); /* so let's go back and find out */
  1464                     s1++;
  1465                     for (i = 0; *s1 && *s1 != '.'; s1++, i++)
  1466                         testword[i] = *s1;
  1467                     testword[i] = 0;
  1468                     for (i = 0; *abbrev[i]; i++)
  1469                         if (!strcmp(testword, abbrev[i]))
  1470                             istypo = 0;
  1471 //                    if (*testword >= 'A' && *testword <= 'Z') 
  1472 //                        istypo = 0;
  1473                     if (gcisdigit(*testword)) istypo = 0;
  1474                     if (!*(testword+1)) istypo = 0;
  1475                     if (isroman(testword)) istypo = 0;
  1476                     if (istypo) {
  1477                         istypo = 0;
  1478                         for (i = 0; testword[i]; i++)
  1479                             if (strchr(vowels, testword[i]))
  1480                                 istypo = 1;
  1481                         }
  1482                     if (istypo) {
  1483                         isdup = 0;
  1484                         if (strlen(testword) < MAX_QWORD_LENGTH && !pswit[VERBOSE_SWITCH])
  1485                             for (i = 0; i < qperiod_index; i++)
  1486                                 if (!strcmp(testword, qperiod[i])) {
  1487                                     isdup = 1;
  1488                                     }
  1489                         if (!isdup) {
  1490                             if (qperiod_index < MAX_QWORD && strlen(testword) < MAX_QWORD_LENGTH) {
  1491                                 strcpy(qperiod[qperiod_index], testword);
  1492                                 qperiod_index++;
  1493                                 }
  1494                             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1495                             if (!pswit[OVERVIEW_SWITCH])
  1496                                 printf("    Line %ld column %d - Extra period?\n", linecnt, (int)(t - aline)+1);
  1497                             else
  1498                                 cnt_punct++;
  1499                             }
  1500                         }
  1501                     }
  1502                 t++;
  1503                 }
  1504 
  1505 
  1506         if (pswit[TYPO_SWITCH]) {    /* Should have put this condition in at the start of 0.99. Duh! */
  1507             /* Check for words usually not followed by punctuation 0.99 */
  1508             for (s = aline; *s;) {
  1509                 wordstart = s;
  1510                 s = getaword(s, inword);
  1511                 if (!*inword) continue;
  1512                 lowerit(inword);
  1513                 for (i = 0; *nocomma[i]; i++)
  1514                     if (!strcmp(inword, nocomma[i])) {
  1515                         if (*s == ',' || *s == ';' || *s == ':') {
  1516                             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1517                             if (!pswit[OVERVIEW_SWITCH])
  1518                                 printf("    Line %ld column %d - Query punctuation after %s?\n", linecnt, (int)(s - aline)+1, inword);
  1519                             else
  1520                                 cnt_punct++;
  1521                             }
  1522                         }
  1523                 for (i = 0; *noperiod[i]; i++)
  1524                     if (!strcmp(inword, noperiod[i])) {
  1525                         if (*s == '.' || *s == '!') {
  1526                             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1527                             if (!pswit[OVERVIEW_SWITCH])
  1528                                 printf("    Line %ld column %d - Query punctuation after %s?\n", linecnt, (int)(s - aline)+1, inword);
  1529                             else
  1530                                 cnt_punct++;
  1531                             }
  1532                         }
  1533                 }
  1534             }
  1535 
  1536 
  1537 
  1538         /* Check for commonly mistyped words, and digits like 0 for O in a word */
  1539         for (s = aline; *s;) {
  1540             wordstart = s;
  1541             s = getaword(s, inword);
  1542             if (!*inword) continue; /* don't bother with empty lines */
  1543             if (mixdigit(inword)) {
  1544                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1545                 if (!pswit[OVERVIEW_SWITCH])
  1546                     printf("    Line %ld column %ld - Query digit in %s\n", linecnt, (int)(wordstart - aline) + 1, inword);
  1547                 else
  1548                     cnt_word++;
  1549                 }
  1550 
  1551             /* put the word through a series of tests for likely typos and OCR errors */
  1552             /* V.21 I had allowed lots of typo-checking even with the typo switch     */
  1553             /* turned off, but I really should disallow reporting of them when        */
  1554             /* the switch is off. Hence the "if" below.                               */
  1555             if (pswit[TYPO_SWITCH]) {
  1556                 istypo = 0;
  1557                 strcpy(testword, inword);
  1558                 alower = 0;
  1559                 for (i = 0; i < (signed int)strlen(testword); i++) { /* lowercase for testing */
  1560                     if (testword[i] >= 'a' && testword[i] <= 'z') alower = 1;
  1561                     if (alower && testword[i] >= 'A' && testword[i] <= 'Z') {
  1562                         /* we have an uppercase mid-word. However, there are common cases: */
  1563                         /*   Mac and Mc like McGill                                        */
  1564                         /*   French contractions like l'Abbe                               */
  1565                         if ((i == 2 && testword[0] == 'm' && testword[1] == 'c') ||
  1566                             (i == 3 && testword[0] == 'm' && testword[1] == 'a' && testword[2] == 'c') ||
  1567                             (i > 0 && testword[i-1] == CHAR_SQUOTE))
  1568                                 ; /* do nothing! */
  1569 
  1570                         else {  /* V.97 - remove separate case of uppercase within word so that         */
  1571                                 /* names like VanAllen fall into qword_index and get reported only once */
  1572                             istypo = 1;
  1573                             }
  1574                         }
  1575                     testword[i] = (char)tolower(testword[i]);
  1576                     }
  1577 
  1578                 /* check for certain unlikely two-letter combinations at word start and end */
  1579                 /* V.0.97 - this replaces individual hardcoded checks in previous versions */
  1580                 if (strlen(testword) > 1) {
  1581                     for (i = 0; *nostart[i]; i++)
  1582                         if (!strncmp(testword, nostart[i], 2))
  1583                             istypo = 1;
  1584                     for (i = 0; *noend[i]; i++)
  1585                         if (!strncmp(testword + strlen(testword) -2, noend[i], 2))
  1586                             istypo = 1;
  1587                     }
  1588 
  1589 
  1590                 /* ght is common, gbt never. Like that. */
  1591                 if (strstr(testword, "cb")) istypo = 1;
  1592                 if (strstr(testword, "gbt")) istypo = 1;
  1593                 if (strstr(testword, "pbt")) istypo = 1;
  1594                 if (strstr(testword, "tbs")) istypo = 1;
  1595                 if (strstr(testword, "mrn")) istypo = 1;
  1596                 if (strstr(testword, "ahle")) istypo = 1;
  1597                 if (strstr(testword, "ihle")) istypo = 1;
  1598 
  1599                 /* "TBE" does happen - like HEARTBEAT - but uncommon.                    */
  1600                 /*  Also "TBI" - frostbite, outbid - but uncommon.                       */
  1601                 /*  Similarly "ii" like Hawaii, or Pompeii, and in Roman numerals,       */
  1602                 /*  but these are covered in V.20. "ii" is a common scanno.              */
  1603                 if (strstr(testword, "tbi")) istypo = 1;
  1604                 if (strstr(testword, "tbe")) istypo = 1;
  1605                 if (strstr(testword, "ii")) istypo = 1;
  1606 
  1607                 /* check for no vowels or no consonants. */
  1608                 /* If none, flag a typo                  */
  1609                 if (!istypo && strlen(testword)>1) {
  1610                     vowel = consonant = 0;
  1611                     for (i = 0; testword[i]; i++)
  1612                         if (testword[i] == 'y' || gcisdigit(testword[i])) {  /* Yah, this is loose. */
  1613                             vowel++;
  1614                             consonant++;
  1615                             }
  1616                         else
  1617                             if  (strchr(vowels, testword[i])) vowel++;
  1618                             else consonant++;
  1619                     if (!vowel || !consonant) {
  1620                         istypo = 1;
  1621                         }
  1622                     }
  1623 
  1624                 /* now exclude the word from being reported if it's in */
  1625                 /* the okword list                                     */
  1626                 for (i = 0; *okword[i]; i++)
  1627                     if (!strcmp(testword, okword[i]))
  1628                         istypo = 0;
  1629 
  1630                 /* what looks like a typo may be a Roman numeral. Exclude these */
  1631                 if (istypo)
  1632                     if (isroman(testword))
  1633                         istypo = 0;
  1634 
  1635                 /* check the manual list of typos */
  1636                 if (!istypo)
  1637                     for (i = 0; *typo[i]; i++)
  1638                         if (!strcmp(testword, typo[i]))
  1639                             istypo = 1;
  1640 
  1641 
  1642                 /* V.21 - check lowercase s and l - special cases */
  1643                 /* V.98 - added "i" and "m"                       */
  1644                 /* V.99 - added "j" often a semi-colon gone wrong */
  1645                 /*      - and "d" for a missing apostrophe - he d */
  1646                 /*      - and "n" for "in"                        */
  1647                 if (!istypo && strlen(testword) == 1)
  1648                     if (strchr("slmijdn", *inword))
  1649                         istypo = 1;
  1650 
  1651 
  1652                 if (istypo) {
  1653                     isdup = 0;
  1654                     if (strlen(testword) < MAX_QWORD_LENGTH && !pswit[VERBOSE_SWITCH])
  1655                         for (i = 0; i < qword_index; i++)
  1656                             if (!strcmp(testword, qword[i])) {
  1657                                 isdup = 1;
  1658                                 ++dupcnt[i];
  1659                                 }
  1660                     if (!isdup) {
  1661                         if (qword_index < MAX_QWORD && strlen(testword) < MAX_QWORD_LENGTH) {
  1662                             strcpy(qword[qword_index], testword);
  1663                             qword_index++;
  1664                             }
  1665                         if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1666                         if (!pswit[OVERVIEW_SWITCH]) {
  1667                             printf("    Line %ld column %d - Query word %s", linecnt, (int)(wordstart - aline) + 1, inword);
  1668                             if (strlen(testword) < MAX_QWORD_LENGTH && !pswit[VERBOSE_SWITCH])
  1669                                 printf(" - not reporting duplicates");
  1670                             printf("\n");
  1671                             }
  1672                         else
  1673                             cnt_word++;
  1674                         }
  1675                     }
  1676                 }        /* end of typo-checking */
  1677 
  1678                 /* check the user's list of typos */
  1679                 if (!istypo)
  1680                     if (usertypo_count)
  1681                         for (i = 0; i < usertypo_count; i++)
  1682                             if (!strcmp(testword, usertypo[i])) {
  1683                                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1684                                 if (!pswit[OVERVIEW_SWITCH])  
  1685                                     printf("    Line %ld column %d - Query possible scanno %s\n", linecnt, (int)(wordstart - aline) + 2, inword);
  1686                                 }
  1687 
  1688 
  1689 
  1690             if (pswit[PARANOID_SWITCH] && warn_digit) {   /* in paranoid mode, query all 0 and 1 standing alone - added warn_digit V.97*/
  1691                 if (!strcmp(inword, "0") || !strcmp(inword, "1")) {
  1692                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1693                     if (!pswit[OVERVIEW_SWITCH])
  1694                         printf("    Line %ld column %d - Query standalone %s\n", linecnt, (int)(wordstart - aline) + 2, inword);
  1695                     else
  1696                         cnt_word++;
  1697                     }
  1698                 }
  1699             }
  1700 
  1701         /* look for added or missing spaces around punctuation and quotes */
  1702         /* If there is a punctuation character like ! with no space on    */
  1703         /* either side, suspect a missing!space. If there are spaces on   */
  1704         /* both sides , assume a typo. If we see a double quote with no   */
  1705         /* space or punctuation on either side of it, assume unspaced     */
  1706         /* quotes "like"this.                                             */
  1707         llen = strlen(aline);
  1708         for (i = 1; i < llen; i++) {                               /* for each character in the line after the first */
  1709             if  (strchr(".?!,;:_", aline[i])) {                    /* if it's punctuation */
  1710                 isacro = 0;                       /* we need to suppress warnings for acronyms like M.D. */
  1711                 isellipsis = 0;                   /* we need to suppress warnings for ellipsis . . . */
  1712                 if ( (gcisalpha(aline[i-1]) && gcisalpha(aline[i+1])) ||     /* if there are letters on both sides of it or ... */
  1713                    (gcisalpha(aline[i+1]) && strchr("?!,;:", aline[i]))) { /* ...if it's strict punctuation followed by an alpha */
  1714                     if (aline[i] == '.') {
  1715                         if (i > 2)
  1716                             if (aline[i-2] == '.') isacro = 1;
  1717                         if (i + 2 < llen)
  1718                             if (aline[i+2] == '.') isacro = 1;
  1719                         }
  1720                     if (!isacro) {
  1721                         if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1722                         if (!pswit[OVERVIEW_SWITCH])
  1723                             printf("    Line %ld column %d - Missing space?\n", linecnt, i+1);
  1724                         else
  1725                             cnt_punct++;
  1726                         }
  1727                     }
  1728                 if (aline[i-1] == CHAR_SPACE && (aline[i+1] == CHAR_SPACE || aline[i+1] == 0)) { /* if there are spaces on both sides, or space before and end of line */
  1729                     if (aline[i] == '.') {
  1730                         if (i > 2)
  1731                             if (aline[i-2] == '.') isellipsis = 1;
  1732                         if (i + 2 < llen)
  1733                             if (aline[i+2] == '.') isellipsis = 1;
  1734                         }
  1735                     if (!isemptyline && !isellipsis) {
  1736                         if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1737                         if (!pswit[OVERVIEW_SWITCH])
  1738                             printf("    Line %ld column %d - Spaced punctuation?\n", linecnt, i+1);
  1739                         else
  1740                             cnt_punct++;
  1741                         }
  1742                     }
  1743                 }
  1744             }
  1745 
  1746         /* 0.98 -- split out the characters that CANNOT be preceded by space */
  1747         llen = strlen(aline);
  1748         for (i = 1; i < llen; i++) {                             /* for each character in the line after the first */
  1749             if  (strchr("?!,;:", aline[i])) {                    /* if it's punctuation that _cannot_ have a space before it */
  1750                 if (aline[i-1] == CHAR_SPACE && !isemptyline && aline[i+1] != CHAR_SPACE) { /* if aline[i+1) DOES == space, it was already reported just above */
  1751                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1752                     if (!pswit[OVERVIEW_SWITCH])
  1753                         printf("    Line %ld column %d - Spaced punctuation?\n", linecnt, i+1);
  1754                     else
  1755                         cnt_punct++;
  1756                     }
  1757                 }
  1758             }
  1759 
  1760 
  1761         /* 0.99 -- special case " .X" where X is any alpha. */
  1762         /* This plugs a hole in the acronym code above. Inelegant, but maintainable. */
  1763         llen = strlen(aline);
  1764         for (i = 1; i < llen; i++) {             /* for each character in the line after the first */
  1765             if  (aline[i] == '.') {              /* if it's a period */
  1766                 if (aline[i-1] == CHAR_SPACE && gcisalpha(aline[i+1])) { /* if the period follows a space and is followed by a letter */
  1767                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1768                     if (!pswit[OVERVIEW_SWITCH])
  1769                         printf("    Line %ld column %d - Spaced punctuation?\n", linecnt, i+1);
  1770                     else
  1771                         cnt_punct++;
  1772                     }
  1773                 }
  1774             }
  1775 
  1776 
  1777 
  1778 
  1779         /* v.21 breaking out the search for unspaced doublequotes        */
  1780         /* This is not as efficient, but it's more maintainable          */
  1781         /* V.97 added underscore to the list of characters not to query, */
  1782         /* since underscores are commonly used as italics indicators.    */
  1783         /* V.98 Added slash as well, same reason.                        */
  1784         for (i = 1; i < llen; i++) {                               /* for each character in the line after the first */
  1785             if (aline[i] == CHAR_DQUOTE) {
  1786                 if ((!strchr(" _-.'`,;:!/([{?}])",  aline[i-1]) &&
  1787                      !strchr(" _-.'`,;:!/([{?}])",  aline[i+1]) &&
  1788                      aline[i+1] != 0
  1789                      || (!strchr(" _-([{'`", aline[i-1]) && gcisalpha(aline[i+1])))) {
  1790                         if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1791                         if (!pswit[OVERVIEW_SWITCH])
  1792                             printf("    Line %ld column %d - Unspaced quotes?\n", linecnt, i+1);
  1793                         else
  1794                             cnt_punct++;
  1795                         }
  1796                 }
  1797             }
  1798 
  1799 
  1800         /* v.98 check parity of quotes                             */
  1801         /* v.99 added !*(s+1) in some tests to catch "I am," he said, but I will not be soon". */
  1802         for (s = aline; *s; s++) {
  1803             if (*s == CHAR_DQUOTE) {
  1804                 if (!(dquotepar = !dquotepar)) {    /* parity even */
  1805                     if (!strchr("_-.'`/,;:!?)]} ",  *(s+1))) {
  1806                         if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1807                         if (!pswit[OVERVIEW_SWITCH])
  1808                             printf("    Line %ld column %d - Wrongspaced quotes?\n", linecnt, (int)(s - aline)+1);
  1809                         else
  1810                             cnt_punct++;
  1811                         }
  1812                     }
  1813                 else {                              /* parity odd */
  1814                     if (!gcisalpha(*(s+1)) && !isdigit(*(s+1)) && !strchr("_-/.'`([{$",  *(s+1)) || !*(s+1)) {
  1815                         if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1816                         if (!pswit[OVERVIEW_SWITCH])
  1817                             printf("    Line %ld column %d - Wrongspaced quotes?\n", linecnt, (int)(s - aline)+1);
  1818                         else
  1819                             cnt_punct++;
  1820                         }
  1821                     }
  1822                 }
  1823             }
  1824 
  1825             if (*aline == CHAR_DQUOTE) {
  1826                 if (strchr(",;:!?)]} ", aline[1])) {
  1827                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1828                     if (!pswit[OVERVIEW_SWITCH])
  1829                         printf("    Line %ld column 1 - Wrongspaced quotes?\n", linecnt, (int)(s - aline)+1);
  1830                     else
  1831                         cnt_punct++;
  1832                     }
  1833                 }
  1834 
  1835         if (pswit[SQUOTE_SWITCH])
  1836             for (s = aline; *s; s++) {
  1837                 if ((*s == CHAR_SQUOTE || *s == CHAR_OPEN_SQUOTE)
  1838                      && ( s == aline || (s > aline && !gcisalpha(*(s-1))) || !gcisalpha(*(s+1)))) {
  1839                     if (!(squotepar = !squotepar)) {    /* parity even */
  1840                         if (!strchr("_-.'`/\",;:!?)]} ",  *(s+1))) {
  1841                             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1842                             if (!pswit[OVERVIEW_SWITCH])
  1843                                 printf("    Line %ld column %d - Wrongspaced singlequotes?\n", linecnt, (int)(s - aline)+1);
  1844                             else
  1845                                 cnt_punct++;
  1846                             }
  1847                         }
  1848                     else {                              /* parity odd */
  1849                         if (!gcisalpha(*(s+1)) && !isdigit(*(s+1)) && !strchr("_-/\".'`",  *(s+1)) || !*(s+1)) {
  1850                             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1851                             if (!pswit[OVERVIEW_SWITCH])
  1852                                 printf("    Line %ld column %d - Wrongspaced singlequotes?\n", linecnt, (int)(s - aline)+1);
  1853                             else
  1854                                 cnt_punct++;
  1855                             }
  1856                         }
  1857                     }
  1858                 }
  1859                     
  1860 
  1861         /* v.20 also look for double punctuation like ,. or ,,     */
  1862         /* Thanks to DW for the suggestion!                        */
  1863         /* I'm putting this in a separate loop for clarity         */
  1864         /* In books with references, ".," and ".;" are common      */
  1865         /* e.g. "etc., etc.," and vol. 1.; vol 3.;                 */
  1866         /* OTOH, from my initial tests, there are also fairly      */
  1867         /* common errors. What to do? Make these cases paranoid?   */
  1868         /* V.21 ".," is the most common, so invented warn_dotcomma */
  1869         /* to suppress detailed reporting if it occurs often       */
  1870         llen = strlen(aline);
  1871         for (i = 0; i < llen; i++)                  /* for each character in the line */
  1872             if (strchr(".?!,;:", aline[i])          /* if it's punctuation */
  1873             && (strchr(".?!,;:", aline[i+1]))
  1874             && aline[i] && aline[i+1])      /* followed by punctuation, it's a query, unless . . . */
  1875                 if (
  1876                   (aline[i] == aline[i+1]
  1877                   && (aline[i] == '.' || aline[i] == '?' || aline[i] == '!'))
  1878                   || (!warn_dotcomma && aline[i] == '.' && aline[i+1] == ',')
  1879                   || (isFrench && !strncmp(aline+i, ",...", 4))
  1880                   || (isFrench && !strncmp(aline+i, "...,", 4))
  1881                   || (isFrench && !strncmp(aline+i, ";...", 4))
  1882                   || (isFrench && !strncmp(aline+i, "...;", 4))
  1883                   || (isFrench && !strncmp(aline+i, ":...", 4))
  1884                   || (isFrench && !strncmp(aline+i, "...:", 4))
  1885                   || (isFrench && !strncmp(aline+i, "!...", 4))
  1886                   || (isFrench && !strncmp(aline+i, "...!", 4))
  1887                   || (isFrench && !strncmp(aline+i, "?...", 4))
  1888                   || (isFrench && !strncmp(aline+i, "...?", 4))
  1889                 ) {
  1890                 if ((isFrench && !strncmp(aline+i, ",...", 4))    /* could this BE any more awkward? */
  1891                   || (isFrench && !strncmp(aline+i, "...,", 4))
  1892                   || (isFrench && !strncmp(aline+i, ";...", 4))
  1893                   || (isFrench && !strncmp(aline+i, "...;", 4))
  1894                   || (isFrench && !strncmp(aline+i, ":...", 4))
  1895                   || (isFrench && !strncmp(aline+i, "...:", 4))
  1896                   || (isFrench && !strncmp(aline+i, "!...", 4))
  1897                   || (isFrench && !strncmp(aline+i, "...!", 4))
  1898                   || (isFrench && !strncmp(aline+i, "?...", 4))
  1899                   || (isFrench && !strncmp(aline+i, "...?", 4)))
  1900                     i +=4;
  1901                         ; /* do nothing for .. !! and ?? which can be legit */
  1902                     }
  1903                 else {
  1904                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1905                     if (!pswit[OVERVIEW_SWITCH])
  1906                         printf("    Line %ld column %d - Double punctuation?\n", linecnt, i+1);
  1907                     else
  1908                         cnt_punct++;
  1909                     }
  1910 
  1911         /* v.21 breaking out the search for spaced doublequotes */
  1912         /* This is not as efficient, but it's more maintainable */
  1913         s = aline;
  1914         while (strstr(s," \" ")) {
  1915             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1916             if (!pswit[OVERVIEW_SWITCH])
  1917                 printf("    Line %ld column %d - Spaced doublequote?\n", linecnt, (int)(strstr(s," \" ")-aline+1));
  1918             else
  1919                 cnt_punct++;
  1920             s = strstr(s," \" ") + 2;
  1921             }
  1922 
  1923         /* v.20 also look for spaced singlequotes ' and `  */
  1924         s = aline;
  1925         while (strstr(s," ' ")) {
  1926             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1927             if (!pswit[OVERVIEW_SWITCH])
  1928                 printf("    Line %ld column %d - Spaced singlequote?\n", linecnt, (int)(strstr(s," ' ")-aline+1));
  1929             else
  1930                 cnt_punct++;
  1931             s = strstr(s," ' ") + 2;
  1932             }
  1933 
  1934         s = aline;
  1935         while (strstr(s," ` ")) {
  1936             if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1937             if (!pswit[OVERVIEW_SWITCH])
  1938                 printf("    Line %ld column %d - Spaced singlequote?\n", linecnt, (int)(strstr(s," ` ")-aline+1));
  1939             else
  1940                 cnt_punct++;
  1941             s = strstr(s," ` ") + 2;
  1942             }
  1943 
  1944         /* v.99 check special case of 'S instead of 's at end of word */
  1945         s = aline + 1;
  1946         while (*s) {
  1947             if (*s == CHAR_SQUOTE && *(s+1) == 'S' && *(s-1)>='a' && *(s-1)<='z')  {
  1948                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1949                 if (!pswit[OVERVIEW_SWITCH])
  1950                     printf("    Line %ld column %d - Capital \"S\"?\n", linecnt, (int)(s-aline+2));
  1951                 else
  1952                     cnt_punct++;
  1953                 }
  1954             s++;
  1955             }
  1956 
  1957 
  1958         /* v.21 Now check special cases - start and end of line - */
  1959         /* for single and double quotes. Start is sometimes [sic] */
  1960         /* but better to query it anyway.                         */
  1961         /* While I'm here, check for dash at end of line          */
  1962         llen = strlen(aline);
  1963         if (llen > 1) {
  1964             if (aline[llen-1] == CHAR_DQUOTE ||
  1965                 aline[llen-1] == CHAR_SQUOTE ||
  1966                 aline[llen-1] == CHAR_OPEN_SQUOTE)
  1967                 if (aline[llen-2] == CHAR_SPACE) {
  1968                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1969                     if (!pswit[OVERVIEW_SWITCH])
  1970                         printf("    Line %ld column %d - Spaced quote?\n", linecnt, llen);
  1971                     else
  1972                         cnt_punct++;
  1973                     }
  1974             
  1975             /* V 0.98 removed aline[0] == CHAR_DQUOTE from the test below, since */
  1976             /* Wrongspaced quotes test also catches it for "                     */
  1977             if (aline[0] == CHAR_SQUOTE ||
  1978                 aline[0] == CHAR_OPEN_SQUOTE)
  1979                 if (aline[1] == CHAR_SPACE) {
  1980                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1981                     if (!pswit[OVERVIEW_SWITCH])
  1982                         printf("    Line %ld column 1 - Spaced quote?\n", linecnt);
  1983                     else
  1984                         cnt_punct++;
  1985                     }
  1986             /* dash at end of line may well be legit - paranoid mode only */
  1987             /* and don't report em-dash at line-end                       */
  1988             if (pswit[PARANOID_SWITCH] && warn_hyphen) {
  1989                 for (i = llen-1; i > 0 && (unsigned char)aline[i] <= CHAR_SPACE; i--);
  1990                 if (aline[i] == '-' && aline[i-1] != '-') {
  1991                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  1992                     if (!pswit[OVERVIEW_SWITCH])
  1993                         printf("    Line %ld column %d - Hyphen at end of line?\n", linecnt, i);
  1994                     }
  1995                 }
  1996             }
  1997 
  1998         /* v.21 also look for brackets surrounded by alpha                    */
  1999         /* Brackets are often unspaced, but shouldn't be surrounded by alpha. */
  2000         /* If so, suspect a scanno like "a]most"                              */
  2001         llen = strlen(aline);
  2002         for (i = 1; i < llen-1; i++) {           /* for each character in the line except 1st & last*/
  2003             if (strchr("{[()]}", aline[i])         /* if it's a bracket */
  2004                 && gcisalpha(aline[i-1]) && gcisalpha(aline[i+1])) {
  2005                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  2006                 if (!pswit[OVERVIEW_SWITCH])
  2007                     printf("    Line %ld column %d - Unspaced bracket?\n", linecnt, i);
  2008                 else
  2009                     cnt_punct++;
  2010                 }
  2011             }
  2012         /* The "Cinderella" case, back in again! :-S Give it another shot */
  2013         if (warn_endquote) {
  2014             llen = strlen(aline);
  2015             for (i = 1; i < llen; i++) {           /* for each character in the line except 1st */
  2016                 if (aline[i] == CHAR_DQUOTE)
  2017                     if (isalpha(aline[i-1])) {
  2018                         if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  2019                         if (!pswit[OVERVIEW_SWITCH])
  2020                             printf("    Line %ld column %d - endquote missing punctuation?\n", linecnt, i);
  2021                         else
  2022                             cnt_punct++;
  2023                         }
  2024                 }
  2025             }
  2026 
  2027         llen = strlen(aline);
  2028 
  2029         /* Check for <HTML TAG> */
  2030         /* If there is a < in the line, followed at some point  */
  2031         /* by a > then we suspect HTML                          */
  2032         if (strstr(aline, "<") && strstr(aline, ">")) {
  2033             i = (signed int) (strstr(aline, ">") - strstr(aline, "<") + 1);
  2034             if (i > 0) {
  2035                 strncpy(wrk, strstr(aline, "<"), i);
  2036                 wrk[i] = 0;
  2037                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  2038                 if (!pswit[OVERVIEW_SWITCH])
  2039                     printf("    Line %ld column %d - HTML Tag? %s \n", linecnt, (int)(strstr(aline, "<") - aline) + 1, wrk);
  2040                 else
  2041                     cnt_html++;
  2042                 }
  2043             }
  2044 
  2045         /* Check for &symbol; HTML                   */
  2046         /* If there is a & in the line, followed at  */
  2047         /* some point by a ; then we suspect HTML    */
  2048         if (strstr(aline, "&") && strstr(aline, ";")) {
  2049             i = (int)(strstr(aline, ";") - strstr(aline, "&") + 1);
  2050             for (s = strstr(aline, "&"); s < strstr(aline, ";"); s++)   
  2051                 if (*s == CHAR_SPACE) i = 0;                /* 0.99 don't report "Jones & Son;" */
  2052             if (i > 0) {
  2053                 strncpy(wrk, strstr(aline,"&"), i);
  2054                 wrk[i] = 0;
  2055                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", aline);
  2056                 if (!pswit[OVERVIEW_SWITCH])
  2057                     printf("    Line %ld column %d - HTML symbol? %s \n", linecnt, (int)(strstr(aline, "&") - aline) + 1, wrk);
  2058                 else
  2059                     cnt_html++;
  2060                 }
  2061             }
  2062 
  2063         /* At end of paragraph, check for mismatched quotes.           */
  2064         /* We don't want to report an error immediately, since it is a */
  2065         /* common convention to omit the quotes at end of paragraph if */
  2066         /* the next paragraph is a continuation of the same speaker.   */
  2067         /* Where this is the case, the next para should begin with a   */
  2068         /* quote, so we store the warning message and only display it  */
  2069         /* at the top of the next iteration if the new para doesn't    */
  2070         /* start with a quote.                                         */
  2071         /* The -p switch overrides this default, and warns of unclosed */
  2072         /* quotes on _every_ paragraph, whether the next begins with a */
  2073         /* quote or not.                                               */
  2074         /* Version .16 - only report mismatched single quotes if       */
  2075         /* an open_single_quotes was found.                            */
  2076 
  2077         if (isemptyline) {          /* end of para - add up the totals */
  2078             if (quot % 2)
  2079                 sprintf(dquote_err, "    Line %ld - Mismatched quotes\n", linecnt);
  2080             if (pswit[SQUOTE_SWITCH] && open_single_quote && (open_single_quote != close_single_quote) )
  2081                 sprintf(squote_err,"    Line %ld - Mismatched singlequotes?\n", linecnt);
  2082             if (pswit[SQUOTE_SWITCH] && open_single_quote
  2083                                      && (open_single_quote != close_single_quote)
  2084                                      && (open_single_quote != close_single_quote +1) )
  2085                 squot = 1;    /* flag it to be noted regardless of the first char of the next para */
  2086             if (r_brack)
  2087                 sprintf(rbrack_err, "    Line %ld - Mismatched round brackets?\n", linecnt);
  2088             if (s_brack)
  2089                 sprintf(sbrack_err, "    Line %ld - Mismatched square brackets?\n", linecnt);
  2090             if (c_brack)
  2091                 sprintf(cbrack_err, "    Line %ld - Mismatched curly brackets?\n", linecnt);
  2092             if (c_unders % 2)
  2093                 sprintf(unders_err, "    Line %ld - Mismatched underscores?\n", linecnt);
  2094             quot = s_brack = c_brack = r_brack = c_unders =
  2095                 open_single_quote = close_single_quote = 0;
  2096             isnewpara = 1;     /* let the next iteration know that it's starting a new para */
  2097             }
  2098 
  2099         /* V.21 _ALSO_ at end of paragraph, check for omitted punctuation. */
  2100         /*      by working back through prevline. DW.                      */
  2101         /* Hmmm. Need to check this only for "normal" paras.               */
  2102         /* So what is a "normal" para? ouch!                               */
  2103         /* Not normal if one-liner (chapter headings, etc.)                */
  2104         /* Not normal if doesn't contain at least one locase letter        */
  2105         /* Not normal if starts with space                                 */
  2106 
  2107         /* 0.99 tighten up on para end checks. Disallow comma and */
  2108         /* semi-colon. Check for legit para end before quotes.    */
  2109         if (isemptyline) {          /* end of para */
  2110             for (s = prevline, i = 0; *s && !i; s++)
  2111                 if (gcisletter(*s))
  2112                     i = 1;    /* use i to indicate the presence of a letter on the line */
  2113             /* This next "if" is a problem.                                             */
  2114             /* If I say "start_para_line <= linecnt - 1", that includes one-line        */
  2115             /* "paragraphs" like chapter heads. Lotsa false positives.                  */
  2116             /* If I say "start_para_line < linecnt - 1" it doesn't, but then it         */
  2117             /* misses genuine one-line paragraphs.                                      */
  2118             /* So what do I do? */
  2119             if (i
  2120                 && lastblen > 2
  2121                 && start_para_line < linecnt - 1
  2122                 && *prevline > CHAR_SPACE
  2123                 ) {
  2124                 for (i = strlen(prevline)-1; (prevline[i] == CHAR_DQUOTE || prevline[i] == CHAR_SQUOTE) && prevline[i] > CHAR_SPACE && i > 0; i--);
  2125                 for (  ; i > 0; i--) {
  2126                     if (gcisalpha(prevline[i])) {
  2127                         if (pswit[ECHO_SWITCH]) printf("\n%s\n", prevline);
  2128                         if (!pswit[OVERVIEW_SWITCH])
  2129                             printf("    Line %ld column %d - No punctuation at para end?\n", linecnt-1, strlen(prevline));
  2130                         else
  2131                             cnt_punct++;
  2132                         break;
  2133                         }
  2134                     if (strchr("-.:!([{?}])", prevline[i]))
  2135                         break;
  2136                     }
  2137                 }
  2138             }
  2139         strcpy(prevline, aline);
  2140     }
  2141     fclose (infile);
  2142     if (!pswit[OVERVIEW_SWITCH])
  2143         for (i = 0; i < MAX_QWORD; i++)
  2144             if (dupcnt[i])
  2145                 printf("\nNote: Queried word %s was duplicated %d time%s\n", qword[i], dupcnt[i], "s");
  2146 }
  2147 
  2148 
  2149 
  2150 /* flgets - get one line from the input stream, checking for   */
  2151 /* the existence of exactly one CR/LF line-end per line.       */
  2152 /* Returns a pointer to the line.                              */
  2153 
  2154 char *flgets(char *theline, int maxlen, FILE *thefile, long lcnt)
  2155 {
  2156     char c;
  2157     int len, isCR, cint;
  2158 
  2159     *theline = 0;
  2160     len = isCR = 0;
  2161     c = cint = fgetc(thefile);
  2162     do {
  2163         if (cint == EOF)
  2164             return (NULL);
  2165         if (c == 10)  /* either way, it's end of line */
  2166             if (isCR)
  2167                 break;
  2168             else {   /* Error - a LF without a preceding CR */
  2169                 if (pswit[LINE_END_SWITCH]) {
  2170                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", theline);
  2171                     if (!pswit[OVERVIEW_SWITCH])
  2172                         printf("    Line %ld - No CR?\n", lcnt);
  2173                     else
  2174                         cnt_lineend++;
  2175                     }
  2176                 break;
  2177                 }
  2178         if (c == 13) {
  2179             if (isCR) { /* Error - two successive CRs */
  2180                 if (pswit[LINE_END_SWITCH]) {
  2181                     if (pswit[ECHO_SWITCH]) printf("\n%s\n", theline);
  2182                     if (!pswit[OVERVIEW_SWITCH])
  2183                         printf("    Line %ld - Two successive CRs?\n", lcnt);
  2184                     else
  2185                         cnt_lineend++;
  2186                     }
  2187                 }
  2188             isCR = 1;
  2189             }
  2190         else {
  2191             if (pswit[LINE_END_SWITCH] && isCR) {
  2192                 if (pswit[ECHO_SWITCH]) printf("\n%s\n", theline);
  2193                 if (!pswit[OVERVIEW_SWITCH])
  2194                     printf("    Line %ld column %d - CR without LF?\n", lcnt, len+1);
  2195                 else
  2196                     cnt_lineend++;
  2197                 }
  2198              theline[len] = c;
  2199              len++;
  2200              theline[len] = 0;
  2201              isCR = 0;
  2202              }
  2203         c = cint = fgetc(thefile);
  2204     } while(len < maxlen);
  2205     if (pswit[MARKUP_SWITCH])  
  2206         postprocess_for_HTML(theline);
  2207     if (pswit[DP_SWITCH])  
  2208         postprocess_for_DP(theline);
  2209     return(theline);
  2210 }
  2211 
  2212 
  2213 
  2214 
  2215 /* mixdigit - takes a "word" as a parameter, and checks whether it   */
  2216 /* contains a mixture of alpha and digits. Generally, this is an     */
  2217 /* error, but may not be for cases like 4th or L5 12s. 3d.           */
  2218 /* Returns 0 if no error found, 1 if error.                          */
  2219 
  2220 int mixdigit(char *checkword)   /* check for digits like 1 or 0 in words */
  2221 {
  2222     int wehaveadigit, wehavealetter, firstdigits, query, wl;
  2223     char *s;
  2224 
  2225 
  2226     wehaveadigit = wehavealetter = query = 0;
  2227     for (s = checkword; *s; s++)
  2228         if (gcisalpha(*s))
  2229             wehavealetter = 1;
  2230         else
  2231             if (gcisdigit(*s))
  2232                 wehaveadigit = 1;
  2233     if (wehaveadigit && wehavealetter) {         /* Now exclude common legit cases, like "21st" and "12l. 3s. 11d." */
  2234         query = 1;
  2235         wl = strlen(checkword);
  2236         for (firstdigits = 0; gcisdigit(checkword[firstdigits]); firstdigits++)
  2237             ;
  2238         /* digits, ending in st, rd, nd, th of either case */
  2239         /* 0.99 donovan points out an error below. Turns out */
  2240         /*      I was using matchword like strcmp when the   */
  2241         /*      return values are different! Duh.            */
  2242         if (firstdigits + 2 == wl &&
  2243               (matchword(checkword + wl - 2, "st")
  2244             || matchword(checkword + wl - 2, "rd")
  2245             || matchword(checkword + wl - 2, "nd")
  2246             || matchword(checkword + wl - 2, "th"))
  2247             )
  2248                 query = 0;
  2249         if (firstdigits + 3 == wl &&
  2250               (matchword(checkword + wl - 3, "sts")
  2251             || matchword(checkword + wl - 3, "rds")
  2252             || matchword(checkword + wl - 3, "nds")
  2253             || matchword(checkword + wl - 3, "ths"))
  2254             )
  2255                 query = 0;
  2256         if (firstdigits + 3 == wl &&
  2257               (matchword(checkword + wl - 4, "stly")
  2258             || matchword(checkword + wl - 4, "rdly")
  2259             || matchword(checkword + wl - 4, "ndly")
  2260             || matchword(checkword + wl - 4, "thly"))
  2261             )
  2262                 query = 0;
  2263 
  2264         /* digits, ending in l, L, s or d */
  2265         if (firstdigits + 1 == wl &&
  2266             (checkword[wl-1] == 'l'
  2267             || checkword[wl-1] == 'L'
  2268             || checkword[wl-1] == 's'
  2269             || checkword[wl-1] == 'd'))
  2270                 query = 0;
  2271         /* L at the start of a number, representing Britsh pounds, like L500  */
  2272         /* This is cute. We know the current word is mixeddigit. If the first */
  2273         /* letter is L, there must be at least one digit following. If both   */
  2274         /* digits and letters follow, we have a genuine error, else we have a */
  2275         /* capital L followed by digits, and we accept that as a non-error.   */
  2276         if (checkword[0] == 'L')
  2277             if (!mixdigit(checkword+1))
  2278                 query = 0;
  2279         }
  2280     return (query);
  2281 }
  2282 
  2283 
  2284 
  2285 
  2286 /* getaword - extracts the first/next "word" from the line, and puts */
  2287 /* it into "thisword". A word is defined as one English word unit    */
  2288 /* -- or at least that's what I'm trying for.                        */
  2289 /* Returns a pointer to the position in the line where we will start */
  2290 /* looking for the next word.                                        */
  2291 
  2292 char *getaword(char *fromline, char *thisword)
  2293 {
  2294     int i, wordlen;
  2295     char *s;
  2296 
  2297     wordlen = 0;
  2298     for ( ; !gcisdigit(*fromline) && !gcisalpha(*fromline) && *fromline ; fromline++ );
  2299 
  2300     /* V .20                                                                   */
  2301     /* add a look-ahead to handle exceptions for numbers like 1,000 and 1.35.  */
  2302     /* Especially yucky is the case of L1,000                                  */
  2303     /* I hate this, and I see other ways, but I don't see that any is _better_.*/
  2304     /* This section looks for a pattern of characters including a digit        */
  2305     /* followed by a comma or period followed by one or more digits.           */
  2306     /* If found, it returns this whole pattern as a word; otherwise we discard */
  2307     /* the results and resume our normal programming.                          */
  2308     s = fromline;
  2309     for (  ; (gcisdigit(*s) || gcisalpha(*s) || *s == ',' || *s == '.') && wordlen < MAXWORDLEN ; s++ ) {
  2310         thisword[wordlen] = *s;
  2311         wordlen++;
  2312         }
  2313     thisword[wordlen] = 0;
  2314     for (i = 1; i < wordlen -1; i++) {
  2315         if (thisword[i] == '.' || thisword[i] == ',') {
  2316             if (gcisdigit(thisword[i-1]) && gcisdigit(thisword[i-1])) {   /* we have one of the damned things */
  2317                 fromline = s;
  2318                 return(fromline);
  2319                 }
  2320             }
  2321         }
  2322 
  2323     /* we didn't find a punctuated number - do the regular getword thing */
  2324     wordlen = 0;
  2325     for (  ; (gcisdigit(*fromline) || gcisalpha(*fromline) || *fromline == '\'') && wordlen < MAXWORDLEN ; fromline++ ) {
  2326         thisword[wordlen] = *fromline;
  2327         wordlen++;
  2328         }
  2329     thisword[wordlen] = 0;
  2330     return(fromline);
  2331 }
  2332 
  2333 
  2334 
  2335 
  2336 
  2337 /* matchword - just a case-insensitive string matcher    */
  2338 /* yes, I know this is not efficient. I'll worry about   */
  2339 /* that when I have a clear idea where I'm going with it.*/
  2340 
  2341 int matchword(char *checkfor, char *thisword)
  2342 {
  2343     unsigned int ismatch, i;
  2344 
  2345     if (strlen(checkfor) != strlen(thisword)) return(0);
  2346 
  2347     ismatch = 1;     /* assume a match until we find a difference */
  2348     for (i = 0; i <strlen(checkfor); i++)
  2349         if (toupper(checkfor[i]) != toupper(thisword[i]))
  2350             ismatch = 0;
  2351     return (ismatch);
  2352 }
  2353 
  2354 
  2355 
  2356 
  2357 
  2358 /* lowerit - lowercase the line. Yes, strlwr does the same job,  */
  2359 /* but not on all platforms, and I'm a bit paranoid about what   */
  2360 /* some implementations of tolower might do to hi-bit characters,*/
  2361 /* which shouldn't matter, but better safe than sorry.           */
  2362 
  2363 void lowerit(char *theline)
  2364 {
  2365     for ( ; *theline; theline++)
  2366         if (*theline >='A' && *theline <='Z')
  2367             *theline += 32;
  2368 }
  2369 
  2370 
  2371 /* Is this word a Roman Numeral?                                    */
  2372 /* v 0.99 improved to be better. It still doesn't actually          */
  2373 /* validate that the number is a valid Roman Numeral -- for example */
  2374 /* it will pass MXXXXXXXXXX as a valid Roman Numeral, but that's not*/
  2375 /* what we're here to do. If it passes this, it LOOKS like a Roman  */
  2376 /* numeral. Anyway, the actual Romans were pretty tolerant of bad   */
  2377 /* arithmetic, or expressions thereof, except when it came to taxes.*/
  2378 /* Allow any number of M, an optional D, an optional CM or CD,      */
  2379 /* any number of optional Cs, an optional XL or an optional XC, an  */
  2380 /* optional IX or IV, an optional V and any number of optional Is.  */
  2381 /* Good enough for jazz chords.                                     */
  2382 
  2383 int isroman(char *t)
  2384 {
  2385     char *s;
  2386 
  2387     if (!t || !*t) return (0);
  2388 
  2389     s = t;
  2390 
  2391     while (*t == 'm' && *t ) t++;
  2392     if (*t == 'd') t++;
  2393     if (*t == 'c' && *(t+1) == 'm') t+=2;
  2394     if (*t == 'c' && *(t+1) == 'd') t+=2;
  2395     while (*t == 'c' && *t) t++;
  2396     if (*t == 'x' && *(t+1) == 'l') t+=2;
  2397     if (*t == 'x' && *(t+1) == 'c') t+=2;
  2398     if (*t == 'l') t++;
  2399     while (*t == 'x' && *t) t++;
  2400     if (*t == 'i' && *(t+1) == 'x') t+=2;
  2401     if (*t == 'i' && *(t+1) == 'v') t+=2;
  2402     if (*t == 'v') t++;
  2403     while (*t == 'i' && *t) t++;
  2404     if (!*t) return (1);
  2405 
  2406     return(0);
  2407 }
  2408 
  2409 
  2410 
  2411 
  2412 /* gcisalpha is a special version that is somewhat lenient on 8-bit texts.     */
  2413 /* If we use the standard isalpha() function, 8-bit accented characters break  */
  2414 /* words, so that tete with accented characters appears to be two words, "t"   */
  2415 /* and "t", with 8-bit characters between them. This causes over-reporting of  */
  2416 /* errors. gcisalpha() recognizes accented letters from the CP1252 (Windows)   */
  2417 /* and ISO-8859-1 character sets, which are the most common PG 8-bit types.    */
  2418 
  2419 int gcisalpha(unsigned char c)
  2420 {
  2421     if (c >='a' && c <='z') return(1);
  2422     if (c >='A' && c <='Z') return(1);
  2423     if (c < 140) return(0);
  2424     if (c >=192 && c != 208 && c != 215 && c != 222 && c != 240 && c != 247 && c != 254) return(1);
  2425     if (c == 140 || c == 142 || c == 156 || c == 158 || c == 159) return (1);
  2426     return(0);
  2427 }
  2428 
  2429 /* gcisdigit is a special version that doesn't get confused in 8-bit texts.    */
  2430 int gcisdigit(unsigned char c)
  2431 {   
  2432     if (c >= '0' && c <='9') return(1);
  2433     return(0);
  2434 }
  2435 
  2436 /* gcisletter is a special version that doesn't get confused in 8-bit texts.    */
  2437 /* Yeah, we're ISO-8891-1-specific. So sue me.                                  */
  2438 int gcisletter(unsigned char c)
  2439 {   
  2440     if ((c >= 'A' && c <='Z') || (c >= 'a' && c <='z') || c >= 192) return(1);
  2441     return(0);
  2442 }
  2443 
  2444 
  2445 
  2446 
  2447 /* gcstrchr wraps strchr to return NULL if the character being searched for is zero */
  2448 
  2449 char *gcstrchr(char *s, char c)
  2450 {
  2451     if (c == 0) return(NULL);
  2452     return(strchr(s,c));
  2453 }
  2454 
  2455 /* postprocess_for_DP is derived from postprocess_for_HTML          */
  2456 /* It is invoked with the -d switch from flgets().                  */
  2457 /* It simply "removes" from the line a hard-coded set of common     */
  2458 /* DP-specific tags, so that the line passed to the main routine has*/
  2459 /* been pre-cleaned of DP markup.                                   */
  2460 
  2461 void postprocess_for_DP(char *theline)
  2462 {
  2463 
  2464     char *s, *t;
  2465     int i;
  2466 
  2467     if (!*theline) 
  2468         return;
  2469 
  2470     for (i = 0; *DPmarkup[i]; i++) {
  2471         s = strstr(theline, DPmarkup[i]);
  2472         while (s) {
  2473             t = s + strlen(DPmarkup[i]);
  2474             while (*t) {
  2475                 *s = *t;
  2476                 t++; s++;
  2477                 }
  2478             *s = 0;
  2479             s = strstr(theline, DPmarkup[i]);
  2480             }
  2481         }
  2482 
  2483 }
  2484 
  2485 
  2486 /* postprocess_for_HTML is, at the moment (0.97), a very nasty      */
  2487 /* short-term fix for Charlz. Nasty, nasty, nasty.                  */
  2488 /* It is invoked with the -m switch from flgets().                  */
  2489 /* It simply "removes" from the line a hard-coded set of common     */
  2490 /* HTML tags and "replaces" a hard-coded set of common HTML         */
  2491 /* entities, so that the line passed to the main routine has        */
  2492 /* been pre-cleaned of HTML. This is _so_ not the right way to      */
  2493 /* deal with HTML, but what Charlz needs now is not HTML handling   */
  2494 /* proper: just ignoring <i> tags and some others.                  */
  2495 /* To be revisited in future releases!                              */
  2496 
  2497 void postprocess_for_HTML(char *theline)
  2498 {
  2499 
  2500     if (strstr(theline, "<") && strstr(theline, ">"))
  2501         while (losemarkup(theline))
  2502             ;
  2503     while (loseentities(theline))
  2504         ;
  2505 }
  2506 
  2507 char *losemarkup(char *theline)
  2508 {
  2509     char *s, *t;
  2510     int i;
  2511 
  2512     if (!*theline) 
  2513         return(NULL);
  2514 
  2515     s = strstr(theline, "<");
  2516     t = strstr(theline, ">");
  2517     if (!s || !t) return(NULL);
  2518     for (i = 0; *markup[i]; i++)
  2519         if (!tagcomp(s+1, markup[i])) {
  2520             if (!*(t+1)) {
  2521                 *s = 0;
  2522                 return(s);
  2523                 }
  2524             else
  2525                 if (t > s) {
  2526                     strcpy(s, t+1);
  2527                     return(s);
  2528                     }
  2529         }
  2530     /* it's an unrecognized <xxx> */
  2531     return(NULL);
  2532 }
  2533 
  2534 char *loseentities(char *theline)
  2535 {
  2536     int i;
  2537     char *s, *t;
  2538 
  2539     if (!*theline) 
  2540         return(NULL);
  2541 
  2542     for (i = 0; *entities[i].htmlent; i++) {
  2543         s = strstr(theline, entities[i].htmlent);
  2544         if (s) {
  2545             t = malloc((size_t)strlen(s));
  2546             if (!t) return(NULL);
  2547             strcpy(t, s + strlen(entities[i].htmlent));
  2548             strcpy(s, entities[i].textent);
  2549             strcat(s, t);
  2550             free(t);
  2551             return(theline);
  2552             }
  2553         }
  2554 
  2555     /* V0.97 Duh. Forgot to check the htmlnum member */
  2556     for (i = 0; *entities[i].htmlnum; i++) {
  2557         s = strstr(theline, entities[i].htmlnum);
  2558         if (s) {
  2559             t = malloc((size_t)strlen(s));
  2560             if (!t) return(NULL);
  2561             strcpy(t, s + strlen(entities[i].htmlnum));
  2562             strcpy(s, entities[i].textent);
  2563             strcat(s, t);
  2564             free(t);
  2565             return(theline);
  2566             }
  2567         }
  2568     return(NULL);
  2569 }
  2570 
  2571 
  2572 int tagcomp(char *strin, char *basetag)
  2573 {
  2574     char *s, *t;
  2575 
  2576     s = basetag;
  2577     t  = strin;
  2578     if (*t == '/') t++; /* ignore a slash */
  2579     while (*s && *t) {
  2580         if (tolower(*s) != tolower(*t)) return(1);
  2581         s++; t++;
  2582         }
  2583     /* OK, we have < followed by a valid tag start  */
  2584     /* should I do something about length?          */
  2585     /* this is messy. The length of an <i> tag is   */
  2586     /* limited, but a <table> could go on for miles */
  2587     /* so I'd have to parse the tags . . . ugh.     */
  2588     /* It isn't what Charlz needs now, so mark it   */
  2589     /* as 'pending'.                                */
  2590     return(0);
  2591 }
  2592 
  2593 void proghelp()                  /* explain program usage here */
  2594 {
  2595     fputs("V. 0.991. Copyright 2000-2005 Jim Tinsley <jtinsley@pobox.com>.\n",stderr);
  2596     fputs("Gutcheck comes wih ABSOLUTELY NO WARRANTY. For details, read the file COPYING.\n", stderr);
  2597     fputs("This is Free Software; you may redistribute it under certain conditions (GPL);\n", stderr);
  2598     fputs("read the file COPYING for details.\n\n", stderr);
  2599     fputs("Usage is: gutcheck [-setpxloyhud] filename\n",stderr);
  2600     fputs("  where -s checks single quotes, -e suppresses echoing lines, -t checks typos\n",stderr);
  2601     fputs("  -x (paranoid) switches OFF -t and extra checks, -l turns OFF line-end checks\n",stderr);
  2602     fputs("  -o just displays overview without detail, -h echoes header fields\n",stderr);
  2603     fputs("  -v (verbose) unsuppresses duplicate reporting, -m suppresses markup\n",stderr);
  2604     fputs("  -d ignores DP-specific markup,\n",stderr);
  2605     fputs("  -u uses a file gutcheck.typ to query user-defined possible typos\n",stderr);
  2606     fputs("Sample usage: gutcheck warpeace.txt \n",stderr);
  2607     fputs("\n",stderr);
  2608     fputs("Gutcheck looks for errors in Project Gutenberg(TM) etexts.\n", stderr);
  2609     fputs("Gutcheck queries anything it thinks shouldn't be in a PG text; non-ASCII\n",stderr);
  2610     fputs("characters like accented letters, lines longer than 75 or shorter than 55,\n",stderr);
  2611     fputs("unbalanced quotes or brackets, a variety of badly formatted punctuation, \n",stderr);
  2612     fputs("HTML tags, some likely typos. It is NOT a substitute for human judgement.\n",stderr);
  2613     fputs("\n",stderr);
  2614 }
  2615 
  2616 
  2617 
  2618 /*********************************************************************
  2619   Revision History:
  2620 
  2621   04/22/01 Cleaned up some stuff and released .10
  2622 
  2623            ---------------
  2624 
  2625   05/09/01 Added the typo list, added two extra cases of he/be error,
  2626            added -p switch, OPEN_SINGLE QUOTE char as .11
  2627 
  2628            ---------------
  2629 
  2630   05/20/01 Increased the typo list,
  2631            added paranoid mode,
  2632            ANSIfied the code and added some casts
  2633               so the compiler wouldn't keep asking if I knew what I was doing,
  2634            fixed bug in l.s.d. condition (thanks, Dave!),
  2635            standardized spacing when echoing,
  2636            added letter-combo checking code to typo section,
  2637            added more h/b words to typo array.
  2638            Not too sure about putting letter combos outside of the TYPO conditions -
  2639            someone is sure to have a book about the tbaka tribe, or something. Anyway, let's see.
  2640            Released as .12
  2641 
  2642            ---------------
  2643 
  2644   06/01/01 Removed duplicate reporting of Tildes, asterisks, etc.
  2645   06/10/01 Added flgets routine to help with platform-independent
  2646            detection of invalid line-ends. All PG text files should
  2647            have CR/LF (13/10) at end of line, regardless of system.
  2648            Gutcheck now validates this by default. (Thanks, Charles!)
  2649            Released as .13
  2650 
  2651            ---------------
  2652 
  2653   06/11/01 Added parenthesis match checking. (c_brack, cbrack_err etc.)
  2654            Released as .14
  2655 
  2656            ---------------
  2657 
  2658   06/23/01 Fixed: 'No',he said. not being flagged.
  2659 
  2660            Improved: better single-quotes checking:
  2661 
  2662            Ignore singlequotes surrounded by alpha, like didn't. (was OK)
  2663 
  2664            If a singlequote is at the END of a word AND the word ends in "s":
  2665                   The dogs' tails wagged.
  2666            it's probably an apostrophe, but less commonly may be a closequote:
  2667                   "These 'pack dogs' of yours look more like wolves."
  2668 
  2669            If it's got punctuation before it and is followed by a space
  2670            or punctuation:
  2671               . . . was a problem,' he said
  2672               . . . was a problem,'"
  2673            it is probably (certainly?) a closequote.
  2674 
  2675            If it's at start of paragraph, it's probably an openquote.
  2676               (but watch dialect)
  2677 
  2678            Words with ' at beginning and end are probably quoted:
  2679                "You have the word 'chivalry' frequently on your lips."
  2680                (Not specifically implemented)
  2681            V.18 I'm glad I didn't implement this, 'cos it jest ain't so
  2682            where the convention is to punctuate outside the quotes.
  2683                'Come', he said, 'and join the party'.
  2684 
  2685            If it is followed by an alpha, and especially a capital:
  2686               'Hello,' called he.
  2687            it is either an openquote or dialect.
  2688 
  2689            Dialect breaks ALL the rules:
  2690                   A man's a man for a' that.
  2691                   "Aye, but 'tis all in the pas' now."
  2692                   "'Tis often the way," he said.
  2693                   'Ave a drink on me.
  2694 
  2695            This version looks to be an improvement, and produces
  2696            fewer false positives, but is still not perfect. The
  2697            'pack dogs' case still fools it, and dialect is still
  2698            a problem. Oh, well, it's an improvement, and I have
  2699            a weighted structure in place for refining guesses at
  2700            closequotes. Maybe next time, I'll add a bit of logic
  2701            where if there is an open quote and one that was guessed
  2702            to be a possessive apostrophe after s, I'll re-guess it
  2703            to be a closequote. Let's see how this one flies, first.
  2704 
  2705            (Afterview: it's still crap. Needs much work, and a deeper insight.)
  2706 
  2707            Released as .15
  2708 
  2709            TODO: More he/be checks. Can't be perfect - counterexamples:
  2710               I gave my son good advice: be married regardless of the world's opinion.
  2711               I gave my son good advice: he married regardless of the world's opinion.
  2712 
  2713               If by "primitive" be meant "crude", we can understand the sentence.
  2714               If by "primitive" he meant "crude", we can understand the sentence.
  2715 
  2716               No matter what be said, I must go on.
  2717               No matter what he said, I must go on.
  2718 
  2719               No value, however great, can be set upon them.
  2720               No value, however great, can he set upon them.
  2721 
  2722               Real-Life one from a DP International Weekly Miscellany:
  2723                 He wandered through the forest without fear, sleeping
  2724                 much, for in sleep be had companionship--the Great
  2725                 Spirit teaching him what he should know in dreams.
  2726                 That one found by jeebies, and it turned out to be "he".
  2727 
  2728 
  2729            ---------------
  2730 
  2731   07/01/01 Added -O option.
  2732            Improved singlequotes by reporting mismatched single quotes
  2733            only if an open_single_quotes was found.
  2734 
  2735            Released as .16
  2736 
  2737            ---------------
  2738 
  2739   08/27/01 Added -Y switch for Robert Rowe to allow his app to
  2740            catch the error output.
  2741 
  2742            Released as .17
  2743 
  2744            ---------------
  2745 
  2746   09/08/01 Added checking Capitals at start of paragraph, but not
  2747            checking them at start of sentence.
  2748 
  2749            TODO: Parse sentences out so can check reliably for start of
  2750                  sentence. Need a whole different approach for that.
  2751                  (Can't just rely on periods, since they are also
  2752                  used for abbreviations, etc.)
  2753 
  2754            Added checking for all vowels or all consonants in a word.
  2755 
  2756            While I was in, I added "ii" checking and "tl" at start of word.
  2757 
  2758            Added echoing of first line of paragraph when reporting
  2759            mismatched quoted or brackets (thanks to David Widger for the
  2760            suggestion)
  2761 
  2762            Not querying L at start of a number (used for British pounds).
  2763 
  2764            The spelling changes are sort of half-done but released anyway
  2765            Skipped .18 because I had given out a couple of test versions
  2766            with that number.
  2767 
  2768   09/25/01 Released as .19
  2769 
  2770            ---------------
  2771 
  2772            TODO:
  2773            Use the logic from my new version of safewrap to stop querying
  2774              short lines like poems and TOCs.
  2775            Ignore non-standard ellipses like .  .  . or ...
  2776 
  2777 
  2778            ---------------
  2779   10/01/01 Made any line over 80 a VERY long line (was 85).
  2780            Recognized openquotes on indented paragraphs as continuations
  2781                of the same speech.
  2782            Added "cf" to the okword list (how did I forget _that_?) and a few others.
  2783            Moved abbrev to okword and made it more general.
  2784            Removed requirement that PG_space_emdash be greater than
  2785                ten before turning off warnings about spaced dashes.
  2786            Added period to list of characters that might constitute a separator line.
  2787            Now checking for double punctuation (Thanks, David!)
  2788            Now if two spaced em-dashes on a line, reports both. (DW)
  2789            Bug: Wasn't catching spaced punctuation at line-end since I
  2790                added flgets in version .13 - fixed.
  2791            Bug: Wasn't catching spaced singlequotes - fixed
  2792            Now reads punctuated numbers like 1,000 as a single word.
  2793                (Used to give "standalone 1" type  queries)
  2794            Changed paranoid mode - not including s and p options. -ex is now quite usable.
  2795            Bug: was calling `"For it is perfectly impossible,"    Unspaced Quotes - fixed
  2796            Bug: Sometimes gave _next_ line number for queried word at end of line - fixed
  2797 
  2798   10/22/01 Released as .20
  2799 
  2800            ---------------
  2801 
  2802            Added count of lines with spaces at end. (cnt_spacend) (Thanks, Brett!)
  2803            Reduced the number of hi-bit letters needed to stop reporting them
  2804                from 1/20 to 1/100 or 200 in total.
  2805            Added PG footer check.
  2806            Added the -h switch.
  2807            Fixed platform-specific CHAR_EOL checking for isemptyline - changed to 13 and 10
  2808            Not reporting ".," when there are many of them, such as a book with many references to "Vol 1., p. 23"
  2809            Added unspaced brackets check when surrounded by alpha.
  2810            Removed all typo reporting unless the typo switch is on.
  2811            Added gcisalpha to ease over-reporting of 8-bit queries.
  2812            ECHO_SWITCH is now ON by default!
  2813            PARANOID_SWITCH is now ON by default!
  2814            Checking for ">From" placed there by e-mail MTA (Thanks Andrew & Greg)
  2815            Checking for standalone lowercase "l"
  2816            Checking for standalone lowercase "s"
  2817            Considering "is be" and "be is" "be was" "was be" as he/be errors
  2818            Looking at punct at end of para
  2819 
  2820   01/20/02 Released as .21
  2821 
  2822            ---------------
  2823 
  2824            Added VERBOSE_SWITCH to make it list everything. (George Davis)
  2825 
  2826            ---------------
  2827 
  2828   02/17/02 Added cint in flgets to try fix an EOF failure on a compiler I don't have.
  2829            after which
  2830            This line caused a coredump on Solaris - fixed.
  2831                 Da sagte die Figur: " Das ist alles gar schoen, und man mag die Puppe
  2832   03/09/02 Changed header recognition for another header change
  2833            Called it .24
  2834   03/29/02 Added qword[][] so I can suppress massive overreporting
  2835            of queried "words" like "FN", "Wm.", "th'", people's 
  2836            initials, chemical formulae and suchlike in some texts.
  2837            Called it .25
  2838   04/07/02 The qword summary reports at end shouldn't show in OVERVIEW mode. Fixed.
  2839            Added linecounts in overview mode.
  2840            Wow! gutcheck gutcheck.exe doesn't report a binary! :-) Need to tighten up. Done.
  2841            "m" is a not uncommon scanno for "in", but also appears in "a.m." - Can I get round that?
  2842   07/07/02 Added GPL.
  2843            Added checking for broken em-dash at line-end (enddash)
  2844            Released as 0.95
  2845   08/17/02 Fixed a bug that treated some hi-bit characters as spaces. Thanks, Carlo.
  2846            Released as 0.96
  2847   10/10/02 Suppressing some annoying multiple reports by default:
  2848            Standalone Ones, Asterisks, Square Brackets.
  2849               Digit 1 occurs often in many scientific texts.
  2850               Asterisk occurs often in multi-footnoted texts.
  2851               Mismatch Square Brackets occurs often in multi-para footnotes.
  2852            Added -m switch for Charlz. Horrible. Nasty. Kludgy. Evil.
  2853               . . . but it does more or less work for the main cases.
  2854            Removed uppercase within a word as a separate category so
  2855            that names like VanAllen get reported only once, like other
  2856            suspected typos.
  2857   11/24/02 Fixed - -m switch wasn't looking at htmlnum in
  2858            loseentities (Thanks, Brett!)
  2859            Fixed bug which occasionally gave false warning of
  2860            paragraph starting with lowercase.
  2861            Added underscore as character not to query around doublequotes.
  2862            Split the "Non-ASCII" message into "Non-ASCII" vs. "Non-ISO-8859"
  2863            . . . this is to help detect things like CP1252 characters.
  2864            Released as 0.97
  2865 
  2866   12/01/02 Hacked a simplified version of the "Wrongspaced quotes" out of gutspell,
  2867            for doublequotes only. Replaces "Spaced quote", since it also covers that
  2868            case.
  2869            Added "warn_hyphen" to ease over-reporting of hyphens.
  2870 
  2871   12/20/02 Added "extra period" checks.
  2872            Added single character line check
  2873            Added I" check - is usually an exclam
  2874            Released as 0.98
  2875 
  2876   1/5/03   Eeek! Left in a lowerit(argv[0]) at the start before procfile()
  2877            from when I was looking at ways to identify markup. Refuses to
  2878            open files for *nix users with upcase in the filemanes. Removed.
  2879            Fixed quickly and released as 0.981
  2880 
  2881   1/8/03   Added "arid" to the list of typos, slightly against my better
  2882            judgement, but the DP gang are all excited about it. :-)
  2883            Added a check for comma followed by capital letter, where
  2884            a period has OCRed into a comma. (DW). Not sure about this
  2885            either; we'll see.
  2886            Compiling for Win32 to allow longfilenames.
  2887 
  2888   6/1/04   A messy test release for DW to include the "gutcheck.typ"
  2889            process. And the gutcheck.jee trials. Removed "arid" --
  2890            it can go in gutcheck.typ
  2891 
  2892            Added checks for carats ^ and slants / but disabling slant
  2893            queries if more than 20 of them, because some people use them
  2894            for /italics/. Slants are commonly mistaken italic "I"s.
  2895 
  2896            Later: removed gutcheck.jee -- wrote jeebies instead.
  2897 
  2898 Random TODO: 
  2899            Check brackets more closely, like quotes, so that it becomes
  2900            easy to find the error in long paragraphs full of brackets.
  2901 
  2902 
  2903   11/4/04  Assorted cleanup. Fixed case where text started with an
  2904            unbalanced paragraph.
  2905 
  2906   1/2/05   Has it really been that long? Added "nocomma", "noperiod" check.
  2907            Bits and pieces: improved isroman(). Added isletter().
  2908            Other stuff I never noted before this.
  2909 
  2910   7/3/05   Stuck in a quick start on DP-markup ignoring 
  2911            at BillFlis's suggestion.
  2912 
  2913   1/23/06  Took out nocomma etc if typos are off. Why did I ever leave that in?
  2914            Don't count footer for dotcomma etc.
  2915 
  2916 
  2917 1       I
  2918 ail     all
  2919 arc     are
  2920 arid    and
  2921 bad     had
  2922 ball    hall
  2923 band    hand
  2924 bar     her
  2925 bat     but
  2926 be      he
  2927 bead    head
  2928 beads   heads
  2929 bear    hear
  2930 bit     hit
  2931 bo      be
  2932 boon    been
  2933 borne   home
  2934 bow     how
  2935 bumbled humbled
  2936 car     ear
  2937 carnage carriage
  2938 carne   came
  2939 cast    east
  2940 cat     cut
  2941 cat     eat
  2942 cheek   check
  2943 clay    day
  2944 coining coming
  2945 comer   corner
  2946 die     she
  2947 docs    does
  2948 ease    case
  2949 fail    fall
  2950 fee     he
  2951 haying  having
  2952 ho      he
  2953 ho      who
  2954 hut     but
  2955 is      as
  2956 lie     he
  2957 lime    time
  2958 loth    10th
  2959 m       in
  2960 modem   modern
  2961 Ms      his
  2962 ray     away
  2963 ray     my
  2964 ringer  finger
  2965 ringers fingers
  2966 rioted  noted
  2967 tho     the
  2968 tie     he
  2969 tie     the
  2970 tier    her
  2971 tight   right
  2972 tile    the
  2973 tiling  thing
  2974 tip     up
  2975 tram    train
  2976 tune    time
  2977 u       "
  2978 wen     well
  2979 yon     you
  2980 
  2981 *********************************************************************/
  2982