doc/gutcheck.txt
changeset 5 f600b0d1fc5d
parent 4 218904410231
child 6 faab25d520dd
     1.1 --- a/doc/gutcheck.txt	Fri Jan 27 00:28:11 2012 +0000
     1.2 +++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.3 @@ -1,742 +0,0 @@
     1.4 -
     1.5 -
     1.6 -                            Gutcheck documentation
     1.7 -
     1.8 -
     1.9 -gutcheck:  lists possible common formatting errors in a Project
    1.10 -Gutenberg candidate file. It is a command line program and can be used
    1.11 -under Win32 or Unix (gutcheck.c should compile anywhere; if it doesn't,
    1.12 -tell me). For Windows-only people, there is an appendix at the end
    1.13 -with brief instructions for running it.
    1.14 -
    1.15 -
    1.16 -Current version: 0.99. Users of 0.98 see end of file for changes.
    1.17 -
    1.18 -You should also have received the licence file COPYING, a README file, 
    1.19 -gutcheck.c, the source code, and gutcheck.exe, a DOS executable, with
    1.20 -this file.
    1.21 -
    1.22 -This software is Copyright Jim Tinsley 2000-2005.
    1.23 -
    1.24 -Gutcheck comes wih ABSOLUTELY NO WARRANTY. For details, read the file COPYING.
    1.25 -This is Free Software; you may redistribute it under certain conditions (GPL).
    1.26 -
    1.27 -See http://gutcheck.sourceforge.net for the latest version.
    1.28 -
    1.29 -
    1.30 -Usage is: gutcheck [-setopxlywm] filename
    1.31 -      where:
    1.32 -      -s checks Single quotes 
    1.33 -      -e switches off Echoing of lines 
    1.34 -      -t checks Typos
    1.35 -      -o produces an Overview only
    1.36 -      -p sets strict quotes checking for Paragraphs
    1.37 -      -x (paranoid) switches OFF typo checking and extra checks
    1.38 -      -l turns off Line-end checks
    1.39 -      -y sets error messages to stdout
    1.40 -      -w is a special mode for web uploads (for future use)
    1.41 -      -v (verbose) forces individual reporting of minor problems
    1.42 -      -m interprets Markup of some common HTML tags and entities    
    1.43 -      -u warns about words in a user-defined typo file gutcheck.typ 
    1.44 -      -d ignores some DP-specific markup
    1.45 -
    1.46 -Running gutcheck without any parameters will display a brief help message.
    1.47 -
    1.48 -Sample usage: 
    1.49 -
    1.50 -    gutcheck warpeace.txt
    1.51 -
    1.52 -
    1.53 -More detail:
    1.54 -
    1.55 -    Echoing lines (-e to switch off)
    1.56 -
    1.57 -      You may find it convenient, when reviewing Gutcheck's 
    1.58 -      suggestions, to see the line that Gutcheck is questioning.
    1.59 -      That way, you can often see at a glance whether it is
    1.60 -      a real error that needs to be fixed, or a false positive
    1.61 -      that should be in the text, but Gutcheck's limited
    1.62 -      programming doesn't understand.
    1.63 -
    1.64 -      By default, gutcheck echoes these lines, but if you don't 
    1.65 -      want to see the lines referred to, -e will switch it OFF.
    1.66 -
    1.67 -
    1.68 -    Quotes (-s and -p switches)
    1.69 -
    1.70 -      Gutcheck always looks for unbalanced doublequotes in a 
    1.71 -      paragraph. It is a common convention for writers not to
    1.72 -      close quotes in a paragraph if the next paragraph opens
    1.73 -      with quotes and is a continuation by the same speaker.
    1.74 -
    1.75 -      Gutcheck therefore does not normally report unclosed quotes 
    1.76 -      if the next paragraph begins with a quote. If you need
    1.77 -      to see all unclosed quotes, even where the next paragraph
    1.78 -      begins with a quote, you should use the -p switch.
    1.79 -
    1.80 -      Singlequotes (') are a problem, since the same character
    1.81 -      is used for an apostrophe. I'm not sure that it is 
    1.82 -      possible to get 100% accuracy on singlequotes checking,
    1.83 -      particularly since dialect, quite common in PG texts,
    1.84 -      upsets the normal rules so badly. Consider the sentence:
    1.85 -        'Tis often said that a man's a man for a' that.
    1.86 -      As humans, we recognize that both apostrophes are used
    1.87 -      for contractions rather than quotes, but it isn't easy 
    1.88 -      to get a program to recognize that.
    1.89 -
    1.90 -      Since Gutcheck makes too many mistakes when trying to match
    1.91 -      singlequotes, it doesn't look for unbalanced singlequotes
    1.92 -      unless you specify the -s switch.
    1.93 -
    1.94 -      Consider these sentences, which illustrate the main cases:
    1.95 -
    1.96 -        'Tis often said that a fool and his money are soon parted.
    1.97 -
    1.98 -        'Becky's goin' home,' said Tom.
    1.99 -
   1.100 -        The dogs' tails wagged in unison.
   1.101 -
   1.102 -        Those 'pack dogs' of yours look more like wolves.
   1.103 -
   1.104 -
   1.105 -
   1.106 -    Typos (-t switch)
   1.107 -
   1.108 -      It's not Gutcheck's job to be a spelling checker, but it
   1.109 -      does check for a list of common typos and OCR errors if you
   1.110 -      use the -t switch. (The -x switch also turns typo checking on.)
   1.111 -
   1.112 -      It also checks for character combinations, especially involving
   1.113 -      h and b, which are often confused by OCR, that rarely or never
   1.114 -      occur. For example, it queries "tbe" in a word. Now, "the" often
   1.115 -      occurs, but "tbe" is very rare (heartbeat, hotbed), so I'm
   1.116 -      playing the odds - a few false positives for many errors found.
   1.117 -      Similarly with "ii", which is a very common OCR error.
   1.118 -
   1.119 -      Gutcheck suppresses multiple reporting of the first 40 "typos"
   1.120 -      found. This is to remove the annoyance of seeing something like
   1.121 -      "FN" (footnote) or "LK" (initials) flagged as a typo 147 times
   1.122 -      in a text. 
   1.123 -
   1.124 -
   1.125 -    Line-end checking (-l switch to disable)
   1.126 -
   1.127 -      All PG texts should have a Carriage Return (CR - character 13)
   1.128 -      and a Line Feed (LF - character 10) at end of each line,
   1.129 -      regardless of what O/S you made them on. DOS/Windows, Unix
   1.130 -      and Mac have different conventions, but the final text should
   1.131 -      always use a CR/LF pair as its line terminator.
   1.132 -
   1.133 -      By default, Gutcheck verifies that every line does have
   1.134 -      the correct terminator, but if you're on a work-in-progress
   1.135 -      in Linux, you might want to convert the line-ends as a final
   1.136 -      step, and not want to see thousands of errors every time you
   1.137 -      run Gutcheck before that final step, so you can turn off 
   1.138 -      this checking with the -l switch.
   1.139 -
   1.140 -
   1.141 -    Paranoid mode (-x switch to disable: Trust No One :-)
   1.142 -
   1.143 -      -x switches OFF typo-checking, the -t flag, automatically
   1.144 -      and some extra checks like standalone 1 and 0 queries.
   1.145 -
   1.146 -
   1.147 -    Overview mode (-o switch)
   1.148 -
   1.149 -       This mode just gives a count of queries found
   1.150 -       instead of a detailed list.
   1.151 -
   1.152 -
   1.153 -    Header quote  (-h switch)
   1.154 -
   1.155 -       If you use the -h switch, gutcheck will also display
   1.156 -       the Title, Author, Release and Edition fields from the
   1.157 -       PG header. This is useful mostly for the automated
   1.158 -       checks we do on recently-posted texts.
   1.159 -
   1.160 -
   1.161 -    Errors to stdout (-y switch)
   1.162 -
   1.163 -       If you're just running gutcheck normally, you can ignore
   1.164 -       this. It's only there for programs that provide a front
   1.165 -       end to gutcheck. It makes error messages appear within
   1.166 -       the output of gutcheck so that the front end knows whether
   1.167 -       gutcheck ran OK.
   1.168 -
   1.169 -
   1.170 -    Verbose reporting (-v switch)
   1.171 -
   1.172 -       Normally, if gutcheck sees lots of long lines, short lines,
   1.173 -       spaced dashes, non-ASCII characters or dot-commas ".," it
   1.174 -       assumes these are features of the text, counts and summarizes
   1.175 -       them at the top of its report, but does not list them 
   1.176 -       individually. If the -v switch is on, gutcheck will list them all.
   1.177 -
   1.178 -
   1.179 -    Markup interpretation (-m switch)
   1.180 -
   1.181 -       Normally, gutcheck flags anything it suspects of being HTML
   1.182 -       markup as a possible error. When you use the -m switch,
   1.183 -       however, it matches anything that looks like markup against
   1.184 -       a short list of common HTML tags and entities. If the markup
   1.185 -       is in that list, it either ignores the markup, in the case
   1.186 -       of a tag, or "interprets" the markup as its nearest ASCII 
   1.187 -       equivalent, in the case of an entity. So, for example, using
   1.188 -       this switch, gutcheck will "see"
   1.189 -
   1.190 -       &ldquo;He went <i>thataway!</i>&rdquo;
   1.191 -
   1.192 -       as
   1.193 -
   1.194 -       "He went thataway!"
   1.195 -
   1.196 -       and report accordingly.
   1.197 -
   1.198 -       This switch does not, not, NOT check the validity of HTML;
   1.199 -       it exists so that you can run gutcheck on most HTML texts
   1.200 -       for PG, and get sane results. It does not support all tags.
   1.201 -       It does not support all entities. When it sees a tag or entity
   1.202 -       it does not recognize, it will query it as HTML just as if
   1.203 -       you hadn't specified the -m switch.
   1.204 -
   1.205 -       Gutcheck 0.99 will automatically switch on markup interpretation
   1.206 -       if it sees a lot of tags that appear to be markup, so mostly, you
   1.207 -       won't have to specify this.
   1.208 -
   1.209 -    User-defined typos (-u switch)
   1.210 -
   1.211 -        If you have a file named gutcheck.typ either in your current
   1.212 -        working directory or in the directory from which you explicitly
   1.213 -        invoked gutcheck, but not necessarily on your path, and if you
   1.214 -        specify the -u switch, gutcheck will query any word specified 
   1.215 -        in that file. The file is simple: one word, in lower case, per
   1.216 -        line. 999 lines are allowed for. Be careful not to put multiple
   1.217 -        words onto a line, or leave any rubbish other than the word on
   1.218 -        the line. You should have received a sample file gutcheck.typ
   1.219 -        with this package.
   1.220 -
   1.221 -    Ignore DP markup (-d switch)
   1.222 -        
   1.223 -        Distributed Proofreaders (http://www.pgdp.net) is currently
   1.224 -        (2005) the main source of PG texts, and proofers there use
   1.225 -        special conventions. This switch understands those conventions,
   1.226 -        so that people can use gutcheck on files in process that still
   1.227 -        haven't had the special conventions removed yet. The special
   1.228 -        conventions supported in 0.99 are page-separators and
   1.229 -        "<sc>", "</sc>", "/*", "*/", "/#", "#/", "/$", "$/".
   1.230 -
   1.231 -
   1.232 -You will probably only run gutcheck on a text once or maybe twice,
   1.233 -just prior to uploading; it usually finds a few formatting problems;
   1.234 -it also usually finds queries that aren't problems at all - it often
   1.235 -questions Tables of Contents for having short lines, for example.
   1.236 -These are called "false positives", and need a human to decide on
   1.237 -them.
   1.238 -
   1.239 -The text should be standard prose, and already close to PG normal
   1.240 -format (plain text, about 70 characters per line with blank lines
   1.241 -between paragraphs).
   1.242 -
   1.243 -Gutcheck merely draws your attention to things that might be errors.
   1.244 -It is NOT a substitute for human judgement. Formatting choices like
   1.245 -short lines may be for a reason that this program can't understand.
   1.246 -
   1.247 -Even the most careful human proofing can leave errors behind in a
   1.248 -text, and there are several automated checks you can do to help find
   1.249 -them. Of these, spellchecking (with _very_ careful human judgement) is
   1.250 -the most important and most useful.
   1.251 -
   1.252 -Gutcheck does perform some basic typo-checking if you ask it to,
   1.253 -but its focus is on formatting errors specific to PG texts - 
   1.254 -mismatched quotes, non-ASCII characters, bad spacing, bad line
   1.255 -length, HTML tags perhaps left from a conversion, unbalanced
   1.256 -brackets.
   1.257 -
   1.258 -Suggestions for additional checks would be appreciated and duly 
   1.259 -considered, but no guarantees that they will be implemented.
   1.260 -
   1.261 -
   1.262 -
   1.263 -
   1.264 -                How do _I_ use it?
   1.265 -
   1.266 -Practically everyone I give gutcheck to asks me how _I_ use it.
   1.267 -Well, when I get a text for posting, say filename.txt, I run
   1.268 -
   1.269 -    gutcheck -o filename.txt
   1.270 -
   1.271 -That gives me a quick idea what I'm dealing with. It'll tell
   1.272 -me what kind of problems gutcheck sees, and give me an idea 
   1.273 -of how much more work needs to be done on the text. Keep in 
   1.274 -mind that gutcheck doesn't do anything like a full spellcheck,
   1.275 -but when I see a text that has a lot of problems, I assume that
   1.276 -it probably needs a spellcheck too.
   1.277 -
   1.278 -Having got a feel for the ballpark, I run
   1.279 -
   1.280 -    gutcheck filename.txt > jj
   1.281 -
   1.282 -where jj is my personal, all-purpose filename for temporary data
   1.283 -that doesn't need to be kept. Then I open filename.txt and jj in
   1.284 -a split-screen view in my editor, and work down the text, fixing
   1.285 -whatever needs fixing, and skipping whatever doesn't. If your 
   1.286 -editor doesn't split-screen, you can get much the same effect by 
   1.287 -opening your original file in your normal editor, and jj (or your
   1.288 -equivalent name) in something like Notepad, keeping both in view 
   1.289 -at the same time.
   1.290 -
   1.291 -Twice a day, an automatic process looks at all recently-posted
   1.292 -texts, and emails Michael, me, and sometimes other people with
   1.293 -their gutcheck summaries.
   1.294 -
   1.295 -
   1.296 -
   1.297 -        Future development of gutcheck
   1.298 -
   1.299 -Gutcheck has gone about as far as it can, given its current
   1.300 -structure. In order to add better singlequotes checking,
   1.301 -sentence checking, better he/be checking and other good stuff
   1.302 -that I'd like to see, I'll have to rewrite it from a different
   1.303 -angle - looking at the syntax instead of the lines. And I'll
   1.304 -probably get around to that sooner or later.
   1.305 -
   1.306 -Meantime, I'm just trying to get this version stabilized, so
   1.307 -please report any bugs you find. When it is stable, I'll run
   1.308 -up a Windows port for those timid souls who can't look a 
   1.309 -command line in the eye. :-)
   1.310 -
   1.311 -And I've started work on gutspell, a companion to gutcheck
   1.312 -which will concentrate on spelling problems. PG spelling
   1.313 -problems are unusual, since the range of texts we cover is
   1.314 -so wide, and I'll be taking a somewhat unorthodox approach
   1.315 -to writing this spelling-checker _specifically_ for texts
   1.316 -containing a lot of dialect and uncommon words that have
   1.317 -probably already been spell-checked against a standard
   1.318 -modern dictionary.
   1.319 -
   1.320 -
   1.321 -
   1.322 -
   1.323 -Explanations of common gutcheck messages:
   1.324 -
   1.325 -    --> 74 lines in this file have white space at end
   1.326 -
   1.327 -    PG texts shouldn't have extra white space added at end of line.
   1.328 -    Don't worry too much about this; they're not doing any harm,
   1.329 -    and they'll be removed during posting anyway.
   1.330 -
   1.331 -
   1.332 -    --> 348 lines in this file are short. Not reporting short lines.
   1.333 -    --> 84 lines in this file are long. Not reporting long lines.
   1.334 -    --> 8 lines in this file are VERY long!
   1.335 -
   1.336 -    If there are a lot of long or short lines, Gutcheck won't list
   1.337 -    them individually. The short lines version of this message
   1.338 -    is commonly seen when gutchecking poetry and some plays, where
   1.339 -    the normal line length is shorter than the standard for prose.
   1.340 -    A "VERY long" line is one over 80 characters.  You normally
   1.341 -    shouldn't have any of these, but sometimes you may have to render
   1.342 -    a table that must be that long, or some special preformatted
   1.343 -    quotation that can't be broken.
   1.344 -
   1.345 -
   1.346 -    --> There are 75 spaced dashes and em-dashes in this file. Not reporting them.
   1.347 -
   1.348 -    The PG standard for an emdash--like these--is two minus signs
   1.349 -    with no spaces before or after them. However, some older texts
   1.350 -    used spaced dashes - like these -- and if there are very many
   1.351 -    such spaced dashes in the file, gutcheck just draws your
   1.352 -    attention to it and doesn't list them individually.
   1.353 -
   1.354 -
   1.355 -
   1.356 -    Line 3020 - Non-ASCII character 233
   1.357 -
   1.358 -    Standard PG texts should use only ASCII characters with values
   1.359 -    up to 127; however, non-English, accented characters can be 
   1.360 -    represented according to several different non-ASCII encoding 
   1.361 -    schemes, using values over 127. If you have a plain English text
   1.362 -    with a few accented characters in words like cafe or tete-a-tete,
   1.363 -    you should replace the accented characters with their unaccented 
   1.364 -    versions. The English pound sign is another commonly-seen
   1.365 -    non-ASCII character. If you have enough non-ASCII characters in
   1.366 -    your text that you feel removing them would degrade your text
   1.367 -    unacceptably, you should probably consider doing an 8-bit text
   1.368 -    as well as a plain-ASCII version.
   1.369 -
   1.370 -
   1.371 -
   1.372 -    Line 1207 - Non-ISO-8859 character 156
   1.373 -
   1.374 -    Even in "8-bit" texts, there are distinctions between code sets.
   1.375 -    The ISO-8859 family of 8-bit code sets is the most commonly used
   1.376 -    in PG, and these sets do not define values in the range 128 through
   1.377 -    159 as printable characters. It's quite common for someone on a
   1.378 -    Windows or Mac machine to use a non-ISO character inadvertently,
   1.379 -    so this message warns that the character is not only not ASCII,
   1.380 -    but also outside the ISO-8859 range.
   1.381 -
   1.382 -
   1.383 -
   1.384 -    Line 46 - Tab character?
   1.385 -
   1.386 -    Some editors and WPs will put in Tab characters (character 9) to
   1.387 -    indicate indented text. You should not use these in a PG text,
   1.388 -    because you can't be sure how they will appear on a reader's
   1.389 -    screen. Find the Tab, and replace it with the appropriate number
   1.390 -    of spaces.
   1.391 -
   1.392 -
   1.393 -    Line 1327 - Tilde character?
   1.394 -
   1.395 -    The tilde character (~) might be legitimately used, but it's the
   1.396 -    character commonly used by OCR software to indicate a place where
   1.397 -    it couldn't make out the letter, so gutcheck flags it.
   1.398 -
   1.399 -
   1.400 -
   1.401 -    Line 1347 - Asterisk?
   1.402 -
   1.403 -    Asterisks are reported only in paranoid mode (see -x). 
   1.404 -    Like tildes, they are often used to indicate errors, but they are
   1.405 -    also legitimately used as line delimiters and footnote markers.
   1.406 -
   1.407 -
   1.408 -
   1.409 -    Line 1451 - Long line 129
   1.410 -
   1.411 -    PG texts should have lines shorter than 76. There may be occasions
   1.412 -    where you decide that you really have to go out to 79 characters,
   1.413 -    but the sample above says that line 1451 is 129 characters long -
   1.414 -    probably two lines run together.
   1.415 -
   1.416 -
   1.417 -
   1.418 -    Line 1590 - Short line?
   1.419 -
   1.420 -    PG texts should have lines longer than 54 characters. However,
   1.421 -    there are special cases like poetry and tables of contents where
   1.422 -    the lines _should_ be shorter. So treat Gutcheck warnings about
   1.423 -    short lines carefully. Sometimes it's a genuine formatting
   1.424 -    problem; sometimes the line really needs to be short.
   1.425 -
   1.426 -    Hint: gutcheck will not flag lines as short if they are indented
   1.427 -    - if they start with a space. I like to start inserted stanzas
   1.428 -    and other such items indented with a couple of spaces so that 
   1.429 -    they stand out from the main text anyway.
   1.430 -
   1.431 -
   1.432 -
   1.433 -    Line 1804 - Begins with punctuation?
   1.434 -
   1.435 -    Lines should normally not begin with commas, periods and so on.
   1.436 -    An exception is ellipses . . . which can happen at start of line.
   1.437 -
   1.438 -
   1.439 -
   1.440 -    Line 1850 - Spaced em-dash?
   1.441 -
   1.442 -    The PG standard for an em-dash--like these--is two minus signs
   1.443 -    with no spaces before or after them. Gutcheck flags non-PG
   1.444 -    em-dashes - like this one. Normally, you will replace it with a 
   1.445 -    PG-standard em-dash.
   1.446 -
   1.447 -
   1.448 -
   1.449 -    Line 1904 - Query he/be error?
   1.450 -
   1.451 -    Gutcheck makes a very minor effort to look for that scourge of all
   1.452 -    proofreaders, "be" replacing "he" or vice-versa, and draws your
   1.453 -    attention to it when it thinks it has found one.
   1.454 -
   1.455 -
   1.456 -
   1.457 -    Line 2017 - Query digit in a1most
   1.458 -
   1.459 -    The digit 1 is commonly OCRed for the letter l, the digit 0 for
   1.460 -    the letter O, and so on. When gutcheck sees a mix of digits and
   1.461 -    letters, it warns you. It may generate a false positive for
   1.462 -    something like 7am.
   1.463 -
   1.464 -
   1.465 -
   1.466 -    Line 2083 - Query standalone 0
   1.467 -
   1.468 -    In paranoid mode (see -x) only, gutcheck warns about the digit 0 
   1.469 -    and the number 1 standing alone as a word. This can happen if the 
   1.470 -    OCR misreads the words O or I.
   1.471 -
   1.472 -
   1.473 -
   1.474 -    Line 2115 - Query word whetber
   1.475 -
   1.476 -    If you have switched typo-checking on, gutcheck looks for
   1.477 -    potential typos, especially common h/b errors. It's not
   1.478 -    infallible; it sometimes queries legit words, but it's
   1.479 -    always worth taking a look.
   1.480 -
   1.481 -
   1.482 -
   1.483 -    Line 2190 column 14 - Missing space?
   1.484 -
   1.485 -    Omitting a space is a very common error,especially coming from
   1.486 -    OCRed text,and can be hard for a human to spot. The commas in
   1.487 -    the previous sentence illustrate the kind of thing I mean.
   1.488 -
   1.489 -
   1.490 -
   1.491 -    Line 2240 column 48 - Spaced punctuation?
   1.492 -
   1.493 -    The flip side of the "missing space" error , here , is when extra
   1.494 -    spaces are added before punctuation . Some old texts appear to add
   1.495 -    extra spaces around punctuation consistently, but this was a
   1.496 -    typographical convention rather than the author's intent, and the
   1.497 -    extra "spaces" should be removed when preparing a PG text.
   1.498 -
   1.499 -
   1.500 -
   1.501 -    Line 2301 column 19 - Unspaced quotes?
   1.502 -
   1.503 -    Another common spacing problem occurs in a phrase like "You wait
   1.504 -    there,"he said.
   1.505 -
   1.506 -
   1.507 -
   1.508 -    Line 2385 column 27 - Wrongspaced quotes?
   1.509 -
   1.510 -    As of version 0.98, gutcheck adds extra checks on whether a quote
   1.511 -    seems to be a start or end quote, and queries those that appear to
   1.512 -    be misplaced. This does give rise to false positives when quotes are
   1.513 -    nested, for example:
   1.514 -
   1.515 -    "And how," she asked, "will your "friends" help you now?"
   1.516 -
   1.517 -    but these false positives are worth it because of the many cases
   1.518 -    that this test catches, notably those like:
   1.519 -
   1.520 -    "And how, "she said," will your friends help you now?"
   1.521 -
   1.522 -    Sometimes a "wrongspaced quotes" query will arise because an earlier
   1.523 -    quote in the paragraph was omitted, so if the place specified seems
   1.524 -    to be OK, look back to see whether there's a problem in the preceding
   1.525 -    lines.
   1.526 -
   1.527 -
   1.528 -
   1.529 -    Line 2400 - HTML Tag? <PRE>
   1.530 -
   1.531 -    Some PG texts have been converted from HTML, and not all of the
   1.532 -    HTML tags have been removed.
   1.533 -
   1.534 -
   1.535 -
   1.536 -    Line 2402 - HTML symbol? &emdash;
   1.537 -
   1.538 -    Similarly, special HTML symbol characters can survive into PG
   1.539 -    texts. Can occasionally produce amusing false positives like
   1.540 -    . . . Marwick & Co were well known for it;
   1.541 -
   1.542 -
   1.543 -
   1.544 -    Line 2540 - Mismatched quotes
   1.545 -
   1.546 -    Another gutcheck mainstay - unclosed doublequotes in a paragraph.
   1.547 -    See the discussion of quotes in the switches section near the
   1.548 -    start of this file.
   1.549 -    
   1.550 -    Since the mismatch doesn't occur on any one line, gutcheck quotes
   1.551 -    the line number of the first blank line following the paragraph,
   1.552 -    since this is the point where it reconciles the count of quotes.
   1.553 -    However, if gutcheck is echoing lines, that is, you haven't used
   1.554 -    the -e switch, it will show the _first_ line of the paragraph, 
   1.555 -    to help you find the place without using line numbers. The 
   1.556 -    offending paragraph is therefore between the quoted line and 
   1.557 -    the line number given.
   1.558 -
   1.559 -
   1.560 -
   1.561 -    Line 2587 - Mismatched single quotes
   1.562 -
   1.563 -    Only checked with the -s switch, since checking single quotes is 
   1.564 -    not a very reliable process. Otherwise, the same logic as for 
   1.565 -    doublequotes applies.
   1.566 -
   1.567 -
   1.568 -
   1.569 -    Line 2877 - Mismatched round brackets?
   1.570 -
   1.571 -    Also curly and square brackets. Texts with a lot of brackets, like
   1.572 -    plays with bracketed stage instructions, may have mismatches.
   1.573 -
   1.574 -
   1.575 -    Line 3150 - No CR?
   1.576 -    Line 3204 - Two successive CRs?
   1.577 -    Line 3281 position 75 - CR without LF?
   1.578 -
   1.579 -    These are the invalid line-end warnings. See the discussion of
   1.580 -    line-end checking in the switches section near the start of this
   1.581 -    file. If you see these, and your editor doesn't show anything
   1.582 -    wrong, you should probably try deleting the characters just before
   1.583 -    and after the line end, and the line-end itself, then retyping the
   1.584 -    characters and the line-end.
   1.585 -
   1.586 -
   1.587 -    Line 2940 - Paragraph starts with lower-case
   1.588 -
   1.589 -    A common error in an e-text is for an extra blank line
   1.590 -
   1.591 -    to be put in, like the blank line above, and this often
   1.592 -    shows up as a new paragraph beginning with lower case.
   1.593 -    Sometimes the blank line is deliberate, as when a 
   1.594 -    quotation is inserted in a speech. Use your judgement.
   1.595 -
   1.596 -
   1.597 -    Line 2987 - Extra period?
   1.598 -
   1.599 -    An extra period. is a. common problem in OCRed text. and usually
   1.600 -    arises when a speck of dust on the page is mistaken for a period.
   1.601 -    or. as occasionally happens. when a comma loses its tail.
   1.602 -
   1.603 -
   1.604 -    Line 3012 column 12 - Double punctuation?
   1.605 -
   1.606 -    Double punctuation., like that,, is a common typo and
   1.607 -    scanno. Some books have much legit double punctuation,
   1.608 -    like etc., etc., but it's worth checking anyway.
   1.609 -
   1.610 -
   1.611 -
   1.612 -            *       *       *        *
   1.613 -
   1.614 -For Windows-only users who are unfamiliar with DOS:
   1.615 -
   1.616 -    If you're a Windows-only user, you need to save
   1.617 -    gutcheck.exe into the folder (directory) where the
   1.618 -    text file you want to check is. Let's say your
   1.619 -    text file is in C:\GUT, then you should save
   1.620 -    GUTCHECK.EXE into C:\GUT.
   1.621 -
   1.622 -    Now get to a DOS prompt. You can do this by
   1.623 -    selecting the "Command Prompt" or "MS-DOS Prompt"
   1.624 -    option that will be somewhere on your
   1.625 -    Start/Programs menu.
   1.626 -
   1.627 -    Now get into the C:\GUT directory. 
   1.628 -    You can do this using the CD (change directory) 
   1.629 -    command, like this:
   1.630 -        CD \GUT
   1.631 -    and your prompt will change to 
   1.632 -        C:\GUT>
   1.633 -    so you know you're in the right place.
   1.634 -
   1.635 -    Now type
   1.636 -        gutcheck yourfile.txt
   1.637 -    and you'll see gutcheck's report
   1.638 -
   1.639 -    By default, gutcheck prints its queries to screen.
   1.640 -    If you want to create a file of them, to edit
   1.641 -    against the text, you can use the greater-than
   1.642 -    sign (>) to tell it to output the report to a
   1.643 -    file. For example, if you want its report in a
   1.644 -    file called QUERIES.LST, you could type
   1.645 -    
   1.646 -        gutcheck yourfile.txt > queries.lst
   1.647 -
   1.648 -    The queries.lst file will then contain the listing
   1.649 -    of possible formatting errors, and you can
   1.650 -    edit it alongside your text.
   1.651 -
   1.652 -    Whatever you do, DON'T make the filename after
   1.653 -    the greater-than sign the name of a file already
   1.654 -    on your disk that you want to keep, because
   1.655 -    the greater-than sign will cause gutcheck to
   1.656 -    replace any existing file of that name.
   1.657 -
   1.658 -    So, for example, if you have two Tolstoy files
   1.659 -    that you want to check, called WARPEACE.TXT and 
   1.660 -    ANNAK.TXT, make sure that neither of these names
   1.661 -    is ever used following the greater-than sign.
   1.662 -    To check these correctly, you might do:
   1.663 -
   1.664 -    gutcheck warpeace.txt >war.lst
   1.665 -
   1.666 -    and
   1.667 -
   1.668 -    gutcheck annak.txt > annak.lst
   1.669 -
   1.670 -    separately. Then you can look at war.lst and annak.lst
   1.671 -    to see the gutcheck reports.
   1.672 -
   1.673 -            *       *       *        *
   1.674 -
   1.675 -
   1.676 -For existing 0.98 users upgrading to 0.99:
   1.677 -
   1.678 -    If you run on old 16-bit DOS or Windows 3.x, I'm afraid
   1.679 -    you're out of luck. I'm not saying it _can't_ be compiled
   1.680 -    to run on 16-bit, but the executable with the package is
   1.681 -    for Win32 only. *nix users won't notice the change at all.
   1.682 -
   1.683 -
   1.684 -    There are two new switches: -u and -d. 
   1.685 -          See above for full rundown.
   1.686 -
   1.687 -
   1.688 -Here's a list of the new errors:
   1.689 -
   1.690 -    Line 1456 - Carat character?
   1.691 -
   1.692 -    I^ve found a few.
   1.693 -
   1.694 -
   1.695 -    Line 1821 - Forward slash?
   1.696 -
   1.697 -    Common error for italicized "I", or so /'ve found.
   1.698 -
   1.699 -
   1.700 -    Line 2139 - Query missing paragraph break?
   1.701 -
   1.702 -    "Come here, son." "Do I _have_ to go, dad?"
   1.703 -    Like that. False positives in some texts. Sorry 'bout that,
   1.704 -    but these are often errors.
   1.705 -
   1.706 -
   1.707 -    Line 2200 - Query had/bad error?
   1.708 -
   1.709 -    Clear enough. Doesn't catch as many as I'd like it to,
   1.710 -    but rarely gives false alarms.
   1.711 -
   1.712 -
   1.713 -    Line 2268 - Query punctuation after the?
   1.714 -
   1.715 -    Some words, like "the", very rarely have punctuation
   1.716 -    following them. Others, like "Mrs", usually have a
   1.717 -    period, but never a comma. Occasional false positives.
   1.718 -
   1.719 -
   1.720 -    Line 2380 - Query possible scanno arid
   1.721 -
   1.722 -    It found one of your user-defined typos when you
   1.723 -    used the -u switch.
   1.724 -
   1.725 -
   1.726 -    Line 2511 - Capital "S"?
   1.727 -
   1.728 -    Surprisingly common specific case, like: Jane'S 
   1.729 -
   1.730 -    
   1.731 -    Line 3469 - endquote missing punctuation?
   1.732 -
   1.733 -    OK. This one can really cause a lot of false positives
   1.734 -    in some books, but it switches itself off if it finds
   1.735 -    more than 20 in a text, unless you force it to list them
   1.736 -    all with the -v switch.
   1.737 -    "Hey, dad" Johnny said, "can we go now?"
   1.738 -    is a common punctuation-missing error.
   1.739 -
   1.740 -
   1.741 -    Line 4266 - Mismatched underscores?
   1.742 -
   1.743 -    Like mismatched anything else!
   1.744 -
   1.745 -