1.1 --- a/doc/gutcheck.txt Fri Jan 27 00:28:11 2012 +0000
1.2 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000
1.3 @@ -1,742 +0,0 @@
1.4 -
1.5 -
1.6 - Gutcheck documentation
1.7 -
1.8 -
1.9 -gutcheck: lists possible common formatting errors in a Project
1.10 -Gutenberg candidate file. It is a command line program and can be used
1.11 -under Win32 or Unix (gutcheck.c should compile anywhere; if it doesn't,
1.12 -tell me). For Windows-only people, there is an appendix at the end
1.13 -with brief instructions for running it.
1.14 -
1.15 -
1.16 -Current version: 0.99. Users of 0.98 see end of file for changes.
1.17 -
1.18 -You should also have received the licence file COPYING, a README file,
1.19 -gutcheck.c, the source code, and gutcheck.exe, a DOS executable, with
1.20 -this file.
1.21 -
1.22 -This software is Copyright Jim Tinsley 2000-2005.
1.23 -
1.24 -Gutcheck comes wih ABSOLUTELY NO WARRANTY. For details, read the file COPYING.
1.25 -This is Free Software; you may redistribute it under certain conditions (GPL).
1.26 -
1.27 -See http://gutcheck.sourceforge.net for the latest version.
1.28 -
1.29 -
1.30 -Usage is: gutcheck [-setopxlywm] filename
1.31 - where:
1.32 - -s checks Single quotes
1.33 - -e switches off Echoing of lines
1.34 - -t checks Typos
1.35 - -o produces an Overview only
1.36 - -p sets strict quotes checking for Paragraphs
1.37 - -x (paranoid) switches OFF typo checking and extra checks
1.38 - -l turns off Line-end checks
1.39 - -y sets error messages to stdout
1.40 - -w is a special mode for web uploads (for future use)
1.41 - -v (verbose) forces individual reporting of minor problems
1.42 - -m interprets Markup of some common HTML tags and entities
1.43 - -u warns about words in a user-defined typo file gutcheck.typ
1.44 - -d ignores some DP-specific markup
1.45 -
1.46 -Running gutcheck without any parameters will display a brief help message.
1.47 -
1.48 -Sample usage:
1.49 -
1.50 - gutcheck warpeace.txt
1.51 -
1.52 -
1.53 -More detail:
1.54 -
1.55 - Echoing lines (-e to switch off)
1.56 -
1.57 - You may find it convenient, when reviewing Gutcheck's
1.58 - suggestions, to see the line that Gutcheck is questioning.
1.59 - That way, you can often see at a glance whether it is
1.60 - a real error that needs to be fixed, or a false positive
1.61 - that should be in the text, but Gutcheck's limited
1.62 - programming doesn't understand.
1.63 -
1.64 - By default, gutcheck echoes these lines, but if you don't
1.65 - want to see the lines referred to, -e will switch it OFF.
1.66 -
1.67 -
1.68 - Quotes (-s and -p switches)
1.69 -
1.70 - Gutcheck always looks for unbalanced doublequotes in a
1.71 - paragraph. It is a common convention for writers not to
1.72 - close quotes in a paragraph if the next paragraph opens
1.73 - with quotes and is a continuation by the same speaker.
1.74 -
1.75 - Gutcheck therefore does not normally report unclosed quotes
1.76 - if the next paragraph begins with a quote. If you need
1.77 - to see all unclosed quotes, even where the next paragraph
1.78 - begins with a quote, you should use the -p switch.
1.79 -
1.80 - Singlequotes (') are a problem, since the same character
1.81 - is used for an apostrophe. I'm not sure that it is
1.82 - possible to get 100% accuracy on singlequotes checking,
1.83 - particularly since dialect, quite common in PG texts,
1.84 - upsets the normal rules so badly. Consider the sentence:
1.85 - 'Tis often said that a man's a man for a' that.
1.86 - As humans, we recognize that both apostrophes are used
1.87 - for contractions rather than quotes, but it isn't easy
1.88 - to get a program to recognize that.
1.89 -
1.90 - Since Gutcheck makes too many mistakes when trying to match
1.91 - singlequotes, it doesn't look for unbalanced singlequotes
1.92 - unless you specify the -s switch.
1.93 -
1.94 - Consider these sentences, which illustrate the main cases:
1.95 -
1.96 - 'Tis often said that a fool and his money are soon parted.
1.97 -
1.98 - 'Becky's goin' home,' said Tom.
1.99 -
1.100 - The dogs' tails wagged in unison.
1.101 -
1.102 - Those 'pack dogs' of yours look more like wolves.
1.103 -
1.104 -
1.105 -
1.106 - Typos (-t switch)
1.107 -
1.108 - It's not Gutcheck's job to be a spelling checker, but it
1.109 - does check for a list of common typos and OCR errors if you
1.110 - use the -t switch. (The -x switch also turns typo checking on.)
1.111 -
1.112 - It also checks for character combinations, especially involving
1.113 - h and b, which are often confused by OCR, that rarely or never
1.114 - occur. For example, it queries "tbe" in a word. Now, "the" often
1.115 - occurs, but "tbe" is very rare (heartbeat, hotbed), so I'm
1.116 - playing the odds - a few false positives for many errors found.
1.117 - Similarly with "ii", which is a very common OCR error.
1.118 -
1.119 - Gutcheck suppresses multiple reporting of the first 40 "typos"
1.120 - found. This is to remove the annoyance of seeing something like
1.121 - "FN" (footnote) or "LK" (initials) flagged as a typo 147 times
1.122 - in a text.
1.123 -
1.124 -
1.125 - Line-end checking (-l switch to disable)
1.126 -
1.127 - All PG texts should have a Carriage Return (CR - character 13)
1.128 - and a Line Feed (LF - character 10) at end of each line,
1.129 - regardless of what O/S you made them on. DOS/Windows, Unix
1.130 - and Mac have different conventions, but the final text should
1.131 - always use a CR/LF pair as its line terminator.
1.132 -
1.133 - By default, Gutcheck verifies that every line does have
1.134 - the correct terminator, but if you're on a work-in-progress
1.135 - in Linux, you might want to convert the line-ends as a final
1.136 - step, and not want to see thousands of errors every time you
1.137 - run Gutcheck before that final step, so you can turn off
1.138 - this checking with the -l switch.
1.139 -
1.140 -
1.141 - Paranoid mode (-x switch to disable: Trust No One :-)
1.142 -
1.143 - -x switches OFF typo-checking, the -t flag, automatically
1.144 - and some extra checks like standalone 1 and 0 queries.
1.145 -
1.146 -
1.147 - Overview mode (-o switch)
1.148 -
1.149 - This mode just gives a count of queries found
1.150 - instead of a detailed list.
1.151 -
1.152 -
1.153 - Header quote (-h switch)
1.154 -
1.155 - If you use the -h switch, gutcheck will also display
1.156 - the Title, Author, Release and Edition fields from the
1.157 - PG header. This is useful mostly for the automated
1.158 - checks we do on recently-posted texts.
1.159 -
1.160 -
1.161 - Errors to stdout (-y switch)
1.162 -
1.163 - If you're just running gutcheck normally, you can ignore
1.164 - this. It's only there for programs that provide a front
1.165 - end to gutcheck. It makes error messages appear within
1.166 - the output of gutcheck so that the front end knows whether
1.167 - gutcheck ran OK.
1.168 -
1.169 -
1.170 - Verbose reporting (-v switch)
1.171 -
1.172 - Normally, if gutcheck sees lots of long lines, short lines,
1.173 - spaced dashes, non-ASCII characters or dot-commas ".," it
1.174 - assumes these are features of the text, counts and summarizes
1.175 - them at the top of its report, but does not list them
1.176 - individually. If the -v switch is on, gutcheck will list them all.
1.177 -
1.178 -
1.179 - Markup interpretation (-m switch)
1.180 -
1.181 - Normally, gutcheck flags anything it suspects of being HTML
1.182 - markup as a possible error. When you use the -m switch,
1.183 - however, it matches anything that looks like markup against
1.184 - a short list of common HTML tags and entities. If the markup
1.185 - is in that list, it either ignores the markup, in the case
1.186 - of a tag, or "interprets" the markup as its nearest ASCII
1.187 - equivalent, in the case of an entity. So, for example, using
1.188 - this switch, gutcheck will "see"
1.189 -
1.190 - “He went <i>thataway!</i>”
1.191 -
1.192 - as
1.193 -
1.194 - "He went thataway!"
1.195 -
1.196 - and report accordingly.
1.197 -
1.198 - This switch does not, not, NOT check the validity of HTML;
1.199 - it exists so that you can run gutcheck on most HTML texts
1.200 - for PG, and get sane results. It does not support all tags.
1.201 - It does not support all entities. When it sees a tag or entity
1.202 - it does not recognize, it will query it as HTML just as if
1.203 - you hadn't specified the -m switch.
1.204 -
1.205 - Gutcheck 0.99 will automatically switch on markup interpretation
1.206 - if it sees a lot of tags that appear to be markup, so mostly, you
1.207 - won't have to specify this.
1.208 -
1.209 - User-defined typos (-u switch)
1.210 -
1.211 - If you have a file named gutcheck.typ either in your current
1.212 - working directory or in the directory from which you explicitly
1.213 - invoked gutcheck, but not necessarily on your path, and if you
1.214 - specify the -u switch, gutcheck will query any word specified
1.215 - in that file. The file is simple: one word, in lower case, per
1.216 - line. 999 lines are allowed for. Be careful not to put multiple
1.217 - words onto a line, or leave any rubbish other than the word on
1.218 - the line. You should have received a sample file gutcheck.typ
1.219 - with this package.
1.220 -
1.221 - Ignore DP markup (-d switch)
1.222 -
1.223 - Distributed Proofreaders (http://www.pgdp.net) is currently
1.224 - (2005) the main source of PG texts, and proofers there use
1.225 - special conventions. This switch understands those conventions,
1.226 - so that people can use gutcheck on files in process that still
1.227 - haven't had the special conventions removed yet. The special
1.228 - conventions supported in 0.99 are page-separators and
1.229 - "<sc>", "</sc>", "/*", "*/", "/#", "#/", "/$", "$/".
1.230 -
1.231 -
1.232 -You will probably only run gutcheck on a text once or maybe twice,
1.233 -just prior to uploading; it usually finds a few formatting problems;
1.234 -it also usually finds queries that aren't problems at all - it often
1.235 -questions Tables of Contents for having short lines, for example.
1.236 -These are called "false positives", and need a human to decide on
1.237 -them.
1.238 -
1.239 -The text should be standard prose, and already close to PG normal
1.240 -format (plain text, about 70 characters per line with blank lines
1.241 -between paragraphs).
1.242 -
1.243 -Gutcheck merely draws your attention to things that might be errors.
1.244 -It is NOT a substitute for human judgement. Formatting choices like
1.245 -short lines may be for a reason that this program can't understand.
1.246 -
1.247 -Even the most careful human proofing can leave errors behind in a
1.248 -text, and there are several automated checks you can do to help find
1.249 -them. Of these, spellchecking (with _very_ careful human judgement) is
1.250 -the most important and most useful.
1.251 -
1.252 -Gutcheck does perform some basic typo-checking if you ask it to,
1.253 -but its focus is on formatting errors specific to PG texts -
1.254 -mismatched quotes, non-ASCII characters, bad spacing, bad line
1.255 -length, HTML tags perhaps left from a conversion, unbalanced
1.256 -brackets.
1.257 -
1.258 -Suggestions for additional checks would be appreciated and duly
1.259 -considered, but no guarantees that they will be implemented.
1.260 -
1.261 -
1.262 -
1.263 -
1.264 - How do _I_ use it?
1.265 -
1.266 -Practically everyone I give gutcheck to asks me how _I_ use it.
1.267 -Well, when I get a text for posting, say filename.txt, I run
1.268 -
1.269 - gutcheck -o filename.txt
1.270 -
1.271 -That gives me a quick idea what I'm dealing with. It'll tell
1.272 -me what kind of problems gutcheck sees, and give me an idea
1.273 -of how much more work needs to be done on the text. Keep in
1.274 -mind that gutcheck doesn't do anything like a full spellcheck,
1.275 -but when I see a text that has a lot of problems, I assume that
1.276 -it probably needs a spellcheck too.
1.277 -
1.278 -Having got a feel for the ballpark, I run
1.279 -
1.280 - gutcheck filename.txt > jj
1.281 -
1.282 -where jj is my personal, all-purpose filename for temporary data
1.283 -that doesn't need to be kept. Then I open filename.txt and jj in
1.284 -a split-screen view in my editor, and work down the text, fixing
1.285 -whatever needs fixing, and skipping whatever doesn't. If your
1.286 -editor doesn't split-screen, you can get much the same effect by
1.287 -opening your original file in your normal editor, and jj (or your
1.288 -equivalent name) in something like Notepad, keeping both in view
1.289 -at the same time.
1.290 -
1.291 -Twice a day, an automatic process looks at all recently-posted
1.292 -texts, and emails Michael, me, and sometimes other people with
1.293 -their gutcheck summaries.
1.294 -
1.295 -
1.296 -
1.297 - Future development of gutcheck
1.298 -
1.299 -Gutcheck has gone about as far as it can, given its current
1.300 -structure. In order to add better singlequotes checking,
1.301 -sentence checking, better he/be checking and other good stuff
1.302 -that I'd like to see, I'll have to rewrite it from a different
1.303 -angle - looking at the syntax instead of the lines. And I'll
1.304 -probably get around to that sooner or later.
1.305 -
1.306 -Meantime, I'm just trying to get this version stabilized, so
1.307 -please report any bugs you find. When it is stable, I'll run
1.308 -up a Windows port for those timid souls who can't look a
1.309 -command line in the eye. :-)
1.310 -
1.311 -And I've started work on gutspell, a companion to gutcheck
1.312 -which will concentrate on spelling problems. PG spelling
1.313 -problems are unusual, since the range of texts we cover is
1.314 -so wide, and I'll be taking a somewhat unorthodox approach
1.315 -to writing this spelling-checker _specifically_ for texts
1.316 -containing a lot of dialect and uncommon words that have
1.317 -probably already been spell-checked against a standard
1.318 -modern dictionary.
1.319 -
1.320 -
1.321 -
1.322 -
1.323 -Explanations of common gutcheck messages:
1.324 -
1.325 - --> 74 lines in this file have white space at end
1.326 -
1.327 - PG texts shouldn't have extra white space added at end of line.
1.328 - Don't worry too much about this; they're not doing any harm,
1.329 - and they'll be removed during posting anyway.
1.330 -
1.331 -
1.332 - --> 348 lines in this file are short. Not reporting short lines.
1.333 - --> 84 lines in this file are long. Not reporting long lines.
1.334 - --> 8 lines in this file are VERY long!
1.335 -
1.336 - If there are a lot of long or short lines, Gutcheck won't list
1.337 - them individually. The short lines version of this message
1.338 - is commonly seen when gutchecking poetry and some plays, where
1.339 - the normal line length is shorter than the standard for prose.
1.340 - A "VERY long" line is one over 80 characters. You normally
1.341 - shouldn't have any of these, but sometimes you may have to render
1.342 - a table that must be that long, or some special preformatted
1.343 - quotation that can't be broken.
1.344 -
1.345 -
1.346 - --> There are 75 spaced dashes and em-dashes in this file. Not reporting them.
1.347 -
1.348 - The PG standard for an emdash--like these--is two minus signs
1.349 - with no spaces before or after them. However, some older texts
1.350 - used spaced dashes - like these -- and if there are very many
1.351 - such spaced dashes in the file, gutcheck just draws your
1.352 - attention to it and doesn't list them individually.
1.353 -
1.354 -
1.355 -
1.356 - Line 3020 - Non-ASCII character 233
1.357 -
1.358 - Standard PG texts should use only ASCII characters with values
1.359 - up to 127; however, non-English, accented characters can be
1.360 - represented according to several different non-ASCII encoding
1.361 - schemes, using values over 127. If you have a plain English text
1.362 - with a few accented characters in words like cafe or tete-a-tete,
1.363 - you should replace the accented characters with their unaccented
1.364 - versions. The English pound sign is another commonly-seen
1.365 - non-ASCII character. If you have enough non-ASCII characters in
1.366 - your text that you feel removing them would degrade your text
1.367 - unacceptably, you should probably consider doing an 8-bit text
1.368 - as well as a plain-ASCII version.
1.369 -
1.370 -
1.371 -
1.372 - Line 1207 - Non-ISO-8859 character 156
1.373 -
1.374 - Even in "8-bit" texts, there are distinctions between code sets.
1.375 - The ISO-8859 family of 8-bit code sets is the most commonly used
1.376 - in PG, and these sets do not define values in the range 128 through
1.377 - 159 as printable characters. It's quite common for someone on a
1.378 - Windows or Mac machine to use a non-ISO character inadvertently,
1.379 - so this message warns that the character is not only not ASCII,
1.380 - but also outside the ISO-8859 range.
1.381 -
1.382 -
1.383 -
1.384 - Line 46 - Tab character?
1.385 -
1.386 - Some editors and WPs will put in Tab characters (character 9) to
1.387 - indicate indented text. You should not use these in a PG text,
1.388 - because you can't be sure how they will appear on a reader's
1.389 - screen. Find the Tab, and replace it with the appropriate number
1.390 - of spaces.
1.391 -
1.392 -
1.393 - Line 1327 - Tilde character?
1.394 -
1.395 - The tilde character (~) might be legitimately used, but it's the
1.396 - character commonly used by OCR software to indicate a place where
1.397 - it couldn't make out the letter, so gutcheck flags it.
1.398 -
1.399 -
1.400 -
1.401 - Line 1347 - Asterisk?
1.402 -
1.403 - Asterisks are reported only in paranoid mode (see -x).
1.404 - Like tildes, they are often used to indicate errors, but they are
1.405 - also legitimately used as line delimiters and footnote markers.
1.406 -
1.407 -
1.408 -
1.409 - Line 1451 - Long line 129
1.410 -
1.411 - PG texts should have lines shorter than 76. There may be occasions
1.412 - where you decide that you really have to go out to 79 characters,
1.413 - but the sample above says that line 1451 is 129 characters long -
1.414 - probably two lines run together.
1.415 -
1.416 -
1.417 -
1.418 - Line 1590 - Short line?
1.419 -
1.420 - PG texts should have lines longer than 54 characters. However,
1.421 - there are special cases like poetry and tables of contents where
1.422 - the lines _should_ be shorter. So treat Gutcheck warnings about
1.423 - short lines carefully. Sometimes it's a genuine formatting
1.424 - problem; sometimes the line really needs to be short.
1.425 -
1.426 - Hint: gutcheck will not flag lines as short if they are indented
1.427 - - if they start with a space. I like to start inserted stanzas
1.428 - and other such items indented with a couple of spaces so that
1.429 - they stand out from the main text anyway.
1.430 -
1.431 -
1.432 -
1.433 - Line 1804 - Begins with punctuation?
1.434 -
1.435 - Lines should normally not begin with commas, periods and so on.
1.436 - An exception is ellipses . . . which can happen at start of line.
1.437 -
1.438 -
1.439 -
1.440 - Line 1850 - Spaced em-dash?
1.441 -
1.442 - The PG standard for an em-dash--like these--is two minus signs
1.443 - with no spaces before or after them. Gutcheck flags non-PG
1.444 - em-dashes - like this one. Normally, you will replace it with a
1.445 - PG-standard em-dash.
1.446 -
1.447 -
1.448 -
1.449 - Line 1904 - Query he/be error?
1.450 -
1.451 - Gutcheck makes a very minor effort to look for that scourge of all
1.452 - proofreaders, "be" replacing "he" or vice-versa, and draws your
1.453 - attention to it when it thinks it has found one.
1.454 -
1.455 -
1.456 -
1.457 - Line 2017 - Query digit in a1most
1.458 -
1.459 - The digit 1 is commonly OCRed for the letter l, the digit 0 for
1.460 - the letter O, and so on. When gutcheck sees a mix of digits and
1.461 - letters, it warns you. It may generate a false positive for
1.462 - something like 7am.
1.463 -
1.464 -
1.465 -
1.466 - Line 2083 - Query standalone 0
1.467 -
1.468 - In paranoid mode (see -x) only, gutcheck warns about the digit 0
1.469 - and the number 1 standing alone as a word. This can happen if the
1.470 - OCR misreads the words O or I.
1.471 -
1.472 -
1.473 -
1.474 - Line 2115 - Query word whetber
1.475 -
1.476 - If you have switched typo-checking on, gutcheck looks for
1.477 - potential typos, especially common h/b errors. It's not
1.478 - infallible; it sometimes queries legit words, but it's
1.479 - always worth taking a look.
1.480 -
1.481 -
1.482 -
1.483 - Line 2190 column 14 - Missing space?
1.484 -
1.485 - Omitting a space is a very common error,especially coming from
1.486 - OCRed text,and can be hard for a human to spot. The commas in
1.487 - the previous sentence illustrate the kind of thing I mean.
1.488 -
1.489 -
1.490 -
1.491 - Line 2240 column 48 - Spaced punctuation?
1.492 -
1.493 - The flip side of the "missing space" error , here , is when extra
1.494 - spaces are added before punctuation . Some old texts appear to add
1.495 - extra spaces around punctuation consistently, but this was a
1.496 - typographical convention rather than the author's intent, and the
1.497 - extra "spaces" should be removed when preparing a PG text.
1.498 -
1.499 -
1.500 -
1.501 - Line 2301 column 19 - Unspaced quotes?
1.502 -
1.503 - Another common spacing problem occurs in a phrase like "You wait
1.504 - there,"he said.
1.505 -
1.506 -
1.507 -
1.508 - Line 2385 column 27 - Wrongspaced quotes?
1.509 -
1.510 - As of version 0.98, gutcheck adds extra checks on whether a quote
1.511 - seems to be a start or end quote, and queries those that appear to
1.512 - be misplaced. This does give rise to false positives when quotes are
1.513 - nested, for example:
1.514 -
1.515 - "And how," she asked, "will your "friends" help you now?"
1.516 -
1.517 - but these false positives are worth it because of the many cases
1.518 - that this test catches, notably those like:
1.519 -
1.520 - "And how, "she said," will your friends help you now?"
1.521 -
1.522 - Sometimes a "wrongspaced quotes" query will arise because an earlier
1.523 - quote in the paragraph was omitted, so if the place specified seems
1.524 - to be OK, look back to see whether there's a problem in the preceding
1.525 - lines.
1.526 -
1.527 -
1.528 -
1.529 - Line 2400 - HTML Tag? <PRE>
1.530 -
1.531 - Some PG texts have been converted from HTML, and not all of the
1.532 - HTML tags have been removed.
1.533 -
1.534 -
1.535 -
1.536 - Line 2402 - HTML symbol? &emdash;
1.537 -
1.538 - Similarly, special HTML symbol characters can survive into PG
1.539 - texts. Can occasionally produce amusing false positives like
1.540 - . . . Marwick & Co were well known for it;
1.541 -
1.542 -
1.543 -
1.544 - Line 2540 - Mismatched quotes
1.545 -
1.546 - Another gutcheck mainstay - unclosed doublequotes in a paragraph.
1.547 - See the discussion of quotes in the switches section near the
1.548 - start of this file.
1.549 -
1.550 - Since the mismatch doesn't occur on any one line, gutcheck quotes
1.551 - the line number of the first blank line following the paragraph,
1.552 - since this is the point where it reconciles the count of quotes.
1.553 - However, if gutcheck is echoing lines, that is, you haven't used
1.554 - the -e switch, it will show the _first_ line of the paragraph,
1.555 - to help you find the place without using line numbers. The
1.556 - offending paragraph is therefore between the quoted line and
1.557 - the line number given.
1.558 -
1.559 -
1.560 -
1.561 - Line 2587 - Mismatched single quotes
1.562 -
1.563 - Only checked with the -s switch, since checking single quotes is
1.564 - not a very reliable process. Otherwise, the same logic as for
1.565 - doublequotes applies.
1.566 -
1.567 -
1.568 -
1.569 - Line 2877 - Mismatched round brackets?
1.570 -
1.571 - Also curly and square brackets. Texts with a lot of brackets, like
1.572 - plays with bracketed stage instructions, may have mismatches.
1.573 -
1.574 -
1.575 - Line 3150 - No CR?
1.576 - Line 3204 - Two successive CRs?
1.577 - Line 3281 position 75 - CR without LF?
1.578 -
1.579 - These are the invalid line-end warnings. See the discussion of
1.580 - line-end checking in the switches section near the start of this
1.581 - file. If you see these, and your editor doesn't show anything
1.582 - wrong, you should probably try deleting the characters just before
1.583 - and after the line end, and the line-end itself, then retyping the
1.584 - characters and the line-end.
1.585 -
1.586 -
1.587 - Line 2940 - Paragraph starts with lower-case
1.588 -
1.589 - A common error in an e-text is for an extra blank line
1.590 -
1.591 - to be put in, like the blank line above, and this often
1.592 - shows up as a new paragraph beginning with lower case.
1.593 - Sometimes the blank line is deliberate, as when a
1.594 - quotation is inserted in a speech. Use your judgement.
1.595 -
1.596 -
1.597 - Line 2987 - Extra period?
1.598 -
1.599 - An extra period. is a. common problem in OCRed text. and usually
1.600 - arises when a speck of dust on the page is mistaken for a period.
1.601 - or. as occasionally happens. when a comma loses its tail.
1.602 -
1.603 -
1.604 - Line 3012 column 12 - Double punctuation?
1.605 -
1.606 - Double punctuation., like that,, is a common typo and
1.607 - scanno. Some books have much legit double punctuation,
1.608 - like etc., etc., but it's worth checking anyway.
1.609 -
1.610 -
1.611 -
1.612 - * * * *
1.613 -
1.614 -For Windows-only users who are unfamiliar with DOS:
1.615 -
1.616 - If you're a Windows-only user, you need to save
1.617 - gutcheck.exe into the folder (directory) where the
1.618 - text file you want to check is. Let's say your
1.619 - text file is in C:\GUT, then you should save
1.620 - GUTCHECK.EXE into C:\GUT.
1.621 -
1.622 - Now get to a DOS prompt. You can do this by
1.623 - selecting the "Command Prompt" or "MS-DOS Prompt"
1.624 - option that will be somewhere on your
1.625 - Start/Programs menu.
1.626 -
1.627 - Now get into the C:\GUT directory.
1.628 - You can do this using the CD (change directory)
1.629 - command, like this:
1.630 - CD \GUT
1.631 - and your prompt will change to
1.632 - C:\GUT>
1.633 - so you know you're in the right place.
1.634 -
1.635 - Now type
1.636 - gutcheck yourfile.txt
1.637 - and you'll see gutcheck's report
1.638 -
1.639 - By default, gutcheck prints its queries to screen.
1.640 - If you want to create a file of them, to edit
1.641 - against the text, you can use the greater-than
1.642 - sign (>) to tell it to output the report to a
1.643 - file. For example, if you want its report in a
1.644 - file called QUERIES.LST, you could type
1.645 -
1.646 - gutcheck yourfile.txt > queries.lst
1.647 -
1.648 - The queries.lst file will then contain the listing
1.649 - of possible formatting errors, and you can
1.650 - edit it alongside your text.
1.651 -
1.652 - Whatever you do, DON'T make the filename after
1.653 - the greater-than sign the name of a file already
1.654 - on your disk that you want to keep, because
1.655 - the greater-than sign will cause gutcheck to
1.656 - replace any existing file of that name.
1.657 -
1.658 - So, for example, if you have two Tolstoy files
1.659 - that you want to check, called WARPEACE.TXT and
1.660 - ANNAK.TXT, make sure that neither of these names
1.661 - is ever used following the greater-than sign.
1.662 - To check these correctly, you might do:
1.663 -
1.664 - gutcheck warpeace.txt >war.lst
1.665 -
1.666 - and
1.667 -
1.668 - gutcheck annak.txt > annak.lst
1.669 -
1.670 - separately. Then you can look at war.lst and annak.lst
1.671 - to see the gutcheck reports.
1.672 -
1.673 - * * * *
1.674 -
1.675 -
1.676 -For existing 0.98 users upgrading to 0.99:
1.677 -
1.678 - If you run on old 16-bit DOS or Windows 3.x, I'm afraid
1.679 - you're out of luck. I'm not saying it _can't_ be compiled
1.680 - to run on 16-bit, but the executable with the package is
1.681 - for Win32 only. *nix users won't notice the change at all.
1.682 -
1.683 -
1.684 - There are two new switches: -u and -d.
1.685 - See above for full rundown.
1.686 -
1.687 -
1.688 -Here's a list of the new errors:
1.689 -
1.690 - Line 1456 - Carat character?
1.691 -
1.692 - I^ve found a few.
1.693 -
1.694 -
1.695 - Line 1821 - Forward slash?
1.696 -
1.697 - Common error for italicized "I", or so /'ve found.
1.698 -
1.699 -
1.700 - Line 2139 - Query missing paragraph break?
1.701 -
1.702 - "Come here, son." "Do I _have_ to go, dad?"
1.703 - Like that. False positives in some texts. Sorry 'bout that,
1.704 - but these are often errors.
1.705 -
1.706 -
1.707 - Line 2200 - Query had/bad error?
1.708 -
1.709 - Clear enough. Doesn't catch as many as I'd like it to,
1.710 - but rarely gives false alarms.
1.711 -
1.712 -
1.713 - Line 2268 - Query punctuation after the?
1.714 -
1.715 - Some words, like "the", very rarely have punctuation
1.716 - following them. Others, like "Mrs", usually have a
1.717 - period, but never a comma. Occasional false positives.
1.718 -
1.719 -
1.720 - Line 2380 - Query possible scanno arid
1.721 -
1.722 - It found one of your user-defined typos when you
1.723 - used the -u switch.
1.724 -
1.725 -
1.726 - Line 2511 - Capital "S"?
1.727 -
1.728 - Surprisingly common specific case, like: Jane'S
1.729 -
1.730 -
1.731 - Line 3469 - endquote missing punctuation?
1.732 -
1.733 - OK. This one can really cause a lot of false positives
1.734 - in some books, but it switches itself off if it finds
1.735 - more than 20 in a text, unless you force it to list them
1.736 - all with the -v switch.
1.737 - "Hey, dad" Johnny said, "can we go now?"
1.738 - is a common punctuation-missing error.
1.739 -
1.740 -
1.741 - Line 4266 - Mismatched underscores?
1.742 -
1.743 - Like mismatched anything else!
1.744 -
1.745 -