doc/bookloupe.txt
author ali <ali@juiblex.co.uk>
Wed Oct 02 09:14:33 2013 +0100 (2013-10-02)
changeset 105 2d48e8cdda24
parent 92 7a62c77a0dbe
permissions -rw-r--r--
Fix bug #19: Update documentation for 2.1
     1 
     2 
     3                             Bookloupe documentation
     4 
     5 
     6 bookloupe: lists possible common formatting errors in a Project
     7 Gutenberg candidate file. Bookloupe is based on gutcheck, written
     8 by Jim Tinsley. It is a command line program and can be used under
     9 Microsoft Windows, Mac or Unix. For Windows-only people, there is
    10 an appendix at the end with brief instructions for running it.
    11 
    12 Current version: 2.1
    13 
    14 This software is Copyright Jim Tinsley 2000-2005 and
    15 J. Ali Harlow 2012 onwards.
    16 
    17 Bookloupe comes wih ABSOLUTELY NO WARRANTY. For details, read the file COPYING.
    18 This is Free Software; you may redistribute it under certain conditions (GPL).
    19 
    20 See http://www.juiblex.co.uk/pgdp/bookloupe/ for the latest version.
    21 
    22 
    23                          Recent changes in behaviour
    24 
    25 Each new version of bookloupe brings bug fixes and improvements. Sometimes
    26 the behaviour is also changed in ways that might be unexpected:
    27 
    28 Odd characters
    29 
    30     The check for "odd" characters (tab, tilde, carat, forward slash and
    31     asterisks) is disabled in bookloupe 2.0 when the character set is
    32     switched from ASCII/ISO-8859-1 to UNICODE (ie., when the "There are a
    33     lot of foreign letters here." message is printed). As of bookloupe 2.1
    34     these tests operate independently of the character set selected.
    35 
    36     Users may notice this change most especially in the case of the
    37     DP-specific /* ... */ markup. Bookloupe 2.0 often did not warn when
    38     this markup was encountered even when the --dp switch was not given.
    39     Bookloupe 2.1 will warn about this markup unless dp-specific mode is
    40     switched on, paranoid mode is switched off or the ebook contains more
    41     than 10 lines containing asterisks. In the last case
    42 
    43       --> 11 lines in this file contain asterisks. Not reporting them.
    44 
    45     will be printed.
    46 
    47 
    48 
    49 Usage is: bookloupe [OPTION...] filename
    50 
    51 Options:
    52       -d, --dp                  ignores some DP-specific markup
    53       -e, --no-echo             switches off Echoing of lines
    54       -s, --squote              checks Single quotes
    55       --typo                    checks Typos
    56       -p, --qpara               sets strict quotes checking for Paragraphs
    57       --no-paranoid             switches OFF typo checking and extra checks
    58       -l, --no-line-end         turns off Line-end checks
    59       -o, --overview            produces an Overview only
    60       -y, --stdout              sets error messages to stdout
    61       -h, --header              echos the header fields
    62       -m, --markup              ignore some common HTML markup
    63       -u, --usertypo            warns about words in a user-defined typo file
    64       -v, --verbose             forces individual reporting of minor problems
    65       -w, --web                 special mode for web uploads (for future use)
    66       --charset=NAME            the set of characters valid for this ebook
    67       --dump-config             dump the current configuration
    68 
    69 There are also inverted options available which are useful when it is
    70 desired to override an option set in the configuration file:
    71 
    72       --no-dp, --echo, --no-squote, --no-typo, --no-qpara, --paranoid,
    73       --line-end, --no-overview, --no-stdout, --no-header, --no-markup,
    74       --no-usertypo --no-verbose.
    75 
    76 Note: there is no --no-web since --web simply selects a set of options.
    77 
    78 Finally there are a couple of options that toggle the state of options
    79 rather than setting or unsetting them: -t (for typo) and -x (for typo
    80 and paranoid). These are mainly intended for compatability with gutcheck.
    81 
    82 Running bookloupe without any parameters will display a brief help message.
    83 
    84 Sample usage:
    85 
    86     bookloupe warpeace.txt
    87 
    88 
    89 More detail:
    90 
    91     Configuration file
    92 
    93       Bookloupe will look for a file named bookloupe.ini to read as
    94       a configuration file. Options set in a configuration file can
    95       be overridden from the command line as required.
    96 
    97       The following directories are searched in order:
    98 
    99         1) The current working directory. When run from the command
   100 	line, this is the directory you ran it from. When run from
   101 	guiguts it will normally be the directory that contains the
   102 	guiguts program.
   103 
   104 	2) The directory containing the bookloupe program.
   105 
   106 	3) The user's configuration directory. Under MS-Windows this
   107 	is normally CSIDL_LOCAL_APPDATA which is typically set to
   108 	C:\Documents and Settings\<user>\Local Settings\Application Data.
   109 	On other platforms this is normally $XDG_CONFIG_HOME which, if
   110 	not set defaults to $HOME/.config
   111 
   112 	The directories to search can also be changed using the
   113 	$BOOKLOUPE_CONFIG_PATH environment variable which is a colon
   114 	separated (semi-colon separated under MS-Windows) list of
   115 	directories.
   116 
   117       The configuration file is a key file. This is very similar to,
   118       but not identical to a typical ini file as found under MS-Windows.
   119       Key files consist of a number of groups which start with the
   120       group name enclosed in square brackets on a line by itself.
   121       Bookloupe recognises just one group, "options". Then below the
   122       group name there follows the keys and their values for that
   123       group, one per line in the format key=value. Most of bookloupe's
   124       options are flags (ie., either on or off). For these keys, the
   125       value must be either "true" or "false". The file may also contain
   126       comment lines which begin with the # symbol. The names of the
   127       keys follow the long option names.
   128 
   129       A sample configuration file is provided (in sample.ini). The file
   130       will need to be copied to bookloupe.ini before bookloupe will
   131       read it. You can also use the --dump-config option to write a
   132       configuration file for you. For example, if you typically want
   133       to run bookloupe with the --dp and --squote options, then you
   134       might do:
   135 
   136         $ bookloupe --dp --squote --dump-config > configuration.ini
   137 	$ ren configuration.ini bookloupe.ini
   138 
   139       (Don't be tempted to merge these two steps or bookloupe will see
   140       an empty configuration file and complain.)
   141 
   142       This same idea can also be used to modify an existing configuration.
   143 
   144 
   145     Character encoding
   146 
   147       Bookloupe will handle e-texts encoded in UTF-8 (preferred),
   148       ISO-8859-1 (also known as Latin-1), or WINDOWS-1252 (also known,
   149       incorrectly, as ansi). The output will be in the same encoding
   150       as the input e-text.
   151 
   152 
   153     Character set (--charset)
   154 
   155       Character encodings have an implicit set of characters that
   156       can be encoded and thus define a set of characters that can
   157       be present in the text. However sometimes it is desirable
   158       that not all characters that can be encoded should be present
   159       in a text. The set of characters that should be present is
   160       known as the character set.
   161 
   162       The default setting for the character set (called auto) does
   163       the same as gutcheck for Windows-1252 encoded texts for
   164       compatability:
   165 
   166       If the file is predominately ASCII then the set of legal
   167       characters is ASCII and warnings are issued whenever non-ASCII
   168       characters are encountered. The message will either warn of
   169       non-ASCII or non-ISO-8859-1 characters as appropriate.
   170 
   171       If the file contains a significant number of non-ASCII characters
   172       then a message is printed as follows:
   173 
   174         --> There are a lot of foreign letters here. Not reporting them.
   175 
   176       and the character set is widened to include all possible
   177       characters.
   178 
   179       For UTF-8 encoded texts, auto selects UNICODE.
   180       
   181       Most character sets are simply defined in bookloupe as the
   182       set of all characters that can be encoded in the encoding of
   183       the same name. UNICODE is an exception and includes only the
   184       characters assigned in the relevant Unicode standard but
   185       excluding the Private Use Area characters. Note that the
   186       relevant Unicode standard is given by the version of glib in
   187       use rather than by any code in bookloupe and thus can vary
   188       from system to system. PG texts however are likely to be
   189       using characters assigned in very early Unicode standards,
   190       thus mitigating this issue.
   191 
   192 
   193     Echoing lines (--no-echo to switch off)
   194 
   195       You may find it convenient, when reviewing Bookloupe's
   196       suggestions, to see the line that Bookloupe is questioning.
   197       That way, you can often see at a glance whether it is
   198       a real error that needs to be fixed, or a false positive
   199       that should be in the text, but Bookloupe's limited
   200       programming doesn't understand.
   201 
   202       By default, bookloupe echoes these lines, but if you don't
   203       want to see the lines referred to, --no-echo will switch it
   204       OFF.
   205 
   206 
   207     Quotes (--squote and --qpara switches)
   208 
   209       Bookloupe always looks for unbalanced doublequotes in a
   210       paragraph. It is a common convention for writers not to
   211       close quotes in a paragraph if the next paragraph opens
   212       with quotes and is a continuation by the same speaker.
   213 
   214       Bookloupe therefore does not normally report unclosed quotes
   215       if the next paragraph begins with a quote. If you need
   216       to see all unclosed quotes, even where the next paragraph
   217       begins with a quote, you should use the -p switch.
   218 
   219       Singlequotes (', `, ‘ and ’) are a problem, since the same
   220       character can be used for an apostrophe. I'm not sure that it
   221       is possible to get 100% accuracy on singlequotes checking,
   222       particularly since dialect, quite common in PG texts,
   223       upsets the normal rules so badly. Consider the sentence:
   224         'Tis often said that a man's a man for a' that.
   225       As humans, we recognize that both apostrophes are used
   226       for contractions rather than quotes, but it isn't easy
   227       to get a program to recognize that.
   228 
   229       Since bookloupe makes too many mistakes when trying to match
   230       singlequotes, it doesn't look for unbalanced singlequotes
   231       unless you specify the --squote switch.
   232 
   233       Consider these sentences, which illustrate the main cases:
   234 
   235         'Tis often said that a fool and his money are soon parted.
   236 
   237         'Becky's goin' home,' said Tom.
   238 
   239         The dogs' tails wagged in unison.
   240 
   241         Those 'pack dogs' of yours look more like wolves.
   242 
   243 
   244     Typos (--typo switch)
   245 
   246       It's not bookoupe's job to be a spelling checker, but it does
   247       check for a list of common typos and OCR errors if you use the
   248       --typo switch. (The -t and -x switchs also toggle typo checking.)
   249 
   250       It also checks for character combinations, especially involving
   251       h and b, which are often confused by OCR, that rarely or never
   252       occur. For example, it queries "tbe" in a word. Now, "the" often
   253       occurs, but "tbe" is very rare (heartbeat, hotbed), so I'm
   254       playing the odds - a few false positives for many errors found.
   255       Similarly with "ii", which is a very common OCR error.
   256 
   257       Bookloupe suppresses multiple reporting of the first 40 "typos"
   258       found. This is to remove the annoyance of seeing something like
   259       "FN" (footnote) or "LK" (initials) flagged as a typo 147 times
   260       in a text.
   261 
   262 
   263     Line-end checking (--no-line-end switch to disable)
   264 
   265       All PG texts should have a Carriage Return (CR - character 13)
   266       and a Line Feed (LF - character 10) at end of each line,
   267       regardless of what O/S you made them on. DOS/Windows, Unix
   268       and Mac have different conventions, but the final text should
   269       always use a CR/LF pair as its line terminator.
   270 
   271       By default, bookloupe verifies that every line does have
   272       the correct terminator, but if you're on a work-in-progress
   273       in Linux, you might want to convert the line-ends as a final
   274       step, and not want to see thousands of errors every time you
   275       run bookloupe before that final step, so you can turn off
   276       this checking with the --no-line-end switch.
   277 
   278 
   279     Paranoid mode (--no-paranoid switch to disable: Trust No One :-)
   280 
   281       --no-paranoid switches OFF some extra checks like standalone
   282       1 and 0 queries.
   283 
   284 
   285     Overview mode (--overview switch)
   286 
   287       This mode just gives a count of queries found
   288       instead of a detailed list.
   289 
   290 
   291     Header quote  (--header switch)
   292 
   293       If you use the --header switch, bookloupe will also display
   294       the Title, Author, Release and Edition fields from the
   295       PG header. This is useful mostly for the automated
   296       checks we do on recently-posted texts.
   297 
   298 
   299     Errors to stdout (--stdout switch)
   300 
   301       If you're just running bookloupe normally, you can ignore
   302       this. It's only there for programs that provide a front
   303       end to bookloupe. It makes error messages appear within
   304       the output of bookloupe so that the front end knows whether
   305       bookloupe ran OK.
   306 
   307 
   308     Verbose reporting (--verbose switch)
   309 
   310       Normally, if bookloupe sees lots of long lines, short lines,
   311       spaced dashes, non-ASCII characters or dot-commas ".," it
   312       assumes these are features of the text, counts and summarizes
   313       them at the top of its report, but does not list them
   314       individually. If the verbose switch is on, bookloupe will list
   315       them all.
   316 
   317 
   318     Markup interpretation (--markup switch)
   319 
   320       Normally, bookloupe flags anything it suspects of being HTML
   321       markup as a possible error. When you use the --markup switch,
   322       however, it matches anything that looks like markup against
   323       a short list of common HTML tags and entities. If the markup
   324       is in that list, it either ignores the markup, in the case
   325       of a tag, or "interprets" the markup as its nearest ASCII
   326       equivalent, in the case of an entity. So, for example, using
   327       this switch, bookloupe will "see"
   328 
   329       &ldquo;He went <i>thataway!</i>&rdquo;
   330 
   331       as
   332 
   333       "He went thataway!"
   334 
   335       and report accordingly.
   336 
   337       This switch does not, not, NOT check the validity of HTML;
   338       it exists so that you can run bookloupe on most HTML texts
   339       for PG, and get sane results. It does not support all tags.
   340       It does not support all entities. When it sees a tag or entity
   341       it does not recognize, it will query it as HTML just as if
   342       you hadn't specified the --markup switch.
   343 
   344       Bookloupe will automatically switch on markup interpretation
   345       if it sees a lot of tags that appear to be markup, so mostly, you
   346       won't have to specify this.
   347 
   348 
   349     User-defined typos (--usertypo switch)
   350 
   351       If you have a file named bookloupe.typ or gutcheck.typ either
   352       in your current working directory or in the directory from
   353       which you explicitly invoked bookoupe, but not necessarily on
   354       your path, and if you specify the --usertypo switch, bookloupe
   355       will query any word specified in that file. The file is simple:
   356       one word, in lower case, per line. Be careful not to put multiple
   357       words onto a line, or leave any rubbish other than the word on
   358       the line. You should have received a sample file bookloupe.typ
   359       with this package. The file may be encoded in UTF-8 (preferred),
   360       ISO-8859-1 (also known as Latin-1), or WINDOWS-1252 (also known,
   361       incorrectly, as ansi).
   362 
   363 
   364     Ignore DP markup (--dp switch)
   365 
   366       Distributed Proofreaders (http://www.pgdp.net) has for some
   367       time been the main source of PG texts, and proofers there use
   368       special conventions. This switch understands those conventions,
   369       so that people can use bookloupe on files in process that still
   370       haven't had the special conventions removed yet. The special
   371       conventions supported are page-separators and
   372       "<sc>", "</sc>", "/*", "*/", "/#", "#/", "/$", "$/".
   373  
   374 
   375     Dump the current configuration (--dump-config switch)
   376 
   377       The --dump-config switch can be used to dump the current
   378       configuration. This is a combination of the internal defaults,
   379       the configuration file (if any) and the command line options.
   380       If a configuration file is present, any comments found in that
   381       file will be preserved in the dumped configuration. If there
   382       is no configuration file, then a default set of comments to
   383       go with the internal default configuration is generated.
   384 
   385 
   386 You will probably only run bookloupe on a text once or maybe twice,
   387 just prior to uploading; it usually finds a few formatting problems;
   388 it also usually finds queries that aren't problems at all - it often
   389 questions Tables of Contents for having short lines, for example.
   390 These are called "false positives," and need a human to decide on
   391 them.
   392 
   393 The text should be standard prose, and already close to PG normal
   394 format (plain text, about 70 characters per line with blank lines
   395 between paragraphs).
   396 
   397 Bookloupe merely draws your attention to things that might be errors.
   398 It is NOT a substitute for human judgement. Formatting choices like
   399 short lines may be for a reason that this program can't understand.
   400 
   401 Even the most careful human proofing can leave errors behind in a
   402 text, and there are several automated checks you can do to help find
   403 them. Of these, spellchecking (with _very_ careful human judgement) is
   404 the most important and most useful.
   405 
   406 Bookloupe does perform some basic typo-checking if you ask it to,
   407 but its focus is on formatting errors specific to PG texts—
   408 mismatched quotes, non-ASCII characters, bad spacing, bad line
   409 length, HTML tags perhaps left from a conversion, unbalanced
   410 brackets.
   411 
   412 Suggestions for additional checks would be appreciated and duly
   413 considered, but no guarantees that they will be implemented.
   414 
   415 
   416 
   417 
   418         How does Jim Tinsley use gutcheck?
   419 
   420 Practically everyone I give gutcheck to asks me how _I_ use it.
   421 Well, when I get a text for posting, say filename.txt, I run
   422 
   423     gutcheck -o filename.txt
   424 
   425 That gives me a quick idea what I'm dealing with. It'll tell
   426 me what kind of problems gutcheck sees, and give me an idea
   427 of how much more work needs to be done on the text. Keep in
   428 mind that gutcheck doesn't do anything like a full spellcheck,
   429 but when I see a text that has a lot of problems, I assume that
   430 it probably needs a spellcheck too.
   431 
   432 Having got a feel for the ballpark, I run
   433 
   434     gutcheck filename.txt > jj
   435 
   436 where jj is my personal, all-purpose filename for temporary data
   437 that doesn't need to be kept. Then I open filename.txt and jj in
   438 a split-screen view in my editor, and work down the text, fixing
   439 whatever needs fixing, and skipping whatever doesn't. If your
   440 editor doesn't split-screen, you can get much the same effect by
   441 opening your original file in your normal editor, and jj (or your
   442 equivalent name) in something like Notepad, keeping both in view
   443 at the same time.
   444 
   445 Twice a day, an automatic process looks at all recently-posted
   446 texts, and emails Michael, me, and sometimes other people with
   447 their gutcheck summaries.
   448 
   449 
   450 
   451 Explanations of common bookloupe messages:
   452 
   453     --> 74 lines in this file have white space at end
   454 
   455     PG texts shouldn't have extra white space added at end of line.
   456     Don't worry too much about this; they're not doing any harm,
   457     and they'll be removed during posting anyway.
   458 
   459 
   460     --> 348 lines in this file are short. Not reporting short lines.
   461     --> 84 lines in this file are long. Not reporting long lines.
   462     --> 8 lines in this file are VERY long!
   463 
   464     If there are a lot of long or short lines, bookloupe won't list
   465     them individually. The short lines version of this message
   466     is commonly seen when gutchecking poetry and some plays, where
   467     the normal line length is shorter than the standard for prose.
   468     A "VERY long" line is one over 80 characters.  You normally
   469     shouldn't have any of these, but sometimes you may have to render
   470     a table that must be that long, or some special preformatted
   471     quotation that can't be broken.
   472 
   473 
   474     --> There are 75 spaced dashes and em-dashes in this file. Not reporting them.
   475 
   476     The PG standard for an emdash--like these--is two minus signs
   477     with no spaces before or after them. However, some older texts
   478     used spaced dashes - like these -- and if there are very many
   479     such spaced dashes in the file, bookoupe just draws your
   480     attention to it and doesn't list them individually.
   481 
   482 
   483 
   484     Line 3020 - Non-ASCII character 233
   485 
   486     Standard PG texts should use only ASCII characters with values
   487     up to 127; however, non-English, accented characters can be
   488     represented according to several different non-ASCII encoding
   489     schemes, using values over 127. If you have a plain English text
   490     with a few accented characters in words like cafe or tete-a-tete,
   491     you might replace the accented characters with their unaccented
   492     versions. The English pound sign is another commonly-seen
   493     non-ASCII character. If you have enough non-ASCII characters in
   494     your text that you feel removing them would degrade your text,
   495     you should probably consider doing a UTF-8 text.
   496 
   497 
   498 
   499     Line 1207 - Non-ISO-8859 character 156
   500 
   501     Even in "8-bit" texts, there are distinctions between code sets.
   502     The ISO-8859 family of 8-bit code sets is the most commonly used
   503     in PG, and these sets do not define values in the range 128 through
   504     159 as printable characters. It's quite common for someone on a
   505     Windows or Mac machine to use a non-ISO character inadvertently,
   506     so this message warns that the character is not only not ASCII,
   507     but also outside the ISO-8859 range.
   508 
   509 
   510 
   511     Line 46 - Tab character?
   512 
   513     Some editors and WPs will put in Tab characters (character 9) to
   514     indicate indented text. You should not use these in a PG text,
   515     because you can't be sure how they will appear on a reader's
   516     screen. Find the Tab, and replace it with the appropriate number
   517     of spaces.
   518 
   519 
   520 
   521     Line 1327 - Tilde character?
   522 
   523     The tilde character (~) might be legitimately used, but it's the
   524     character commonly used by OCR software to indicate a place where
   525     it couldn't make out the letter, so bookloupe flags it.
   526 
   527 
   528 
   529     Line 1347 - Asterisk?
   530 
   531     Asterisks are reported only in paranoid mode (see -x).
   532     Like tildes, they are often used to indicate errors, but they are
   533     also legitimately used as line delimiters and footnote markers.
   534 
   535 
   536 
   537     Line 1451 - Long line 129
   538 
   539     PG texts should have lines shorter than 76. There may be occasions
   540     where you decide that you really have to go out to 79 characters,
   541     but the sample above says that line 1451 is 129 characters long—
   542     probably two lines run together.
   543 
   544 
   545 
   546     Line 1590 - Short line?
   547 
   548     PG texts should have lines longer than 54 characters. However,
   549     there are special cases like poetry and tables of contents where
   550     the lines _should_ be shorter. So treat bookloupe warnings about
   551     short lines carefully. Sometimes it's a genuine formatting
   552     problem; sometimes the line really needs to be short.
   553 
   554     Hint: bookloupe will not flag lines as short if they are indented
   555     —if they start with a space. I like to start inserted stanzas
   556     and other such items indented with a couple of spaces so that
   557     they stand out from the main text anyway.
   558 
   559 
   560 
   561     Line 1804 - Begins with punctuation?
   562 
   563     Lines should normally not begin with commas, periods and so on.
   564     An exception is ellipses . . . which can happen at start of line.
   565 
   566 
   567 
   568     Line 1850 - Spaced em-dash?
   569 
   570     The PG standard for an em-dash--like these--is two minus signs
   571     with no spaces before or after them. Bookloupe flags non-PG
   572     em-dashes - like this one. Normally, you will replace it with a
   573     PG-standard em-dash.
   574 
   575 
   576 
   577     Line 1904 - Query he/be error?
   578 
   579     Bookloupe makes a very minor effort to look for that scourge of all
   580     proofreaders, "be" replacing "he" or vice-versa, and draws your
   581     attention to it when it thinks it has found one.
   582 
   583 
   584 
   585     Line 2017 - Query digit in a1most
   586 
   587     The digit 1 is commonly OCRed for the letter l, the digit 0 for
   588     the letter O, and so on. When bookloupe sees a mix of digits and
   589     letters, it warns you. It may generate a false positive for
   590     something like 7am.
   591 
   592 
   593 
   594     Line 2083 - Query standalone 0
   595 
   596     In paranoid mode (see -x) only, bookloupe warns about the digit 0
   597     and the number 1 standing alone as a word. This can happen if the
   598     OCR misreads the words O or I.
   599 
   600 
   601 
   602     Line 2115 - Query word whetber
   603 
   604     If you have switched typo-checking on, bookloupe looks for
   605     potential typos, especially common h/b errors. It's not
   606     infallible; it sometimes queries legit words, but it's
   607     always worth taking a look.
   608 
   609 
   610 
   611     Line 2190 column 14 - Missing space?
   612 
   613     Omitting a space is a very common error,especially coming from
   614     OCRed text,and can be hard for a human to spot. The commas in
   615     the previous sentence illustrate the kind of thing I mean.
   616 
   617 
   618 
   619     Line 2240 column 48 - Spaced punctuation?
   620 
   621     The flip side of the "missing space" error , here , is when extra
   622     spaces are added before punctuation . Some old texts appear to add
   623     extra spaces around punctuation consistently, but this was a
   624     typographical convention rather than the author's intent, and the
   625     extra "spaces" should be removed when preparing a PG text.
   626 
   627 
   628 
   629     Line 2301 column 19 - Unspaced quotes?
   630 
   631     Another common spacing problem occurs in a phrase like "You wait
   632     there,"he said.
   633 
   634 
   635 
   636     Line 2385 column 27 - Wrongspaced quotes?
   637 
   638     Bookloupe checks whether a quote seems to be a start or end quote,
   639     and queries those that appear to be misplaced. This does give rise
   640     to false positives when quotes are nested, for example:
   641 
   642     "And how," she asked, "will your "friends" help you now?"
   643 
   644     but these false positives are worth it because of the many cases
   645     that this test catches, notably those like:
   646 
   647     "And how, "she said," will your friends help you now?"
   648 
   649     Sometimes a "wrongspaced quotes" query will arise because an earlier
   650     quote in the paragraph was omitted, so if the place specified seems
   651     to be OK, look back to see whether there's a problem in the preceding
   652     lines.
   653 
   654 
   655 
   656     Line 2400 - HTML Tag? <PRE>
   657 
   658     Some PG texts have been converted from HTML, and not all of the
   659     HTML tags have been removed.
   660 
   661 
   662 
   663     Line 2402 - HTML symbol? &emdash;
   664 
   665     Similarly, special HTML symbol characters can survive into PG
   666     texts. Can occasionally produce amusing false positives like
   667     . . . Marwick & Co were well known for it;
   668 
   669 
   670 
   671     Line 2540 - Mismatched quotes
   672 
   673     Another bookloupe mainstay—unclosed doublequotes in a paragraph.
   674     See the discussion of quotes in the switches section near the
   675     start of this file.
   676 
   677     Since the mismatch doesn't occur on any one line, bookloupe quotes
   678     the line number of the first blank line following the paragraph,
   679     since this is the point where it reconciles the count of quotes.
   680     However, if bookloupe is echoing lines, that is, you haven't used
   681     the -e switch, it will show the _first_ line of the paragraph,
   682     to help you find the place without using line numbers. The
   683     offending paragraph is therefore between the quoted line and
   684     the line number given.
   685 
   686 
   687 
   688     Line 2587 - Mismatched single quotes
   689 
   690     Only checked with the -s switch, since checking single quotes is
   691     not a very reliable process. Otherwise, the same logic as for
   692     doublequotes applies.
   693 
   694 
   695 
   696     Line 2877 - Mismatched round brackets?
   697 
   698     Also curly and square brackets. Texts with a lot of brackets, like
   699     plays with bracketed stage instructions, may have mismatches.
   700 
   701 
   702     Line 3150 - No CR?
   703     Line 3204 - Two successive CRs?
   704     Line 3281 position 75 - CR without LF?
   705 
   706     These are the invalid line-end warnings. See the discussion of
   707     line-end checking in the switches section near the start of this
   708     file. If you see these, and your editor doesn't show anything
   709     wrong, you should probably try deleting the characters just before
   710     and after the line end, and the line-end itself, then retyping the
   711     characters and the line-end.
   712 
   713 
   714     Line 2940 - Paragraph starts with lower-case
   715 
   716     A common error in an e-text is for an extra blank line
   717 
   718     to be put in, like the blank line above, and this often
   719     shows up as a new paragraph beginning with lower case.
   720     Sometimes the blank line is deliberate, as when a
   721     quotation is inserted in a speech. Use your judgement.
   722 
   723 
   724     Line 2987 - Extra period?
   725 
   726     An extra period. is a. common problem in OCRed text. and usually
   727     arises when a speck of dust on the page is mistaken for a period.
   728     or. as occasionally happens. when a comma loses its tail.
   729 
   730 
   731     Line 3012 column 12 - Double punctuation?
   732 
   733     Double punctuation., like that,, is a common typo and
   734     scanno. Some books have much legit double punctuation,
   735     like etc., etc., but it's worth checking anyway.
   736 
   737 
   738 
   739             *       *       *        *
   740 
   741 For Windows-only users who are unfamiliar with DOS:
   742 
   743     If you're a Windows-only user, you need to save
   744     bookloupe.exe into the folder (directory) where the
   745     text file you want to check is. Let's say your
   746     text file is in C:\gut, then you should save
   747     bookloupe.exe into C:\gut.
   748 
   749     Now get to a console. You can do this by
   750     selecting the "Command Prompt" or "MS-DOS Prompt"
   751     option that will be somewhere on your
   752     Start/Programs menu.
   753 
   754     Now get into the C:\gut directory.
   755     You can do this using the cd (change directory)
   756     command, like this:
   757         cd \gut
   758     and your prompt will change to
   759         C:\gut>
   760     so you know you're in the right place.
   761 
   762     Now type
   763         bookloupe yourfile.txt
   764     and you'll see bookloupe's report
   765 
   766     By default, bookloupe prints its queries to screen.
   767     If you want to create a file of them, to edit
   768     against the text, you can use the greater-than
   769     sign (>) to tell it to output the report to a
   770     file. For example, if you want its report in a
   771     file called queries.lst, you could type
   772 
   773         bookloupe yourfile.txt > queries.lst
   774 
   775     The queries.lst file will then contain the listing
   776     of possible formatting errors, and you can
   777     edit it alongside your text.
   778 
   779     Whatever you do, DON'T make the filename after
   780     the greater-than sign the name of a file already
   781     on your disk that you want to keep, because
   782     the greater-than sign will cause bookloupe to
   783     replace any existing file of that name.
   784 
   785     So, for example, if you have two Tolstoy files
   786     that you want to check, called WARPEACE.TXT and
   787     ANNAK.TXT, make sure that neither of these names
   788     is ever used following the greater-than sign.
   789     To check these correctly, you might do:
   790 
   791     bookloupe warpeace.txt > war.lst
   792 
   793     and
   794 
   795     bookloupe annak.txt > annak.lst
   796 
   797     separately. Then you can look at war.lst and annak.lst
   798     to see the bookloupe reports.
   799 
   800 For Windows-only users who want to use bookloupe from guiguts:
   801 
   802     1) If you haven't already done so, download bookloupe-win32-xxx.zip
   803     from http://www.juiblex.co.uk/pgdp/bookloupe/
   804 
   805     2) Extract the files into a suitable folder, e.g. C:\DP\bookloupe
   806 
   807     3) Start Guiguts
   808 
   809     4) Choose Preferences | File Paths | Set File Paths..
   810 
   811     5) Click the "Locate Gutcheck..." button
   812 
   813     6) Browse to the folder where you extracted bookloupe
   814 
   815     7) Double-click bookloupe.exe
   816 
   817     Now, whenever you do "Gutcheck" in Guiguts, it will run bookloupe
   818     instead. Since the output will look very like gutcheck output, you
   819     may want to check that it is actually bookloupe that is running. To do
   820     this, look at the black command line message window, which will say:
   821 
   822     "bookloupe: Check and report on an e-text".
   823 
   824     To return to using gutcheck for any reason, repeat steps 4 and 5
   825     above, and then,
   826 
   827     6b) Browse back to the gutcheck folder, which is in a "tools"
   828     folder inside the main Guiguts folder. It will be something like
   829     "C:\DP\guiguts-win\tools\gutcheck", depending on where you installed
   830     Guiguts originally.
   831 
   832     7b) Double-click gutcheck.exe
   833 
   834     Now doing "Gutcheck" in Guiguts will run gutcheck itself, and the
   835     message in the black window should read:
   836 
   837     "gutcheck: Check and report on an e-text".