Fix bug #36: Document workaround to prevent Guiguts conversion of utf-8 files passed to Bookloupe
3 Bookloupe documentation
6 bookloupe: lists possible common formatting errors in a Project
7 Gutenberg candidate file. Bookloupe is based on gutcheck, written
8 by Jim Tinsley. It is a command line program and can be used under
9 Microsoft Windows, Mac or Unix. For Windows-only people, there is
10 an appendix at the end with brief instructions for running it.
14 This software is Copyright Jim Tinsley 2000-2005 and
15 J. Ali Harlow 2012 onwards.
17 Bookloupe comes wih ABSOLUTELY NO WARRANTY. For details, read the file COPYING.
18 This is Free Software; you may redistribute it under certain conditions (GPL).
20 See http://www.juiblex.co.uk/pgdp/bookloupe/ for the latest version.
23 Compatibility with guiguts v1.0.25
25 Versions of guiguts up to at least 1.0.25 have a bug in the way that they
26 prepare a copy of the ebook for gutcheck (or bookloupe) to check. This causes
27 problems with ebooks that contain Unicode characters not present in Latin-1.
29 The guiguts bug report is here: http://sourceforge.net/p/guiguts/bugs/95/
30 The bug report also includes details of how to edit guiguts to work around
31 the problem until an offical fix is released.
34 Recent changes in behaviour
36 Each new version of bookloupe brings bug fixes and improvements. Sometimes
37 the behaviour is also changed in ways that might be unexpected:
41 The check for "odd" characters (tab, tilde, carat, forward slash and
42 asterisks) is disabled in bookloupe 2.0 when the character set is
43 switched from ASCII/ISO-8859-1 to UNICODE (ie., when the "There are a
44 lot of foreign letters here." message is printed). As of bookloupe 2.1
45 these tests operate independently of the character set selected.
47 Users may notice this change most especially in the case of the
48 DP-specific /* ... */ markup. Bookloupe 2.0 often did not warn when
49 this markup was encountered even when the --dp switch was not given.
50 Bookloupe 2.1 will warn about this markup unless dp-specific mode is
51 switched on, paranoid mode is switched off or the ebook contains more
52 than 10 lines containing asterisks. In the last case
54 --> 11 lines in this file contain asterisks. Not reporting them.
60 Usage is: bookloupe [OPTION...] filename
63 -d, --dp ignores some DP-specific markup
64 -e, --no-echo switches off Echoing of lines
65 -s, --squote checks Single quotes
67 -p, --qpara sets strict quotes checking for Paragraphs
68 --no-paranoid switches OFF typo checking and extra checks
69 -l, --no-line-end turns off Line-end checks
70 -o, --overview produces an Overview only
71 -y, --stdout sets error messages to stdout
72 -h, --header echos the header fields
73 -m, --markup ignore some common HTML markup
74 -u, --usertypo warns about words in a user-defined typo file
75 -v, --verbose forces individual reporting of minor problems
76 -w, --web special mode for web uploads (for future use)
77 --charset=NAME the set of characters valid for this ebook
78 --dump-config dump the current configuration
80 There are also inverted options available which are useful when it is
81 desired to override an option set in the configuration file:
83 --no-dp, --echo, --no-squote, --no-typo, --no-qpara, --paranoid,
84 --line-end, --no-overview, --no-stdout, --no-header, --no-markup,
85 --no-usertypo --no-verbose.
87 Note: there is no --no-web since --web simply selects a set of options.
89 Finally there are a couple of options that toggle the state of options
90 rather than setting or unsetting them: -t (for typo) and -x (for typo
91 and paranoid). These are mainly intended for compatability with gutcheck.
93 Running bookloupe without any parameters will display a brief help message.
97 bookloupe warpeace.txt
104 Bookloupe will look for a file named bookloupe.ini to read as
105 a configuration file. Options set in a configuration file can
106 be overridden from the command line as required.
108 The following directories are searched in order:
110 1) The current working directory. When run from the command
111 line, this is the directory you ran it from. When run from
112 guiguts it will normally be the directory that contains the
115 2) The directory containing the bookloupe program.
117 3) The user's configuration directory. Under MS-Windows this
118 is normally CSIDL_LOCAL_APPDATA which is typically set to
119 C:\Documents and Settings\<user>\Local Settings\Application Data.
120 On other platforms this is normally $XDG_CONFIG_HOME which, if
121 not set defaults to $HOME/.config
123 The directories to search can also be changed using the
124 $BOOKLOUPE_CONFIG_PATH environment variable which is a colon
125 separated (semi-colon separated under MS-Windows) list of
128 The configuration file is a key file. This is very similar to,
129 but not identical to a typical ini file as found under MS-Windows.
130 Key files consist of a number of groups which start with the
131 group name enclosed in square brackets on a line by itself.
132 Bookloupe recognises just one group, "options". Then below the
133 group name there follows the keys and their values for that
134 group, one per line in the format key=value. Most of bookloupe's
135 options are flags (ie., either on or off). For these keys, the
136 value must be either "true" or "false". The file may also contain
137 comment lines which begin with the # symbol. The names of the
138 keys follow the long option names.
140 A sample configuration file is provided (in sample.ini). The file
141 will need to be copied to bookloupe.ini before bookloupe will
142 read it. You can also use the --dump-config option to write a
143 configuration file for you. For example, if you typically want
144 to run bookloupe with the --dp and --squote options, then you
147 $ bookloupe --dp --squote --dump-config > configuration.ini
148 $ ren configuration.ini bookloupe.ini
150 (Don't be tempted to merge these two steps or bookloupe will see
151 an empty configuration file and complain.)
153 This same idea can also be used to modify an existing configuration.
158 Bookloupe will handle e-texts encoded in UTF-8 (preferred),
159 ISO-8859-1 (also known as Latin-1), or WINDOWS-1252 (also known,
160 incorrectly, as ansi). The output will be in the same encoding
164 Character set (--charset)
166 Character encodings have an implicit set of characters that
167 can be encoded and thus define a set of characters that can
168 be present in the text. However sometimes it is desirable
169 that not all characters that can be encoded should be present
170 in a text. The set of characters that should be present is
171 known as the character set.
173 The default setting for the character set (called auto) does
174 the same as gutcheck for Windows-1252 encoded texts for
177 If the file is predominately ASCII then the set of legal
178 characters is ASCII and warnings are issued whenever non-ASCII
179 characters are encountered. The message will either warn of
180 non-ASCII or non-ISO-8859-1 characters as appropriate.
182 If the file contains a significant number of non-ASCII characters
183 then a message is printed as follows:
185 --> There are a lot of foreign letters here. Not reporting them.
187 and the character set is widened to include all possible
190 For UTF-8 encoded texts, auto selects UNICODE.
192 Most character sets are simply defined in bookloupe as the
193 set of all characters that can be encoded in the encoding of
194 the same name. UNICODE is an exception and includes only the
195 characters assigned in the relevant Unicode standard but
196 excluding the Private Use Area characters. Note that the
197 relevant Unicode standard is given by the version of glib in
198 use rather than by any code in bookloupe and thus can vary
199 from system to system. PG texts however are likely to be
200 using characters assigned in very early Unicode standards,
201 thus mitigating this issue.
204 Echoing lines (--no-echo to switch off)
206 You may find it convenient, when reviewing Bookloupe's
207 suggestions, to see the line that Bookloupe is questioning.
208 That way, you can often see at a glance whether it is
209 a real error that needs to be fixed, or a false positive
210 that should be in the text, but Bookloupe's limited
211 programming doesn't understand.
213 By default, bookloupe echoes these lines, but if you don't
214 want to see the lines referred to, --no-echo will switch it
218 Quotes (--squote and --qpara switches)
220 Bookloupe always looks for unbalanced doublequotes in a
221 paragraph. It is a common convention for writers not to
222 close quotes in a paragraph if the next paragraph opens
223 with quotes and is a continuation by the same speaker.
225 Bookloupe therefore does not normally report unclosed quotes
226 if the next paragraph begins with a quote. If you need
227 to see all unclosed quotes, even where the next paragraph
228 begins with a quote, you should use the -p switch.
230 Singlequotes (', `, ‘ and ’) are a problem, since the same
231 character can be used for an apostrophe. I'm not sure that it
232 is possible to get 100% accuracy on singlequotes checking,
233 particularly since dialect, quite common in PG texts,
234 upsets the normal rules so badly. Consider the sentence:
235 'Tis often said that a man's a man for a' that.
236 As humans, we recognize that both apostrophes are used
237 for contractions rather than quotes, but it isn't easy
238 to get a program to recognize that.
240 Since bookloupe makes too many mistakes when trying to match
241 singlequotes, it doesn't look for unbalanced singlequotes
242 unless you specify the --squote switch.
244 Consider these sentences, which illustrate the main cases:
246 'Tis often said that a fool and his money are soon parted.
248 'Becky's goin' home,' said Tom.
250 The dogs' tails wagged in unison.
252 Those 'pack dogs' of yours look more like wolves.
255 Typos (--typo switch)
257 It's not bookoupe's job to be a spelling checker, but it does
258 check for a list of common typos and OCR errors if you use the
259 --typo switch. (The -t and -x switchs also toggle typo checking.)
261 It also checks for character combinations, especially involving
262 h and b, which are often confused by OCR, that rarely or never
263 occur. For example, it queries "tbe" in a word. Now, "the" often
264 occurs, but "tbe" is very rare (heartbeat, hotbed), so I'm
265 playing the odds - a few false positives for many errors found.
266 Similarly with "ii", which is a very common OCR error.
268 Bookloupe suppresses multiple reporting of the first 40 "typos"
269 found. This is to remove the annoyance of seeing something like
270 "FN" (footnote) or "LK" (initials) flagged as a typo 147 times
274 Line-end checking (--no-line-end switch to disable)
276 All PG texts should have a Carriage Return (CR - character 13)
277 and a Line Feed (LF - character 10) at end of each line,
278 regardless of what O/S you made them on. DOS/Windows, Unix
279 and Mac have different conventions, but the final text should
280 always use a CR/LF pair as its line terminator.
282 By default, bookloupe verifies that every line does have
283 the correct terminator, but if you're on a work-in-progress
284 in Linux, you might want to convert the line-ends as a final
285 step, and not want to see thousands of errors every time you
286 run bookloupe before that final step, so you can turn off
287 this checking with the --no-line-end switch.
290 Paranoid mode (--no-paranoid switch to disable: Trust No One :-)
292 --no-paranoid switches OFF some extra checks like standalone
296 Overview mode (--overview switch)
298 This mode just gives a count of queries found
299 instead of a detailed list.
302 Header quote (--header switch)
304 If you use the --header switch, bookloupe will also display
305 the Title, Author, Release and Edition fields from the
306 PG header. This is useful mostly for the automated
307 checks we do on recently-posted texts.
310 Errors to stdout (--stdout switch)
312 If you're just running bookloupe normally, you can ignore
313 this. It's only there for programs that provide a front
314 end to bookloupe. It makes error messages appear within
315 the output of bookloupe so that the front end knows whether
319 Verbose reporting (--verbose switch)
321 Normally, if bookloupe sees lots of long lines, short lines,
322 spaced dashes, non-ASCII characters or dot-commas ".," it
323 assumes these are features of the text, counts and summarizes
324 them at the top of its report, but does not list them
325 individually. If the verbose switch is on, bookloupe will list
329 Markup interpretation (--markup switch)
331 Normally, bookloupe flags anything it suspects of being HTML
332 markup as a possible error. When you use the --markup switch,
333 however, it matches anything that looks like markup against
334 a short list of common HTML tags and entities. If the markup
335 is in that list, it either ignores the markup, in the case
336 of a tag, or "interprets" the markup as its nearest ASCII
337 equivalent, in the case of an entity. So, for example, using
338 this switch, bookloupe will "see"
340 “He went <i>thataway!</i>”
346 and report accordingly.
348 This switch does not, not, NOT check the validity of HTML;
349 it exists so that you can run bookloupe on most HTML texts
350 for PG, and get sane results. It does not support all tags.
351 It does not support all entities. When it sees a tag or entity
352 it does not recognize, it will query it as HTML just as if
353 you hadn't specified the --markup switch.
355 Bookloupe will automatically switch on markup interpretation
356 if it sees a lot of tags that appear to be markup, so mostly, you
357 won't have to specify this.
360 User-defined typos (--usertypo switch)
362 If you have a file named bookloupe.typ or gutcheck.typ either
363 in your current working directory or in the directory from
364 which you explicitly invoked bookoupe, but not necessarily on
365 your path, and if you specify the --usertypo switch, bookloupe
366 will query any word specified in that file. The file is simple:
367 one word, in lower case, per line. Be careful not to put multiple
368 words onto a line, or leave any rubbish other than the word on
369 the line. You should have received a sample file bookloupe.typ
370 with this package. The file may be encoded in UTF-8 (preferred),
371 ISO-8859-1 (also known as Latin-1), or WINDOWS-1252 (also known,
372 incorrectly, as ansi).
375 Ignore DP markup (--dp switch)
377 Distributed Proofreaders (http://www.pgdp.net) has for some
378 time been the main source of PG texts, and proofers there use
379 special conventions. This switch understands those conventions,
380 so that people can use bookloupe on files in process that still
381 haven't had the special conventions removed yet. The special
382 conventions supported are page-separators and
383 "<sc>", "</sc>", "/*", "*/", "/#", "#/", "/$", "$/".
386 Dump the current configuration (--dump-config switch)
388 The --dump-config switch can be used to dump the current
389 configuration. This is a combination of the internal defaults,
390 the configuration file (if any) and the command line options.
391 If a configuration file is present, any comments found in that
392 file will be preserved in the dumped configuration. If there
393 is no configuration file, then a default set of comments to
394 go with the internal default configuration is generated.
397 You will probably only run bookloupe on a text once or maybe twice,
398 just prior to uploading; it usually finds a few formatting problems;
399 it also usually finds queries that aren't problems at all - it often
400 questions Tables of Contents for having short lines, for example.
401 These are called "false positives," and need a human to decide on
404 The text should be standard prose, and already close to PG normal
405 format (plain text, about 70 characters per line with blank lines
408 Bookloupe merely draws your attention to things that might be errors.
409 It is NOT a substitute for human judgement. Formatting choices like
410 short lines may be for a reason that this program can't understand.
412 Even the most careful human proofing can leave errors behind in a
413 text, and there are several automated checks you can do to help find
414 them. Of these, spellchecking (with _very_ careful human judgement) is
415 the most important and most useful.
417 Bookloupe does perform some basic typo-checking if you ask it to,
418 but its focus is on formatting errors specific to PG texts—
419 mismatched quotes, non-ASCII characters, bad spacing, bad line
420 length, HTML tags perhaps left from a conversion, unbalanced
423 Suggestions for additional checks would be appreciated and duly
424 considered, but no guarantees that they will be implemented.
429 How does Jim Tinsley use gutcheck?
431 Practically everyone I give gutcheck to asks me how _I_ use it.
432 Well, when I get a text for posting, say filename.txt, I run
434 gutcheck -o filename.txt
436 That gives me a quick idea what I'm dealing with. It'll tell
437 me what kind of problems gutcheck sees, and give me an idea
438 of how much more work needs to be done on the text. Keep in
439 mind that gutcheck doesn't do anything like a full spellcheck,
440 but when I see a text that has a lot of problems, I assume that
441 it probably needs a spellcheck too.
443 Having got a feel for the ballpark, I run
445 gutcheck filename.txt > jj
447 where jj is my personal, all-purpose filename for temporary data
448 that doesn't need to be kept. Then I open filename.txt and jj in
449 a split-screen view in my editor, and work down the text, fixing
450 whatever needs fixing, and skipping whatever doesn't. If your
451 editor doesn't split-screen, you can get much the same effect by
452 opening your original file in your normal editor, and jj (or your
453 equivalent name) in something like Notepad, keeping both in view
456 Twice a day, an automatic process looks at all recently-posted
457 texts, and emails Michael, me, and sometimes other people with
458 their gutcheck summaries.
462 Explanations of common bookloupe messages:
464 --> 74 lines in this file have white space at end
466 PG texts shouldn't have extra white space added at end of line.
467 Don't worry too much about this; they're not doing any harm,
468 and they'll be removed during posting anyway.
471 --> 348 lines in this file are short. Not reporting short lines.
472 --> 84 lines in this file are long. Not reporting long lines.
473 --> 8 lines in this file are VERY long!
475 If there are a lot of long or short lines, bookloupe won't list
476 them individually. The short lines version of this message
477 is commonly seen when gutchecking poetry and some plays, where
478 the normal line length is shorter than the standard for prose.
479 A "VERY long" line is one over 80 characters. You normally
480 shouldn't have any of these, but sometimes you may have to render
481 a table that must be that long, or some special preformatted
482 quotation that can't be broken.
485 --> There are 75 spaced dashes and em-dashes in this file. Not reporting them.
487 The PG standard for an emdash--like these--is two minus signs
488 with no spaces before or after them. However, some older texts
489 used spaced dashes - like these -- and if there are very many
490 such spaced dashes in the file, bookoupe just draws your
491 attention to it and doesn't list them individually.
495 Line 3020 - Non-ASCII character 233
497 Standard PG texts should use only ASCII characters with values
498 up to 127; however, non-English, accented characters can be
499 represented according to several different non-ASCII encoding
500 schemes, using values over 127. If you have a plain English text
501 with a few accented characters in words like cafe or tete-a-tete,
502 you might replace the accented characters with their unaccented
503 versions. The English pound sign is another commonly-seen
504 non-ASCII character. If you have enough non-ASCII characters in
505 your text that you feel removing them would degrade your text,
506 you should probably consider doing a UTF-8 text.
510 Line 1207 - Non-ISO-8859 character 156
512 Even in "8-bit" texts, there are distinctions between code sets.
513 The ISO-8859 family of 8-bit code sets is the most commonly used
514 in PG, and these sets do not define values in the range 128 through
515 159 as printable characters. It's quite common for someone on a
516 Windows or Mac machine to use a non-ISO character inadvertently,
517 so this message warns that the character is not only not ASCII,
518 but also outside the ISO-8859 range.
522 Line 46 - Tab character?
524 Some editors and WPs will put in Tab characters (character 9) to
525 indicate indented text. You should not use these in a PG text,
526 because you can't be sure how they will appear on a reader's
527 screen. Find the Tab, and replace it with the appropriate number
532 Line 1327 - Tilde character?
534 The tilde character (~) might be legitimately used, but it's the
535 character commonly used by OCR software to indicate a place where
536 it couldn't make out the letter, so bookloupe flags it.
540 Line 1347 - Asterisk?
542 Asterisks are reported only in paranoid mode (see -x).
543 Like tildes, they are often used to indicate errors, but they are
544 also legitimately used as line delimiters and footnote markers.
548 Line 1451 - Long line 129
550 PG texts should have lines shorter than 76. There may be occasions
551 where you decide that you really have to go out to 79 characters,
552 but the sample above says that line 1451 is 129 characters long—
553 probably two lines run together.
557 Line 1590 - Short line?
559 PG texts should have lines longer than 54 characters. However,
560 there are special cases like poetry and tables of contents where
561 the lines _should_ be shorter. So treat bookloupe warnings about
562 short lines carefully. Sometimes it's a genuine formatting
563 problem; sometimes the line really needs to be short.
565 Hint: bookloupe will not flag lines as short if they are indented
566 —if they start with a space. I like to start inserted stanzas
567 and other such items indented with a couple of spaces so that
568 they stand out from the main text anyway.
572 Line 1804 - Begins with punctuation?
574 Lines should normally not begin with commas, periods and so on.
575 An exception is ellipses . . . which can happen at start of line.
579 Line 1850 - Spaced em-dash?
581 The PG standard for an em-dash--like these--is two minus signs
582 with no spaces before or after them. Bookloupe flags non-PG
583 em-dashes - like this one. Normally, you will replace it with a
588 Line 1904 - Query he/be error?
590 Bookloupe makes a very minor effort to look for that scourge of all
591 proofreaders, "be" replacing "he" or vice-versa, and draws your
592 attention to it when it thinks it has found one.
596 Line 2017 - Query digit in a1most
598 The digit 1 is commonly OCRed for the letter l, the digit 0 for
599 the letter O, and so on. When bookloupe sees a mix of digits and
600 letters, it warns you. It may generate a false positive for
605 Line 2083 - Query standalone 0
607 In paranoid mode (see -x) only, bookloupe warns about the digit 0
608 and the number 1 standing alone as a word. This can happen if the
609 OCR misreads the words O or I.
613 Line 2115 - Query word whetber
615 If you have switched typo-checking on, bookloupe looks for
616 potential typos, especially common h/b errors. It's not
617 infallible; it sometimes queries legit words, but it's
618 always worth taking a look.
622 Line 2190 column 14 - Missing space?
624 Omitting a space is a very common error,especially coming from
625 OCRed text,and can be hard for a human to spot. The commas in
626 the previous sentence illustrate the kind of thing I mean.
630 Line 2240 column 48 - Spaced punctuation?
632 The flip side of the "missing space" error , here , is when extra
633 spaces are added before punctuation . Some old texts appear to add
634 extra spaces around punctuation consistently, but this was a
635 typographical convention rather than the author's intent, and the
636 extra "spaces" should be removed when preparing a PG text.
640 Line 2301 column 19 - Unspaced quotes?
642 Another common spacing problem occurs in a phrase like "You wait
647 Line 2385 column 27 - Wrongspaced quotes?
649 Bookloupe checks whether a quote seems to be a start or end quote,
650 and queries those that appear to be misplaced. This does give rise
651 to false positives when quotes are nested, for example:
653 "And how," she asked, "will your "friends" help you now?"
655 but these false positives are worth it because of the many cases
656 that this test catches, notably those like:
658 "And how, "she said," will your friends help you now?"
660 Sometimes a "wrongspaced quotes" query will arise because an earlier
661 quote in the paragraph was omitted, so if the place specified seems
662 to be OK, look back to see whether there's a problem in the preceding
667 Line 2400 - HTML Tag? <PRE>
669 Some PG texts have been converted from HTML, and not all of the
670 HTML tags have been removed.
674 Line 2402 - HTML symbol? &emdash;
676 Similarly, special HTML symbol characters can survive into PG
677 texts. Can occasionally produce amusing false positives like
678 . . . Marwick & Co were well known for it;
682 Line 2540 - Mismatched quotes
684 Another bookloupe mainstay—unclosed doublequotes in a paragraph.
685 See the discussion of quotes in the switches section near the
688 Since the mismatch doesn't occur on any one line, bookloupe quotes
689 the line number of the first blank line following the paragraph,
690 since this is the point where it reconciles the count of quotes.
691 However, if bookloupe is echoing lines, that is, you haven't used
692 the -e switch, it will show the _first_ line of the paragraph,
693 to help you find the place without using line numbers. The
694 offending paragraph is therefore between the quoted line and
695 the line number given.
699 Line 2587 - Mismatched single quotes
701 Only checked with the -s switch, since checking single quotes is
702 not a very reliable process. Otherwise, the same logic as for
703 doublequotes applies.
707 Line 2877 - Mismatched round brackets?
709 Also curly and square brackets. Texts with a lot of brackets, like
710 plays with bracketed stage instructions, may have mismatches.
714 Line 3204 - Two successive CRs?
715 Line 3281 position 75 - CR without LF?
717 These are the invalid line-end warnings. See the discussion of
718 line-end checking in the switches section near the start of this
719 file. If you see these, and your editor doesn't show anything
720 wrong, you should probably try deleting the characters just before
721 and after the line end, and the line-end itself, then retyping the
722 characters and the line-end.
725 Line 2940 - Paragraph starts with lower-case
727 A common error in an e-text is for an extra blank line
729 to be put in, like the blank line above, and this often
730 shows up as a new paragraph beginning with lower case.
731 Sometimes the blank line is deliberate, as when a
732 quotation is inserted in a speech. Use your judgement.
735 Line 2987 - Extra period?
737 An extra period. is a. common problem in OCRed text. and usually
738 arises when a speck of dust on the page is mistaken for a period.
739 or. as occasionally happens. when a comma loses its tail.
742 Line 3012 column 12 - Double punctuation?
744 Double punctuation., like that,, is a common typo and
745 scanno. Some books have much legit double punctuation,
746 like etc., etc., but it's worth checking anyway.
752 For Windows-only users who are unfamiliar with DOS:
754 If you're a Windows-only user, you need to save
755 bookloupe.exe into the folder (directory) where the
756 text file you want to check is. Let's say your
757 text file is in C:\gut, then you should save
758 bookloupe.exe into C:\gut.
760 Now get to a console. You can do this by
761 selecting the "Command Prompt" or "MS-DOS Prompt"
762 option that will be somewhere on your
765 Now get into the C:\gut directory.
766 You can do this using the cd (change directory)
769 and your prompt will change to
771 so you know you're in the right place.
774 bookloupe yourfile.txt
775 and you'll see bookloupe's report
777 By default, bookloupe prints its queries to screen.
778 If you want to create a file of them, to edit
779 against the text, you can use the greater-than
780 sign (>) to tell it to output the report to a
781 file. For example, if you want its report in a
782 file called queries.lst, you could type
784 bookloupe yourfile.txt > queries.lst
786 The queries.lst file will then contain the listing
787 of possible formatting errors, and you can
788 edit it alongside your text.
790 Whatever you do, DON'T make the filename after
791 the greater-than sign the name of a file already
792 on your disk that you want to keep, because
793 the greater-than sign will cause bookloupe to
794 replace any existing file of that name.
796 So, for example, if you have two Tolstoy files
797 that you want to check, called WARPEACE.TXT and
798 ANNAK.TXT, make sure that neither of these names
799 is ever used following the greater-than sign.
800 To check these correctly, you might do:
802 bookloupe warpeace.txt > war.lst
806 bookloupe annak.txt > annak.lst
808 separately. Then you can look at war.lst and annak.lst
809 to see the bookloupe reports.
811 For Windows-only users who want to use bookloupe from guiguts:
813 1) If you haven't already done so, download bookloupe-win32-xxx.zip
814 from http://www.juiblex.co.uk/pgdp/bookloupe/
816 2) Extract the files into a suitable folder, e.g. C:\DP\bookloupe
820 4) Choose Preferences | File Paths | Set File Paths..
822 5) Click the "Locate Gutcheck..." button
824 6) Browse to the folder where you extracted bookloupe
826 7) Double-click bookloupe.exe
828 Now, whenever you do "Gutcheck" in Guiguts, it will run bookloupe
829 instead. Since the output will look very like gutcheck output, you
830 may want to check that it is actually bookloupe that is running. To do
831 this, look at the black command line message window, which will say:
833 "bookloupe: Check and report on an e-text".
835 To return to using gutcheck for any reason, repeat steps 4 and 5
838 6b) Browse back to the gutcheck folder, which is in a "tools"
839 folder inside the main Guiguts folder. It will be something like
840 "C:\DP\guiguts-win\tools\gutcheck", depending on where you installed
843 7b) Double-click gutcheck.exe
845 Now doing "Gutcheck" in Guiguts will run gutcheck itself, and the
846 message in the black window should read:
848 "gutcheck: Check and report on an e-text".