diff --git a/AUTHORS b/AUTHORS new file mode 100644 index 0000000..c371184 --- /dev/null +++ b/AUTHORS @@ -0,0 +1,7 @@ +Dmitry Kirsanov designed Fresh Eye +and implemented original MS-DOS version in 1995. + +Vadim Penzin rewrote and ported Fresh Eye to +Linux, FreeBSD, OpenBSD and Win32 in 1999-2000. + +$Id: AUTHORS,v 1.2 2002/06/07 03:45:54 vadimp Exp $ diff --git a/BUGS b/BUGS new file mode 100644 index 0000000..c1eeaca --- /dev/null +++ b/BUGS @@ -0,0 +1,55 @@ +This file describes known deficiencies of Fresh Eye. + +* Fresh Eye imposes no artificial limitations neither on the size of + input files nor on the length of lines. Nevertheless, Fresh Eye by itself + is helpless if your system has small memory. In extremely rare, although + possible conditions of insufficient memory, Fresh Eye makes no attempts of + graceful recovery - it aborts processing. There are five ways to overcome + such situations: + + - If the input file has many enormously long lines ('long' means more than + tens or hundreds of kilobytes, which is quite possible with + machine-generated HTML or texts having big paragraphs, which were + exported from word processors like Microsoft Word), it definitely makes + sense to re-format the file. While re-formatting, bear in mind that + Fresh Eye makes certain assumptions about text formatting, which affect + its computations. Please consult program documentation for details. + + - Split input file into smaller pieces. This will always help. + + - Try smaller context length (see --context-size option). This will make + Fresh Eye less precise. + + - Use larger swap space (or 'virtual memory', in Win32 parlance). + + - Buy more RAM. + +* Messages issued by Fresh Eye are a mess - part of them in English, + part of them in Russian. + +* Fresh Eye ignores locale settings, using its own default encoding for + Russian messages. + +* If resume is enabled (--resume=y) and fresheye.log contains information + about more than one pass over the same particular file then fe uses + information of the first pass for resuming. + +* Fresh Eye does not detect encoding of its log files, assuming they are + encoded using its own default encoding. It means that a log file produced + on Win32 using CP-866 cannot be processed "as is" on UNIX, where KOI8-R + is the default encoding - Fresh Eye will skip over all entries found in + this file. + +* 'ce' code page conversion utility cannot be used for general conversion + tasks because it ignores non-letters, for instance, it does not attempt to + convert pseudo-graphic characters. + +Please send suggestions for improvements or new features to +Dmitry Kirsanov . Please make sure there are words +'Fresh Eye' in the Subject: line. + +If you think that you discovered a bug, please report it to +Vadim Penzin . Please make sure there are words +'Fresh Eye' in the Subject: line. + +$Id: BUGS,v 1.2 2002/06/08 18:32:01 vadimp Exp $ diff --git a/COPYING b/COPYING new file mode 100644 index 0000000..eeb586b --- /dev/null +++ b/COPYING @@ -0,0 +1,340 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Library General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) 19yy + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) 19yy name of author + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + , 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Library General +Public License instead of this License. diff --git a/ChangeLog b/ChangeLog new file mode 100644 index 0000000..9d31ff5 --- /dev/null +++ b/ChangeLog @@ -0,0 +1,218 @@ +2002-06-29 Vadim Penzin + + * Rewrote interactive "interface" along the lines set by DK. Now there + is a help summary and a 'default key' feature. + * --resume has no argument anymore and it is off by default. + * Fixed description of --context-size in fe's usage: no upper limit, + must be 2 at least. + * Word dictionary dump is sorted by the number of occurences, in + descending order. No unique words in output. Properly handles empty + files, as well as files having only one, or only two words in them. + If all words are unique, the entire word dictionary is dumped. + * Fixed building using Microsoft Developer Studio. + * Updated win32/README with information on Cygwin + * Merged win32/recode.dsw and win32/fe.dsw into win32/fe.dsw. + * Changed version to 1.3.7. + +2002-06-28 Vadim Penzin + + * Added --without-cygwin to configure.in. Since Cygwin does not + attempt to determine if debug version of Microsoft Runtime Library + is present on the system, --without-cygwin disables --enable-debug + and --enable-profile (which are meaningless in this context anyway). + * Streamlined getopt_long detection logic and handling. + * configure prints settings summary on exit. + +2002-06-27 Vadim Penzin + + * Added --enable-encoding to configure.in. src/fe.c, src/lingtbl.c and + src/ui.c are not distributed anymore, configure creates these files from + src/fe.koi8-r, src/lingtbl.koi8-r and src/ui.koi8-r. + * Added platform detection at the build time. + * Fixed src/Makefile.am for Cygwin's $EXEEXT, separated generated files + from fe_SOURCES to nodist_fe_SOURCES and added them to MOSTLYCLEANFILES. + Now building from CVS requires automake-1.5. + * Changed default value of sensitivity threshold to 500. + * fe --version and ce --version print information about the platform + they were compiled on as well as default encoding which was set using + --enable-encoding. + * win32/*.dsp are broken for a while, will fix it later. + +2002-06-27 Dmitry Kirsanov + + * src/lingtbl.c: changed information values for yo + * src/fe.c: broke infor() into two functions, infor_same() and + infor_diff() + * src/fe.c: in simwords(), added a step to decrease sim based on + infor_diff() + * src/fe.c: in set_count_coefficient() decreased divisor from 13 to 8 + to sharpen dependance on wordcount information + +2002-06-08 Vadim Penzin + + * Moved README.win32 to win32/README, added win32/README to + EXTRA_DIST in win32/Makefile.am + * Removed win32/Makefile.mingw32 + * Made error message issued when user provides no input more + informative, removed message issued when processing of a file begins + * Added win32/recode.dsw win32/recode.dsp + * Rewrote win32/README + * Fixed compilation errors on Win32 + * Updated BUGS, wrote NEWS and TODO, added them to the distribution. + +2002-06-07 Vadim Penzin + + * Corrected creation year of original Fresh Eye in AUTHORS + * Added --enable-debug and --enable-profile for configure + * Fixed configure for platforms that do not have getopt_long + * Eliminated remaining #includes in headers + +2000-10-15 Vadim Penzin + + * Added automatic generation of config.h. Version changed to 1.3.6 + +2000-10-07 Vadim Penzin + + * Added compilation of GNU's getopt on platforms + that do not have getopt_long. This change affected + configure.in and src/Makfile.am + + * Fixed warnings issued by gcc on FreeBSD in getopt.c + + * Compiles and runs on FreeBSD. Version changed to 1.3.5 + +2000-10-05 Vadim Penzin + + * Added support for "yo". The similarity and quantity of + information coefficients are *copied* from "ye" + + * Version changed to 1.3.4 + +2000-10-02 Vadim Penzin + + * Changed #inclusion policy to "included files do not #include + other files". + + * All warnings issued by -Wall eliminated + + * Added dos/fe.dsw, dos/ce.dsp, dos/fe.dsp + + * Compiles with Microsoft Visual C 6.0 without warnings + + * Version changed to 1.3.3 + +2000-04-30 Vadim Penzin + + * src/ce.c: Added --strict + +1999-07-28 Vadim Penzin + + * Added wrappers.c, wrappers.h, utilities.c and utilities.h. + All calls to fopen and fclose changed accordingly except for + check_log (). + + * Added empty context.c context.h + + * Wrote first version of README.win32 + + * All files opened for reading are now closed + +1999-07-27 Vadim Penzin + + * Created dos/Makefile.gcc for use with Mingw32 + +1999-07-26 Vadim Penzin + + * Word mode removed, globals wordmode and wordwrap eliminated, + fgs (), nextword () and displ_help () changed accordingly. + +$Id: ChangeLog,v 1.11 2002/06/29 06:50:54 vadimp Exp $ +2002-06-26 Vadim Penzin + + * --version prints platform type + * Added acconfig.h + * Removed src/config.h.in from CVS + +2002-06-21 Vadim Penzin + + * Build fixes for NetBSD 1.5.2 + +2002-06-08 Vadim Penzin + + * Moved README.win32 to win32/README, added win32/README to + EXTRA_DIST in win32/Makefile.am + * Removed win32/Makefile.mingw32 + * Made error message issued when user provides no input more + informative, removed message issued when processing of a file begins + * Added win32/recode.dsw win32/recode.dsp + * Rewrote win32/README + * Fixed compilation errors on Win32 + * Updated BUGS, wrote NEWS and TODO, added them to the distribution. + +2002-06-07 Vadim Penzin + + * Corrected creation year of original Fresh Eye in AUTHORS + * Added --enable-debug and --enable-profile for configure + * Fixed configure for platforms that do not have getopt_long + * Eliminated remaining #includes in headers + +2000-10-15 Vadim Penzin + + * Added automatic generation of config.h. Version changed to 1.3.6 + +2000-10-07 Vadim Penzin + + * Added compilation of GNU's getopt on platforms + that do not have getopt_long. This change affected + configure.in and src/Makfile.am + + * Fixed warnings issued by gcc on FreeBSD in getopt.c + + * Compiles and runs on FreeBSD. Version changed to 1.3.5 + +2000-10-05 Vadim Penzin + + * Added support for "yo". The similarity and quantity of + information coefficients are *copied* from "ye" + + * Version changed to 1.3.4 + +2000-10-02 Vadim Penzin + + * Changed #inclusion policy to "included files do not #include + other files". + + * All warnings issued by -Wall eliminated + + * Added dos/fe.dsw, dos/ce.dsp, dos/fe.dsp + + * Compiles with Microsoft Visual C 6.0 without warnings + + * Version changed to 1.3.3 + +2000-04-30 Vadim Penzin + + * src/ce.c: Added --strict + +1999-07-28 Vadim Penzin + + * Added wrappers.c, wrappers.h, utilities.c and utilities.h. + All calls to fopen and fclose changed accordingly except for + check_log (). + + * Added empty context.c context.h + + * Wrote first version of README.win32 + + * All files opened for reading are now closed + +1999-07-27 Vadim Penzin + + * Created dos/Makefile.gcc for use with Mingw32 + +1999-07-26 Vadim Penzin + + * Word mode removed, globals wordmode and wordwrap eliminated, + fgs (), nextword () and displ_help () changed accordingly. + +$Id: ChangeLog,v 1.11 2002/06/29 06:50:54 vadimp Exp $ diff --git a/Makefile.am b/Makefile.am new file mode 100644 index 0000000..e9936dd --- /dev/null +++ b/Makefile.am @@ -0,0 +1,7 @@ +# $Id: Makefile.am,v 1.5 2002/06/27 00:44:00 vadimp Exp $ + +EXTRA_DIST = NEWS TODO BUGS README.Russian +SUBDIRS = src win32 + +recode: + (cd src; make recode) diff --git a/NEWS b/NEWS new file mode 100644 index 0000000..a1e8b60 --- /dev/null +++ b/NEWS @@ -0,0 +1,36 @@ +This file describes changes made to Fresh Eye from one release to another. + +* Fresh Eye 1.4 + + Fresh Eye 1.4 features great performance and feature improvements over + its predcessor - Fresh Eye 1.21, released in November 1995: + + - Fresh Eye 1.4 compiles and runs on a variety of platforms: Linux, + FreeBSD, OpenBSD, Sun Solaris, Win32, and Cygwin. We believe that + Fresh Eye can be compiled with no or little modifications on many + other platforms. + + - Fresh Eye 1.4 supports virtually all major single-byte Russian + encodings: KOI8-R, CP-1251, CP-866, MacCyrillics, ISO 8859-5. + + - Fresh Eye 1.4 imposes no artificial limitations on its input: length + of a word, length of a line, number of lines in a file, size of the + word dictionary, size of the context can be anything that fits in + computer's memory. + + - Fresh Eye 1.4 is a true 32-bit application that is able to use all + available memory found on modern desktop systems. + + - Fresh Eye undergone massive speed optimisations: now it processes + its input up to ten times faster than Fresh Eye 1.21 on the same + machine. + + - Fresh Eye 1.4 features a better command-line interface, which uses + GNU long options, and is able to process several files at once. + + MS-DOS is not supported anymore. The state of OS/2 port made by + K. Boyandin is unknown, link to his home page points nowhere. + Fresh Eye 1.4 breaks compatibility with the log file format of + Fresh Eye 1.21. + +$Id: NEWS,v 1.2 2002/06/08 18:32:01 vadimp Exp $ diff --git a/README b/README new file mode 100644 index 0000000..b2fa513 --- /dev/null +++ b/README @@ -0,0 +1,16 @@ +If you just fetched the source tree from a CVS repository, please use +the following command to generate files that are needed for building: + aclocal && autoconf && autoheader && automake -a + +Please refer to the file named INSTALL for building on UNIX or Cygwin. + +Please see win32/README for building on Win32. + +Please read the file README.Russian for general description of the program, +the file is encoded using KOI8-R. + +Send suggestions for improvements to Dmitry Kirsanov +Report bugs to Vadim Penzin +Please make sure there are words 'Fresh Eye' in the Subject: line + +$Id: README,v 1.3 2002/06/26 00:27:47 vadimp Exp $ diff --git a/TODO b/TODO new file mode 100644 index 0000000..eabbb62 --- /dev/null +++ b/TODO @@ -0,0 +1,21 @@ +This file describes features that we would like to implement in the future, +in random order: + +* Automatic generation of binary packages for various UNIX packaging systems + (DEB, RPM etc). +* Installer for Win32. +* Unicode support, UTF-8 in particular. +* Implement '-O' option for changing encoding of output produced by + Fresh Eye (Now it is #ifdef'ed out in src/ui.c). +* Use XML for the log file. +* Use GNU gettext for message translation and locale-aware functionality. + (Can it be done on Win32?) +* Write a man page and/or texinfo documentation. +* Embedding API that would allow for integration of Fresh Eye into e-mail + clients, text editors and word processors. +* More ports: Max OS X, HURD. + +If you wish to participate, please contact +Dmitry Kirsanov or Vadim Penzin + +$Id: TODO,v 1.1 2002/06/08 18:32:55 vadimp Exp $ diff --git a/config.cache b/config.cache new file mode 100644 index 0000000..00fc237 --- /dev/null +++ b/config.cache @@ -0,0 +1,36 @@ +# This file is a shell script that caches the results of configure +# tests run on this system so they can be shared between configure +# scripts and configure runs. It is not useful on other systems. +# If it contains results you don't want to keep, you may remove or edit it. +# +# By default, configure uses ./config.cache as the cache file, +# creating it if it does not exist already. You can give configure +# the --cache-file=FILE option to use a different cache file; that is +# what configure does when it calls configure scripts in +# subdirectories, so they share the cache. +# Giving --cache-file=/dev/null disables caching, for debugging configure. +# config.status only pays attention to the cache file if you give it the +# --recheck option to rerun configure. +# +ac_cv_type_size_t=${ac_cv_type_size_t='yes'} +am_cv_CC_dependencies_compiler_type=${am_cv_CC_dependencies_compiler_type='gcc'} +ac_cv_prog_cc_works=${ac_cv_prog_cc_works='yes'} +ac_cv_func_getopt_long=${ac_cv_func_getopt_long='yes'} +ac_cv_prog_cc_g=${ac_cv_prog_cc_g='yes'} +ac_cv_path_install=${ac_cv_path_install='/usr/bin/install -c'} +ac_cv_func_strstr=${ac_cv_func_strstr='yes'} +ac_cv_c_const=${ac_cv_c_const='yes'} +ac_cv_prog_CC=${ac_cv_prog_CC='gcc'} +ac_cv_prog_LN_S=${ac_cv_prog_LN_S='ln -s'} +ac_cv_header_stdc=${ac_cv_header_stdc='yes'} +ac_cv_prog_make_make_set=${ac_cv_prog_make_make_set='yes'} +ac_cv_header_string_h=${ac_cv_header_string_h='yes'} +ac_cv_header_unistd_h=${ac_cv_header_unistd_h='yes'} +ac_cv_lib_m_sqrt=${ac_cv_lib_m_sqrt='yes'} +ac_cv_prog_gcc=${ac_cv_prog_gcc='yes'} +ac_cv_prog_cc_cross=${ac_cv_prog_cc_cross='no'} +ac_cv_func_strerror=${ac_cv_func_strerror='yes'} +ac_cv_type_pid_t=${ac_cv_type_pid_t='yes'} +ac_cv_prog_CPP=${ac_cv_prog_CPP='gcc -E'} +ac_cv_prog_AWK=${ac_cv_prog_AWK='gawk'} +ac_cv_func_strdup=${ac_cv_func_strdup='yes'} diff --git a/configure.in b/configure.in new file mode 100644 index 0000000..ba7d78b --- /dev/null +++ b/configure.in @@ -0,0 +1,155 @@ +dnl $Id: configure.in,v 1.8 2002/06/29 05:22:13 vadimp Exp $ +dnl Process this file with autoconf to produce a configure script. +AC_INIT(src/fe.h) +AM_INIT_AUTOMAKE(fe, 1.3.7) + +dnl Check platform +AC_CANONICAL_HOST +AC_DEFINE_UNQUOTED(PLATFORM,"$host",[Specifies platform type]) + +dnl Checks for programs. +AC_PROG_AWK +AC_PROG_CC +AC_PROG_INSTALL +AC_PROG_LN_S + +dnl Checks for libraries. +AC_CHECK_LIB(m, sqrt) +AC_CHECK_FUNC(getopt_long, HAVE_GETOPT_LONG=true, HAVE_GETOPT_LONG=false) + +dnl Checks for header files. +AC_HEADER_STDC +AC_CHECK_HEADERS(unistd.h string.h) + +dnl Checks for typedefs, structures, and compiler characteristics. +AC_C_CONST +AC_TYPE_PID_T +AC_TYPE_SIZE_T + +dnl Checks for library functions. +AC_CHECK_FUNCS(strdup strerror strstr) + +AM_CONFIG_HEADER(src/config.h) + +dnl Checks for user-supplied arguments +AC_ARG_ENABLE(debug, + [ --enable-debug Turn on debugging], + [case "${enableval}" in + yes) debug=true ;; + no) debug=false ;; + *) AC_MSG_ERROR(bad value ${enableval} for --enable-debug) ;; + esac],[debug=false]) + +AC_ARG_ENABLE(profile, + [ --enable-profile Turn on profiling], + [case "${enableval}" in + yes) profile=true ;; + no) profile=false ;; + *) AC_MSG_ERROR(bad value ${enableval} for --enable-profile) ;; + esac],[profile=false]) + +encoding="CYR_CP_KOI8_R" +encoding_name="koi8-r" +AC_ARG_ENABLE(encoding, + [ --enable-encoding=CP Set default Cyrillic codepage. CP must be one of: + koi8-r (default), cp1251, cp866, mac, iso8859-5], + [case "${enableval}" in + koi8-r) + encoding="CYR_CP_KOI8_R" + encoding_name="koi8-r" + ;; + cp866) + encoding="CYR_CP_CP866" + encoding_name="cp866" + ;; + cp1251) + encoding="CYR_CP_CP1251" + encoding_name="cp1251" + ;; + mac) + encoding="CYR_CP_MAC" + encoding_name="mac" + ;; + iso8859-5) + encoding="CYR_CP_ISO_8859_5" + encoding_name="iso8859-5" + ;; + *) AC_MSG_ERROR(bad value ${enableval} for --enable-encoding) ;; + esac],[]) + +case $host_os in + *cygwin* ) + AC_ARG_WITH(cygwin, + [ --without-cygwin Disable linking with Cygwin DLLs, + implies --disable-debug --disble-profile], + [case "${withval}" in + no) + no_cygwin=true + debug=false + profile=false + HAVE_GETOPT_LONG=false + LIBS= + ;; + *) ;; + esac], [no_cygwin=false]) + ;; +esac + +AM_CONDITIONAL(COMPILE_GNU_GETOPT, test x$HAVE_GETOPT_LONG = xfalse) +AC_DEFINE_UNQUOTED(CYR_CP_DEFAULT, $encoding, [Default Cyrillic code page]) +AC_DEFINE_UNQUOTED(CYR_CP_NAME, "$encoding_name", + [The name of default Cyrillic code page]) +AM_CONDITIONAL(NO_CYGWIN, test x$no_cygwin = xtrue) +AM_CONDITIONAL(DEBUG, test x$debug = xtrue) +AM_CONDITIONAL(PROFILE, test x$profile = xtrue) + +AC_OUTPUT( Makefile src/Makefile win32/Makefile ) + +if test "${debug}" = true +then + debug_setting="enabled" +else + debug_setting="disabled" +fi + +if test "${profile}" = true +then + profile_setting="enabled" +else + profile_setting="disabled" +fi + +case $encoding_name in + koi8-r) encoding_setting="KOI8-R";; + cp1251) encoding_setting="Windows CP-1251";; + cp866) encoding_setting="MS-DOS CP-866";; + mac) encoding_setting="MacCyrillic";; + iso8859-5) encoding_setting="ISO 8859-5";; +esac + +cat <<-EOF + +Configuration of Fresh Eye's source tree is completed. + + * Debugging is $debug_setting. + * Profiling is $profile_setting. + * Default Cyrillic code page is $encoding_setting. +EOF + +if test "${no_cygwin}" = true +then + echo " * Cygwin is disabled, Fresh Eye will use Microsoft C Runtime Library." +fi +if test "${host_os}" = cygwin -a "${no_cygwin}" = false +then + echo " * Cygwin is enabled, Fresh Eye will use POSIX layer that Cygwin provides." +fi + +cat <<-EOF + +If you are satisfied with these settings, please run 'make' and then +'make install'. Depending on your environment, you may need to become +super user before running 'make install'. + +Thank you for using Fresh Eye. +EOF diff --git a/fex.gif b/fex.gif new file mode 100644 index 0000000..d036298 Binary files /dev/null and b/fex.gif differ diff --git a/readme.html b/readme.html new file mode 100644 index 0000000..db0da58 --- /dev/null +++ b/readme.html @@ -0,0 +1,434 @@ + + + + + + +Свежий Взгляд / Fresh Eye + + +

http://www.kirsanov.com/fresheye/   +http://fresheye.sourceforge.net   +Version 1.4pre (1.3.7)   +© 1994-2002, OnMind Systems   + +

+ +

Fresh Eye [*] яБЕФХИ бГЦКЪД

+

+ У какого-то писателя я встретил + в одной фразе «кошку» и «кишку» — + отвратительно! Меня едва не + стошнило. + + +

+ Лев Толстой +

Что это такое

+ +

Программа Свежий Взгляд может быть полезна всем, чья деятельность +связана с сочинением или редактированием текстов на русском +языке. Это не универсальный корректор, а специализированная утилита +для поиска лишь одной разновидности стилистических ошибок — +относящихся, впрочем, к самым досадным и крайне трудных для +обнаружения без помощи компьютера. Если вы уже не допускаете +элементарных стилистических ляпов, а тексты ваши рассчитаны на +внимательное и неоднократное чтение, Свежий Взгляд способен стать +для вас одним из самых необходимых инструментов (тем более, что +какие-либо аналоги реализованного в программе алгоритма авторам +неизвестны). Посмотрите на свои тексты Свежим Взглядом!

1 Зачем это нужно

2 Что нового

3 Условия распространения

4 Инсталляция

4.1 Windows

4.2 Unix (Linux, BSD, Solaris, ...)

5 Работа

5.1 Запуск

5.2 Интерактивный режим

5.3 Лог-файл

5.4 Возобновление

5.5 Пакетный режим

6 Алгоритм

6.1 Параметры оценки

6.2 Частотный словарь

6.3 Вычисление badness

6.4 Игнорируемые слова

6.5 Что еще не сделано

7 Авторы

8 Другие программы Дмитрия Кирсанова

8.1 Transforming Text Editor

8.2 Russian Word Constructor

1 Зачем это нужно

Идея программы крайне проста. Свежий Взгляд отыскивает в тексте +места, подозрительные на предмет одной из самых распространенных +стилистических погрешностей: расположенных близко по тексту +фонетически (а часто и морфологически) сходных слов, чей параллелизм +никак не мотивирован (так называемая паронимия, или +«нечаянная тавтология»). Вот некоторые из ошибок, найденных +программой:

...при условии, что нелогичность эта +художественная, обусловленная композиционными +соображениями...

...чтобы его сайты соответствовали духу и букве +соответствующих стандартов...

...и тем самым установить контроль +как над самим стандартом, так и над соответствующим сегментом рынка...

...спираль развития HTML завершила свой +первый виток...

...не только на весь документ в целом, но и на какую-то точку +(точнее, на какой-то элемент)...

...единственный способ приспособить этот язык...

...придется преобразовать в XML следующим образом...

...приемов, позволяющих добиться приемлемого качества...

...к сожалению, существует сразу несколько препятствий к осуществлению этой схемы...

...соответствующим +модулем, ответственность за запуск которого...

...при определении субъективного размера неважные части формы +учитываются лишь частично или игнорируются вообще...

...границы объекта становятся трудноразличимыми и +субъективно он может восприниматься значительно большим, чем на самом деле...

...традиционная парадигма заголовка как самой +выдающейся, "издалека +видной" части композиции далеко не всегда практически оправдана...

...особо стоит упомянуть об отношениях с окружающим пространством +такого распространеннейшего элемента, как колонка текста...

...на первый взгляд выглядит более привлекательным...

...вместо нее это место занял...

...в последнее время неожиданно современно зазвучали...

...принципы подбора шрифтов достаточно подробно рассмотрены...

...были довольно дорогим удовольствием...

...представив зрителю главное действующее лицо предстоящей феерии дизайна...

...в виде частично видимой фотографии...

...хотя и не несет никакой практической пользы, как правило, +повышает субъективную оценку дизайна пользователем...

...имеет еще одно проявление, специфическое +именно для дизайна логотипов...

...без знакомства со сферой деятельности +владельца знака...

...при этом обычное направление +восприятия информации - слева направо - диктует...

...в первую очередь, очевидно, баннеры должны...

...в определенных пределах...

...аппроксимируют исходное +изображение областями плоского цвета (сходный эффект дает...

Соответственно, для XML-документов существует два уровня соответствия стандарту...

Размеченный таким образом документ нелегко будет преобразовать...

...попробуем разобрать типы изображений по выполняемым ими функциям...

...этот вариант следует выбирать для следующих кадров...

...можно пользоваться, не боясь ограничить свою аудиторию пользователями...

...несколько хитрее - они манипулируют сразу несколькими параметрами

Как правило, после посылки...

...минимально возможное количество информации, которое можно...

...своих свойств...

...определяя для себя их приоритеты. Обнаружена определенная динамика...

...ресурсов в одну общедоступную систему. Вавилонское столпотворение + форматов, протоколов и методов доступа сменяется единой...

...ситуация еще больше упростилась, так как большинство...

Удивительнее всего то, что тексты, из которых взяты приведенные +примеры, вычитывались многими людьми и не по одному разу. И тем не +менее, понадобился Свежий Взгляд, чтобы все это не +отправилось в типографию. (Если вы не верите, что программа смогла +самостоятельно найти эти ошибки — запустите ее на обработку файла, +который сейчас читаете, и убедитесь сами.)

Далеко не всякая тавтология является ошибкой. В техническом тексте +вряд ли можно избежать частого повторения терминов; в художественных +и публицистических текстах лексический или фонетический параллелизм +зачастую служит средством достижения выразительности. И все же в +значительной доле случаев тавтология есть результат стилистической +неряшливости. Даже тогда, когда соседство бессмысленно похожих слов +не режет глаз, замена одного из них на фонетически далекий синоним +почти всегда делает текст плотнее, устойчивее и интереснее. К +сожалению, ошибки такого рода легко допустить и трудно обнаружить +именно благодаря их «тавтологичности» — что и послужило стимулом для +создания программы, взгляд которой остается всегда свежим и +незамыленным.

Конечно, программа не исправляет ошибки сама — она лишь находит их, +причем далеко не все из найденного действительно следует +править. Без определенного навыка даже автор-профессионал не всегда +может быстро решить, является ли найденный Свежим Взглядом контекст +стилистической ошибкой, и тем более — на лету сообразить, как +избавиться от тавтологии. Однако со временем те, кто регулярно +пользуется этой программой, вырабатывают вкус к фонетической и +смысловой полновесности и приучаются избегать в своих текстах +большой части тех ошибок, на отлов которых нацелен Свежий Взгляд.

2 Что нового

Главные отличия версии 1.4:

    +
  • Поддержка платформ Windows, Linux, FreeBSD, OpenBSD, +NetBSD, Sun Solaris, Cygwin. Вероятно, без сколько-нибудь +значительных изменений программа будет компилироваться и работать и +на многих других платформах.

  • + +
  • Поддержка основных кодировок русского языка: CP-1251, CP-866, +KOI8-R, Mac, ISO 8859-5.

  • + +
  • Улучшения в алгоритме, позволившие снизить число ложных +срабатываний.

  • + +
  • Снятие каких бы то ни было ограничений на размер входного файла, +количество букв в слове, длину строки, число строк в файле, размер +частотного словаря и длину контекста. Единственное ограничение — +объем оперативной памяти вашего компьютера.

  • + +
  • Оптимизация по скорости: в одинаковых условиях версия 1.4 работает +в несколько раз быстрее, чем предыдущая (1.21).

  • + +
  • Новый интерфейс командной строки (стандарт GNU).

  • +

Подробный список изменений см. в файле ChangeLog.

3 Условия распространения

Программа Свежий Взгляд распространяется на условиях GNU GPL +(см. файл COPYING) в виде исходного текста на языке C. (Дистрибутив +для Windows включает в себя также скомпилированный исполняемый +файл.) Авторы будут благодарны за упоминание программы в выходных +данных обработанных ею книг или других текстов.

4 Инсталляция

4.1 Windows

Под Windows достаточно скопировать исполняемый файл программы +(fe.exe) в один из каталогов, перечисленных в $PATH (или, в крайнем +случае, в тот каталог, где вы храните свои тексты). Если вы хотите +скомпилировать программу под Windows, см. файл win32\readme.

4.2 Unix (Linux, BSD, Solaris, ...)

Подробные инструкции по компиляции и установке программы для Unix см. в файле +INSTALL. Коротко: ./configure; make; make install.

5 Работа

5.1 Запуск

Свежий Взгляд запускается из командной строки и обрабатывает файлы в +формате «просто текст» (plain text):

fe [опции] исходный-файл

Поскольку латинские буквы программой игнорируются, на практике это +означает, что с тем же успехом можно использовать Свежий Взгляд для +проверки текста, размеченного в ТеХе, в XML (без русских имен +элементов), в HTML и т.п.

По умолчанию предполагается, что входной файл использует кодировку +CP866 («альтернативную», или «досовскую») на Windows и KOI8-R на +Unix. Вы можете явно указать кодировку текста опцией -I. Например,

fe -I cp1251 исходный-файл

обработает файл в кодировке CP1251 (кодировке Windows). (Кстати, эта +кодировка не сделана стандартной на платформе Windows, потому что в +«окне MS-DOS», в котором запускается программа, правильно +отображается только CP866.)

В состав дистрибутива входит также утилита ce, которой можно +пользоваться для трансляции русских текстов из одной кодировки в +другую. Чтобы получить справку по использованию этой утилиты, +напишите

ce --help

5.2 Интерактивный режим

Встретив ошибку, Свежий Взгляд выводит на экран соответствующий +контекст, подчеркнув в нем подозрительные слова, и спрашивает +пользователя, не записать ли это место в лог-файл (по умолчанию он +называется fresheye.log и создается в том же каталоге, что и +обрабатываемый файл). По окончании проверки вы можете исправить +неблагозвучные фрагменты в текстовом редакторе, пользуясь в качестве +подсказки содержимым лог-файла.

В ответ на очередной запрос вы можете нажать одну из следующих +клавиш (плюс Enter):

    +
  • Y: записать найденный контекст в лог-файл;

  • +
  • N или пробел: пропустить это место и продолжить проверку;

  • +
  • S: остановить работу программы;

  • +
  • H: вывести краткую подсказку.

  • +

5.3 Лог-файл

Если лог-файл уже существует на момент запуска программы, результаты +ее работы всегда дописываются в конец лог-файла. Это значит, что при +обработке нескольких файлов в одном каталоге все результаты проверки +будут записаны в один файл. Для каждого сеанса работы, помимо +отобранных пользователем стилистических погрешностей, в лог-файле +сохраняется информация о количестве просмотренных строк, слов, числе +срабатываний и занесенных в лог-файл фрагментов.

По умолчанию лог-файл называется fresheye.log, но вы можете указать +любое другое имя с помощью опции -o, например:

fe -o myfile.fe myfile.txt

5.4 Возобновление

Опция -r позволяет включить режим возобновления обработки, удобный +для правки больших файлов в несколько приемов. Добавив в командную +строку -ry, вы заставите Свежий Взгляд прочитать существующий +лог-файл, чтобы выяснить, не проверял ли он заказанный текст ранее. +Если файл уже обрабатывался, проверка будет возобновлена с той +строки, до которой вы дошли в прошлый раз. С опцией -rn (оставленной +для совместимости со старыми версиями программы) или в отсутствие +опции -r программа всегда будет обрабатывать файл с начала.

5.5 Пакетный режим

Можно попросить Свежий Взгляд записывать в лог-файл все, что он +сочтет неблагозвучным, не спрашивая совета пользователя. На время +раздумий программу можно предоставить самой себе, после чего вам +останется лишь быстро пробежаться по лог-файлу, отмечая то, что +действительно требует исправления. Для этого достаточно добавить в +командную строку опцию -a. Например,

fe -a -o allfiles *.txt

обработает в пакетном режиме все текстовые файлы текущего каталога и +запишет найденные ошибки в лог-файл allfiles.

Кроме того, перейти в пакетный режим можно, нажав a (латинское) в +ответ на очередной интерактивный запрос программы.

6 Алгоритм

6.1 Параметры оценки

Познакомимся теперь с числовыми параметрами, которыми пользуется +Свежий Взгляд при оценке стилистических шероховатостей. Для каждой +пары слов, не понравившихся программе, выводятся на экран и/или +записываются в протокольный файл следующие величины:

    + +
  • line <число>
    + +Номер строки, в которой произошло срабатывание. Он поможет вам +быстро найти нужное место в файле.

  • + +
  • sim = <число>
    + +Коэффициент сходства найденных слов. Подсчитывается по особому +алгоритму, призванному моделировать психологическое восприятие +схожести слов. Может принимать значения от 0 («бузина» и «дядька») +до 1500 и выше (длинные совпадающие словоформы).

  • + +
  • dist = <число>
    + +Психологическое контекстное расстояние между отмеченными словами. +Определяется качеством и количеством стоящих в промежутке +разделителей — слов (с учетом длины), пробелов (несколько идущих +друг за другом пробелов воспринимаются как один), знаков препинания, +границ абзаца (пустая строка) и т.п. Учитывается также частотность +рассматриваемых слов в данном тексте.

  • + +
  • badness = <число>
    + +Окончательное суждение Свежего Взгляда о степени нежелательности +соседства данной пары слов, выраженное в числовой форме. Эта +величина тем больше, чем сильнее слова похожи друг на друга +(т.е. чем больше параметр sim) и тем меньше, чем больше расстояние +между ними (параметр dist). Срабатывание происходит, если badness +превышает определенный порог (по умолчанию 500), который можно +задать параметром -s <число> в командной строке. Например: + +

    fe -s 650 glava5
    + +При значении этого параметра 700 и выше Свежий Взгляд будет +отлавливать почти исключительно одинаковые слова в разных формах. +Уменьшив порог до 400, вы увеличите количество срабатываний +(особенно, к сожалению, — ложных), но зато сможете обнаружить менее +очевидные огрехи.

  • +

Кроме того, с помощью опции -l <число> в командной строке можно +установить длину контекста — количество только что прочитанных +слов, которые помнит Свежий Взгляд и с которыми он сравнивает новое +слово (по умолчанию 15). Увеличение длины контекста не только +позволяет найти более далекие друг от друга тавтологические пары, но +и увеличивает значения badness для близких пар. +

6.2 Частотный словарь

Для тех слов, которые встречаются в тексте наиболее часто, параметр +dist искусственно увеличивается (т.е. слова как бы разводятся в +стороны), чтобы сделать программу терпимее к близкому соседству +частотных слов. Величина этого «растаскивания» определяется тем, +сколько раз каждое из сравниваемых слов встретилось во всем тексте. +Вот для чего Свежий Взгляд перед началом собственно проверки +набирает частотный словарь по данному тексту.

Степень учета информации о частотности слов регулируется также +особым коэффициентом, который может принимать значения от 0 до 100 +(по умолчанию 50) и устанавливается параметром -c <число> в +командной строке. Чем больше этот коэффициент, тем заметнее +уменьшается количество срабатываний на часто встречающихся словах. +Это позволяет в определенной степени учитывать характер +обрабатываемого текста.

Скажем, научные статьи или технические описания изобилуют часто +повторяемыми терминами, и параметр -c для них можно увеличить, +чтобы программа почти не обращала внимания на созвучия и повторения +частотных слов. Так, проверка этого файла с отключенным +частотным словарем

fe -c 0 README.Russian

резко увеличит количество срабатываний на частотных словах +«программа», «текст», «файл» и т.п. Наоборот, для сколько-нибудь +художественных текстов близкое соседство слов, даже весьма частотных +в масштабах всего текста, крайне нежелательно, поэтому для них нужно +понизить параметр -c.

Чтобы Свежий Взгляд мог набрать достаточно представительный +частотный словарь, объем обрабатываемого текста должен быть не менее +15-20 Кб. Указав в командной строке опцию -d, вы заставите +программу записать полученный словарь (отсортированный по количеству +вхождений и не включающий слова, встретившиеся в тексте лишь один +раз) в лог-файл. Для каждого слова (приведенного в той форме, +в которой оно первый раз встретилось в тексте) указаны его число +вхождений и соответствующий коэффициент информативности, +изменяющийся от 1000 у одноразовых слов до приблизительно 200-300 +для самых частотных.

6.3 Вычисление badness

При вычислении коэффициента сходства словоформ (параметра sim) +учитываются:

    +
  • степень «фонетической близости» букв («одна и та же буква», «очень +близкие буквы» — например, «ф» и «в», и «чем-то схожие буквы» — +например, «м» и «н»);

  • + +
  • частотность букв в русском языке: чем более редкие буквы входят в +пару похожих слов, тем сильнее бросается в глаза их сходство;

  • + +
  • количество и длина совпадающих или схожих фрагментов в +сравниваемых словоформах;

  • + +
  • место в слове сопоставляемых фрагментов (начало слова, как +известно, намного информативнее, чем его конец), а также взаимное +расположение фрагментов в сравниваемых словах;

  • + +
  • соотношение длин сравниваемых слов;

  • + +
  • количество и частотность букв в несовпадающих частях слов: чем +больше букв, которые есть только в одном из двух слов, и чем реже +встречаются эти буквы (т.е. чем больше они несут информации), тем +меньше sim для этой пары.

  • +

После вычисления sim определяется параметр dist, для чего каждому +разделителю (цепочка идущих подряд пробелов, знак препинания, +граница абзаца и т.п.) приписывается вес из некоторой +таблицы. Вычисленная для данной пары слов сумма этих весов +корректируется на основе информации частотного словаря. Наконец, из +sim и dist вычисляется параметр badness по формуле нормального +распределения (коэффициент «сигма» пропорционален длине контекста) — +иными словами, badness принимает значения от sim до нуля по +гауссиане в зависимости от dist.

6.4 Игнорируемые слова

В программе предусмотрен небольшой словарь исключений, +содержащий распространенные словосочетания типа «друг друга», +«в конце концов», «полным-полно» и т.п. Все эти случаи +программой, естественно, игнорируются.

Свежий Взгляд пытается по мере сил отыскивать в тексте имена +собственные, чтобы не обращать внимание на их сходство с другими +словами. Алгоритм очень прост: именем собственным считается +любое слово, написанное с заглавной буквы и не стоящее в начале +предложения или после открывающей кавычки. В большинстве +случаев это свойство программы позволяет снизить число ложных +срабатываний. Если же ваши требования столь высоки, что фраза +«На диване сидел Иван» кажется вам абсолютно недопустимой, вы +можете отключить режим поиска имен собственных с помощью опции +-p.

6.5 Что еще не сделано

Очевидно, что реализованный в программе алгоритм может быть +значительно усовершенствован. Вот несколько идей:

    +
  • Набираемый программой частотный словарь дает количество +вхождений в тексте для слов, а не их форм, хотя примененный +алгоритм лемматизации (идентификации слов по формам) +весьма примитивен и правильно работает лишь приблизительно в +90% случаев (например, «искать» и «ищет» считаются разными +словами). Очевидно, что лингвистически корректная лемматизация +могла бы заметно повлиять на качество работы программы. Изобретать +велосипед не требуется: существует свободный (под GNU GPL) +спелл-чекер Ispell, для которого есть словарь аффиксов и грамматики +для русского языка. Нужно лишь приспособить алгоритм Ispell, +сводящий слово к его словарной форме, к Свежему Взгляду.

    + +

    Кроме набора частотного словаря, корректный алгоритм лемматизации +можно будет использовать и при собственно проверке текста. Так, +параметр badness имеет смысл менять в зависимости от того, является +ли найденная тавтологическая пара одним и тем же словом в разных +грамматических формах или же разными, хоть и похожими словами. Более +того, можно будет реализовать особый режим проверки, ищущий только +пары редких в масштабах всего текста однокоренных слов или форм +одного слова: их нечаянное соседство режет глаз даже на большом +расстоянии (в соседних абзацах и даже дальше), но попытка найти их +стандартным алгоритмом Свежего Взгляда с большой длиной контекста +приводит к резкому увеличению числа ложных срабатываний.

  • + +
  • Сейчас алгоритм вычисления badness учитывает информативность +только отдельных букв (на основе их частотности в языке), а dist — +информативность слов (на основе их частотности в данном +тексте). Интересно было бы использовать информацию о частотности +всех возможных буквосочетаний (скажем, длиной до 4 или 5 букв), +причем в зависимости от их места в слове (сейчас для букв +различаются только положение в начале слова и в любой другой +позиции). Для определения информативности слов имеет смысл как-либо +комбинировать значения частотности данного слова в проверяемом +тексте и в языке в целом или, по крайней мере, позволить программе +записывать и впоследствии использовать частотный словарь предыдущих +сеансов работы.

  • + +
  • Таблица весов разделителей, используемая при вычислении dist, +сейчас жестко зашита в программу. Нужно сделать ее изменяемой с +помощью настроечного файла, в котором для каждого разделителя +хранились бы соответствующее регулярное выражение и значение +веса. Это позволит, в частности, лучше приспособить программу для +работы с текстами в XML; например, если ваши заголовки размечены +тегами <head>...</head>, то в настройках достаточно приписать +этим цепочкам символов очень большие веса, чтобы предотвратить +срабатывание на словах из заголовка, повторяемых в первом +предложении следующего за заголовком текста.

  • +

Кроме того, в планах:

    +
  • XML-формат для лог-файла, чтобы облегчить его автоматическую +обработку (например, с помощью XSLT-трансформаций).

  • + +
  • API для встраивания Свежего Взгляда в текстовые редакторы, +программы электронной почты и т.п. В частности, хотелось бы иметь +возможность проверять свои тексты не выходя из (X)Emacs.

  • + +
  • Поддержка Unicode (для начала хотя бы в UTF-8).

  • + +
  • Автоматическое определение кодировки входного файла.

  • + +
  • Опции для задания кодировки экранного вывода и лог-файла (сейчас +она всегда совпадает с кодировкой обрабатываемого файла).

  • + +
  • Для Unix — использование кодировки locale и документация в texinfo +и в man-странице.

  • + +
  • Упрощение установки: программа-инсталлятор для Windows и +автоматическая генерация бинарных дистрибутивов (RPM, DEB) для +Unix.

  • +

Само собой, вы можете принять посильное участие в реализации этих (и +других) замыслов. Если вы внесли (или планируете внести) какое-либо +изменение в код программы, пожалуйста, свяжитесь с авторами +(см. следующий раздел).

7 Авторы

    +
  • Дмитрий Кирсанов (dmitry@kirsanov.com, http://www.kirsanov.com) +разработал алгоритм и написал первую версию Свежего Взгляда для +MS-DOS в 1994-95. В 2002 он внес усовершенствования в алгоритм и +переписал документацию.

  • + +
  • Вадим Пензин (penzin@attglobal.net) переписал код программы, +значительно усовершенствовав и оптимизировав ее, и перенес ее на +Linux, FreeBSD, OpenBSD и Win32 в 1999-2002.

  • +

Авторы будут рады узнать впечатления пользователей от программы +Свежий Взгляд и ответить на любые касающиеся ее вопросы.

8 Другие программы Дмитрия Кирсанова

8.1 Transforming Text Editor

http://tte.sourceforge.net

Transforming Text Editor — мощный и гибкий текстовый редактор, +напоминающий Emacs, но использующий XML и XPath для хранения и +обработки структурных представлений документов. Проект находится в +стадии планирования; основные архитектурные принципы нового +редактора подробно изложены в статье на +http://tte.sourceforge.net. Заинтересованные разработчики могут +связаться с Д. Кирсановым, чтобы обсудить участие в проекте.

8.2 Russian Word Constructor

http://www.kirsanov.com/rwc/

Экспериментальная программная система Russian Word Constructor +(RWC) представляет собой попытку изобрести «инструмент поэта» — +или, более строго, «диалоговую систему для создания +русскоязычных стихоподобных текстов». Главное в программе — +способность конструировать русские неологизмы на основе словаря +с лексико-статистической информацией о языке. Это дает мощный +толчок творческой фантазии, а оригинальная полноэкранная +оболочка позволяет тут же реализовывать возникающие идеи.

Отсутствие практической пользы заставляет отнести программу +RWC скорее к категории компьютерных игр, хотя она лишена +большинства их признаков — не тренирует ни быстроту реакции, ни +аналитические способности. Цель программы иная — доставить +радость творчества в максимально чистом виде.

SourceForge Logo +Last updated: Thu Jun 27 01:22:26 GMT-04:00 2002

\ No newline at end of file diff --git a/src/Makefile.am b/src/Makefile.am new file mode 100644 index 0000000..345984b --- /dev/null +++ b/src/Makefile.am @@ -0,0 +1,56 @@ +# $Id: Makefile.am,v 1.8 2002/06/28 05:01:33 vadimp Exp $ + +SUFFIXES=.koi8-r + +.koi8-r.c: ce + ./ce --input-codepage=koi8-r < $< > $@ + +if DEBUG +DEBUG_FLAGS=-g +else +DEBUG_FLAGS=-DNDEBUG +endif + +if PROFILE +PROFILE_FLAGS=-pg +endif + +if COMPILE_GNU_GETOPT +GNU_GETOPT_SRC = getopt.c getopt1.c getopt.h +endif + +if NO_CYGWIN +CYGWIN_FLAGS=-mno-cygwin +endif + +AM_CFLAGS=-Wall -O2 $(DEBUG_FLAGS) $(PROFILE_FLAGS) $(CYGWIN_FLAGS) + +bin_PROGRAMS = fe ce + +GENERATED_FILES = ui.c fe.c lingtbl.c + +fe_SOURCES = fe.h \ + cyrillic.c tables.c cyrillic.h \ + avl.c avl_low.c avl.h \ + reader.c reader.h \ + context.h context.c \ + util.c util.h \ + wrappers.c wrappers.h \ + ui.h \ + $(GNU_GETOPT_SRC) + +nodist_fe_SOURCES = $(GENERATED_FILES) + +ce_SOURCES = ce.c \ + cyrillic.c cyrillic.h tables.c \ + $(GNU_GETOPT_SRC) + +EXTRA_DIST = getopt.h getopt.c getopt1.c config-win32.h \ + ui.koi8-r lingtbl.koi8-r fe.koi8-r + +MOSTLYCLEANFILES = $(GENERATED_FILES) + +# Cygwin sets EXEEXT to .exe -- must use it. +fe.c: ce$(EXEEXT) +ui.c: ce$(EXEEXT) +lingtbl.c: ce$(EXEEXT) diff --git a/src/README b/src/README new file mode 100644 index 0000000..b2fa513 --- /dev/null +++ b/src/README @@ -0,0 +1,16 @@ +If you just fetched the source tree from a CVS repository, please use +the following command to generate files that are needed for building: + aclocal && autoconf && autoheader && automake -a + +Please refer to the file named INSTALL for building on UNIX or Cygwin. + +Please see win32/README for building on Win32. + +Please read the file README.Russian for general description of the program, +the file is encoded using KOI8-R. + +Send suggestions for improvements to Dmitry Kirsanov +Report bugs to Vadim Penzin +Please make sure there are words 'Fresh Eye' in the Subject: line + +$Id: README,v 1.3 2002/06/26 00:27:47 vadimp Exp $ diff --git a/src/avl.c b/src/avl.c new file mode 100644 index 0000000..9bfbb67 --- /dev/null +++ b/src/avl.c @@ -0,0 +1,152 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: avl.c,v 1.1.1.1 2000/10/17 01:16:56 vadimp Exp $ + */ + +#include +#include +#include +#include + +#include "avl.h" + +node* node_init ( const char* s, int len ) { + + node* this = NULL; + + assert ( s ); + + if ( (this = (node *) malloc ( sizeof ( node ) )) == NULL ) + return NULL; + + if ( (this -> key = strdup ( s )) == NULL ) { + free ( this ); + return NULL; + } + + this -> length = len; + this -> right = this -> left = NULL; + this -> height = 1; + this -> count = 1; + this -> coefficient = 0; + + return this; +} + +void node_free ( node* this ) { + if ( this ) { + if ( this -> key ) + free ( (node *) this -> key ); + node_free ( this -> left ); + node_free ( this -> right ); + free ( this ); + } +} + +void node_print ( node* n, FILE* f ) { + if ( n ) { + node_print ( n -> left, f ); + fprintf ( f, "%s\t%lu\n", n -> key, n -> count ); + node_print ( n -> right, f ); + } +} + +avl* avl_init ( cmpfn cmp ) { + + avl* this = NULL; + + assert ( cmp ); + + if ( (this = (avl *) malloc ( sizeof ( avl ) )) == NULL ) + return NULL; + + this -> root = NULL; + this -> count = 0; + this -> cmp = cmp; + + return this; +} + +void avl_free ( avl* this ) { + + assert ( this ); + + node_free ( this -> root ); + free ( this ); +} + +static node* lookup ( node* n, const key_t key, int len, cmpfn cmp ) { + + while ( n ) { + + int cmpres = cmp ( key, len, n -> key, n -> length ); + if ( !cmpres ) + break; + n = cmpres < 0 ? n -> left : n -> right; + } + + return n; +} + +node* avl_insert ( avl* this, const char* s, int len ) { + + node* new_node = NULL; + + assert ( this ); + assert ( s ); + + if ( (new_node = node_init ( s, len )) == NULL ) + return NULL; + + insert ( new_node, &this -> root, this -> cmp ); + this -> count ++; + + return new_node; +} + +node* avl_lookup ( avl* this, const char* s, int len ) { + + assert ( this ); + assert ( this -> cmp ); + assert ( s ); + + return lookup ( this -> root, s, len, this -> cmp ); +} + +void avl_print ( avl* this, FILE* f ) { + assert ( this ); + node_print ( this -> root, f ); +} + +void node_foreach ( avl* t, node* n, void (*func) ( avl*, node*, void * ), + void* user_data ) { + if ( n ) + { + node_foreach ( t, n -> left, func, user_data ); + func ( t, n, user_data ); + node_foreach ( t, n -> right, func, user_data ); + } +} + +void avl_foreach ( avl* this, void (*func) ( avl*, node*, void* ), + void* user_data ) { + + assert ( this ); + node_foreach ( this, this -> root, func, user_data ); +} diff --git a/src/avl.h b/src/avl.h new file mode 100644 index 0000000..bc4a95b --- /dev/null +++ b/src/avl.h @@ -0,0 +1,56 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: avl.h,v 1.2 2002/06/07 03:45:54 vadimp Exp $ + */ + +#define key_t char* +#define AVL_MAX_HEIGHT 41 /* why this? a small exercise */ +#define heightof(tree) ((tree) == NULL ? 0 : (tree)->height) + +typedef int (*cmpfn) ( const key_t, int, const key_t, int ); + +typedef struct node { + key_t key; + int length; + unsigned long count; + unsigned long coefficient; + struct node* left; + struct node* right; + unsigned char height; +} node; + +typedef struct avl { + node* root; + unsigned long count; + cmpfn cmp; +} avl; + + +/* Public interface -- avl.c */ +avl* avl_init ( cmpfn cmp ); +void avl_free ( avl* this ); +node* avl_insert ( avl* this, const char* s, int len ); +node* avl_lookup ( avl* this, const char* s, int len ); +void avl_print ( avl* this, FILE* f ); +void avl_foreach ( avl* this, void (*func) ( avl*, node*, void* ), + void* user_data ); + +/* Low-level AVL functions -- avl_low.c */ +void insert (node* new_node, node** ptree, cmpfn cmp ); +void delete (node* node_to_delete, node** ptree, cmpfn cmp ); diff --git a/src/avl_low.c b/src/avl_low.c new file mode 100644 index 0000000..0701891 --- /dev/null +++ b/src/avl_low.c @@ -0,0 +1,232 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: avl_low.c,v 1.2 2002/06/07 03:45:54 vadimp Exp $ + */ + +/* + * Mon Jul 12 18:45:57 IDT 1999, vadimp: + * + * This code is taken from Linux kernel 2.2.10 and adopted for the needs of + * Fresh Eye by Vadim Penzin . + * Original source (see mm/mmap_avl.c in the Linux kernel source tree) is + * written by Bruno Haible . + */ + +#include +#include +#include + +#include "avl.h" + +/* + * task->mm->mmap_avl is the AVL tree corresponding to task->mm->mmap + * or, more exactly, its root. + * A vm_area_struct has the following fields: + * left left son of a tree node + * right right son of a tree node + * height 1+max(heightof(left),heightof(right)) + * The empty tree is represented as NULL. + */ + +/* Since the trees are balanced, their height will never be large. */ +#define AVL_MAX_HEIGHT 41 /* why this? a small exercise */ +#define heightof(tree) ((tree) == NULL ? 0 : (tree)->height) +/* + * Consistency and balancing rules: + * 1. tree->height == 1+max(heightof(tree->left),heightof(tree->right)) + * 2. abs( heightof(tree->left) - heightof(tree->right) ) <= 1 + * 3. foreach node in tree->left: node->key <= tree->key, + * foreach node in tree->right: node->key >= tree->key. + */ + +/* + * Rebalance a tree. + * After inserting or deleting a node of a tree we have a sequence of subtrees + * nodes[0]..nodes[k-1] such that + * nodes[0] is the root and nodes[i+1] = nodes[i]->{left|right}. + */ +static void rebalance (node*** nodeplaces_ptr, int count) +{ + for ( ; count > 0 ; count--) { + node** nodeplace = *--nodeplaces_ptr; + node* n = *nodeplace; + node* nodeleft = n->left; + node* noderight = n->right; + int heightleft = heightof(nodeleft); + int heightright = heightof(noderight); + if (heightright + 1 < heightleft) { + /* */ + /* * */ + /* / \ */ + /* n+2 n */ + /* */ + node* nodeleftleft = nodeleft->left; + node* nodeleftright = nodeleft->right; + int heightleftright = heightof(nodeleftright); + if (heightof(nodeleftleft) >= heightleftright) { + /* */ + /* * n+2|n+3 */ + /* / \ / \ */ + /* n+2 n --> / n+1|n+2 */ + /* / \ | / \ */ + /* n+1 n|n+1 n+1 n|n+1 n */ + /* */ + n->left = nodeleftright; + nodeleft->right = n; + nodeleft->height = 1 + + (n->height = 1 + heightleftright); + *nodeplace = nodeleft; + } else { + /* */ + /* * n+2 */ + /* / \ / \ */ + /* n+2 n --> n+1 n+1 */ + /* / \ / \ / \ */ + /* n n+1 n L R n */ + /* / \ */ + /* L R */ + /* */ + nodeleft->right = nodeleftright->left; + n->left = nodeleftright->right; + nodeleftright->left = nodeleft; + nodeleftright->right = n; + nodeleft->height = n->height = heightleftright; + nodeleftright->height = heightleft; + *nodeplace = nodeleftright; + } + } + else if (heightleft + 1 < heightright) { + /* + * similar to the above, + * just interchange 'left' <--> 'right' + */ + node* noderightright = noderight->right; + node* noderightleft = noderight->left; + int heightrightleft = heightof(noderightleft); + if (heightof(noderightright) >= heightrightleft) { + n->right = noderightleft; + noderight->left = n; + noderight->height = 1 + + (n->height = 1 + heightrightleft); + *nodeplace = noderight; + } else { + noderight->left = noderightleft->right; + n->right = noderightleft->left; + noderightleft->right = noderight; + noderightleft->left = n; + noderight->height = + n->height = heightrightleft; + noderightleft->height = heightright; + *nodeplace = noderightleft; + } + } + else { + int height = (heightleftheight) + break; + n->height = height; + } + } +} + +/* Insert a node into a tree. */ +void insert (node* new_node, node** ptree, cmpfn cmp ) +{ + key_t key = new_node->key; + int len = new_node -> length; + node** nodeplace = ptree; + node** stack[AVL_MAX_HEIGHT]; + int stack_count = 0; + node*** stack_ptr = &stack[0]; /* = &stack[stackcount] */ + + for (;;) { + node* n = *nodeplace; + if (n == NULL) + break; + *stack_ptr++ = nodeplace; stack_count++; + if ( cmp ( key, len, n->key, n -> length ) < 0 ) + nodeplace = &n->left; + else + nodeplace = &n->right; + } + new_node->left = NULL; + new_node->right = NULL; + new_node->height = 1; + *nodeplace = new_node; + rebalance(stack_ptr,stack_count); +} + +#if 0 + +/* + * We do not need delete functionality in Fresh Eye, keep it here + * "for completeness". + */ + +/* Removes a node out of a tree. */ +void delete (node* node_to_delete, node** ptree, cmpfn cmp ) +{ + key_t key = node_to_delete->key; + int len = node_to_delete -> length; + node** nodeplace = ptree; + node** stack[AVL_MAX_HEIGHT]; + int stack_count = 0; + node*** stack_ptr = &stack[0]; /* = &stack[stackcount] */ + node** nodeplace_to_delete; + + for (;;) { + node* n = *nodeplace; + assert ( n ); /* node_to_delete must be found */ + *stack_ptr++ = nodeplace; stack_count++; + if ( !cmp ( key, len, n->key, n -> length ) ) + break; + if ( cmp ( key, len, n->key, n -> length ) < 0 ) + nodeplace = &n->left; + else + nodeplace = &n->right; + } + nodeplace_to_delete = nodeplace; + /* Have to t emove node_to_delete = *nodeplace_to_delete. */ + if (node_to_delete->left == NULL) { + *nodeplace_to_delete = node_to_delete->right; + stack_ptr--; stack_count--; + } else { + node*** stack_ptr_to_delete = stack_ptr; + node** nodeplace = &node_to_delete->left; + node* n; + for (;;) { + n = *nodeplace; + if (n->right == NULL) + break; + *stack_ptr++ = nodeplace; stack_count++; + nodeplace = &n->right; + } + *nodeplace = n->left; + /* n replaces node_to_delete */ + n->left = node_to_delete->left; + n->right = node_to_delete->right; + n->height = node_to_delete->height; + *nodeplace_to_delete = n; /* replace node_to_delete */ + *stack_ptr_to_delete = &n->left; /* replace &node_to_delete->left */ + } + rebalance(stack_ptr,stack_count); +} + +#endif diff --git a/src/ce.c b/src/ce.c new file mode 100644 index 0000000..cb5f4ac --- /dev/null +++ b/src/ce.c @@ -0,0 +1,161 @@ +/* + * Cyrillic Encoding, a program for conversion between Cyrillic code pages + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: ce.c,v 1.4 2002/06/27 00:44:00 vadimp Exp $ + */ + +#include +#include +#include + +#include "config.h" +#include "cyrillic.h" +#include "fe.h" + +int input_codepage = CYR_CP_DEFAULT; +int output_codepage = CYR_CP_DEFAULT; +int strict = 0; + +void usage ( void ) { + + printf ( + + "Usage: ce [options]\n" + "Converts standard input from one Cyrillic code page to another.\n\n" + + "-i cp, --input-codepage Set Cyrillic code page of input to cp\n" + "-o cp, --output-codepage Set Cyrillic code page of output to cp\n" + "-s, --strict Ignores everything but ASCII and\n" + " Cyrillic letters valid for the input\n" + " code page\n" + "-h, -?, --help Display this help and exit\n" + "-v, --version Display version information and exit\n\n" + + "cp can be one of the " + "following:\n\n" + "koi8-r -- KOI8-R (default on UNIX-compatible platforms)\n" + "cp866 -- MS-DOS CP866 (aka 'alternative', default on Win32 " + "platforms)\n" + "cp1251 -- Windows CP1251\n" + "mac -- Cyrillic encoding used on Apple Macintosh\n" + "iso8859-5 -- ISO 8859-5\n\n" + + "Report bugs to Vadim Penzin \n" + "Please make sure there are words 'Fresh Eye' in the Subject: line\n" + + ); + + exit ( 0 ); + +} + +void version ( void ) { + + printf ( + + "ce ("PACKAGE" version "VERSION" ("PLATFORM" ["CYR_CP_NAME"]))\n" + "Copyright (C) 1999 OnMind Systems.\n" + "Fresh Eye is distributed in the hope that it will be useful,\n" + "but THERE IS ABSOLUTELY NO WARRANTY OF ANY KIND for this software.\n" + "You may redistribute copies of Fresh Eye\n" + "under the terms of the GNU General Public License.\n" + "For more information, see the file named COPYING.\n" + + ); + + exit ( 0 ); +} + +int parse_command_line ( int argc, char* argv [] ) { + + static const char* options = "i:o:shv?"; + int option_index; + static struct option long_options [] = { + { "input-codepage", 1, NULL, 'i' }, + { "output-codepage", 1, NULL, 'o' }, + { "strict", 0, NULL, 's' }, + { "help", 0, NULL, 'h' }, + { "version", 0, NULL, 'v' }, + { NULL, 0, NULL, 0 } + }; + int ch; + + while ( (ch = getopt_long ( argc, argv, options, long_options, + &option_index )) != -1 ) + switch ( ch ) { + case 'i': + input_codepage = + cyr_codepage_by_name ( optarg ); + break; + + case 'o': + output_codepage = + cyr_codepage_by_name ( optarg ); + break; + + case 's': + strict = 1; + break; + + case '?': + case 'h': + usage (); + + case 'v': + version (); + + } + + return optind; +} + +int main ( int argc, char* argv [] ) { + + int ch; + + parse_command_line ( argc, argv ); + + if ( input_codepage == CYR_CP_UNDEFINED ) { + fprintf ( stderr, "ce: Bad input code page specified\n" ); + return -1; + } + + if ( output_codepage == CYR_CP_UNDEFINED ) { + fprintf ( stderr, "ce: Bad output code page specified\n" ); + return -1; + } + + while ( (ch = getchar ()) != EOF ) + if ( !cyr_isletter_ex ( (cyr_letter) ch, input_codepage ) ) { + if ( !strict ) + putchar ( translate_special_character ( + output_codepage, input_codepage, ch ) ); + else { + if ( ch < 0x80 ) + putchar ( ch ); + } + } + else { + int ord = cyr_ord_ex ( (cyr_letter) ch, + input_codepage ); + putchar ( cyr_chr_ex ( ord, output_codepage ) ); + } + + return 0; +} + diff --git a/src/config-win32.h b/src/config-win32.h new file mode 100644 index 0000000..6fd0233 --- /dev/null +++ b/src/config-win32.h @@ -0,0 +1,49 @@ +/* src/config.h. Generated automatically by configure. */ +/* src/config.h.in. Generated automatically from configure.in by autoheader. */ + +/* Define to empty if the keyword does not work. */ +/* #undef const */ + +/* Define to `int' if doesn't define. */ +/* #undef pid_t */ + +/* Define to `unsigned' if doesn't define. */ +/* #undef size_t */ + +/* Define if you have the ANSI C header files. */ +#define STDC_HEADERS 1 + +/* Define if you have the strdup function. */ +#define HAVE_STRDUP 1 + +/* Define if you have the strerror function. */ +#define HAVE_STRERROR 1 + +/* Define if you have the strstr function. */ +#define HAVE_STRSTR 1 + +/* Define if you have the header file. */ +/* #define HAVE_UNISTD_H */ + +/* Define if you have the header file. */ +#define HAVE_STRING_H 1 + +/* Define if you have the m library (-lm). */ +#define HAVE_LIBM 1 + +/* Name of package */ +#define PACKAGE "fe" + +/* Version number of package */ +#define VERSION "1.3.7" + +/* Specifies platform type */ +#define PLATFORM "i386-microsoft-win32" + +/* Default Cyrillic code page */ +#define CYR_CP_DEFAULT CYR_CP_CP866 + +/* The name of default Cyrillic code page */ +#define CYR_CP_NAME "cp866" + +#define __inline diff --git a/src/context.c b/src/context.c new file mode 100644 index 0000000..10dde3d --- /dev/null +++ b/src/context.c @@ -0,0 +1,401 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: context.c,v 1.4 2002/06/21 00:53:13 vadimp Exp $ + */ + +#include +#include +#include +#include + +#include "config.h" +#include "fe.h" +#include "reader.h" +#include "context.h" +#include "wrappers.h" +#include "cyrillic.h" +#include "util.h" + +static word* wrd_init ( line* line, int start, int length, + int delimiters, int proper ) { + + word* this = NULL; + + assert ( line ); + assert ( start >= 0 ); + assert ( length > 0 ); + + this = xmalloc ( sizeof ( word ) ); + this -> line = line; + this -> line -> refcnt ++; + this -> position = start; + this -> next = NULL; + this -> text = strndup ( line -> text + start, length ); + unify_word ( this -> text ); + this -> logical = xmalloc ( length + 1 ); + convert_to_logical ( this -> logical, this -> text ); + this -> length = length; + this -> delimiters = delimiters; + this -> proper = proper; + + return this; +} + +static void wrd_free ( word* this ) { + + if ( this == NULL ) + return; + + if ( this -> text ) + free ( this -> text ); + + if ( this -> logical ) + free ( this -> logical ); + + wrd_free ( this -> next ); + free ( this ); +} + +static __inline void wrd_free_one ( word* this ) { + this -> next = NULL; + wrd_free ( this ); +} + +static line* line_init ( const char* text, unsigned long number ) { + + line* this = NULL; + + assert ( text ); + assert ( number > 0 ); + + this = xmalloc ( sizeof ( line ) ); + this -> text = xstrdup ( text ); + this -> length = strlen ( text ); + this -> number = number; + this -> refcnt = 0; + this -> next = NULL; + + return this; +} + +static void line_free ( line* this ) { + + if ( this == NULL ) + return; + + if ( this -> text ) + free ( this -> text ); + line_free ( this -> next ); + free ( this ); +} + +static __inline void line_free_one ( line* this ) { + this -> next = NULL; + line_free ( this ); +} + +static context* ctx_fetch_line ( context* this ) { + + line* l = NULL; + const char* s = NULL; + + assert ( this ); + + s = rdr_gets ( this -> rdr ); + if ( s == NULL ) + return NULL; /* EOF */ + this -> total_lines ++; + l = line_init ( s, this -> total_lines ); + this -> cp = l -> text; + if ( this -> lines_head == NULL ) + this -> lines_tail = this -> lines_head = l; + else + this -> lines_tail = this -> lines_tail -> next = l; + + this -> line_cnt ++; + + return this; +} + +static context* ctx_remove_topmost_word ( context* this ) { + + word* w = NULL; + line* l = NULL; + + assert ( this ); + assert ( this -> words_head ); + + /* Detach topmost word from the list and free it */ + w = this -> words_head; + this -> words_head = this -> words_head -> next; + w -> line -> refcnt --; + wrd_free_one ( w ); + this -> word_cnt --; + + /* Check if some unreferenced lines can be removed */ + l = this -> lines_head; + while ( l && !l -> refcnt ) { + line* tmp = l -> next; + line_free_one ( l ); + this -> line_cnt --; + l = tmp; + } + this -> lines_head = l; + + return this; +} + +static word* ctx_make_new_word ( context* this ) { + + int start; + int end; + + start = this -> cp - this -> lines_tail -> text; + while ( cyr_isletter ( *(this -> cp) ) ) + this -> cp ++; + end = this -> cp - this -> lines_tail -> text; + return wrd_init ( this -> lines_tail, start, end - start, + this -> delimiters, this -> proper ); +} + +static int count_delimiters ( context* this ) { + + int res = 0; + + if ( this -> immediate_new && this -> new_sentence ) + this -> immediate_new = this -> new_sentence = 0; + + switch ( *(this -> cp) ) { + + case ' ': + if ( !this -> spaces ) { + res ++; + this -> spaces = 1; + } + break; + + case ',': + res += 2; + break; + + case '.': + res += 4; + this -> new_sentence = 1; + break; + + case ';': + res += 3; + break; + + case ':': + res += 3; + break; + + case '!': + res += 4; + this -> new_sentence = 1; + break; + + case '?': + res += 4; + this -> new_sentence = 1; + break; + + case ')': + res += 3; + break; + + case '(': + res += 3; + break; + + case '"': + res += 3; + this -> new_sentence = 1; + /* To switch new_sentence off if the word doesn't + * follow immediately */ + this -> immediate_new = 1; + break; + + case '-': + if ( this -> spaces ) + res += 3; + else + res ++; + break; + + case '^': + break; + + default: + if ( !this -> spaces ) { + res ++; + this -> spaces = 1; + } + break; + } + + return res; + +} + +/* + * Skips to a word or end-of-string in the current string. + * Returns nonzero if a word found + */ +__inline int ctx_skip_to_word ( context* this ) { + + assert ( this ); + + while ( this -> cp + && *(this -> cp) + && !cyr_isletter ( *(this -> cp) ) ) { + this -> delimiters += count_delimiters ( this ); + this -> cp ++; /* skip to a letter or end-of-string */ + } + + return this -> cp && *this -> cp; +} + +/* + * Searches for next Russian word in input. If a new word found, this -> cp + * points to its first letter. + * Returns this if successful, otherwise (EOF or I/O error) NULL; + */ +__inline context* ctx_search_for_word ( context* this ) { + + int newline = 0; + + while ( !ctx_skip_to_word ( this ) ) { + if ( ctx_fetch_line ( this ) == NULL ) + return NULL; + if ( !newline ) + newline = 1; + else + this -> par = 1; + } + + return this; +} + +context* ctx_init ( const char* path, const int width, int codepage ) { + + context* this = NULL; + + assert ( path ); + assert ( width > 0 ); + + this = xmalloc ( sizeof ( context ) ); + memset ( this, 0, sizeof ( context ) ); + + this -> width = width; + this -> path = xstrdup ( path ); + this -> f = xfopen ( path, "r" ); + /* The only reasonable guess about maximal line length is 80 */ + this -> rdr = rdr_init ( this -> f, 80, codepage ); + + this -> new_sentence = 1; + + return this; +} + +void ctx_free ( context* this ) { + + assert ( this ); + + if ( this -> rdr ) + rdr_free ( this -> rdr ); + + xfclose ( this -> f ); + + if ( this -> cur_word ) + wrd_free ( this -> cur_word ); + + if ( this -> words_head ) + wrd_free ( this -> words_head ); + + if ( this -> lines_head ) + line_free ( this -> lines_head ); + + free ( this ); +} + +/* + * Shifts context by feeding the current word into word list and fetching a + * new word from the source file. + * Returns its actual parameter if successful, otherwise NULL. + */ +context* ctx_shift ( context* this ) { + + word* w = NULL; + + assert ( this ); + + this -> proper = this -> spaces = this -> delimiters = 0; + + if ( ctx_search_for_word ( this ) == NULL ) + return NULL; /* EOF or I/O error */ + + if ( this -> par ) { + this -> delimiters += 8; + this -> par = 0; + } + + this -> immediate_new = 0; + if ( this -> new_sentence ) + this -> new_sentence = 0; + else + this -> proper = cyr_isletter ( *(this -> cp) ) + && cyr_isletter ( *(this -> cp + 1) ) + && cyr_iscap ( *(this -> cp) ) + && cyr_islow ( *(this -> cp + 1) ); + + w = ctx_make_new_word ( this ); + if ( this -> cur_word ) + this -> cur_word -> delimiters = w -> delimiters; + this -> total_words ++; + + /* Place current word to the end of the list of words */ + if ( this -> words_head == NULL ) + this -> words_tail = this -> words_head = this -> cur_word; + else + this -> words_tail = + this -> words_tail -> next = this -> cur_word; + + /* Increase context width, remove topmost word if + * context's capacity is exceeded */ + if ( this -> word_cnt ++ == this -> width ) + ctx_remove_topmost_word ( this ); + + /* Make new word current */ + this -> cur_word = w; + + return this; +} + +context* ctx_skip_lines ( context* this, unsigned long n ) { + + if ( !(this -> cp = rdr_skip ( this -> rdr, n )) ) + return NULL; + this -> total_lines += n; + this -> cp = NULL; + + return this; +} + diff --git a/src/context.h b/src/context.h new file mode 100644 index 0000000..29599ec --- /dev/null +++ b/src/context.h @@ -0,0 +1,136 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: context.h,v 1.2 2002/06/07 03:45:54 vadimp Exp $ + */ + +/* + * Context is rather a complex object. Its main goal is to maintain a + * "sliding window" on the input file. This window has a user-defined size + * measured in the number of words that can be seen through it. + * + * Basically context consists of the following: + * + * - List of words -- it stores the words that user is able to see through + * the "sliding window" at any instant of context's life time. + * Physically it is a simple FIFO queue implemented as a linked list + * of 'word' structures. + * + * - List of lines -- it stores the lines of input file where the words + * of context (both those found in the list of words and the current + * word) appear. When a line doesn't contain words but there are words + * (currently available through the context) before and/or after it, the + * line is also kept in this list. Similarly, list of lines is a FIFO + * queue implemented as a linked list of 'line' structures. + * + * - Current word - one 'word' structure representing the current word. + * + * Each 'word' structure points to a 'line' structure so the user can determine + * the origin of a word. Each 'line' structure contains reference counter + * showing the number of words pointing to a specific line. + * + * The only operation supported by the context is shifting. When context is + * being shifted, current word (if any) is inserted at the bottom of the word + * list and the next word found in the input file becomes the current. If + * context's capacity is exceeded during this operation, the topmost word is + * removed from the word list and reference counter of the corresponding line + * is decremented. Unreferenced lines (those having reference counter equal to + * zero) are *not* removed from the list of lines until they are at the top of + * it. Besides shifting and handling words-lines relationship, context does + * a lot of work by computing everything that context's user may (or may not :) + * need -- lengths of words and lines, position of words in lines etc. + */ + +typedef struct line { + char* text; /* Character data as it appears in the input file */ + unsigned long number; /* Line number in the input file */ + int length; /* Line length */ + int refcnt; /* Reference count */ + struct line* next; /* Next line in the list */ +} line; + +typedef struct word { + line* line; /* Line on which the word appears */ + char* text; /* The word in unified representation */ + char* logical; /* The word in logical representation */ + int position; /* Start position in the line */ + int length; /* Length of the word */ + int delimiters; /* Count of delimiters coming after this word */ + int proper; /* Proper name */ + struct word* next; /* Next word in the list */ +} word; + +typedef struct { + + int width; /* Maximal number of words in the context */ + + line* lines_head; /* Head of the list of lines */ + line* lines_tail; /* Tail of the list of lines */ + int line_cnt; /* Count of entries in the list of lines */ + + struct word* words_head; /* Head of the list of context's words */ + struct word* words_tail; /* Tail of the list of context's words */ + int word_cnt; /* Count of words in the words' list */ + + unsigned long total_lines; /* Total line count */ + unsigned long total_words; /* Total word count */ + + const char* path; + FILE* f; /* Input file */ + reader* rdr; /* Line reader */ + + struct word* cur_word; /* Current word */ + const char* cp; /* Current position in the current line */ + + /* + * Various state flags and counters of context's automaton. + * This is taken as is from fe 1.2.1, to ensure that the + * functionality remains the same, probably it should be rewritten. + */ + int new_sentence; /* Looking for a new sentence */ + int immediate_new; /* Immediate speech */ + int proper; /* Proper name was detected */ + int spaces; /* 'Spaces' mode -- running spaces counted as one space */ + int par; /* Last line(s) contain(s) no words */ + int delimiters; /* Delimiters' counter. Passed on to word */ + +} context; + +/* + * Allocates and initializes a new context with the path to the input file and + * context width -- maximal number of the words in context. + * Returns pointer to the new context if successful, otherwise aborts. + */ +context* ctx_init ( const char* path, const int width, int codepage ); + +/* + * Frees memory allocated for context + */ +void ctx_free ( context* this ); + +/* + * Shifts context. + * Returns this if successful, otherwise (EOF) NULL. + */ +context* ctx_shift ( context* this ); + +/* + * Skips lines in input file. + * Returns this if successful, otherwise (EOF) NULL. + */ +context* ctx_skip_lines ( context* this, unsigned long n ); diff --git a/src/cyrillic.c b/src/cyrillic.c new file mode 100644 index 0000000..803ee07 --- /dev/null +++ b/src/cyrillic.c @@ -0,0 +1,284 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: cyrillic.c,v 1.1.1.1 2000/10/17 01:16:55 vadimp Exp $ + */ + +#include +#include +#include +#include + +#include "cyrillic.h" + +static int* code_table = NULL; +static int encoding = CYR_CP_UNDEFINED; + +extern int code_table_koi8_r []; +extern int code_table_cp866 []; +extern int code_table_cp1251 []; +extern int code_table_mac []; +extern int code_table_iso_8859_5 []; +extern int code_table_unicode []; +extern int ord2chr [] [ 6 ]; + +static int* select_codepage ( int codepage ); + +#define CODE_TABLE( x ) (code_table [ (x) - EXTENDED_ASCII_OFFSET ]) +#define CODE_TABLE_EX( x, cp ) (select_codepage ( cp ) [ (x) - EXTENDED_ASCII_OFFSET ]) + +static int* select_codepage ( int codepage ) { + + switch ( codepage ) { + + case CYR_CP_KOI8_R: + return code_table_koi8_r; + + case CYR_CP_CP866: + return code_table_cp866; + + case CYR_CP_CP1251: + return code_table_cp1251; + + case CYR_CP_MAC: + return code_table_mac; + + case CYR_CP_ISO_8859_5: + return code_table_iso_8859_5; + + case CYR_CP_UNICODE: + return code_table_unicode; + + } + + return NULL; + +} + +int cyr_set_default_codepage ( int codepage ) { + + switch ( codepage ) { + + case CYR_CP_KOI8_R: + code_table = select_codepage ( encoding = CYR_CP_KOI8_R ); + break; + + case CYR_CP_CP866: + code_table = select_codepage ( encoding = CYR_CP_CP866 ); + break; + + case CYR_CP_CP1251: + code_table = select_codepage ( encoding = CYR_CP_CP1251 ); + break; + + case CYR_CP_MAC: + code_table = select_codepage ( encoding = CYR_CP_MAC ); + break; + + case CYR_CP_ISO_8859_5: + code_table = select_codepage ( encoding = CYR_CP_ISO_8859_5 ); + break; + + case CYR_CP_UNICODE: + code_table = select_codepage ( encoding = CYR_CP_UNICODE ); + break; + + default: + code_table = select_codepage ( encoding = CYR_CP_UNDEFINED ); + + } + + return code_table != NULL; + +} + +int cyr_get_default_codepage ( void ) { + + return encoding; + +} + +int cyr_ord ( cyr_letter letter ) { + + assert ( cyr_isletter ( letter ) ); + + return CODE_TABLE ( letter ); + +} + +int cyr_ord_ex ( cyr_letter letter, int codepage ) { + + assert ( cyr_isletter_ex ( letter, codepage ) ); + + return select_codepage ( codepage ) != NULL ? + CODE_TABLE_EX ( letter, codepage ) : CYR_NON_LETTER; + +} + +cyr_letter cyr_chr ( int index ) { + + assert ( code_table != NULL ); + assert ( index >= 0 && index < CYR_CHARACTER_COUNT ); + + return ord2chr [ index ] [ encoding - 1 ]; +} + +cyr_letter cyr_chr_ex ( int index, int codepage ) { + + return ord2chr [ index ] [ codepage - 1 ]; + +} + +int cyr_isletter ( cyr_letter letter ) { + + return letter >= EXTENDED_ASCII_OFFSET + && CODE_TABLE ( letter ) != CYR_NON_LETTER; + +} + +int cyr_isletter_ex ( cyr_letter letter, int codepage ) { + + return select_codepage ( codepage ) ? + letter >= EXTENDED_ASCII_OFFSET + && CODE_TABLE_EX ( letter, codepage ) != CYR_NON_LETTER : 0; + +} + +int cyr_iscap ( cyr_letter letter ) { + + assert ( cyr_isletter ( letter ) ); + + return CODE_TABLE ( letter ) < CYR_LETTER_COUNT; + +} + +int cyr_iscap_ex ( cyr_letter letter, int codepage ) { + + assert ( cyr_isletter_ex ( letter, codepage ) ); + + return select_codepage ( codepage ) != NULL ? + CODE_TABLE_EX ( letter, codepage ) < CYR_LETTER_COUNT : 0; + +} + +int cyr_islow ( cyr_letter letter ) { + + assert ( cyr_isletter ( letter ) ); + + return CODE_TABLE ( letter ) >= CYR_LETTER_COUNT; + +} + +int cyr_islow_ex ( cyr_letter letter, int codepage ) { + + assert ( cyr_isletter_ex ( letter, codepage ) ); + + return select_codepage ( codepage ) != NULL ? + CODE_TABLE_EX ( letter, codepage ) >= CYR_LETTER_COUNT : 0; + +} + +cyr_letter cyr_downc ( cyr_letter letter ) { + + assert ( cyr_isletter ( letter ) ); + + return cyr_islow ( letter ) ? letter : + cyr_chr ( CODE_TABLE ( letter ) + CYR_LETTER_COUNT ); + +} + +cyr_letter cyr_downc_ex ( cyr_letter letter, int codepage ) { + + assert ( cyr_isletter_ex ( letter, codepage ) ); + + if ( select_codepage ( codepage ) == NULL ) + return 0; + + return cyr_islow_ex ( letter, codepage ) ? letter : + cyr_chr_ex ( CODE_TABLE_EX ( letter, codepage ) + + CYR_LETTER_COUNT, codepage ); + +} + +cyr_letter cyr_upc ( cyr_letter letter ) { + + assert ( cyr_isletter ( letter ) ); + + return cyr_iscap ( letter ) ? letter : + cyr_chr ( CODE_TABLE ( letter ) - CYR_LETTER_COUNT ); + +} + +cyr_letter cyr_upc_ex ( cyr_letter letter, int codepage ) { + + assert ( cyr_isletter_ex ( letter, codepage ) ); + + if ( select_codepage ( codepage ) == NULL ) + return 0; + + return cyr_iscap_ex ( letter, codepage ) ? letter : + cyr_chr_ex ( CODE_TABLE_EX ( letter, codepage ) - + CYR_LETTER_COUNT, codepage ); + +} + +int cyr_codepage_by_name ( const char* s ) { + + static const struct { + const char* name; + int codepage; + } codepages [] = { + { "koi8-r", CYR_CP_KOI8_R }, + { "cp866", CYR_CP_CP866 }, + { "cp1251", CYR_CP_CP1251 }, + { "mac", CYR_CP_MAC }, + { "iso8859-5", CYR_CP_ISO_8859_5 }, + { NULL, 0 } + }; + + int i; + + for ( i = 0; codepages [ i ].name; i ++ ) + if ( !strcmp ( s, codepages [ i ].name ) ) + return codepages [ i ].codepage; + + return CYR_CP_UNDEFINED; +} + +int translate_special_character ( int dst_codepage, int src_codepage, int ch ) +{ + static const int xlat [ 6 ] [ 6 ] = + { + { '"', 0xab, '"', 0xc7, '"', 0xab }, /* opening Russian quote */ + { '"', 0xbb, '"', 0xc8, '"', 0xbb }, /* closing Russian quote */ + { 0xbe, 0xb9, 0xfc, 0xdc, 0xf0, 0x2116 }, /* numero sign */ + { '-', 0x96, '-', 0xd0, '-', 0x2013 }, /* en-dash */ + { '-', 0x97, '-', 0xd1, '-', 0x2014 }, /* em-dash */ + { 0x9a, 0xa0, 0xff, 0xca, 0xa0, 0xa0 } /* non-breaking space */ + }; + int i; + + if ( ch < 0x80 ) + return ch; + + for ( i = 0; i < 6; i ++ ) + if ( ch == xlat [ i ] [ src_codepage - 1 ] ) + return xlat [ i ] [ dst_codepage - 1 ]; + + return ch; +} diff --git a/src/cyrillic.h b/src/cyrillic.h new file mode 100644 index 0000000..04a2ff6 --- /dev/null +++ b/src/cyrillic.h @@ -0,0 +1,57 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: cyrillic.h,v 1.2 2002/06/07 03:45:54 vadimp Exp $ + */ + +#define CYR_LETTER_COUNT 33 +#define CYR_CHARACTER_COUNT CYR_LETTER_COUNT * 2 +#define CYR_NON_LETTER CYR_CHARACTER_COUNT + 1 +#define EXTENDED_ASCII_OFFSET 0x80 + +enum +{ + CYR_CP_UNDEFINED, + CYR_CP_KOI8_R, + CYR_CP_CP1251, + CYR_CP_CP866, + CYR_CP_MAC, + CYR_CP_ISO_8859_5, + CYR_CP_UNICODE +}; + +typedef unsigned char cyr_letter; + +int cyr_set_default_codepage ( int codepage ); +int cyr_get_default_codepage ( void ); +int cyr_ord ( cyr_letter letter ); +int cyr_ord_ex ( cyr_letter letter, int codepage ); +cyr_letter cyr_chr ( int index ); +cyr_letter cyr_chr_ex ( int index, int codepage ); +int cyr_isletter ( cyr_letter letter ); +int cyr_isletter_ex ( cyr_letter letter, int codepage ); +int cyr_iscap ( cyr_letter letter ); +int cyr_iscap_ex ( cyr_letter letter, int codepage ); +int cyr_islow ( cyr_letter letter ); +int cyr_islow_ex ( cyr_letter letter, int codepage ); +cyr_letter cyr_downc ( cyr_letter letter ); +cyr_letter cyr_downc_ex ( cyr_letter letter, int codepage ); +cyr_letter cyr_upc ( cyr_letter letter ); +cyr_letter cyr_upc_ex ( cyr_letter letter, int codepage ); +int cyr_codepage_by_name ( const char* s ); +int translate_special_character ( int dst_codepage, int src_codepage, int ch ); diff --git a/src/fe.c b/src/fe.c new file mode 100644 index 0000000..f0db495 --- /dev/null +++ b/src/fe.c @@ -0,0 +1,802 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: fe.koi8-r,v 1.3 2002/06/29 05:22:14 vadimp Exp $ + */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "config.h" +#include "fe.h" +#include "cyrillic.h" +#include "avl.h" +#include "wrappers.h" +#include "reader.h" +#include "context.h" +#include "util.h" +#include "ui.h" + +const char *program_name = "Fresh Eye ▒╒╔╕╗╘ ┌╖ё╚О╓"; +const char *rule = + "=================================================================="; + +unsigned long cries; /* Count of issued messages */ +unsigned long ogos; /* Count of recorded messages */ + +int twosigmasqr; /* coefficient for Gaussian distribution */ +unsigned long int first_line; /* first line of file to process */ +int not_all_words_counted; + +int context_size = 15; +int sensitivity_threshold = 500; +int wordcount_use_coefficient = 50; +int quiet_logging = 0; +int dump_wordcount = 0; +int exclude_proper_names = 1; +int resume_processing = 0; /* -1 means "ask user" */ +int cancel_processing = 0; /* Cancel processing gracefully */ +int yes_to_all = 0; /* Accept all suggestions automatically */ +char* log_path = NULL; +avl* tree = NULL; +FILE* log_file = NULL; + +int input_codepage = CYR_CP_DEFAULT; +int output_codepage = CYR_CP_DEFAULT; + +int sqr (int x); +int wordcmp ( const char * s1, int len1, const char * s2, int len2 ); +int checkvoc ( char *w1, char *w2 ); +int implen ( int x ); +int infor_same ( char *a, char *b ); +int infor_diff ( char *a, char *b ); +int simwords ( const word *a, const word *b ); +unsigned long int inf_w ( char *w, int len ); +int show ( context* ctx, word* w, int bad, int sim, int dist ); +int check ( context* ctx ); +void write_log_header ( context* ctx, FILE* f ); +char* underline ( char* s, const word* w ); +char* prepare_offscreen_buffer ( const line* l ); +void write_log_entry ( context* ctx, word* w, int bad, int sim, int dist, + FILE* f ); +void write_log_footer ( context* ctx, FILE* f ); +unsigned long check_log ( const char* logpath, const char* path ); +void set_count_coefficient ( avl* t, node* n, void* user_data ); +void print_node ( avl* t, node* n, void* user_data ); +avl* wordcount ( avl* tree, FILE* f ); +word* patch_proper_name ( word* w ); +void validate_globals ( void ); +int init ( void ); +void cleanup ( void ); +int process_file ( const char* path ); + +__inline int sqr (int x) +{ + + return x * x; + +} + +/* + * Smart comparision routine. Words having common stem considered to be equal + * strings + */ +int wordcmp ( const char* s1, int len1, const char* s2, int len2 ) +{ + int l0 = 0; + int l1 = len1; + int l2 = len2; + + if ( *s1 != *s2 + || abs ( l1 - l2 ) > 2 + || min ( l1, l2 ) <= 2 ) + return strcmp ( (const char *) s1, (const char *) s2 ); + + while ( s1 [ l0 ] && s2 [ l0 ] && s1 [ l0 ] == s2 [ l0 ] ) + l0 ++; + + if ( abs ( l1 - l2 ) == 2 + || (abs ( l1 - l2 ) == 1 && min ( l1, l2 ) <= 3) ) + if ( s1 [ l0 ] && s2 [ l0 ] ) + return s1 [ l0 ] - s2 [ l0 ]; + + return l0 > min ( l1, l2 ) / 2 ? 0 : s1 [ l0 ] - s2 [ l0 ]; +} + +int checkvoc (char *w1, char *w2) +{ + + register int t; + + for (t = 0; t < VOCSIZE; t ++) + if ((!strcmp (w1, voc [t] [0])) && (!strcmp (w2, voc [t] [1]))) + return 1; + + return 0; +} + +/* + * Psych. importance of the word x ch. long big for small words, then slowly + * lagging behind the real length + */ +int implen (int x) +{ + if (x == 2) + return 5; + return (x - sqr ((x - 1) / 6) + (int) (4.1 / (float) x)); + +} + +/* + * Calculates average quantity of information in the letters common to + * the two words. + */ +int infor_same ( char *a, char *b ) +{ + int count = 0; + int res = 0; + int beg = 1; + char *p, *pp = a; + + while (*pp) { + if ( (p = strchr (b, *pp)) != NULL ) { /* bipresent letters */ + if (beg && (p == b)) + res += inf_letters [ (int) *pp ] [ 1 ]; /* beginning of the word */ + else + res += inf_letters [ (int) *pp ] [ 0 ]; /* elsewhere */ + count ++; + } + beg = 0; + pp ++; + } + return count ? (res / count) : 0; +} + +/* + * Calculates total quantity of information in differing letters of the + * two words. + */ +int infor_diff ( char *a, char *b ) +{ + int count = 0; + int res = 0; + char *p, *pp = a; + + pp = a; /* letters in a only */ + while (*pp) { + if ((p = strchr (b, *pp)) == 0) { + if (pp == a) + res += inf_letters [ (int) *pp ] [ 1 ]; /* in the beginning of the word */ + else + res += inf_letters [ (int) *pp ] [ 0 ]; /* elsewhere */ + count ++; + } + pp ++; + } + + pp = b; /* letters in b only */ + while (*pp) { + if ((p = strchr (a, *pp)) == 0) { + if (pp == b) + res += inf_letters [ (int) *pp ] [ 1 ]; /* in the beginning of the word */ + else + res += inf_letters [ (int) *pp ] [ 0 ]; /* elsewhere */ + count ++; + } + pp ++; + } + + return count ? res : 0; +} + +/* + * Calculates similarity of words. + */ +int simwords (const word *a, const word *b) +{ + register char *tx, *ty, *ta, *tb; + char* parta = NULL; + unsigned long int res = 0; /* value to be returned */ + unsigned long int resa = 0; + int partlen; + long int prir; + int rever = 0; + int dist; + int dissimilarity_threshold = 24000; /* how much total information in + differing letters reduces res + to zero; larger values make res + more tolerant to differences */ + + if (checkvoc (a -> logical, b -> logical)) /* an exception? */ + return (0); + + if (infor_diff ( a -> logical, b -> logical ) >= dissimilarity_threshold) + /* too many too rare dissimilar letters? */ + return (0); + + if ( a -> length > b -> length ) { /* swap strings so a is always the longest */ + const word* tmp = a; + a = b; + b = tmp; + rever = 1; + } + + parta = xmalloc ( a -> length + 1 ); + + for ( partlen = 1; partlen <= a -> length; partlen ++, resa = 0 ) { + for ( ta = a -> logical; + (a -> length - (int) (ta - a -> logical)) >= partlen; + ta ++) { + strncpy ( parta, ta, partlen ); + parta [ partlen ] = '\0'; + + for ( tb = b -> logical; + partlen <= (b -> length - + (int) (tb - b -> logical)); + tb ++) { + + for ( prir = 0, tx = parta, ty = tb; + *tx != 0; + tx ++, ty ++) + prir += + sim_ch [ (int) *tx ] [ (int) *ty ]; + + if ( !prir ) + continue; + + if ( ta > a -> logical ) + prir -= (prir * + (int) (ta - a -> logical)) / + (3 * a -> length); + + if ( tb > b -> logical ) + prir -= (prir * + (int) (tb - b -> logical)) / + (3 * b -> length); + + dist = rever ? + (b -> length - + (int) (tb - b -> logical + + partlen)) + + (int) (ta - a -> logical) + : (a -> length - + (int) (ta - a -> logical + + partlen)) + + (int) (tb - b -> logical); + + if ( dist < 3 ) + prir += ((prir * (2 - dist)) / 3); + + if ( (unsigned long int) prir > resa) + resa = prir; + } + } + if (resa / partlen > 6) { + prir = resa; + dist = 3 * (a -> length + b -> length) / 8 + 1; + res += resa + prir * + (partlen - min(dist, a -> length)) / (2 * dist); + } + } + + free ( parta ); + + for (partlen = 1, resa = 0; partlen <= a -> length; partlen ++) + resa += 9 * partlen; + + res = ((res * infor_same ( a -> logical, b -> logical )) / resa); + /* allowing for the info contained in the common letters */ + res = (res * (dissimilarity_threshold - infor_diff ( a -> logical, b -> logical )))/dissimilarity_threshold; + /* decreasing by a coefficient depending on infor_diff */ + + res -= (res * (b -> length - a -> length)) / (2 * b -> length); + /* decreasing if words are too different in length */ + return (int) (res * a -> length * b -> length / + (implen (a -> length) * implen (b -> length))); + /* finally, taking into account the psychological length */ +} + +/* + * Returns information quantity in a word basing on the wordcount + */ + +__inline unsigned long int inf_w (char *w, int len ) +{ + node* n = avl_lookup ( tree, w, len ); + return n ? n -> coefficient : 1000; +} + +void help ( void ) +{ + printf ( "\n" + "█═╕╛╗Б╔ ╝╓╜Ц ╗╖ ╙╚═╒╗Х (║Ц╙╒К ╚═Б╗╜А╙╗╔), ╖═Б╔╛ 'Enter':\n\n" + "'Y': ╖═╞╗А═БЛ ╜═╘╓╔╜╜К╘ ╙╝╜Б╔╙АБ ╒ ╚╝ё-Д═╘╚ (%s);\n" + "'N': ╞Ю╝╞ЦАБ╗БЛ МБ╝ ╛╔АБ╝ ╗ ╞Ю╝╓╝╚╕╗БЛ ╞Ю╝╒╔Ю╙Ц;\n" + "'S': ╝АБ═╜╝╒╗БЛ Ю═║╝БЦ ╞Ю╝ёЮ═╛╛К;\n" + "'C': ╞╝╙═╖═БЛ ╙╝╜Б╔╙АБ ╔ИЯ Ю═╖;\n" + "'H': ╒К╒╔АБ╗ МБЦ ╞╝╓А╙═╖╙Ц.\n\n" + "█═╕═Б╗╔ ╝╓╜╝╘ ╚╗ХЛ ╙╚═╒╗Х╗ 'Enter' Ю═╒╜╝╖╜═Г╜╝ ╒К║╝ЮЦ" + " ╝╞Ф╗╗,\n" + "Ц╙═╖═╜╜╝╘ ╒ ╙╒═╓Ю═Б╜КЕ А╙╝║╙═Е.\n", + log_path ); +} + +int show ( context* ctx, word* w, int bad, int sim, int dist ) +{ + int answer = 'C'; /* Context */ + + cries ++; + + if ( quiet_logging ) + yes_to_all = 1; + + if ( !yes_to_all ) { + while ( answer == 'C' ) { + write_log_entry ( ctx, w, bad, sim, dist, stdout ); + while ( (answer = ask ( "\n┤═╞╝╛╜╗БЛ", + "NYASCH?\n" )) == 'H' + || answer == '?' || answer == '\0' ) + help (); + } + } + + if ( answer == 'S' ) { + cancel_processing = 1; + return 1; + } + + if ( yes_to_all || answer == 'Y' || answer == 'A' ) { + ogos ++; + write_log_entry ( ctx, w, bad, sim, dist, log_file ); + if ( answer == 'A' ) + yes_to_all = 1; + } + + return 0; +} + +int check ( context* ctx ) +{ + int similarity; + int badness; + double dal; + long dist; + word* w = NULL; + word* cw = ctx -> cur_word; + + for ( w = ctx -> words_head; w; w = w -> next ) { + + word* tmp = NULL; + + if ( !(similarity = simwords ( w, cw )) ) + continue; + + dist = 0; + + for ( tmp = w; tmp; tmp = tmp -> next ) + dist += tmp -> delimiters; + + for ( tmp = w -> next; tmp; tmp = tmp -> next ) + dist += tmp -> length / 3 + 1; + + if ( wordcount_use_coefficient ) { + dist *= 2000; + dist /= inf_w ( w -> logical, w -> length ) + + inf_w ( cw -> logical, w -> length ); + } + + dal = exp (((double) (- dist * dist)) / ((double) twosigmasqr)); + badness = (int) (((float) similarity) * dal); + + if ( badness > sensitivity_threshold ) + if ( show ( ctx, w, badness, similarity, (int) dist ) ) + return 1; + } + + return 0; +} + +void write_log_header ( context* ctx, FILE* f ) +{ + time_t now = time ( NULL ); + + fprintf ( f, "\n\n%s v"VERSION"\t■═╘╚: %s%s %s\n", + program_name, ctx -> path, strlen ( ctx -> path ) > 8 ? " " : "\t", + ctime ( &now ) ); + fprintf ( f, "%s\n\n", rule ); + +} + +/* + * Puts caret ('^') characters to an off-screen buffer to underline w. + */ +char* underline ( char* s, const word* w ) +{ + char* pos = s + w -> position; + int n = w -> length; + + while ( n -- ) + *pos ++ = '^'; + return s; +} + +/* + * Prepares off-screen buffer for "underlining" of words. + * It allocates a string of an appropriate length, fills it with spaces and + * then copies all tabulation characters from the original string. + * The string returned by this function must be freed by caller. + */ +char* prepare_offscreen_buffer ( const line* l ) +{ + char* buf = xmalloc ( l -> length + 1 ); + memset ( buf, ' ', l -> length ); + buf [ l -> length ] = '\0'; + strccpy ( buf, l -> text, '\t' ); + + return buf; +} + +void write_log_entry ( context* ctx, word* w, int bad, int sim, int dist, + FILE* f ) +{ + line* l = w -> line; + + fputc ( '\n', f ); + while ( l ) { + fprintf ( f, "%s\n", l -> text ); + if ( w -> line == l || ctx -> cur_word -> line == l ) { + + char* buf = prepare_offscreen_buffer ( l ); + + if ( w -> line == l ) + underline ( buf, w ); + if ( ctx -> cur_word -> line == l ) + underline ( buf, ctx -> cur_word ); + fprintf ( f, "%s\n", buf ); + free ( buf ); + } + l = l -> next; + } + + fprintf ( f, " " + "line %lu sim = %u dist = %u badness = %u\n", + ctx -> total_lines, sim, dist, bad ); +} + +void write_log_footer ( context* ctx, FILE* f ) +{ + fprintf ( f, "%s\n", rule ); + fprintf ( f, "▒БЮ╝╙: %lu █═Г═╚╝: %lu " + "▒╚╝╒: %lu ▒Ю═║═БК╒═╜╗╘: %lu ┤═╞╗А═╜╝: %lu\n", + ctx -> total_lines, first_line + 1, + ctx -> total_words, cries, ogos ); +} + +unsigned long check_log ( const char* logpath, const char* path ) +{ + static const char* filename_tag = "■═╘╚:"; + static const char* lines_format = "▒БЮ╝╙: %lu"; + + FILE* f = NULL; + reader* r = NULL; + const char* s = NULL; + unsigned long last_checked_line = 0; + + if ( (f = fopen ( logpath, "r" )) == NULL ) + return 0L; + + r = rdr_init ( f, 80, output_codepage ); + + /* Search for log header with the given file name in it */ + while ( (s = rdr_gets ( r )) != NULL ) { + + char* tagpos; + char* file_name; + + if ( strncmp ( s, program_name, strlen ( program_name ) ) ) + continue; + tagpos = strstr ( s, filename_tag ); + file_name = tagpos + strlen ( filename_tag ) + 1; + if ( tagpos && !strncmp ( file_name, path, strlen ( path ) ) ) + break; + } + + if ( s ) { /* Log header with the given filename found */ + s = rdr_skip ( r, 2 ); /* Jump to the first rule */ + while ( s && (s = rdr_gets ( r )) && strcmp ( s, rule ) ) + ; /* Does nothing */ + if ( s && (s = rdr_gets ( r )) ) + sscanf ( s, lines_format, &last_checked_line ); + } + + rdr_free ( r ); + xfclose ( f ); + + return last_checked_line; +} + +void set_count_coefficient ( avl* t, node* n, void* user_data ) +{ + unsigned long word_count = * (unsigned long *) user_data; + + if ( n -> count == 1 ) + n -> coefficient = 1000; + else { + double tmp = ((double) n -> count) / ((double) word_count); + /* + * Decreasing the second 8.0 will sharpen the dependance + * on the -c coefficient + */ + n -> coefficient = (unsigned long) (1 - log (tmp) * 1000 / + ((8.0 + ((float) wordcount_use_coefficient) / 8.0) * + log (2))); + if ( n -> coefficient > 1000 ) + n -> coefficient = 1000; + } +} + +void print_node ( avl* t, node* n, void* user_data ) +{ + char* s = n -> key; + FILE* f = (FILE *) user_data; + + fputc ( '"', f ); + while ( *s ) + fputc ( cyr_chr ( *s ++ - 1 + CYR_LETTER_COUNT ), f ); + fputc ( '"', f ); + fprintf ( f, "\t%lu\t%lu\n", n -> count, n -> coefficient ); +} + +avl* wordcount ( avl* tree, FILE* f ) +{ + word_reader* wr = NULL; + const char* w = NULL; + unsigned long word_count = 0; + assert ( tree ); + + wr = wrr_init ( f, input_codepage ); + + while ( (w = wrr_getw ( wr )) != NULL ) { + + int length = wrr_get_word_length ( wr ); + char* buf = strndup ( w, length ); + node* n = NULL; + + word_count ++; + unify_word ( buf ); + convert_to_logical ( buf, buf ); + + if ( (n = avl_lookup ( tree, buf, length )) != NULL ) + n -> count ++; + else + if ( avl_insert ( tree, buf, length ) == NULL ) { + free ( buf ); + not_all_words_counted = 1; + break; + } + + free ( buf ); + } + + wrr_free ( wr ); + avl_foreach ( tree, set_count_coefficient, &word_count ); + return tree; +} + +__inline word* patch_proper_name ( word* w ) +{ + strcpy ( w -> text, "╝" ); + *w -> logical = 15; + w -> logical [ 1 ] = 0; + w -> length = 1; + + return w; +} + +void validate_globals ( void ) +{ + if ( context_size < 2 ) + fatal_error ( "bad context size specified", 0 ); + + if ( sensitivity_threshold < 1 || sensitivity_threshold > 1000 ) + fatal_error ( "bad sensitivity threshold specified", 0 ); + + if ( wordcount_use_coefficient < 0 || wordcount_use_coefficient > 100 ) + fatal_error ( "bad wordcount use coefficient specified", 0 ); + + if ( !(resume_processing == -1 + || resume_processing == 0 + || resume_processing == 1) ) + fatal_error ( "cannot determine if resuming is desired", 0 ); + + if ( !wordcount_use_coefficient && dump_wordcount ) + fatal_error ( "you must enable wordcount to dump it", 0 ); + + if ( input_codepage == CYR_CP_UNDEFINED ) + fatal_error ( "invalid input code page specified", 0 ); + + if ( output_codepage == CYR_CP_UNDEFINED ) + fatal_error ( "invalid output code page specified", 0 ); + + if ( !log_path ) + log_path = xstrdup ( "fresheye.log" ); +} + +/* + * Does various initialization things. + */ +int init ( void ) +{ + int i; + + cyr_set_default_codepage ( CYR_CP_DEFAULT ); + + for ( i = 0; i < VOCSIZE; i ++ ) { + convert_to_logical ( voc [ i ] [ 0 ], voc [ i ] [ 0 ] ); + convert_to_logical ( voc [ i ] [ 1 ], voc [ i ] [ 1 ] ); + } + + twosigmasqr = 2 * sqr (context_size * 4); + + return 0; +} + +void cleanup ( void ) +{ + free ( log_path ); +} + +void count_unique_nodes ( avl* t, node* n, void* user_data ) +{ + if ( n -> count == 1 ) + (* (size_t *) user_data) ++; +} + +void map_node ( avl* tree, node* n, void* data ) +{ + node*** ppp = (node ***) data; + **ppp = n; + (*ppp) ++; +} + +void map_non_unique_node ( avl* tree, node* n, void* data ) +{ + node*** ppp = (node ***) data; + if ( n -> count == 1 ) + return; + **ppp = n; + (*ppp) ++; +} + +static int cmp_nodes ( const void* a, const void* b ) +{ + return (*(const node **) b) -> count - (*(const node **) a) -> count; +} + +void do_dump_wordcount ( avl* tree, FILE* f ) +{ + node** table = NULL; + node** pp = NULL; + size_t table_size = 0; + + assert ( tree ); + assert ( f ); + + /* Save some memory - try to count only non-unique nodes */ + avl_foreach ( tree, count_unique_nodes, &table_size ); + table_size = tree -> count - table_size ? + tree -> count - table_size : tree -> count; + + pp = table = xmalloc ( sizeof ( node *) * table_size ); + avl_foreach ( tree, tree -> count == table_size ? + map_node : map_non_unique_node, &pp ); + qsort ( table, table_size, sizeof ( node *), cmp_nodes ); + + fprintf ( log_file, "=== WORDCOUNT\nWords listed: %u\n", table_size ); + for ( pp = table; pp < table + table_size; pp ++ ) + print_node ( tree, *pp, f ); + fprintf ( log_file, "=== END WORDCOUNT\n\n"); + free ( table ); +} + +int process_file ( const char* path ) +{ + context* ctx = NULL; + FILE* f = NULL; + + cries = 0; + ogos = 0; + first_line = 0; + yes_to_all = 0; + + if ( wordcount_use_coefficient ) { + f = xfopen ( path, "r" ); + if ( (tree = avl_init ( wordcmp )) == NULL ) + fatal_error ( "memory allocation error", 0 ); + if ( !wordcount ( tree, f ) ) + fatal_error ( "memory allocation error", 0 ); + if ( not_all_words_counted ) + fprintf ( stderr, + "fe: warning: only %lu words counted\n", + tree -> count ); + xfclose ( f ); + } + + ctx = ctx_init ( path, context_size, input_codepage ); + + log_file = xfopen ( log_path, "a" ); + write_log_header ( ctx, log_file ); + + if ( wordcount_use_coefficient && dump_wordcount ) + do_dump_wordcount ( tree, log_file ); + + if ( (first_line = check_log ( log_path, path )) != 0 ) { + int answer = 0; + if ( resume_processing ) { + printf ( "! ■═╘╚ %s ╞Ю╝╒╔Ю╔╜ ╓╝ АБЮ╝╙╗ %lu.\n" + " ▐Ю╝╓╝╚╕╗БЛ (Y) ╗╚╗ ╜═Г═БЛ А╜═Г═╚═ (N)? ", + path, first_line ); + answer = whatkey ( "NY" ); + } + if ( answer == 'Y' ) + ctx_skip_lines ( ctx, first_line + 1 ); + else + first_line = 0; + } + while ( ctx_shift ( ctx ) ) { + if ( exclude_proper_names && ctx -> cur_word -> proper ) + patch_proper_name ( ctx -> cur_word ); + if ( check ( ctx ) ) + break; + } + + write_log_footer ( ctx, log_file ); + xfclose ( log_file ); + + ctx_free ( ctx ); + if ( wordcount_use_coefficient ) { + avl_free ( tree ); + tree = NULL; + } + + return 0; +} + +int main ( int argc, char* argv [] ) +{ + int argument_index = parse_command_line ( argc, argv ); + + if ( argument_index >= argc ) + fatal_error ( + "Please specify a file to process (try --help)", 0 ); + + validate_globals (); + init (); + + while ( !cancel_processing && argument_index < argc ) + process_file ( argv [ argument_index ++ ] ); + + cleanup (); + return 0; +} diff --git a/src/fe.h b/src/fe.h new file mode 100644 index 0000000..1a155ee --- /dev/null +++ b/src/fe.h @@ -0,0 +1,47 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: fe.h,v 1.3 2002/06/27 00:44:00 vadimp Exp $ + */ + +#if !defined ( _WIN32 ) || defined ( __GNUC__ ) +#define min( a, b ) ((a) < (b) ? (a) : (b)) +#endif + +#define MAXWLEN 20 /* maximum word length to be stored in wordcount */ +#define MAXWIDTH 30 /* maximum number of words checked (length of context) */ +#define CONTEXT_LINES 7 /* number of lines stored */ +#define VOCSIZE 65 /* number of exceptions */ + +extern short int sim_ch [34] [34]; /* letters' similarity map */ +extern int inf_letters [34] [3]; /* quantity of information in letters */ +extern char voc [VOCSIZE] [2][20]; /* exceptions vocabulary */ + +extern int context_size; +extern int sensitivity_threshold; +extern int wordcount_use_coefficient; +extern int quiet_logging; +extern int dump_wordcount; +extern int exclude_proper_names; +extern int resume_processing; /* -1 means "ask user" */ +extern int cancel_processing; +extern int yes_to_all; +extern char* log_path; +extern int input_codepage; +extern int output_codepage; +int wordcmp ( const char* s1, int len1, const char* s2, int len2 ); diff --git a/src/getopt.c b/src/getopt.c new file mode 100644 index 0000000..2f15d16 --- /dev/null +++ b/src/getopt.c @@ -0,0 +1,1055 @@ +/* Getopt for GNU. + NOTE: getopt is now part of the C library, so if you don't know what + "Keep this file name-space clean" means, talk to roland@gnu.ai.mit.edu + before changing it! + + Copyright (C) 1987, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97 + Free Software Foundation, Inc. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Library General Public License as + published by the Free Software Foundation; either version 2 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Library General Public License for more details. + + You should have received a copy of the GNU Library General Public + License along with the GNU C Library; see the file COPYING.LIB. If not, + write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, + Boston, MA 02111-1307, USA. */ + +/* This tells Alpha OSF/1 not to define a getopt prototype in . + Ditto for AIX 3.2 and . */ +#ifndef _NO_PROTO +#define _NO_PROTO +#endif + +#ifdef HAVE_CONFIG_H +#include +#endif + +#if !defined (__STDC__) || !__STDC__ +/* This is a separate conditional since some stdc systems + reject `defined (const)'. */ +#ifndef const +#define const +#endif +#endif + +#include + +/* Comment out all this code if we are using the GNU C Library, and are not + actually compiling the library itself. This code is part of the GNU C + Library, but also included in many other GNU distributions. Compiling + and linking in this code is a waste when using the GNU C library + (especially if it is a shared library). Rather than having every GNU + program understand `configure --with-gnu-libc' and omit the object files, + it is simpler to just do this in the source for each such file. */ + +#define GETOPT_INTERFACE_VERSION 2 +#if !defined (_LIBC) && defined (__GLIBC__) && __GLIBC__ >= 2 +#include +#if _GNU_GETOPT_INTERFACE_VERSION == GETOPT_INTERFACE_VERSION +#define ELIDE_CODE +#endif +#endif + +#ifndef ELIDE_CODE + + +/* This needs to come after some library #include + to get __GNU_LIBRARY__ defined. */ +#ifdef __GNU_LIBRARY__ +/* Don't include stdlib.h for non-GNU C libraries because some of them + contain conflicting prototypes for getopt. */ +#include +#include +#endif /* GNU C library. */ + +#ifdef VMS +#include +#if HAVE_STRING_H - 0 +#include +#endif +#endif + +#if HAVE_STRING_H +#include +#endif + +#if defined (WIN32) && !defined (__CYGWIN32__) +/* It's not Unix, really. See? Capital letters. */ +#include +#define getpid() GetCurrentProcessId() +#endif + +#ifndef _ +/* This is for other GNU distributions with internationalized messages. + When compiling libc, the _ macro is predefined. */ +#ifdef HAVE_LIBINTL_H +# include +# define _(msgid) gettext (msgid) +#else +# define _(msgid) (msgid) +#endif +#endif + +/* This version of `getopt' appears to the caller like standard Unix `getopt' + but it behaves differently for the user, since it allows the user + to intersperse the options with the other arguments. + + As `getopt' works, it permutes the elements of ARGV so that, + when it is done, all the options precede everything else. Thus + all application programs are extended to handle flexible argument order. + + Setting the environment variable POSIXLY_CORRECT disables permutation. + Then the behavior is completely standard. + + GNU application programs can use a third alternative mode in which + they can distinguish the relative order of options and other arguments. */ + +#include "getopt.h" + +/* For communication from `getopt' to the caller. + When `getopt' finds an option that takes an argument, + the argument value is returned here. + Also, when `ordering' is RETURN_IN_ORDER, + each non-option ARGV-element is returned here. */ + +char *optarg = NULL; + +/* Index in ARGV of the next element to be scanned. + This is used for communication to and from the caller + and for communication between successive calls to `getopt'. + + On entry to `getopt', zero means this is the first call; initialize. + + When `getopt' returns -1, this is the index of the first of the + non-option elements that the caller should itself scan. + + Otherwise, `optind' communicates from one call to the next + how much of ARGV has been scanned so far. */ + +/* 1003.2 says this must be 1 before any call. */ +int optind = 1; + +/* Formerly, initialization of getopt depended on optind==0, which + causes problems with re-calling getopt as programs generally don't + know that. */ + +int __getopt_initialized = 0; + +/* The next char to be scanned in the option-element + in which the last option character we returned was found. + This allows us to pick up the scan where we left off. + + If this is zero, or a null string, it means resume the scan + by advancing to the next ARGV-element. */ + +static char *nextchar; + +/* Callers store zero here to inhibit the error message + for unrecognized options. */ + +int opterr = 1; + +/* Set to an option character which was unrecognized. + This must be initialized on some systems to avoid linking in the + system's own getopt implementation. */ + +int optopt = '?'; + +/* Describe how to deal with options that follow non-option ARGV-elements. + + If the caller did not specify anything, + the default is REQUIRE_ORDER if the environment variable + POSIXLY_CORRECT is defined, PERMUTE otherwise. + + REQUIRE_ORDER means don't recognize them as options; + stop option processing when the first non-option is seen. + This is what Unix does. + This mode of operation is selected by either setting the environment + variable POSIXLY_CORRECT, or using `+' as the first character + of the list of option characters. + + PERMUTE is the default. We permute the contents of ARGV as we scan, + so that eventually all the non-options are at the end. This allows options + to be given in any order, even with programs that were not written to + expect this. + + RETURN_IN_ORDER is an option available to programs that were written + to expect options and other ARGV-elements in any order and that care about + the ordering of the two. We describe each non-option ARGV-element + as if it were the argument of an option with character code 1. + Using `-' as the first character of the list of option characters + selects this mode of operation. + + The special argument `--' forces an end of option-scanning regardless + of the value of `ordering'. In the case of RETURN_IN_ORDER, only + `--' can cause `getopt' to return -1 with `optind' != ARGC. */ + +static enum +{ + REQUIRE_ORDER, PERMUTE, RETURN_IN_ORDER +} ordering; + +/* Value of POSIXLY_CORRECT environment variable. */ +static char *posixly_correct; + +#ifdef __GNU_LIBRARY__ +/* We want to avoid inclusion of string.h with non-GNU libraries + because there are many ways it can cause trouble. + On some systems, it contains special magic macros that don't work + in GCC. */ +#include +#define my_index strchr +#else + +/* Avoid depending on library functions or files + whose names are inconsistent. */ + +char *getenv (); + +static char * +my_index (str, chr) + const char *str; + int chr; +{ + while (*str) + { + if (*str == chr) + return (char *) str; + str++; + } + return 0; +} + +/* If using GCC, we can safely declare strlen this way. + If not using GCC, it is ok not to declare it. */ +#ifdef __GNUC__ +/* Note that Motorola Delta 68k R3V7 comes with GCC but not stddef.h. + That was relevant to code that was here before. */ +#if !defined (__STDC__) || !__STDC__ +/* gcc with -traditional declares the built-in strlen to return int, + and has done so at least since version 2.4.5. -- rms. */ +extern int strlen (const char *); +#endif /* not __STDC__ */ +#endif /* __GNUC__ */ + +#endif /* not __GNU_LIBRARY__ */ + +/* Handle permutation of arguments. */ + +/* Describe the part of ARGV that contains non-options that have + been skipped. `first_nonopt' is the index in ARGV of the first of them; + `last_nonopt' is the index after the last of them. */ + +static int first_nonopt; +static int last_nonopt; + +#ifdef _LIBC +/* Bash 2.0 gives us an environment variable containing flags + indicating ARGV elements that should not be considered arguments. */ + +/* Defined in getopt_init.c */ +extern char *__getopt_nonoption_flags; + +static int nonoption_flags_max_len; +static int nonoption_flags_len; + +static int original_argc; +static char *const *original_argv; + +extern pid_t __libc_pid; + +/* Make sure the environment variable bash 2.0 puts in the environment + is valid for the getopt call we must make sure that the ARGV passed + to getopt is that one passed to the process. */ +static void +__attribute__ ((unused)) +store_args_and_env (int argc, char *const *argv) +{ + /* XXX This is no good solution. We should rather copy the args so + that we can compare them later. But we must not use malloc(3). */ + original_argc = argc; + original_argv = argv; +} +text_set_element (__libc_subinit, store_args_and_env); + +# define SWAP_FLAGS(ch1, ch2) \ + if (nonoption_flags_len > 0) \ + { \ + char __tmp = __getopt_nonoption_flags[ch1]; \ + __getopt_nonoption_flags[ch1] = __getopt_nonoption_flags[ch2]; \ + __getopt_nonoption_flags[ch2] = __tmp; \ + } +#else /* !_LIBC */ +# define SWAP_FLAGS(ch1, ch2) +#endif /* _LIBC */ + +/* Exchange two adjacent subsequences of ARGV. + One subsequence is elements [first_nonopt,last_nonopt) + which contains all the non-options that have been skipped so far. + The other is elements [last_nonopt,optind), which contains all + the options processed since those non-options were skipped. + + `first_nonopt' and `last_nonopt' are relocated so that they describe + the new indices of the non-options in ARGV after they are moved. */ + +#if defined (__STDC__) && __STDC__ +static void exchange (char **); +#endif + +static void +exchange (argv) + char **argv; +{ + int bottom = first_nonopt; + int middle = last_nonopt; + int top = optind; + char *tem; + + /* Exchange the shorter segment with the far end of the longer segment. + That puts the shorter segment into the right place. + It leaves the longer segment in the right place overall, + but it consists of two parts that need to be swapped next. */ + +#ifdef _LIBC + /* First make sure the handling of the `__getopt_nonoption_flags' + string can work normally. Our top argument must be in the range + of the string. */ + if (nonoption_flags_len > 0 && top >= nonoption_flags_max_len) + { + /* We must extend the array. The user plays games with us and + presents new arguments. */ + char *new_str = malloc (top + 1); + if (new_str == NULL) + nonoption_flags_len = nonoption_flags_max_len = 0; + else + { + memcpy (new_str, __getopt_nonoption_flags, nonoption_flags_max_len); + memset (&new_str[nonoption_flags_max_len], '\0', + top + 1 - nonoption_flags_max_len); + nonoption_flags_max_len = top + 1; + __getopt_nonoption_flags = new_str; + } + } +#endif + + while (top > middle && middle > bottom) + { + if (top - middle > middle - bottom) + { + /* Bottom segment is the short one. */ + int len = middle - bottom; + register int i; + + /* Swap it with the top part of the top segment. */ + for (i = 0; i < len; i++) + { + tem = argv[bottom + i]; + argv[bottom + i] = argv[top - (middle - bottom) + i]; + argv[top - (middle - bottom) + i] = tem; + SWAP_FLAGS (bottom + i, top - (middle - bottom) + i); + } + /* Exclude the moved bottom segment from further swapping. */ + top -= len; + } + else + { + /* Top segment is the short one. */ + int len = top - middle; + register int i; + + /* Swap it with the bottom part of the bottom segment. */ + for (i = 0; i < len; i++) + { + tem = argv[bottom + i]; + argv[bottom + i] = argv[middle + i]; + argv[middle + i] = tem; + SWAP_FLAGS (bottom + i, middle + i); + } + /* Exclude the moved top segment from further swapping. */ + bottom += len; + } + } + + /* Update records for the slots the non-options now occupy. */ + + first_nonopt += (optind - last_nonopt); + last_nonopt = optind; +} + +/* Initialize the internal data when the first call is made. */ + +#if defined (__STDC__) && __STDC__ +static const char *_getopt_initialize (int, char *const *, const char *); +#endif +static const char * +_getopt_initialize (argc, argv, optstring) + int argc; + char *const *argv; + const char *optstring; +{ + /* Start processing options with ARGV-element 1 (since ARGV-element 0 + is the program name); the sequence of previously skipped + non-option ARGV-elements is empty. */ + + first_nonopt = last_nonopt = optind; + + nextchar = NULL; + + posixly_correct = getenv ("POSIXLY_CORRECT"); + + /* Determine how to handle the ordering of options and nonoptions. */ + + if (optstring[0] == '-') + { + ordering = RETURN_IN_ORDER; + ++optstring; + } + else if (optstring[0] == '+') + { + ordering = REQUIRE_ORDER; + ++optstring; + } + else if (posixly_correct != NULL) + ordering = REQUIRE_ORDER; + else + ordering = PERMUTE; + +#ifdef _LIBC + if (posixly_correct == NULL + && argc == original_argc && argv == original_argv) + { + if (nonoption_flags_max_len == 0) + { + if (__getopt_nonoption_flags == NULL + || __getopt_nonoption_flags[0] == '\0') + nonoption_flags_max_len = -1; + else + { + const char *orig_str = __getopt_nonoption_flags; + int len = nonoption_flags_max_len = strlen (orig_str); + if (nonoption_flags_max_len < argc) + nonoption_flags_max_len = argc; + __getopt_nonoption_flags = + (char *) malloc (nonoption_flags_max_len); + if (__getopt_nonoption_flags == NULL) + nonoption_flags_max_len = -1; + else + { + memcpy (__getopt_nonoption_flags, orig_str, len); + memset (&__getopt_nonoption_flags[len], '\0', + nonoption_flags_max_len - len); + } + } + } + nonoption_flags_len = nonoption_flags_max_len; + } + else + nonoption_flags_len = 0; +#endif + + return optstring; +} + +/* Scan elements of ARGV (whose length is ARGC) for option characters + given in OPTSTRING. + + If an element of ARGV starts with '-', and is not exactly "-" or "--", + then it is an option element. The characters of this element + (aside from the initial '-') are option characters. If `getopt' + is called repeatedly, it returns successively each of the option characters + from each of the option elements. + + If `getopt' finds another option character, it returns that character, + updating `optind' and `nextchar' so that the next call to `getopt' can + resume the scan with the following option character or ARGV-element. + + If there are no more option characters, `getopt' returns -1. + Then `optind' is the index in ARGV of the first ARGV-element + that is not an option. (The ARGV-elements have been permuted + so that those that are not options now come last.) + + OPTSTRING is a string containing the legitimate option characters. + If an option character is seen that is not listed in OPTSTRING, + return '?' after printing an error message. If you set `opterr' to + zero, the error message is suppressed but we still return '?'. + + If a char in OPTSTRING is followed by a colon, that means it wants an arg, + so the following text in the same ARGV-element, or the text of the following + ARGV-element, is returned in `optarg'. Two colons mean an option that + wants an optional arg; if there is text in the current ARGV-element, + it is returned in `optarg', otherwise `optarg' is set to zero. + + If OPTSTRING starts with `-' or `+', it requests different methods of + handling the non-option ARGV-elements. + See the comments about RETURN_IN_ORDER and REQUIRE_ORDER, above. + + Long-named options begin with `--' instead of `-'. + Their names may be abbreviated as long as the abbreviation is unique + or is an exact match for some defined option. If they have an + argument, it follows the option name in the same ARGV-element, separated + from the option name by a `=', or else the in next ARGV-element. + When `getopt' finds a long-named option, it returns 0 if that option's + `flag' field is nonzero, the value of the option's `val' field + if the `flag' field is zero. + + The elements of ARGV aren't really const, because we permute them. + But we pretend they're const in the prototype to be compatible + with other systems. + + LONGOPTS is a vector of `struct option' terminated by an + element containing a name which is zero. + + LONGIND returns the index in LONGOPT of the long-named option found. + It is only valid when a long-named option has been found by the most + recent call. + + If LONG_ONLY is nonzero, '-' as well as '--' can introduce + long-named options. */ + +int +_getopt_internal (argc, argv, optstring, longopts, longind, long_only) + int argc; + char *const *argv; + const char *optstring; + const struct option *longopts; + int *longind; + int long_only; +{ + optarg = NULL; + + if (optind == 0 || !__getopt_initialized) + { + if (optind == 0) + optind = 1; /* Don't scan ARGV[0], the program name. */ + optstring = _getopt_initialize (argc, argv, optstring); + __getopt_initialized = 1; + } + + /* Test whether ARGV[optind] points to a non-option argument. + Either it does not have option syntax, or there is an environment flag + from the shell indicating it is not an option. The later information + is only used when the used in the GNU libc. */ +#ifdef _LIBC +#define NONOPTION_P (argv[optind][0] != '-' || argv[optind][1] == '\0' \ + || (optind < nonoption_flags_len \ + && __getopt_nonoption_flags[optind] == '1')) +#else +#define NONOPTION_P (argv[optind][0] != '-' || argv[optind][1] == '\0') +#endif + + if (nextchar == NULL || *nextchar == '\0') + { + /* Advance to the next ARGV-element. */ + + /* Give FIRST_NONOPT & LAST_NONOPT rational values if OPTIND has been + moved back by the user (who may also have changed the arguments). */ + if (last_nonopt > optind) + last_nonopt = optind; + if (first_nonopt > optind) + first_nonopt = optind; + + if (ordering == PERMUTE) + { + /* If we have just processed some options following some non-options, + exchange them so that the options come first. */ + + if (first_nonopt != last_nonopt && last_nonopt != optind) + exchange ((char **) argv); + else if (last_nonopt != optind) + first_nonopt = optind; + + /* Skip any additional non-options + and extend the range of non-options previously skipped. */ + + while (optind < argc && NONOPTION_P) + optind++; + last_nonopt = optind; + } + + /* The special ARGV-element `--' means premature end of options. + Skip it like a null option, + then exchange with previous non-options as if it were an option, + then skip everything else like a non-option. */ + + if (optind != argc && !strcmp (argv[optind], "--")) + { + optind++; + + if (first_nonopt != last_nonopt && last_nonopt != optind) + exchange ((char **) argv); + else if (first_nonopt == last_nonopt) + first_nonopt = optind; + last_nonopt = argc; + + optind = argc; + } + + /* If we have done all the ARGV-elements, stop the scan + and back over any non-options that we skipped and permuted. */ + + if (optind == argc) + { + /* Set the next-arg-index to point at the non-options + that we previously skipped, so the caller will digest them. */ + if (first_nonopt != last_nonopt) + optind = first_nonopt; + return -1; + } + + /* If we have come to a non-option and did not permute it, + either stop the scan or describe it to the caller and pass it by. */ + + if (NONOPTION_P) + { + if (ordering == REQUIRE_ORDER) + return -1; + optarg = argv[optind++]; + return 1; + } + + /* We have found another option-ARGV-element. + Skip the initial punctuation. */ + + nextchar = (argv[optind] + 1 + + (longopts != NULL && argv[optind][1] == '-')); + } + + /* Decode the current option-ARGV-element. */ + + /* Check whether the ARGV-element is a long option. + + If long_only and the ARGV-element has the form "-f", where f is + a valid short option, don't consider it an abbreviated form of + a long option that starts with f. Otherwise there would be no + way to give the -f short option. + + On the other hand, if there's a long option "fubar" and + the ARGV-element is "-fu", do consider that an abbreviation of + the long option, just like "--fu", and not "-f" with arg "u". + + This distinction seems to be the most useful approach. */ + + if (longopts != NULL + && (argv[optind][1] == '-' + || (long_only && (argv[optind][2] || !my_index (optstring, argv[optind][1]))))) + { + char *nameend; + const struct option *p; + const struct option *pfound = NULL; + int exact = 0; + int ambig = 0; + int indfound = -1; + int option_index; + + for (nameend = nextchar; *nameend && *nameend != '='; nameend++) + /* Do nothing. */ ; + + /* Test all long options for either exact match + or abbreviated matches. */ + for (p = longopts, option_index = 0; p->name; p++, option_index++) + if (!strncmp (p->name, nextchar, nameend - nextchar)) + { + if ((unsigned int) (nameend - nextchar) + == (unsigned int) strlen (p->name)) + { + /* Exact match found. */ + pfound = p; + indfound = option_index; + exact = 1; + break; + } + else if (pfound == NULL) + { + /* First nonexact match found. */ + pfound = p; + indfound = option_index; + } + else + /* Second or later nonexact match found. */ + ambig = 1; + } + + if (ambig && !exact) + { + if (opterr) + fprintf (stderr, _("%s: option `%s' is ambiguous\n"), + argv[0], argv[optind]); + nextchar += strlen (nextchar); + optind++; + optopt = 0; + return '?'; + } + + if (pfound != NULL) + { + option_index = indfound; + optind++; + if (*nameend) + { + /* Don't test has_arg with >, because some C compilers don't + allow it to be used on enums. */ + if (pfound->has_arg) + optarg = nameend + 1; + else + { + if (opterr) + { + if (argv[optind - 1][1] == '-') + /* --option */ + fprintf (stderr, + _("%s: option `--%s' doesn't allow an argument\n"), + argv[0], pfound->name); + else + /* +option or -option */ + fprintf (stderr, + _("%s: option `%c%s' doesn't allow an argument\n"), + argv[0], argv[optind - 1][0], pfound->name); + } + nextchar += strlen (nextchar); + + optopt = pfound->val; + return '?'; + } + } + else if (pfound->has_arg == 1) + { + if (optind < argc) + optarg = argv[optind++]; + else + { + if (opterr) + fprintf (stderr, + _("%s: option `%s' requires an argument\n"), + argv[0], argv[optind - 1]); + nextchar += strlen (nextchar); + optopt = pfound->val; + return optstring[0] == ':' ? ':' : '?'; + } + } + nextchar += strlen (nextchar); + if (longind != NULL) + *longind = option_index; + if (pfound->flag) + { + *(pfound->flag) = pfound->val; + return 0; + } + return pfound->val; + } + + /* Can't find it as a long option. If this is not getopt_long_only, + or the option starts with '--' or is not a valid short + option, then it's an error. + Otherwise interpret it as a short option. */ + if (!long_only || argv[optind][1] == '-' + || my_index (optstring, *nextchar) == NULL) + { + if (opterr) + { + if (argv[optind][1] == '-') + /* --option */ + fprintf (stderr, _("%s: unrecognized option `--%s'\n"), + argv[0], nextchar); + else + /* +option or -option */ + fprintf (stderr, _("%s: unrecognized option `%c%s'\n"), + argv[0], argv[optind][0], nextchar); + } + nextchar = (char *) ""; + optind++; + optopt = 0; + return '?'; + } + } + + /* Look at and handle the next short option-character. */ + + { + char c = *nextchar++; + char *temp = my_index (optstring, c); + + /* Increment `optind' when we start to process its last character. */ + if (*nextchar == '\0') + ++optind; + + if (temp == NULL || c == ':') + { + if (opterr) + { + if (posixly_correct) + /* 1003.2 specifies the format of this message. */ + fprintf (stderr, _("%s: illegal option -- %c\n"), + argv[0], c); + else + fprintf (stderr, _("%s: invalid option -- %c\n"), + argv[0], c); + } + optopt = c; + return '?'; + } + /* Convenience. Treat POSIX -W foo same as long option --foo */ + if (temp[0] == 'W' && temp[1] == ';') + { + char *nameend; + const struct option *p; + const struct option *pfound = NULL; + int exact = 0; + int ambig = 0; + int indfound = 0; + int option_index; + + /* This is an option that requires an argument. */ + if (*nextchar != '\0') + { + optarg = nextchar; + /* If we end this ARGV-element by taking the rest as an arg, + we must advance to the next element now. */ + optind++; + } + else if (optind == argc) + { + if (opterr) + { + /* 1003.2 specifies the format of this message. */ + fprintf (stderr, _("%s: option requires an argument -- %c\n"), + argv[0], c); + } + optopt = c; + if (optstring[0] == ':') + c = ':'; + else + c = '?'; + return c; + } + else + /* We already incremented `optind' once; + increment it again when taking next ARGV-elt as argument. */ + optarg = argv[optind++]; + + /* optarg is now the argument, see if it's in the + table of longopts. */ + + for (nextchar = nameend = optarg; *nameend && *nameend != '='; nameend++) + /* Do nothing. */ ; + + /* Test all long options for either exact match + or abbreviated matches. */ + for (p = longopts, option_index = 0; p->name; p++, option_index++) + if (!strncmp (p->name, nextchar, nameend - nextchar)) + { + if ((unsigned int) (nameend - nextchar) == strlen (p->name)) + { + /* Exact match found. */ + pfound = p; + indfound = option_index; + exact = 1; + break; + } + else if (pfound == NULL) + { + /* First nonexact match found. */ + pfound = p; + indfound = option_index; + } + else + /* Second or later nonexact match found. */ + ambig = 1; + } + if (ambig && !exact) + { + if (opterr) + fprintf (stderr, _("%s: option `-W %s' is ambiguous\n"), + argv[0], argv[optind]); + nextchar += strlen (nextchar); + optind++; + return '?'; + } + if (pfound != NULL) + { + option_index = indfound; + if (*nameend) + { + /* Don't test has_arg with >, because some C compilers don't + allow it to be used on enums. */ + if (pfound->has_arg) + optarg = nameend + 1; + else + { + if (opterr) + fprintf (stderr, _("\ +%s: option `-W %s' doesn't allow an argument\n"), + argv[0], pfound->name); + + nextchar += strlen (nextchar); + return '?'; + } + } + else if (pfound->has_arg == 1) + { + if (optind < argc) + optarg = argv[optind++]; + else + { + if (opterr) + fprintf (stderr, + _("%s: option `%s' requires an argument\n"), + argv[0], argv[optind - 1]); + nextchar += strlen (nextchar); + return optstring[0] == ':' ? ':' : '?'; + } + } + nextchar += strlen (nextchar); + if (longind != NULL) + *longind = option_index; + if (pfound->flag) + { + *(pfound->flag) = pfound->val; + return 0; + } + return pfound->val; + } + nextchar = NULL; + return 'W'; /* Let the application handle it. */ + } + if (temp[1] == ':') + { + if (temp[2] == ':') + { + /* This is an option that accepts an argument optionally. */ + if (*nextchar != '\0') + { + optarg = nextchar; + optind++; + } + else + optarg = NULL; + nextchar = NULL; + } + else + { + /* This is an option that requires an argument. */ + if (*nextchar != '\0') + { + optarg = nextchar; + /* If we end this ARGV-element by taking the rest as an arg, + we must advance to the next element now. */ + optind++; + } + else if (optind == argc) + { + if (opterr) + { + /* 1003.2 specifies the format of this message. */ + fprintf (stderr, + _("%s: option requires an argument -- %c\n"), + argv[0], c); + } + optopt = c; + if (optstring[0] == ':') + c = ':'; + else + c = '?'; + } + else + /* We already incremented `optind' once; + increment it again when taking next ARGV-elt as argument. */ + optarg = argv[optind++]; + nextchar = NULL; + } + } + return c; + } +} + +int +getopt (argc, argv, optstring) + int argc; + char *const *argv; + const char *optstring; +{ + return _getopt_internal (argc, argv, optstring, + (const struct option *) 0, + (int *) 0, + 0); +} + +#endif /* Not ELIDE_CODE. */ + +#ifdef TEST + +/* Compile with -DTEST to make an executable for use in testing + the above definition of `getopt'. */ + +int +main (argc, argv) + int argc; + char **argv; +{ + int c; + int digit_optind = 0; + + while (1) + { + int this_option_optind = optind ? optind : 1; + + c = getopt (argc, argv, "abc:d:0123456789"); + if (c == -1) + break; + + switch (c) + { + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + if (digit_optind != 0 && digit_optind != this_option_optind) + printf ("digits occur in two different argv-elements.\n"); + digit_optind = this_option_optind; + printf ("option %c\n", c); + break; + + case 'a': + printf ("option a\n"); + break; + + case 'b': + printf ("option b\n"); + break; + + case 'c': + printf ("option c with value `%s'\n", optarg); + break; + + case '?': + break; + + default: + printf ("?? getopt returned character code 0%o ??\n", c); + } + } + + if (optind < argc) + { + printf ("non-option ARGV-elements: "); + while (optind < argc) + printf ("%s ", argv[optind++]); + printf ("\n"); + } + + exit (0); +} + +#endif /* TEST */ diff --git a/src/getopt.h b/src/getopt.h new file mode 100644 index 0000000..a3f75f1 --- /dev/null +++ b/src/getopt.h @@ -0,0 +1,126 @@ +/* Declarations for getopt. + Copyright (C) 1989,90,91,92,93,94,96,97 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Library General Public License as + published by the Free Software Foundation; either version 2 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Library General Public License for more details. + + You should have received a copy of the GNU Library General Public + License along with the GNU C Library; see the file COPYING.LIB. If not, + write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, + Boston, MA 02111-1307, USA. */ + +#ifdef __cplusplus +extern "C" { +#endif + +/* For communication from `getopt' to the caller. + When `getopt' finds an option that takes an argument, + the argument value is returned here. + Also, when `ordering' is RETURN_IN_ORDER, + each non-option ARGV-element is returned here. */ + +extern char *optarg; + +/* Index in ARGV of the next element to be scanned. + This is used for communication to and from the caller + and for communication between successive calls to `getopt'. + + On entry to `getopt', zero means this is the first call; initialize. + + When `getopt' returns -1, this is the index of the first of the + non-option elements that the caller should itself scan. + + Otherwise, `optind' communicates from one call to the next + how much of ARGV has been scanned so far. */ + +extern int optind; + +/* Callers store zero here to inhibit the error message `getopt' prints + for unrecognized options. */ + +extern int opterr; + +/* Set to an option character which was unrecognized. */ + +extern int optopt; + +/* Describe the long-named options requested by the application. + The LONG_OPTIONS argument to getopt_long or getopt_long_only is a vector + of `struct option' terminated by an element containing a name which is + zero. + + The field `has_arg' is: + no_argument (or 0) if the option does not take an argument, + required_argument (or 1) if the option requires an argument, + optional_argument (or 2) if the option takes an optional argument. + + If the field `flag' is not NULL, it points to a variable that is set + to the value given in the field `val' when the option is found, but + left unchanged if the option is not found. + + To have a long-named option do something other than set an `int' to + a compiled-in constant, such as set a value from `optarg', set the + option's `flag' field to zero and its `val' field to a nonzero + value (the equivalent single-letter option character, if there is + one). For long options that have a zero `flag' field, `getopt' + returns the contents of the `val' field. */ + +struct option +{ +#if defined (__STDC__) && __STDC__ + const char *name; +#else + char *name; +#endif + /* has_arg can't be an enum because some compilers complain about + type mismatches in all the code that assumes it is an int. */ + int has_arg; + int *flag; + int val; +}; + +/* Names for the values of the `has_arg' field of `struct option'. */ + +#define no_argument 0 +#define required_argument 1 +#define optional_argument 2 + +#if defined (__STDC__) && __STDC__ +#ifdef __GNU_LIBRARY__ +/* Many other libraries have conflicting prototypes for getopt, with + differences in the consts, in stdlib.h. To avoid compilation + errors, only prototype getopt for the GNU C library. */ +extern int getopt (int argc, char *const *argv, const char *shortopts); +#else /* not __GNU_LIBRARY__ */ +extern int getopt (); +#endif /* __GNU_LIBRARY__ */ +extern int getopt_long (int argc, char *const *argv, const char *shortopts, + const struct option *longopts, int *longind); +extern int getopt_long_only (int argc, char *const *argv, + const char *shortopts, + const struct option *longopts, int *longind); + +/* Internal only. Users should not call this directly. */ +extern int _getopt_internal (int argc, char *const *argv, + const char *shortopts, + const struct option *longopts, int *longind, + int long_only); +#else /* not __STDC__ */ +extern int getopt (); +extern int getopt_long (); +extern int getopt_long_only (); + +extern int _getopt_internal (); +#endif /* __STDC__ */ + +#ifdef __cplusplus +} +#endif diff --git a/src/getopt1.c b/src/getopt1.c new file mode 100644 index 0000000..4aa8de6 --- /dev/null +++ b/src/getopt1.c @@ -0,0 +1,187 @@ +/* getopt_long and getopt_long_only entry points for GNU getopt. + Copyright (C) 1987,88,89,90,91,92,93,94,96,97 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Library General Public License as + published by the Free Software Foundation; either version 2 of the + License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Library General Public License for more details. + + You should have received a copy of the GNU Library General Public + License along with the GNU C Library; see the file COPYING.LIB. If not, + write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, + Boston, MA 02111-1307, USA. */ + +#ifdef HAVE_CONFIG_H +#include +#endif + +#include "getopt.h" + +#if !defined (__STDC__) || !__STDC__ +/* This is a separate conditional since some stdc systems + reject `defined (const)'. */ +#ifndef const +#define const +#endif +#endif + +#include + +/* Comment out all this code if we are using the GNU C Library, and are not + actually compiling the library itself. This code is part of the GNU C + Library, but also included in many other GNU distributions. Compiling + and linking in this code is a waste when using the GNU C library + (especially if it is a shared library). Rather than having every GNU + program understand `configure --with-gnu-libc' and omit the object files, + it is simpler to just do this in the source for each such file. */ + +#define GETOPT_INTERFACE_VERSION 2 +#if !defined (_LIBC) && defined (__GLIBC__) && __GLIBC__ >= 2 +#include +#if _GNU_GETOPT_INTERFACE_VERSION == GETOPT_INTERFACE_VERSION +#define ELIDE_CODE +#endif +#endif + +#ifndef ELIDE_CODE + + +/* This needs to come after some library #include + to get __GNU_LIBRARY__ defined. */ +#ifdef __GNU_LIBRARY__ +#include +#endif + +#ifndef NULL +#define NULL 0 +#endif + +int +getopt_long (argc, argv, options, long_options, opt_index) + int argc; + char *const *argv; + const char *options; + const struct option *long_options; + int *opt_index; +{ + return _getopt_internal (argc, argv, options, long_options, opt_index, 0); +} + +/* Like getopt_long, but '-' as well as '--' can indicate a long option. + If an option that starts with '-' (not '--') doesn't match a long option, + but does match a short option, it is parsed as a short option + instead. */ + +int +getopt_long_only (argc, argv, options, long_options, opt_index) + int argc; + char *const *argv; + const char *options; + const struct option *long_options; + int *opt_index; +{ + return _getopt_internal (argc, argv, options, long_options, opt_index, 1); +} + + +#endif /* Not ELIDE_CODE. */ + +#ifdef TEST + +#include + +int +main (argc, argv) + int argc; + char **argv; +{ + int c; + int digit_optind = 0; + + while (1) + { + int this_option_optind = optind ? optind : 1; + int option_index = 0; + static struct option long_options[] = + { + {"add", 1, 0, 0}, + {"append", 0, 0, 0}, + {"delete", 1, 0, 0}, + {"verbose", 0, 0, 0}, + {"create", 0, 0, 0}, + {"file", 1, 0, 0}, + {0, 0, 0, 0} + }; + + c = getopt_long (argc, argv, "abc:d:0123456789", + long_options, &option_index); + if (c == -1) + break; + + switch (c) + { + case 0: + printf ("option %s", long_options[option_index].name); + if (optarg) + printf (" with arg %s", optarg); + printf ("\n"); + break; + + case '0': + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + if (digit_optind != 0 && digit_optind != this_option_optind) + printf ("digits occur in two different argv-elements.\n"); + digit_optind = this_option_optind; + printf ("option %c\n", c); + break; + + case 'a': + printf ("option a\n"); + break; + + case 'b': + printf ("option b\n"); + break; + + case 'c': + printf ("option c with value `%s'\n", optarg); + break; + + case 'd': + printf ("option d with value `%s'\n", optarg); + break; + + case '?': + break; + + default: + printf ("?? getopt returned character code 0%o ??\n", c); + } + } + + if (optind < argc) + { + printf ("non-option ARGV-elements: "); + while (optind < argc) + printf ("%s ", argv[optind++]); + printf ("\n"); + } + + exit (0); +} + +#endif /* TEST */ diff --git a/src/lingtbl.c b/src/lingtbl.c new file mode 100644 index 0000000..2441c89 --- /dev/null +++ b/src/lingtbl.c @@ -0,0 +1,172 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: lingtbl.koi8-r,v 1.2 2002/06/27 04:13:00 vadimp Exp $ + */ + +#include "fe.h" + +short int sim_ch [34] [34] = + +{ /* letters' similarity map */ + /* ═ ║ ╒ ё ╓ ╔ Я ╕ ╖ ╗ ╘ ╙ ╚ ╛ ╜ ╝ ╞ Ю А Б Ц Д Е Ф Г Х И Й К Л М Н О */ + { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}, /* ═ */ + { 0,9,0,0,0,0,1,1,0,0,1,0,0,0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,1,2}, /* ═ */ + { 0,0,9,1,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0}, /* ║ */ + { 0,0,1,9,1,0,0,0,0,0,0,0,0,1,1,1,0,1,0,0,0,1,3,0,0,0,0,0,0,0,0,0,0,0}, /* ╒ */ + { 0,0,0,1,9,0,0,0,3,0,0,0,3,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0}, /* ё */ + { 0,0,0,0,0,9,0,0,0,1,0,0,0,0,0,0,0,0,0,1,3,0,0,0,1,0,0,0,0,0,0,0,0,0}, /* ╓ */ + { 0,1,0,0,0,0,9,9,0,0,2,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,2,1,1}, /* ╔ */ + { 0,1,0,0,0,0,9,9,0,0,2,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,2,1,1}, /* Я */ + { 0,0,0,0,3,0,0,0,9,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,0,0,0,0,0,0}, /* ╕ */ + { 0,0,0,0,0,1,0,0,3,9,0,0,0,0,0,0,0,0,0,3,1,0,0,0,3,1,1,1,0,0,0,0,0,0}, /* ╖ */ + { 0,1,0,0,0,0,2,2,0,0,9,3,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,2,0,1,1,1}, /* ╗ */ + { 0,0,0,0,0,0,0,0,0,0,2,9,0,1,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}, /* ╘ */ + { 0,0,0,0,3,0,0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0}, /* ╙ */ + { 0,0,0,1,0,0,0,0,0,0,0,1,0,9,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}, /* ╚ */ + { 0,0,0,1,0,0,0,0,0,0,0,1,0,1,9,3,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}, /* ╛ */ + { 0,0,0,1,0,0,0,0,0,0,0,1,0,1,3,9,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}, /* ╜ */ + { 0,2,0,0,0,0,1,1,0,0,1,0,0,0,0,0,9,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,1,1}, /* ╝ */ + { 0,0,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,9,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0}, /* ╞ */ + { 0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,1,0,0,9,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0}, /* Ю */ + { 0,0,0,0,0,1,0,0,0,3,0,0,0,0,0,0,0,0,0,9,1,0,0,0,3,1,0,0,0,0,0,0,0,0}, /* А */ + { 0,0,0,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0,0,1,9,0,0,0,1,1,0,0,0,0,0,0,0,0}, /* Б */ + { 0,1,0,1,0,0,1,1,0,0,1,0,0,0,0,0,1,0,0,0,0,9,0,0,0,0,0,0,0,1,0,1,2,1}, /* Ц */ + { 0,0,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,9,0,0,0,0,0,0,0,0,0,0,0}, /* Д */ + { 0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,9,0,1,0,0,0,0,0,0,0,0}, /* Е */ + { 0,0,0,0,0,1,0,0,0,3,0,0,0,0,0,0,0,0,0,3,1,0,0,0,9,0,0,0,0,0,0,0,0,0}, /* Ф */ + { 0,0,0,0,0,0,0,0,3,1,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,9,3,3,0,0,0,0,0,0}, /* Г */ + { 0,0,0,0,0,0,0,0,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,9,3,0,0,0,0,0,0}, /* Х */ + { 0,0,0,0,0,0,0,0,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,9,0,0,0,0,0,0}, /* И */ + { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,0,3,0,0,0}, /* Й */ + { 0,1,0,0,0,0,1,1,0,0,2,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,9,0,1,1,1}, /* К */ + { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,9,0,0,0}, /* Л */ + { 0,1,0,0,0,0,3,3,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,9,1,1}, /* М */ + { 0,1,0,0,0,0,1,1,0,0,1,0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,0,0,0,1,0,1,9,1}, /* Н */ + { 0,2,0,0,0,0,1,1,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,1,9} /* О */ + /* ═ ║ ╒ ё ╓ ╔ Я ╕ ╖ ╗ ╘ ╙ ╚ ╛ ╜ ╝ ╞ Ю А Б Ц Д Е Ф Г Х И Й К Л М Н О */ +}; + +int inf_letters [34] [3] = { /* quantity of information in letters */ + /* relative, average = 1000 */ +/*by itself - in the beginning of a word */ + { 0, 0 }, /* dummy */ + { 802, 959 }, /* ═ */ + { 1232, 1129 }, /* ║ */ + { 944, 859 }, /* ╒ */ + { 1253, 1193 }, /* ё */ + { 1064, 951 }, /* ╓ */ + { 759, 1232 }, /* ╔ */ + { 1300, 1900 }, /* Я */ /*set manually - dk*/ + { 1432, 1432 }, /* ╕ */ + { 1193, 993 }, /* ╖ */ + { 802, 767 }, /* ╗ */ + { 1329, 1993 }, /* ╘ */ + { 1032, 929 }, /* ╙ */ + { 967, 1276 }, /* ╚ */ + { 1053, 944 }, /* ╛ */ + { 848, 711 }, /* ╜ */ + { 695, 853 }, /* ╝ */ + { 1088, 454 }, /* ╞ */ + { 929, 1115 }, /* Ю */ + { 895, 793 }, /* А */ + { 848, 1002 }, /* Б */ + { 1115, 1129 }, /* Ц */ + { 1793, 1022 }, /* Д */ + { 1259, 1329 }, /* Е */ /* [0] manually decreased! was 1359 */ + { 1593, 1393 }, /* Ф */ + { 1276, 1212 }, /* Г */ + { 1476, 1012 }, /* Х */ + { 1676, 1676 }, /* И */ + { 1993, 3986 }, /* Й */ + { 1193, 3986 }, /* К */ + { 1253, 3986 }, /* Л */ + { 1676, 1232 }, /* М */ + { 1476, 1793 }, /* Н */ + { 1159, 967 } /* О */ + }; + +char voc [VOCSIZE] [2] [ 20 ]= { /* exceptions vocabulary */ + + {"║╔╚К╛", "║╔╚╝"}, + {"║╝╚ЛХ╔", "║╝╚ЛХ╔"}, + {"║╝╚ЛХ╔", "║╝╚╔╔"}, + {"║╝╚╔╔", "║╝╚ЛХ╔"}, + {"║╝╚ЛХ╔", "╛╔╜ЛХ╔"}, + {"║К", "║К"}, + {"║К", "║К╚"}, + {"║К", "║К╚═"}, + {"║К", "║К╚╗"}, + {"║К", "║К╚╝"}, + {"║К", "╒К"}, + {"║К╚", "║К"}, + {"║К╚═", "║К"}, + {"║К╚╗", "║К"}, + {"║К╚╝", "║К"}, + {"╒╗╜К", "╒╗╜╝╒═БК╘"}, + {"╒╝╚╔╘", "╜╔╒╝╚╔╘"}, + {"╒Ю╔╛О", "╒Ю╔╛╔╜╗"}, + {"╒А╔ё╝", "╜═╒А╔ё╝"}, + {"╒К", "║К"}, + {"╓═╕╔", "Ц╕╔"}, + {"╓ЮЦё", "╓ЮЦё═"}, + {"╓ЮЦё", "╓ЮЦё╔"}, + {"╓ЮЦё", "╓ЮЦё╝╛"}, + {"╓ЮЦё", "╓ЮЦёЦ"}, + {"╓ЦЮ═╙", "╓ЦЮ═╙╝╛"}, + {"╔А╚╗", "╔А╚╗"}, + {"╖╒╝╜╙═", "╖╒╝╜╙═"}, + {"╗╚╗", "╗╚╗"}, + {"╙═╙", "Б═╙"}, + {"╙╝╜Ф╔", "╙╝╜Ф╝╒"}, + {"╙╝Ю╙╗", "╙╝Ю╙╗"}, + {"╙Б╝", "ГБ╝"}, + {"╚╗║╝", "╚╗║╝"}, + {"╛═╚╝", "╞╝╛═╚Ц"}, + {"╛╔╜ЛХ╔", "║╝╚ЛХ╔"}, + {"╜═Г═БЛ", "А╜═Г═╚═"}, + {"╜╔", "╜═"}, + {"╜╔", "╜╔"}, + {"╜╔", "╜╗"}, + {"╜╔ё╝", "╜╔Б"}, + {"╜╗", "╜═"}, + {"╜╗", "╜╔"}, + {"╜╗", "╜╗"}, + {"╜╝", "╜═"}, + {"╜╝", "╜╔"}, + {"╜╝", "╜╗"}, + {"╜╝╒К╔", "╜╝╒К╔"}, + {"╝║ЙОБЛ", "╜╔╝║ЙОБ╜╝╔"}, + {"╝╓╜╝╛Ц", "Б╝╛Ц"}, + {"╞╝╚╜К╛", "╞╝╚╜╝"}, + {"╞╝АБ╝╚Л╙Ц", "╞╝А╙╝╚Л╙Ц"}, + {"Б═╙", "╙═╙"}, + {"Б╔╛", "Г╔╛"}, + {"Б╝", "Б╝"}, + {"Б╝ё╓═", "╙╝ё╓═"}, + {"Е═", "Е═"}, + {"Г╔╛", "Б╔╛"}, + {"ГБ╝", "Б╝"}, + {"ГЦБЛ", "ГЦБЛ"}, + {"Х═ё", "Х═ё╝╛"}, + {"МБ╝╘", "ГБ╝"}, + {"МБ╝Б", "ГБ╝"}, + + {"", ""}, + {"", ""} +}; diff --git a/src/reader.c b/src/reader.c new file mode 100644 index 0000000..35e9fa3 --- /dev/null +++ b/src/reader.c @@ -0,0 +1,253 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: reader.c,v 1.3 2002/06/27 00:44:00 vadimp Exp $ + */ + +#include +#include +#include +#include + +#include "config.h" +#include "cyrillic.h" +#include "reader.h" +#include "wrappers.h" +#include "util.h" +#include "fe.h" + +/* + * Allocates and initializes a reader. Buffer length must be at least 2 + * characters long (a char + terminating zero). + * + * Returns pointer to allocated and initialized reader if successful, + * otherwise returns NULL. + */ +reader* rdr_init ( FILE* f, const int len, int codepage ) { + + reader* this = NULL; + + assert ( len > 1 ); + assert ( f ); + + this = (reader *) xmalloc ( sizeof ( reader ) ); + this -> s = (char *) xmalloc ( len ); + this -> f = f; + this -> len = len; + this -> codepage = codepage; + return this; +} + +/* + * Deallocates a reader + */ +void rdr_free ( reader* this ) { + + assert ( this ); + + if ( this -> s ) + free ( this -> s ); + + free ( this ); +} + +/* + * Seeks file f for a newline character or EOF starting from the current + * position. + * + * Returns position of newline or EOF if successful, otherwise -1L + */ +static long int seek_newline_or_eof ( FILE* f ) { + + int ch; + + assert ( f ); + + while ( (ch = fgetc ( f )) != EOF && ch != '\n' ) + ; /* does "nothing" */ + + if ( ch == EOF && !feof ( f ) ) { + perror ( "fe: fgetc () failed" ); + exit ( -1 ); + } + + return xftell ( f ); +} + +/* + * Resizes a reader's buffer to len characters. __BROKEN_REALLOC__ must be + * defined on platforms where realloc (3) damages the contents of the block + * of memory which is being re-allocated. + * + * Returns this if successful, otherwise NULL. + */ +static reader* resize_buffer ( reader* this, int len ) { + +#ifdef __BROKEN_REALLOC__ + char* s = NULL; +#endif + + assert ( this ); + assert ( this -> s ); + assert ( len > 0 ); + +#ifdef __BROKEN_REALLOC__ + s = xmalloc ( len ); + strcpy ( s, this -> s ); + free ( this -> s ); + this -> s = s; +#else + this -> s = xrealloc ( this -> s, len ); +#endif + this -> len = len; + + return this; +} + +/* + * Chops newline character off a string. + * + * Returns s if successful, otherwise (s isn't terminated with '\n') NULL. + */ +static char* chop ( char* s ) { + + char* tmp; + + assert ( s ); + + tmp = strchr ( s, '\0' ); + if ( tmp [ -1 ] == '\n' ) { + tmp [ -1 ] = '\0'; + return s; + } + + return NULL; +} + +/* + * Causes reader to read the next string from the input file. If a reader's + * buffer is shorter than the current string, it resizes the buffer and reads + * up remaining characters. + * + * Returns pointer to the internal string buffer if successful, otherwise NULL. + * It is advised to check input file for EOF condition when NULL is returned. + */ +const char* rdr_gets ( reader* this ) { + + long int eolpos; /* Newline character's position in file */ + long int curpos; /* Current position in file */ + long int gap; /* Buffer's growth value */ + int oldlen; /* Old length reader's buffer */ + + assert ( this ); + + if ( fgets ( this -> s, this -> len, this -> f ) == NULL ) + return NULL; + + if ( chop ( this -> s ) ) { + if ( this -> codepage != CYR_CP_DEFAULT ) + recode_cyrillics ( this -> s, this -> s, + CYR_CP_DEFAULT, this -> codepage ); + return this -> s; + } + + if ( feof ( this -> f ) ) + /* '\n' isn't the last character in the input file, it's Ok. */ + return this -> s; + + /* The buffer is shorter than the current line. */ + curpos = xftell ( this -> f ); + eolpos = seek_newline_or_eof ( this -> f ) - 1; + gap = eolpos - curpos + 1; + oldlen = this -> len; + + resize_buffer ( this, (int) (this -> len + gap ) ); + xfseek ( this -> f, curpos, SEEK_SET ); + + if ( fgets ( this -> s + oldlen - 1, (int) gap + 1, + this -> f ) == NULL ) + return NULL; + + chop ( this -> s ); /* It may or may not be terminated with '\n' */ + + if ( this -> codepage != CYR_CP_DEFAULT ) + recode_cyrillics ( this -> s, this -> s, CYR_CP_DEFAULT, + this -> codepage ); + return this -> s; +} + +const char* rdr_skip ( reader* this, unsigned long n ) { + + while ( n -- ) + rdr_gets ( this ); + return this -> s; +} + +word_reader* wrr_init ( FILE* f, int codepage ) { + + word_reader* this = NULL; + + assert ( f ); + + this = xmalloc ( sizeof ( word_reader ) ); + memset ( this, 0, sizeof ( word_reader ) ); + + this -> r = rdr_init ( f, 80, codepage ); + + return this; +} + +void wrr_free ( word_reader* this ) { + rdr_free ( this -> r ); + free ( this ); +} + +static __inline int wrr_scan_for_letter ( word_reader* this ) { + + while ( this -> cp + && *(this -> cp) + && !cyr_isletter ( *(this -> cp) ) ) + this -> cp ++; + + return this -> cp && *(this -> cp); +} + +static __inline int wrr_scan_for_non_letter ( word_reader* this ) { + + while ( this -> cp && *(this -> cp) && cyr_isletter ( *(this -> cp) ) ) + this -> cp ++; + + return this -> cp && *(this -> cp); +} + +const char* wrr_getw ( word_reader* this ) { + + while ( !wrr_scan_for_letter ( this ) ) + if ( !(this -> cp = rdr_gets ( this -> r )) ) + return NULL; /* EOF or I/O error */ + + this -> w = this -> cp; /* Store start position */ + wrr_scan_for_non_letter ( this ); + this -> len = this -> cp - this -> w; + + return this -> w; +} + +int wrr_get_word_length ( const word_reader* this ) { + return this -> len; +} diff --git a/src/reader.h b/src/reader.h new file mode 100644 index 0000000..f772e81 --- /dev/null +++ b/src/reader.h @@ -0,0 +1,44 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: reader.h,v 1.2 2002/06/07 03:45:54 vadimp Exp $ + */ + +typedef struct { + FILE* f; + char* s; + int len; + int codepage; +} reader; + +typedef struct { + reader* r; + const char* cp; + const char* w; + int len; +} word_reader; + +reader* rdr_init ( FILE* f, const int len, int codepage ); +void rdr_free ( reader* this ); +const char* rdr_gets ( reader* this ); +const char* rdr_skip ( reader* this, unsigned long n ); + +word_reader* wrr_init ( FILE* f, int codepage ); +void wrr_free ( word_reader* this ); +const char* wrr_getw ( word_reader* this ); +int wrr_get_word_length ( const word_reader* this ); diff --git a/src/stamp-h b/src/stamp-h new file mode 100644 index 0000000..7354eae --- /dev/null +++ b/src/stamp-h @@ -0,0 +1 @@ +timestamp diff --git a/src/stamp-h.in b/src/stamp-h.in new file mode 100644 index 0000000..9788f70 --- /dev/null +++ b/src/stamp-h.in @@ -0,0 +1 @@ +timestamp diff --git a/src/tables.c b/src/tables.c new file mode 100644 index 0000000..c9c2908 --- /dev/null +++ b/src/tables.c @@ -0,0 +1,904 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: tables.c,v 1.1.1.1 2000/10/17 01:16:56 vadimp Exp $ + */ + +#include "cyrillic.h" + +int code_table_cp866 [] = { + + 0, /* 0x80 */ + 1, /* 0x81 */ + 2, /* 0x82 */ + 3, /* 0x83 */ + 4, /* 0x84 */ + 5, /* 0x85 */ + 7, /* 0x86 */ + 8, /* 0x87 */ + 9, /* 0x88 */ + 10, /* 0x89 */ + 11, /* 0x8a */ + 12, /* 0x8b */ + 13, /* 0x8c */ + 14, /* 0x8d */ + 15, /* 0x8e */ + 16, /* 0x8f */ + 17, /* 0x90 */ + 18, /* 0x91 */ + 19, /* 0x92 */ + 20, /* 0x93 */ + 21, /* 0x94 */ + 22, /* 0x95 */ + 23, /* 0x96 */ + 24, /* 0x97 */ + 25, /* 0x98 */ + 26, /* 0x99 */ + 27, /* 0x9a */ + 28, /* 0x9b */ + 29, /* 0x9c */ + 30, /* 0x9d */ + 31, /* 0x9e */ + 32, /* 0x9f */ + 33, /* 0xa0 */ + 34, /* 0xa1 */ + 35, /* 0xa2 */ + 36, /* 0xa3 */ + 37, /* 0xa4 */ + 38, /* 0xa5 */ + 40, /* 0xa6 */ + 41, /* 0xa7 */ + 42, /* 0xa8 */ + 43, /* 0xa9 */ + 44, /* 0xaa */ + 45, /* 0xab */ + 46, /* 0xac */ + 47, /* 0xad */ + 48, /* 0xae */ + 49, /* 0xaf */ + CYR_NON_LETTER, /* 0xb0 */ + CYR_NON_LETTER, /* 0xb1 */ + CYR_NON_LETTER, /* 0xb2 */ + CYR_NON_LETTER, /* 0xb3 */ + CYR_NON_LETTER, /* 0xb4 */ + CYR_NON_LETTER, /* 0xb5 */ + CYR_NON_LETTER, /* 0xb6 */ + CYR_NON_LETTER, /* 0xb7 */ + CYR_NON_LETTER, /* 0xb8 */ + CYR_NON_LETTER, /* 0xb9 */ + CYR_NON_LETTER, /* 0xba */ + CYR_NON_LETTER, /* 0xbb */ + CYR_NON_LETTER, /* 0xbc */ + CYR_NON_LETTER, /* 0xbd */ + CYR_NON_LETTER, /* 0xbe */ + CYR_NON_LETTER, /* 0xbf */ + CYR_NON_LETTER, /* 0xc0 */ + CYR_NON_LETTER, /* 0xc1 */ + CYR_NON_LETTER, /* 0xc2 */ + CYR_NON_LETTER, /* 0xc3 */ + CYR_NON_LETTER, /* 0xc4 */ + CYR_NON_LETTER, /* 0xc5 */ + CYR_NON_LETTER, /* 0xc6 */ + CYR_NON_LETTER, /* 0xc7 */ + CYR_NON_LETTER, /* 0xc8 */ + CYR_NON_LETTER, /* 0xc9 */ + CYR_NON_LETTER, /* 0xca */ + CYR_NON_LETTER, /* 0xcb */ + CYR_NON_LETTER, /* 0xcc */ + CYR_NON_LETTER, /* 0xcd */ + CYR_NON_LETTER, /* 0xce */ + CYR_NON_LETTER, /* 0xcf */ + CYR_NON_LETTER, /* 0xd0 */ + CYR_NON_LETTER, /* 0xd1 */ + CYR_NON_LETTER, /* 0xd2 */ + CYR_NON_LETTER, /* 0xd3 */ + CYR_NON_LETTER, /* 0xd4 */ + CYR_NON_LETTER, /* 0xd5 */ + CYR_NON_LETTER, /* 0xd6 */ + CYR_NON_LETTER, /* 0xd7 */ + CYR_NON_LETTER, /* 0xd8 */ + CYR_NON_LETTER, /* 0xd9 */ + CYR_NON_LETTER, /* 0xda */ + CYR_NON_LETTER, /* 0xdb */ + CYR_NON_LETTER, /* 0xdc */ + CYR_NON_LETTER, /* 0xdd */ + CYR_NON_LETTER, /* 0xde */ + CYR_NON_LETTER, /* 0xdf */ + 50, /* 0xe0 */ + 51, /* 0xe1 */ + 52, /* 0xe2 */ + 53, /* 0xe3 */ + 54, /* 0xe4 */ + 55, /* 0xe5 */ + 56, /* 0xe6 */ + 57, /* 0xe7 */ + 58, /* 0xe8 */ + 59, /* 0xe9 */ + 60, /* 0xea */ + 61, /* 0xeb */ + 62, /* 0xec */ + 63, /* 0xed */ + 64, /* 0xee */ + 65, /* 0xef */ + 6, /* 0xf0 */ + 39, /* 0xf1 */ + CYR_NON_LETTER, /* 0xf2 */ + CYR_NON_LETTER, /* 0xf3 */ + CYR_NON_LETTER, /* 0xf4 */ + CYR_NON_LETTER, /* 0xf5 */ + CYR_NON_LETTER, /* 0xf6 */ + CYR_NON_LETTER, /* 0xf7 */ + CYR_NON_LETTER, /* 0xf8 */ + CYR_NON_LETTER, /* 0xf9 */ + CYR_NON_LETTER, /* 0xfa */ + CYR_NON_LETTER, /* 0xfb */ + CYR_NON_LETTER, /* 0xfc */ + CYR_NON_LETTER, /* 0xfd */ + CYR_NON_LETTER, /* 0xfe */ + CYR_NON_LETTER, /* 0xff */ + +}; + +int code_table_cp1251 [] = { + + CYR_NON_LETTER, /* 0x80 */ + CYR_NON_LETTER, /* 0x81 */ + CYR_NON_LETTER, /* 0x82 */ + CYR_NON_LETTER, /* 0x83 */ + CYR_NON_LETTER, /* 0x84 */ + CYR_NON_LETTER, /* 0x85 */ + CYR_NON_LETTER, /* 0x86 */ + CYR_NON_LETTER, /* 0x87 */ + CYR_NON_LETTER, /* 0x88 */ + CYR_NON_LETTER, /* 0x89 */ + CYR_NON_LETTER, /* 0x8a */ + CYR_NON_LETTER, /* 0x8b */ + CYR_NON_LETTER, /* 0x8c */ + CYR_NON_LETTER, /* 0x8d */ + CYR_NON_LETTER, /* 0x8e */ + CYR_NON_LETTER, /* 0x8f */ + CYR_NON_LETTER, /* 0x90 */ + CYR_NON_LETTER, /* 0x91 */ + CYR_NON_LETTER, /* 0x92 */ + CYR_NON_LETTER, /* 0x93 */ + CYR_NON_LETTER, /* 0x94 */ + CYR_NON_LETTER, /* 0x95 */ + CYR_NON_LETTER, /* 0x96 */ + CYR_NON_LETTER, /* 0x97 */ + CYR_NON_LETTER, /* 0x98 */ + CYR_NON_LETTER, /* 0x99 */ + CYR_NON_LETTER, /* 0x9a */ + CYR_NON_LETTER, /* 0x9b */ + CYR_NON_LETTER, /* 0x9c */ + CYR_NON_LETTER, /* 0x9d */ + CYR_NON_LETTER, /* 0x9e */ + CYR_NON_LETTER, /* 0x9f */ + CYR_NON_LETTER, /* 0xa0 */ + CYR_NON_LETTER, /* 0xa1 */ + CYR_NON_LETTER, /* 0xa2 */ + CYR_NON_LETTER, /* 0xa3 */ + CYR_NON_LETTER, /* 0xa4 */ + CYR_NON_LETTER, /* 0xa5 */ + CYR_NON_LETTER, /* 0xa6 */ + CYR_NON_LETTER, /* 0xa7 */ + 6, /* 0xa8 */ + CYR_NON_LETTER, /* 0xa9 */ + CYR_NON_LETTER, /* 0xaa */ + CYR_NON_LETTER, /* 0xab */ + CYR_NON_LETTER, /* 0xac */ + CYR_NON_LETTER, /* 0xad */ + CYR_NON_LETTER, /* 0xae */ + CYR_NON_LETTER, /* 0xaf */ + CYR_NON_LETTER, /* 0xb0 */ + CYR_NON_LETTER, /* 0xb1 */ + CYR_NON_LETTER, /* 0xb2 */ + CYR_NON_LETTER, /* 0xb3 */ + CYR_NON_LETTER, /* 0xb4 */ + CYR_NON_LETTER, /* 0xb5 */ + CYR_NON_LETTER, /* 0xb6 */ + CYR_NON_LETTER, /* 0xb7 */ + 39, /* 0xb8 */ + CYR_NON_LETTER, /* 0xb9 */ + CYR_NON_LETTER, /* 0xba */ + CYR_NON_LETTER, /* 0xbb */ + CYR_NON_LETTER, /* 0xbc */ + CYR_NON_LETTER, /* 0xbd */ + CYR_NON_LETTER, /* 0xbe */ + CYR_NON_LETTER, /* 0xbf */ + 0, /* 0xc0 */ + 1, /* 0xc1 */ + 2, /* 0xc2 */ + 3, /* 0xc3 */ + 4, /* 0xc4 */ + 5, /* 0xc5 */ + 7, /* 0xc6 */ + 8, /* 0xc7 */ + 9, /* 0xc8 */ + 10, /* 0xc9 */ + 11, /* 0xca */ + 12, /* 0xcb */ + 13, /* 0xcc */ + 14, /* 0xcd */ + 15, /* 0xce */ + 16, /* 0xcf */ + 17, /* 0xd0 */ + 18, /* 0xd1 */ + 19, /* 0xd2 */ + 20, /* 0xd3 */ + 21, /* 0xd4 */ + 22, /* 0xd5 */ + 23, /* 0xd6 */ + 24, /* 0xd7 */ + 25, /* 0xd8 */ + 26, /* 0xd9 */ + 27, /* 0xda */ + 28, /* 0xdb */ + 29, /* 0xdc */ + 30, /* 0xdd */ + 31, /* 0xde */ + 32, /* 0xdf */ + 33, /* 0xe0 */ + 34, /* 0xe1 */ + 35, /* 0xe2 */ + 36, /* 0xe3 */ + 37, /* 0xe4 */ + 38, /* 0xe5 */ + 40, /* 0xe6 */ + 41, /* 0xe7 */ + 42, /* 0xe8 */ + 43, /* 0xe9 */ + 44, /* 0xea */ + 45, /* 0xeb */ + 46, /* 0xec */ + 47, /* 0xed */ + 48, /* 0xee */ + 49, /* 0xef */ + 50, /* 0xf0 */ + 51, /* 0xf1 */ + 52, /* 0xf2 */ + 53, /* 0xf3 */ + 54, /* 0xf4 */ + 55, /* 0xf5 */ + 56, /* 0xf6 */ + 57, /* 0xf7 */ + 58, /* 0xf8 */ + 59, /* 0xf9 */ + 60, /* 0xfa */ + 61, /* 0xfb */ + 62, /* 0xfc */ + 63, /* 0xfd */ + 64, /* 0xfe */ + 65, /* 0xff */ + +}; + +int code_table_koi8_r [] = { + + CYR_NON_LETTER, /* 0x80 */ + CYR_NON_LETTER, /* 0x81 */ + CYR_NON_LETTER, /* 0x82 */ + CYR_NON_LETTER, /* 0x83 */ + CYR_NON_LETTER, /* 0x84 */ + CYR_NON_LETTER, /* 0x85 */ + CYR_NON_LETTER, /* 0x86 */ + CYR_NON_LETTER, /* 0x87 */ + CYR_NON_LETTER, /* 0x88 */ + CYR_NON_LETTER, /* 0x89 */ + CYR_NON_LETTER, /* 0x8a */ + CYR_NON_LETTER, /* 0x8b */ + CYR_NON_LETTER, /* 0x8c */ + CYR_NON_LETTER, /* 0x8d */ + CYR_NON_LETTER, /* 0x8e */ + CYR_NON_LETTER, /* 0x8f */ + CYR_NON_LETTER, /* 0x90 */ + CYR_NON_LETTER, /* 0x91 */ + CYR_NON_LETTER, /* 0x92 */ + CYR_NON_LETTER, /* 0x93 */ + CYR_NON_LETTER, /* 0x94 */ + CYR_NON_LETTER, /* 0x95 */ + CYR_NON_LETTER, /* 0x96 */ + CYR_NON_LETTER, /* 0x97 */ + CYR_NON_LETTER, /* 0x98 */ + CYR_NON_LETTER, /* 0x99 */ + CYR_NON_LETTER, /* 0x9a */ + CYR_NON_LETTER, /* 0x9b */ + CYR_NON_LETTER, /* 0x9c */ + CYR_NON_LETTER, /* 0x9d */ + CYR_NON_LETTER, /* 0x9e */ + CYR_NON_LETTER, /* 0x9f */ + CYR_NON_LETTER, /* 0xa0 */ + CYR_NON_LETTER, /* 0xa1 */ + CYR_NON_LETTER, /* 0xa2 */ + 39, /* 0xa3 ё */ + CYR_NON_LETTER, /* 0xa4 */ + CYR_NON_LETTER, /* 0xa5 */ + CYR_NON_LETTER, /* 0xa6 */ + CYR_NON_LETTER, /* 0xa7 */ + CYR_NON_LETTER, /* 0xa8 */ + CYR_NON_LETTER, /* 0xa9 */ + CYR_NON_LETTER, /* 0xaa */ + CYR_NON_LETTER, /* 0xab */ + CYR_NON_LETTER, /* 0xac */ + CYR_NON_LETTER, /* 0xad */ + CYR_NON_LETTER, /* 0xae */ + CYR_NON_LETTER, /* 0xaf */ + CYR_NON_LETTER, /* 0xb0 */ + CYR_NON_LETTER, /* 0xb1 */ + CYR_NON_LETTER, /* 0xb2 */ + 6, /* 0xb3 Ё */ + CYR_NON_LETTER, /* 0xb4 */ + CYR_NON_LETTER, /* 0xb5 */ + CYR_NON_LETTER, /* 0xb6 */ + CYR_NON_LETTER, /* 0xb7 */ + CYR_NON_LETTER, /* 0xb8 */ + CYR_NON_LETTER, /* 0xb9 */ + CYR_NON_LETTER, /* 0xba */ + CYR_NON_LETTER, /* 0xbb */ + CYR_NON_LETTER, /* 0xbc */ + CYR_NON_LETTER, /* 0xbd */ + CYR_NON_LETTER, /* 0xbe */ + CYR_NON_LETTER, /* 0xbf */ + 64, /* 0xc0 ю */ + 33, /* 0xc1 а */ + 34, /* 0xc2 б */ + 56, /* 0xc3 ц */ + 37, /* 0xc4 д */ + 38, /* 0xc5 е */ + 54, /* 0xc6 ф */ + 36, /* 0xc7 г */ + 55, /* 0xc8 х */ + 42, /* 0xc9 и */ + 43, /* 0xca й */ + 44, /* 0xcb к */ + 45, /* 0xcc л */ + 46, /* 0xcd м */ + 47, /* 0xce н */ + 48, /* 0xcf о */ + 49, /* 0xd0 п */ + 65, /* 0xd1 я */ + 50, /* 0xd2 р */ + 51, /* 0xd3 с */ + 52, /* 0xd4 т */ + 53, /* 0xd5 у */ + 40, /* 0xd6 ж */ + 35, /* 0xd7 в */ + 62, /* 0xd8 ь */ + 61, /* 0xd9 ы */ + 41, /* 0xda з */ + 58, /* 0xdb ш */ + 63, /* 0xdc э */ + 59, /* 0xdd щ */ + 57, /* 0xde ч */ + 60, /* 0xdf ъ */ + 31, /* 0xe0 Ю */ + 0, /* 0xe1 А */ + 1, /* 0xe2 Б */ + 23, /* 0xe3 Ц */ + 4, /* 0xe4 Д */ + 5, /* 0xe5 Е */ + 21, /* 0xe6 Ф */ + 3, /* 0xe7 Г */ + 22, /* 0xe8 Х */ + 9, /* 0xe9 И */ + 10, /* 0xea Й */ + 11, /* 0xeb К */ + 12, /* 0xec Л */ + 13, /* 0xed М */ + 14, /* 0xee Н */ + 15, /* 0xef О */ + 16, /* 0xf0 П */ + 32, /* 0xf1 Я */ + 17, /* 0xf2 Р */ + 18, /* 0xf3 С */ + 19, /* 0xf4 Т */ + 20, /* 0xf5 У */ + 7, /* 0xf6 Ж */ + 2, /* 0xf7 В */ + 29, /* 0xf8 Ь */ + 28, /* 0xf9 Ы */ + 8, /* 0xfa З */ + 25, /* 0xfb Ш */ + 30, /* 0xfc Э */ + 26, /* 0xfd Щ */ + 24, /* 0xfe Ч */ + 27, /* 0xff Ъ */ +}; + + +int code_table_mac [] = { + + 0, /* 0x80 */ + 1, /* 0x81 */ + 2, /* 0x82 */ + 3, /* 0x83 */ + 4, /* 0x84 */ + 5, /* 0x85 */ + 7, /* 0x86 */ + 8, /* 0x87 */ + 9, /* 0x88 */ + 10, /* 0x89 */ + 11, /* 0x8a */ + 12, /* 0x8b */ + 13, /* 0x8c */ + 14, /* 0x8d */ + 15, /* 0x8e */ + 16, /* 0x8f */ + 17, /* 0x90 */ + 18, /* 0x91 */ + 19, /* 0x92 */ + 20, /* 0x93 */ + 21, /* 0x94 */ + 22, /* 0x95 */ + 23, /* 0x96 */ + 24, /* 0x97 */ + 25, /* 0x98 */ + 26, /* 0x99 */ + 27, /* 0x9a */ + 28, /* 0x9b */ + 29, /* 0x9c */ + 30, /* 0x9d */ + 31, /* 0x9e */ + 32, /* 0x9f */ + CYR_NON_LETTER, /* 0xa0 */ + CYR_NON_LETTER, /* 0xa1 */ + CYR_NON_LETTER, /* 0xa2 */ + CYR_NON_LETTER, /* 0xa3 */ + CYR_NON_LETTER, /* 0xa4 */ + CYR_NON_LETTER, /* 0xa5 */ + CYR_NON_LETTER, /* 0xa6 */ + CYR_NON_LETTER, /* 0xa7 */ + CYR_NON_LETTER, /* 0xa8 */ + CYR_NON_LETTER, /* 0xa9 */ + CYR_NON_LETTER, /* 0xaa */ + CYR_NON_LETTER, /* 0xab */ + CYR_NON_LETTER, /* 0xac */ + CYR_NON_LETTER, /* 0xad */ + CYR_NON_LETTER, /* 0xae */ + CYR_NON_LETTER, /* 0xaf */ + CYR_NON_LETTER, /* 0xb0 */ + CYR_NON_LETTER, /* 0xb1 */ + CYR_NON_LETTER, /* 0xb2 */ + CYR_NON_LETTER, /* 0xb3 */ + CYR_NON_LETTER, /* 0xb4 */ + CYR_NON_LETTER, /* 0xb5 */ + CYR_NON_LETTER, /* 0xb6 */ + CYR_NON_LETTER, /* 0xb7 */ + CYR_NON_LETTER, /* 0xb8 */ + CYR_NON_LETTER, /* 0xb9 */ + CYR_NON_LETTER, /* 0xba */ + CYR_NON_LETTER, /* 0xbb */ + CYR_NON_LETTER, /* 0xbc */ + CYR_NON_LETTER, /* 0xbd */ + CYR_NON_LETTER, /* 0xbe */ + CYR_NON_LETTER, /* 0xbf */ + CYR_NON_LETTER, /* 0xc0 */ + CYR_NON_LETTER, /* 0xc1 */ + CYR_NON_LETTER, /* 0xc2 */ + CYR_NON_LETTER, /* 0xc3 */ + CYR_NON_LETTER, /* 0xc4 */ + CYR_NON_LETTER, /* 0xc5 */ + CYR_NON_LETTER, /* 0xc6 */ + CYR_NON_LETTER, /* 0xc7 */ + CYR_NON_LETTER, /* 0xc8 */ + CYR_NON_LETTER, /* 0xc9 */ + CYR_NON_LETTER, /* 0xca */ + CYR_NON_LETTER, /* 0xcb */ + CYR_NON_LETTER, /* 0xcc */ + CYR_NON_LETTER, /* 0xcd */ + CYR_NON_LETTER, /* 0xce */ + CYR_NON_LETTER, /* 0xcf */ + CYR_NON_LETTER, /* 0xd0 */ + CYR_NON_LETTER, /* 0xd1 */ + CYR_NON_LETTER, /* 0xd2 */ + CYR_NON_LETTER, /* 0xd3 */ + CYR_NON_LETTER, /* 0xd4 */ + CYR_NON_LETTER, /* 0xd5 */ + CYR_NON_LETTER, /* 0xd6 */ + CYR_NON_LETTER, /* 0xd7 */ + CYR_NON_LETTER, /* 0xd8 */ + CYR_NON_LETTER, /* 0xd9 */ + CYR_NON_LETTER, /* 0xda */ + CYR_NON_LETTER, /* 0xdb */ + CYR_NON_LETTER, /* 0xdc */ + 6, /* 0xdd */ + 39, /* 0xde */ + 65, /* 0xdf */ + 33, /* 0xe0 */ + 34, /* 0xe1 */ + 35, /* 0xe2 */ + 36, /* 0xe3 */ + 37, /* 0xe4 */ + 38, /* 0xe5 */ + 40, /* 0xe6 */ + 41, /* 0xe7 */ + 42, /* 0xe8 */ + 43, /* 0xe9 */ + 44, /* 0xea */ + 45, /* 0xeb */ + 46, /* 0xec */ + 47, /* 0xed */ + 48, /* 0xee */ + 49, /* 0xef */ + 50, /* 0xf0 */ + 51, /* 0xf1 */ + 52, /* 0xf2 */ + 53, /* 0xf3 */ + 54, /* 0xf4 */ + 55, /* 0xf5 */ + 56, /* 0xf6 */ + 57, /* 0xf7 */ + 58, /* 0xf8 */ + 59, /* 0xf9 */ + 60, /* 0xfa */ + 61, /* 0xfb */ + 62, /* 0xfc */ + 63, /* 0xfd */ + 64, /* 0xfe */ + CYR_NON_LETTER, /* 0xff */ + +}; + +int code_table_iso_8859_5 [] = { + + CYR_NON_LETTER, /* 0x80 */ + CYR_NON_LETTER, /* 0x81 */ + CYR_NON_LETTER, /* 0x82 */ + CYR_NON_LETTER, /* 0x83 */ + CYR_NON_LETTER, /* 0x84 */ + CYR_NON_LETTER, /* 0x85 */ + CYR_NON_LETTER, /* 0x86 */ + CYR_NON_LETTER, /* 0x87 */ + CYR_NON_LETTER, /* 0x88 */ + CYR_NON_LETTER, /* 0x89 */ + CYR_NON_LETTER, /* 0x8a */ + CYR_NON_LETTER, /* 0x8b */ + CYR_NON_LETTER, /* 0x8c */ + CYR_NON_LETTER, /* 0x8d */ + CYR_NON_LETTER, /* 0x8e */ + CYR_NON_LETTER, /* 0x8f */ + CYR_NON_LETTER, /* 0x90 */ + CYR_NON_LETTER, /* 0x91 */ + CYR_NON_LETTER, /* 0x92 */ + CYR_NON_LETTER, /* 0x93 */ + CYR_NON_LETTER, /* 0x94 */ + CYR_NON_LETTER, /* 0x95 */ + CYR_NON_LETTER, /* 0x96 */ + CYR_NON_LETTER, /* 0x97 */ + CYR_NON_LETTER, /* 0x98 */ + CYR_NON_LETTER, /* 0x99 */ + CYR_NON_LETTER, /* 0x9a */ + CYR_NON_LETTER, /* 0x9b */ + CYR_NON_LETTER, /* 0x9c */ + CYR_NON_LETTER, /* 0x9d */ + CYR_NON_LETTER, /* 0x9e */ + CYR_NON_LETTER, /* 0x9f */ + CYR_NON_LETTER, /* 0xa0 */ + 6, /* 0xa1 */ + CYR_NON_LETTER, /* 0xa2 */ + CYR_NON_LETTER, /* 0xa3 */ + CYR_NON_LETTER, /* 0xa4 */ + CYR_NON_LETTER, /* 0xa5 */ + CYR_NON_LETTER, /* 0xa6 */ + CYR_NON_LETTER, /* 0xa7 */ + CYR_NON_LETTER, /* 0xa8 */ + CYR_NON_LETTER, /* 0xa9 */ + CYR_NON_LETTER, /* 0xaa */ + CYR_NON_LETTER, /* 0xab */ + CYR_NON_LETTER, /* 0xac */ + CYR_NON_LETTER, /* 0xad */ + CYR_NON_LETTER, /* 0xae */ + CYR_NON_LETTER, /* 0xaf */ + 0, /* 0xb0 */ + 1, /* 0xb1 */ + 2, /* 0xb2 */ + 3, /* 0xb3 */ + 4, /* 0xb4 */ + 5, /* 0xb5 */ + 7, /* 0xb6 */ + 8, /* 0xb7 */ + 9, /* 0xb8 */ + 10, /* 0xb9 */ + 11, /* 0xba */ + 12, /* 0xbb */ + 13, /* 0xbc */ + 14, /* 0xbd */ + 15, /* 0xbe */ + 16, /* 0xbf */ + 17, /* 0xc0 */ + 18, /* 0xc1 */ + 19, /* 0xc2 */ + 20, /* 0xc3 */ + 21, /* 0xc4 */ + 22, /* 0xc5 */ + 23, /* 0xc6 */ + 24, /* 0xc7 */ + 25, /* 0xc8 */ + 26, /* 0xc9 */ + 27, /* 0xca */ + 28, /* 0xcb */ + 29, /* 0xcc */ + 30, /* 0xcd */ + 31, /* 0xce */ + 32, /* 0xcf */ + 33, /* 0xd0 */ + 34, /* 0xd1 */ + 35, /* 0xd2 */ + 36, /* 0xd3 */ + 37, /* 0xd4 */ + 38, /* 0xd5 */ + 40, /* 0xd6 */ + 41, /* 0xd7 */ + 42, /* 0xd8 */ + 43, /* 0xd9 */ + 44, /* 0xda */ + 45, /* 0xdb */ + 46, /* 0xdc */ + 47, /* 0xdd */ + 48, /* 0xde */ + 49, /* 0xdf */ + 50, /* 0xe0 */ + 51, /* 0xe1 */ + 52, /* 0xe2 */ + 53, /* 0xe3 */ + 54, /* 0xe4 */ + 55, /* 0xe5 */ + 56, /* 0xe6 */ + 57, /* 0xe7 */ + 58, /* 0xe8 */ + 59, /* 0xe9 */ + 60, /* 0xea */ + 61, /* 0xeb */ + 62, /* 0xec */ + 63, /* 0xed */ + 64, /* 0xee */ + 65, /* 0xef */ + CYR_NON_LETTER, /* 0xf0 */ + 39, /* 0xf1 */ + CYR_NON_LETTER, /* 0xf2 */ + CYR_NON_LETTER, /* 0xf3 */ + CYR_NON_LETTER, /* 0xf4 */ + CYR_NON_LETTER, /* 0xf5 */ + CYR_NON_LETTER, /* 0xf6 */ + CYR_NON_LETTER, /* 0xf7 */ + CYR_NON_LETTER, /* 0xf8 */ + CYR_NON_LETTER, /* 0xf9 */ + CYR_NON_LETTER, /* 0xfa */ + CYR_NON_LETTER, /* 0xfb */ + CYR_NON_LETTER, /* 0xfc */ + CYR_NON_LETTER, /* 0xfd */ + CYR_NON_LETTER, /* 0xfe */ + CYR_NON_LETTER, /* 0xff */ + +}; + +int code_table_unicode [] = { + CYR_NON_LETTER, /* 0x400 */ + 6, /* 0x401 */ + CYR_NON_LETTER, /* 0x402 */ + CYR_NON_LETTER, /* 0x403 */ + CYR_NON_LETTER, /* 0x404 */ + CYR_NON_LETTER, /* 0x405 */ + CYR_NON_LETTER, /* 0x406 */ + CYR_NON_LETTER, /* 0x407 */ + CYR_NON_LETTER, /* 0x408 */ + CYR_NON_LETTER, /* 0x409 */ + CYR_NON_LETTER, /* 0x40a */ + CYR_NON_LETTER, /* 0x40b */ + CYR_NON_LETTER, /* 0x40c */ + CYR_NON_LETTER, /* 0x40d */ + CYR_NON_LETTER, /* 0x40e */ + CYR_NON_LETTER, /* 0x40f */ + 0, /* 0x80 */ + 1, /* 0x81 */ + 2, /* 0x82 */ + 3, /* 0x83 */ + 4, /* 0x84 */ + 5, /* 0x85 */ + 7, /* 0x86 */ + 8, /* 0x87 */ + 9, /* 0x88 */ + 10, /* 0x89 */ + 11, /* 0x8a */ + 12, /* 0x8b */ + 13, /* 0x8c */ + 14, /* 0x8d */ + 15, /* 0x8e */ + 16, /* 0x8f */ + 17, /* 0x90 */ + 18, /* 0x91 */ + 19, /* 0x92 */ + 20, /* 0x93 */ + 21, /* 0x94 */ + 22, /* 0x95 */ + 23, /* 0x96 */ + 24, /* 0x97 */ + 25, /* 0x98 */ + 26, /* 0x99 */ + 27, /* 0x9a */ + 28, /* 0x9b */ + 29, /* 0x9c */ + 30, /* 0x9d */ + 31, /* 0x9e */ + 32, /* 0x9f */ + 33, /* 0xa0 */ + 34, /* 0xa1 */ + 35, /* 0xa2 */ + 36, /* 0xa3 */ + 37, /* 0xa4 */ + 38, /* 0xa5 */ + 40, /* 0xa6 */ + 41, /* 0xa7 */ + 42, /* 0xa8 */ + 43, /* 0xa9 */ + 44, /* 0xaa */ + 45, /* 0xab */ + 46, /* 0xac */ + 47, /* 0xad */ + 48, /* 0xae */ + 49, /* 0xaf */ + 50, /* 0xb0 */ + 51, /* 0xb1 */ + 52, /* 0xb2 */ + 53, /* 0xb3 */ + 54, /* 0xb4 */ + 55, /* 0xb5 */ + 56, /* 0xb6 */ + 57, /* 0xb7 */ + 58, /* 0xb8 */ + 59, /* 0xb9 */ + 60, /* 0xba */ + 61, /* 0xbb */ + 62, /* 0xbc */ + 63, /* 0xbd */ + 64, /* 0xbe */ + 65, /* 0xbf */ + CYR_NON_LETTER, /* 0xc0 */ + 39, /* 0xc1 */ + CYR_NON_LETTER, /* 0xc2 */ + CYR_NON_LETTER, /* 0xc3 */ + CYR_NON_LETTER, /* 0xc4 */ + CYR_NON_LETTER, /* 0xc5 */ + CYR_NON_LETTER, /* 0xc6 */ + CYR_NON_LETTER, /* 0xc7 */ + CYR_NON_LETTER, /* 0xc8 */ + CYR_NON_LETTER, /* 0xc9 */ + CYR_NON_LETTER, /* 0xca */ + CYR_NON_LETTER, /* 0xcb */ + CYR_NON_LETTER, /* 0xcc */ + CYR_NON_LETTER, /* 0xcd */ + CYR_NON_LETTER, /* 0xce */ + CYR_NON_LETTER, /* 0xcf */ + CYR_NON_LETTER, /* 0xd0 */ + CYR_NON_LETTER, /* 0xd1 */ + CYR_NON_LETTER, /* 0xd2 */ + CYR_NON_LETTER, /* 0xd3 */ + CYR_NON_LETTER, /* 0xd4 */ + CYR_NON_LETTER, /* 0xd5 */ + CYR_NON_LETTER, /* 0xd6 */ + CYR_NON_LETTER, /* 0xd7 */ + CYR_NON_LETTER, /* 0xd8 */ + CYR_NON_LETTER, /* 0xd9 */ + CYR_NON_LETTER, /* 0xda */ + CYR_NON_LETTER, /* 0xdb */ + CYR_NON_LETTER, /* 0xdc */ + CYR_NON_LETTER, /* 0xdd */ + CYR_NON_LETTER, /* 0xde */ + CYR_NON_LETTER, /* 0xdf */ + CYR_NON_LETTER, /* 0xe0 */ + CYR_NON_LETTER, /* 0xe1 */ + CYR_NON_LETTER, /* 0xe2 */ + CYR_NON_LETTER, /* 0xe3 */ + CYR_NON_LETTER, /* 0xe4 */ + CYR_NON_LETTER, /* 0xe5 */ + CYR_NON_LETTER, /* 0xe6 */ + CYR_NON_LETTER, /* 0xe7 */ + CYR_NON_LETTER, /* 0xe8 */ + CYR_NON_LETTER, /* 0xe9 */ + CYR_NON_LETTER, /* 0xea */ + CYR_NON_LETTER, /* 0xeb */ + CYR_NON_LETTER, /* 0xec */ + CYR_NON_LETTER, /* 0xed */ + CYR_NON_LETTER, /* 0xee */ + CYR_NON_LETTER, /* 0xef */ + CYR_NON_LETTER, /* 0xf0 */ + CYR_NON_LETTER, /* 0xf1 */ + CYR_NON_LETTER, /* 0xf2 */ + CYR_NON_LETTER, /* 0xf3 */ + CYR_NON_LETTER, /* 0xf4 */ + CYR_NON_LETTER, /* 0xf5 */ + CYR_NON_LETTER, /* 0xf6 */ + CYR_NON_LETTER, /* 0xf7 */ + CYR_NON_LETTER, /* 0xf8 */ + CYR_NON_LETTER, /* 0xf9 */ + CYR_NON_LETTER, /* 0xfa */ + CYR_NON_LETTER, /* 0xfb */ + CYR_NON_LETTER, /* 0xfc */ + CYR_NON_LETTER, /* 0xfd */ + CYR_NON_LETTER, /* 0xfe */ + CYR_NON_LETTER, /* 0xff */ + +}; + +int ord2chr [] [ 6 ] = { + /* А */ { 0xe1, 0xc0, 0x80, 0x80, 0xb0, 0x0410 }, + /* Б */ { 0xe2, 0xc1, 0x81, 0x81, 0xb1, 0x0411 }, + /* В */ { 0xf7, 0xc2, 0x82, 0x82, 0xb2, 0x0412 }, + /* Г */ { 0xe7, 0xc3, 0x83, 0x83, 0xb3, 0x0413 }, + /* Д */ { 0xe4, 0xc4, 0x84, 0x84, 0xb4, 0x0414 }, + /* Е */ { 0xe5, 0xc5, 0x85, 0x85, 0xb5, 0x0415 }, + /* Ё */ { 0xb3, 0xa8, 0xf0, 0xdd, 0xa1, 0x0401 }, + /* Ж */ { 0xf6, 0xc6, 0x86, 0x86, 0xb6, 0x0416 }, + /* З */ { 0xfa, 0xc7, 0x87, 0x87, 0xb7, 0x0417 }, + /* И */ { 0xe9, 0xc8, 0x88, 0x88, 0xb8, 0x0418 }, + /* Й */ { 0xea, 0xc9, 0x89, 0x89, 0xb9, 0x0419 }, + /* К */ { 0xeb, 0xca, 0x8a, 0x8a, 0xba, 0x041a }, + /* Л */ { 0xec, 0xcb, 0x8b, 0x8b, 0xbb, 0x041b }, + /* М */ { 0xed, 0xcc, 0x8c, 0x8c, 0xbc, 0x041c }, + /* Н */ { 0xee, 0xcd, 0x8d, 0x8d, 0xbd, 0x041d }, + /* О */ { 0xef, 0xce, 0x8e, 0x8e, 0xbe, 0x041e }, + /* П */ { 0xf0, 0xcf, 0x8f, 0x8f, 0xbf, 0x041f }, + /* Р */ { 0xf2, 0xd0, 0x90, 0x90, 0xc0, 0x0420 }, + /* С */ { 0xf3, 0xd1, 0x91, 0x91, 0xc1, 0x0421 }, + /* Т */ { 0xf4, 0xd2, 0x92, 0x92, 0xc2, 0x0422 }, + /* У */ { 0xf5, 0xd3, 0x93, 0x93, 0xc3, 0x0423 }, + /* Ф */ { 0xe6, 0xd4, 0x94, 0x94, 0xc4, 0x0424 }, + /* Х */ { 0xe8, 0xd5, 0x95, 0x95, 0xc5, 0x0425 }, + /* Ц */ { 0xe3, 0xd6, 0x96, 0x96, 0xc6, 0x0426 }, + /* Ч */ { 0xfe, 0xd7, 0x97, 0x97, 0xc7, 0x0427 }, + /* Ш */ { 0xfb, 0xd8, 0x98, 0x98, 0xc8, 0x0428 }, + /* Щ */ { 0xfd, 0xd9, 0x99, 0x99, 0xc9, 0x0429 }, + /* Ъ */ { 0xff, 0xda, 0x9a, 0x9a, 0xca, 0x042a }, + /* Ы */ { 0xf9, 0xdb, 0x9b, 0x9b, 0xcb, 0x042b }, + /* Ь */ { 0xf8, 0xdc, 0x9c, 0x9c, 0xcc, 0x042c }, + /* Э */ { 0xfc, 0xdd, 0x9d, 0x9d, 0xcd, 0x042d }, + /* Ю */ { 0xe0, 0xde, 0x9e, 0x9e, 0xce, 0x042e }, + /* Я */ { 0xf1, 0xdf, 0x9f, 0x9f, 0xcf, 0x042f }, + /* а */ { 0xc1, 0xe0, 0xa0, 0xe0, 0xd0, 0x0430 }, + /* б */ { 0xc2, 0xe1, 0xa1, 0xe1, 0xd1, 0x0431 }, + /* в */ { 0xd7, 0xe2, 0xa2, 0xe2, 0xd2, 0x0432 }, + /* г */ { 0xc7, 0xe3, 0xa3, 0xe3, 0xd3, 0x0433 }, + /* д */ { 0xc4, 0xe4, 0xa4, 0xe4, 0xd4, 0x0434 }, + /* е */ { 0xc5, 0xe5, 0xa5, 0xe5, 0xd5, 0x0435 }, + /* ё */ { 0xa3, 0xb8, 0xf1, 0xde, 0xf1, 0x0451 }, + /* ж */ { 0xd6, 0xe6, 0xa6, 0xe6, 0xd6, 0x0436 }, + /* з */ { 0xda, 0xe7, 0xa7, 0xe7, 0xd7, 0x0437 }, + /* и */ { 0xc9, 0xe8, 0xa8, 0xe8, 0xd8, 0x0438 }, + /* й */ { 0xca, 0xe9, 0xa9, 0xe9, 0xd9, 0x0439 }, + /* к */ { 0xcb, 0xea, 0xaa, 0xea, 0xda, 0x043a }, + /* л */ { 0xcc, 0xeb, 0xab, 0xeb, 0xdb, 0x043b }, + /* м */ { 0xcd, 0xec, 0xac, 0xec, 0xdc, 0x043c }, + /* н */ { 0xce, 0xed, 0xad, 0xed, 0xdd, 0x043d }, + /* о */ { 0xcf, 0xee, 0xae, 0xee, 0xde, 0x043e }, + /* п */ { 0xd0, 0xef, 0xaf, 0xef, 0xdf, 0x043f }, + /* р */ { 0xd2, 0xf0, 0xe0, 0xf0, 0xe0, 0x0440 }, + /* с */ { 0xd3, 0xf1, 0xe1, 0xf1, 0xe1, 0x0441 }, + /* т */ { 0xd4, 0xf2, 0xe2, 0xf2, 0xe2, 0x0442 }, + /* у */ { 0xd5, 0xf3, 0xe3, 0xf3, 0xe3, 0x0443 }, + /* ф */ { 0xc6, 0xf4, 0xe4, 0xf4, 0xe4, 0x0444 }, + /* х */ { 0xc8, 0xf5, 0xe5, 0xf5, 0xe5, 0x0445 }, + /* ц */ { 0xc3, 0xf6, 0xe6, 0xf6, 0xe6, 0x0446 }, + /* ч */ { 0xde, 0xf7, 0xe7, 0xf7, 0xe7, 0x0447 }, + /* ш */ { 0xdb, 0xf8, 0xe8, 0xf8, 0xe8, 0x0448 }, + /* щ */ { 0xdd, 0xf9, 0xe9, 0xf9, 0xe9, 0x0449 }, + /* ъ */ { 0xdf, 0xfa, 0xea, 0xfa, 0xea, 0x044a }, + /* ы */ { 0xd9, 0xfb, 0xeb, 0xfb, 0xeb, 0x044b }, + /* ь */ { 0xd8, 0xfc, 0xec, 0xfc, 0xec, 0x044c }, + /* э */ { 0xdc, 0xfd, 0xed, 0xfd, 0xed, 0x044d }, + /* ю */ { 0xc0, 0xfe, 0xee, 0xfe, 0xee, 0x044e }, + /* я */ { 0xd1, 0xff, 0xef, 0xdf, 0xef, 0x044f } +}; diff --git a/src/ui.c b/src/ui.c new file mode 100644 index 0000000..917605e --- /dev/null +++ b/src/ui.c @@ -0,0 +1,254 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: ui.koi8-r,v 1.2 2002/06/29 05:22:14 vadimp Exp $ + */ + +#include +#include +#include +#include +#include +#include + +#include "config.h" +#include "fe.h" +#include "ui.h" +#include "util.h" +#include "wrappers.h" +#include "cyrillic.h" + +static void cleareol ( int ch ) +{ + while ( ch != '\n' ) + if ( (ch = getchar ()) == EOF ) + break; +} + +/* + * General query function with some case folding. + * Returns a character listed in keys or '\0' + */ +int whatkey ( const char* keys ) +{ + int ch = 0; + + while ( 1 ) + switch ( ch = toupper ( getchar () ) ) { + + case 'N': + case '█': + case '╜': + if ( strchr ( keys, 'N' ) ) { + ch = 'N'; + goto quit; + } + break; + + case 'Y': + case '└': + case '╓': + if ( strchr ( keys, 'Y' ) ) { + ch = 'Y'; + goto quit; + } + break; + + default: + if ( !strchr ( keys, ch ) ) + ch = 0; + goto quit; + + } +quit: + cleareol ( ch ); + return ch; +} + +int ask ( const char *string, const char* keys ) +{ + static int lastkey = 0; + int key = 0; + + if ( string ) + printf (string); + printf ("? (Yes/No/All/Stop/Context/Help) "); + switch ( lastkey ) { + case 'Y': + printf ( "[Yes] " ); + break; + case 'N': + printf ( "[No] " ); + break; + case 'C': + printf ( "[Context] " ); + break; + default: + printf ( "[Help] " ); + } + + return lastkey = + ((key = whatkey ( keys )) == '\n' ? lastkey : key); +} + +void usage ( void ) { + + printf ( + + "Usage: "PACKAGE" [options] file...\n" + "Check Russian writing style.\n\n" + "-l n, --context-size Set size of context to n words\n" + " (default = 15, min = 2)\n" + "-s n, --sensitivity Set sensitivity threshold to n " + "(default = 600)\n" + "-c n, --wordcount-use Set coefficient of using wordcount " + "information to n\n" + " (0..100, 0 = off, default = 50)\n" + "-a, --silent Output into log file without queries\n" + "-d, --dump-wordcount Dump wordcount into log file\n" + "-p, --proper-names Do not exclude proper names\n" + "-r, --resume Resume processing, if possible\n" + "-o path, --output Use path as log file ('fresheye.log' " + "is the default)\n" + "-I cp, --input-codepage Set Cyrillic code page of input file to " + "cp (check\n" + " below for possible values of cp)\n" +#if 0 /* Not implemented yet */ + "-O cp, --output-codepage Set Cyrillic code page of fe's interface " + "and log\n" + " file to cp (check below for " + "possible values of cp)\n" +#endif + "-h, -?, --help Display this help and exit\n" + "-v, --version Display version information and exit\n\n" + + "cp parameter (used for code page definition) can be one of the " + "following:\n\n" + "koi8-r -- KOI8-R (default on UNIX-compatible platforms)\n" + "cp866 -- MS-DOS CP866 (aka 'alternative', default on Win32 " + "platforms)\n" + "cp1251 -- Windows CP1251\n" + "mac -- Cyrillic encoding used on Apple Macintosh\n" + "iso8859-5 -- ISO 8859-5\n\n" + + "Send suggestions for improvements to Dmitry Kirsanov " + "\n" + "Report bugs to Vadim Penzin \n" + "Please make sure there are words 'Fresh Eye' in the Subject: line\n" + + ); + + exit ( 0 ); + +} + +void version ( void ) { + + printf ( + + PACKAGE" version "VERSION" ("PLATFORM" ["CYR_CP_NAME"])\n" + "Copyright (C) 1999 OnMind Systems.\n" + "Fresh Eye is distributed in the hope that it will be useful,\n" + "but THERE IS ABSOLUTELY NO WARRANTY OF ANY KIND for this software.\n" + "You may redistribute copies of Fresh Eye\n" + "under the terms of the GNU General Public License.\n" + "For more information, see the file named COPYING.\n" + + ); + + exit ( 0 ); +} + +int parse_command_line ( int argc, char* argv [] ) { + + static const char* options = "l:s:c:adpro:I:O:hv?"; + int option_index; + static struct option long_options [] = { + { "context-size", 1, NULL, 'l' }, + { "sensitivity", 1, NULL, 's' }, + { "wordcount-use", 1, NULL, 'c' }, + { "silent", 0, NULL, 'a' }, + { "dump-wordcount", 0, NULL, 'd' }, + { "proper-names", 0, NULL, 'p' }, + { "resume", 0, NULL, 'r' }, + { "output", 1, NULL, 'o' }, + { "input-codepage", 1, NULL, 'I' }, + { "output-codepage", 1, NULL, 'O' }, + { "help", 0, NULL, 'h' }, + { "version", 0, NULL, 'v' }, + { NULL, 0, NULL, 0 } + }; + int ch; + + while ( (ch = getopt_long ( argc, argv, options, long_options, + &option_index )) != -1 ) + switch ( ch ) { + case 'l': + context_size = atol ( optarg ); + break; + + case 's': + sensitivity_threshold = atol ( optarg ); + break; + + case 'c': + wordcount_use_coefficient = atol ( optarg ); + break; + + case 'a': + quiet_logging = 1; + break; + + case 'd': + dump_wordcount = 1; + break; + + case 'p': + exclude_proper_names = 0; + break; + + case 'r': + resume_processing = 1; + break; + + case 'o': + log_path = xstrdup ( optarg ); + break; + + case 'I': + input_codepage = + cyr_codepage_by_name ( optarg ); + break; + + case 'O': + output_codepage = + cyr_codepage_by_name ( optarg ); + break; + + case '?': + case 'h': + usage (); + + case 'v': + version (); + + } + + return optind; +} + diff --git a/src/ui.h b/src/ui.h new file mode 100644 index 0000000..5198f06 --- /dev/null +++ b/src/ui.h @@ -0,0 +1,26 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: ui.h,v 1.3 2002/06/29 05:22:14 vadimp Exp $ + */ + +int whatkey ( const char* keys ); +int ask ( const char *string, const char* keys ); +void usage ( void ); +void version ( void ); +int parse_command_line ( int argc, char* argv [] ); diff --git a/src/ui.koi8-r b/src/ui.koi8-r new file mode 100644 index 0000000..16971d9 --- /dev/null +++ b/src/ui.koi8-r @@ -0,0 +1,254 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: ui.koi8-r,v 1.2 2002/06/29 05:22:14 vadimp Exp $ + */ + +#include +#include +#include +#include +#include +#include + +#include "config.h" +#include "fe.h" +#include "ui.h" +#include "util.h" +#include "wrappers.h" +#include "cyrillic.h" + +static void cleareol ( int ch ) +{ + while ( ch != '\n' ) + if ( (ch = getchar ()) == EOF ) + break; +} + +/* + * General query function with some case folding. + * Returns a character listed in keys or '\0' + */ +int whatkey ( const char* keys ) +{ + int ch = 0; + + while ( 1 ) + switch ( ch = toupper ( getchar () ) ) { + + case 'N': + case 'Н': + case 'н': + if ( strchr ( keys, 'N' ) ) { + ch = 'N'; + goto quit; + } + break; + + case 'Y': + case 'Д': + case 'д': + if ( strchr ( keys, 'Y' ) ) { + ch = 'Y'; + goto quit; + } + break; + + default: + if ( !strchr ( keys, ch ) ) + ch = 0; + goto quit; + + } +quit: + cleareol ( ch ); + return ch; +} + +int ask ( const char *string, const char* keys ) +{ + static int lastkey = 0; + int key = 0; + + if ( string ) + printf (string); + printf ("? (Yes/No/All/Stop/Context/Help) "); + switch ( lastkey ) { + case 'Y': + printf ( "[Yes] " ); + break; + case 'N': + printf ( "[No] " ); + break; + case 'C': + printf ( "[Context] " ); + break; + default: + printf ( "[Help] " ); + } + + return lastkey = + ((key = whatkey ( keys )) == '\n' ? lastkey : key); +} + +void usage ( void ) { + + printf ( + + "Usage: "PACKAGE" [options] file...\n" + "Check Russian writing style.\n\n" + "-l n, --context-size Set size of context to n words\n" + " (default = 15, min = 2)\n" + "-s n, --sensitivity Set sensitivity threshold to n " + "(default = 600)\n" + "-c n, --wordcount-use Set coefficient of using wordcount " + "information to n\n" + " (0..100, 0 = off, default = 50)\n" + "-a, --silent Output into log file without queries\n" + "-d, --dump-wordcount Dump wordcount into log file\n" + "-p, --proper-names Do not exclude proper names\n" + "-r, --resume Resume processing, if possible\n" + "-o path, --output Use path as log file ('fresheye.log' " + "is the default)\n" + "-I cp, --input-codepage Set Cyrillic code page of input file to " + "cp (check\n" + " below for possible values of cp)\n" +#if 0 /* Not implemented yet */ + "-O cp, --output-codepage Set Cyrillic code page of fe's interface " + "and log\n" + " file to cp (check below for " + "possible values of cp)\n" +#endif + "-h, -?, --help Display this help and exit\n" + "-v, --version Display version information and exit\n\n" + + "cp parameter (used for code page definition) can be one of the " + "following:\n\n" + "koi8-r -- KOI8-R (default on UNIX-compatible platforms)\n" + "cp866 -- MS-DOS CP866 (aka 'alternative', default on Win32 " + "platforms)\n" + "cp1251 -- Windows CP1251\n" + "mac -- Cyrillic encoding used on Apple Macintosh\n" + "iso8859-5 -- ISO 8859-5\n\n" + + "Send suggestions for improvements to Dmitry Kirsanov " + "\n" + "Report bugs to Vadim Penzin \n" + "Please make sure there are words 'Fresh Eye' in the Subject: line\n" + + ); + + exit ( 0 ); + +} + +void version ( void ) { + + printf ( + + PACKAGE" version "VERSION" ("PLATFORM" ["CYR_CP_NAME"])\n" + "Copyright (C) 1999 OnMind Systems.\n" + "Fresh Eye is distributed in the hope that it will be useful,\n" + "but THERE IS ABSOLUTELY NO WARRANTY OF ANY KIND for this software.\n" + "You may redistribute copies of Fresh Eye\n" + "under the terms of the GNU General Public License.\n" + "For more information, see the file named COPYING.\n" + + ); + + exit ( 0 ); +} + +int parse_command_line ( int argc, char* argv [] ) { + + static const char* options = "l:s:c:adpro:I:O:hv?"; + int option_index; + static struct option long_options [] = { + { "context-size", 1, NULL, 'l' }, + { "sensitivity", 1, NULL, 's' }, + { "wordcount-use", 1, NULL, 'c' }, + { "silent", 0, NULL, 'a' }, + { "dump-wordcount", 0, NULL, 'd' }, + { "proper-names", 0, NULL, 'p' }, + { "resume", 0, NULL, 'r' }, + { "output", 1, NULL, 'o' }, + { "input-codepage", 1, NULL, 'I' }, + { "output-codepage", 1, NULL, 'O' }, + { "help", 0, NULL, 'h' }, + { "version", 0, NULL, 'v' }, + { NULL, 0, NULL, 0 } + }; + int ch; + + while ( (ch = getopt_long ( argc, argv, options, long_options, + &option_index )) != -1 ) + switch ( ch ) { + case 'l': + context_size = atol ( optarg ); + break; + + case 's': + sensitivity_threshold = atol ( optarg ); + break; + + case 'c': + wordcount_use_coefficient = atol ( optarg ); + break; + + case 'a': + quiet_logging = 1; + break; + + case 'd': + dump_wordcount = 1; + break; + + case 'p': + exclude_proper_names = 0; + break; + + case 'r': + resume_processing = 1; + break; + + case 'o': + log_path = xstrdup ( optarg ); + break; + + case 'I': + input_codepage = + cyr_codepage_by_name ( optarg ); + break; + + case 'O': + output_codepage = + cyr_codepage_by_name ( optarg ); + break; + + case '?': + case 'h': + usage (); + + case 'v': + version (); + + } + + return optind; +} + diff --git a/src/util.c b/src/util.c new file mode 100644 index 0000000..960f2c1 --- /dev/null +++ b/src/util.c @@ -0,0 +1,139 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: util.c,v 1.3 2002/06/21 00:53:13 vadimp Exp $ + */ + +#include +#include +#include +#include +#include + +#include "config.h" +#include "util.h" +#include "wrappers.h" +#include "cyrillic.h" + +void fatal_error ( const char* message, const int use_errno ) { + + assert ( message ); + + if ( use_errno ) + fprintf ( stderr, PACKAGE": %s: %s\n", message, + strerror ( errno ) ); + else + fprintf ( stderr, "fe: %s\n", message ); + + exit ( -1 ); +} + +__inline int lnum (const char *c) { /* letter number in the alphabet */ + return cyr_ord ( *c ) % CYR_LETTER_COUNT; +} + +char* strndup ( const char* s, size_t n ) { + + char* dst = xmalloc ( n + 1 ); + char* cp = dst; + + while ( n -- && *s ) + *cp ++ = *s ++; + *cp = '\0'; + + return dst; +} + +__inline char* unify_word ( char* s ) { + + char* cp = s; + + while ( *cp ) { + *cp = cyr_downc ( *cp ); + cp ++; + + } + + return s; +} + +/* + * Converts src from native cyrillic encoding into "logical" one and places the + * result into dst. Logical encoding starts from 1 -- first capital letter of + * the Russian alphabet -- and ends at 64 -- last small letter of the Russian + * alphabet. + * Returns dst. + * Obviously, you can't use this function if src contains non-cyrillic + * characters. + */ + +char* convert_to_logical ( char* dst, const char* src ) { + + char* p = dst; + + while ( *src ) + *p ++ = lnum ( src ++ ) + 1; + *p = '\0'; + + return dst; +} + +char* convert_to_physical ( char* dst, const char* src ) { + + char* p = dst; + + while ( *src ) + *p ++ = cyr_chr ( *src ++ - 1 + CYR_LETTER_COUNT ); + *p = '\0'; + + return dst; +} + +char* strccpy ( char* dst, const char* src, char ch ) { + + char* cp = dst; + + while ( *src && *cp ) { + if ( *src == ch ) + *cp = ch; + src ++; + cp ++; + } + + return dst; +} + +char* recode_cyrillics ( char* dst, const char* src, + int dst_codepage, int src_codepage ) { + + char* cp = dst; + + while ( *src ) { + if ( !cyr_isletter_ex ( *src, src_codepage ) ) + *cp = *src; + else { + int ord = cyr_ord_ex ( *src, src_codepage ); + *cp = cyr_chr_ex ( ord, dst_codepage ); + } + src ++; + cp ++; + } + *cp = '\0'; + + return dst; +} diff --git a/src/util.h b/src/util.h new file mode 100644 index 0000000..4f54b8f --- /dev/null +++ b/src/util.h @@ -0,0 +1,40 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: util.h,v 1.2 2002/06/07 03:45:54 vadimp Exp $ + */ + +void fatal_error ( const char* message, const int use_errno ); +__inline int lnum (const char *c); +char* strndup ( const char* s, size_t n ); +char* unify_word ( char* s ); +char* convert_to_logical ( char* dst, const char* src ); +char* convert_to_physical ( char* dst, const char* src ); + +/* + * Copies from src to dst only characters equal to ch while leaving the rest of + * characters in dst intact. + */ +char* strccpy ( char* dst, const char* src, char ch ); + +/* + * Recodes string from one Cyrillic codepage to another + * Returns dst + */ +char* recode_cyrillics ( char* dst, const char* src, + int dst_codepage, int src_codepage ); diff --git a/src/wrappers.c b/src/wrappers.c new file mode 100644 index 0000000..a5a63a4 --- /dev/null +++ b/src/wrappers.c @@ -0,0 +1,121 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: wrappers.c,v 1.1.1.1 2000/10/17 01:16:59 vadimp Exp $ + */ + +#include +#include +#include +#include +#include + +#include "wrappers.h" +#include "util.h" + +void* xrealloc ( void* block, size_t size ) { + + void* p = realloc ( block, size ); + + if ( p == NULL ) + fatal_error ( "memory allocation error", 0 ); + + return p; +} + +void* xmalloc ( size_t size ) { + + void* p = malloc ( size ); + + if ( p == NULL ) + fatal_error ( "memory allocation error", 0 ); + + return p; +} + +long int xftell ( FILE* f ) { + + long int pos = ftell ( f ); + + if ( pos == -1L ) + fatal_error ( "ftell () failed", 1 ); + + return pos; +} + +int xfseek ( FILE *f, long offset, int whence ) { + + if ( fseek ( f, offset, whence ) == -1 ) + fatal_error ( "fseek () failed", 1 ); + + return 0; +} + +FILE* xfopen ( const char* path, const char* mode ) { + + const char* mode_name = "undefined access mode"; + char* message = NULL; + + FILE* fp = fopen ( path, mode ); + + if ( fp != NULL ) + return fp; + + message = xmalloc ( strlen ( path ) + 80 ); + + switch ( tolower ( *mode ) ) { + case 'r': + mode_name = "reading"; + break; + + case 'a': + case 'w': + mode_name = "writing"; + break; + + default: + ; /* does nothing */ + } + + sprintf ( message, "cannot open '%s' for %s", + path, mode_name ); + fatal_error ( message, 1 ); + + /* the following is very unlikely to happen, but who knows ... */ + free ( message ); + + return NULL; +} + +int xfclose ( FILE* stream ) { + + if ( fclose ( stream ) == EOF ) + fatal_error ( "fclose () failed", 1 ); + + return 0; +} + +char* xstrdup ( const char* s ) { + + char* tmp = strdup ( s ); + + if ( tmp == NULL ) + fatal_error ( "memory allocation error", 0 ); + + return tmp; +} diff --git a/src/wrappers.h b/src/wrappers.h new file mode 100644 index 0000000..8c7a229 --- /dev/null +++ b/src/wrappers.h @@ -0,0 +1,60 @@ +/* + * Fresh Eye, a program for Russian writing style checking + * Copyright (C) 1999 OnMind Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * $Id: wrappers.h,v 1.2 2002/06/07 03:45:54 vadimp Exp $ + */ + +/* + * realloc (3) wrapper. On errors it prints error message and calls exit (3). + */ +void* xrealloc ( void* block, size_t size ); + +/* + * malloc (3) wrapper. On errors it prints error message and calls exit (3). + */ +void* xmalloc ( size_t size ); + +/* + * ftell (3) wrapper. On system errors it prints error message and calls + * exit (3). + */ +long int xftell ( FILE* f ); + +/* + * fseek (3) wrapper. On system errors it prints error message and calls + * exit (3). + */ +int xfseek ( FILE *f, long offset, int whence ); + +/* + * fopen (3) wrapper. On system errors it prints error message and calls + * exit (3). + */ +FILE* xfopen ( const char* path, const char* mode ); + +/* + * fclose (3) wrapper. On system errors it prints error message and calls + * exit (3). + */ +int xfclose ( FILE* stream ); + +/* + * strdup (3) wrapper. On memory allocation errors it prints error message and + * calls exit (3). + */ +char* xstrdup ( const char* s ); diff --git a/win32/Makefile.am b/win32/Makefile.am new file mode 100644 index 0000000..6a5521b --- /dev/null +++ b/win32/Makefile.am @@ -0,0 +1,3 @@ +# $Id: Makefile.am,v 1.2 2002/06/08 04:05:21 vadimp Exp $ + +EXTRA_DIST = fe.dsp ce.dsp config.dsp recode.dsp fe.dsw README diff --git a/win32/ce.dsp b/win32/ce.dsp new file mode 100644 index 0000000..5db91a3 --- /dev/null +++ b/win32/ce.dsp @@ -0,0 +1,148 @@ +# Microsoft Developer Studio Project File - Name="ce" - Package Owner=<4> +# Microsoft Developer Studio Generated Build File, Format Version 6.00 +# ** DO NOT EDIT ** + +# TARGTYPE "Win32 (x86) Console Application" 0x0103 + +CFG=ce - Win32 Debug +!MESSAGE This is not a valid makefile. To build this project using NMAKE, +!MESSAGE use the Export Makefile command and run +!MESSAGE +!MESSAGE NMAKE /f "ce.mak". +!MESSAGE +!MESSAGE You can specify a configuration when running NMAKE +!MESSAGE by defining the macro CFG on the command line. For example: +!MESSAGE +!MESSAGE NMAKE /f "ce.mak" CFG="ce - Win32 Debug" +!MESSAGE +!MESSAGE Possible choices for configuration are: +!MESSAGE +!MESSAGE "ce - Win32 Release" (based on "Win32 (x86) Console Application") +!MESSAGE "ce - Win32 Debug" (based on "Win32 (x86) Console Application") +!MESSAGE + +# Begin Project +# PROP AllowPerConfigDependencies 0 +# PROP Scc_ProjName "" +# PROP Scc_LocalPath "" +CPP=cl.exe +RSC=rc.exe + +!IF "$(CFG)" == "ce - Win32 Release" + +# PROP BASE Use_MFC 0 +# PROP BASE Use_Debug_Libraries 0 +# PROP BASE Output_Dir "Release" +# PROP BASE Intermediate_Dir "Release" +# PROP BASE Target_Dir "" +# PROP Use_MFC 0 +# PROP Use_Debug_Libraries 0 +# PROP Output_Dir "Release" +# PROP Intermediate_Dir "Release" +# PROP Target_Dir "" +# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c +# ADD CPP /nologo /W3 /GX /O2 /I "..\src" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c +# ADD BASE RSC /l 0x409 /d "NDEBUG" +# ADD RSC /l 0x409 /d "NDEBUG" +BSC32=bscmake.exe +# ADD BASE BSC32 /nologo +# ADD BSC32 /nologo +LINK32=link.exe +# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386 +# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386 + +!ELSEIF "$(CFG)" == "ce - Win32 Debug" + +# PROP BASE Use_MFC 0 +# PROP BASE Use_Debug_Libraries 1 +# PROP BASE Output_Dir "ce___Win32_Debug" +# PROP BASE Intermediate_Dir "ce___Win32_Debug" +# PROP BASE Target_Dir "" +# PROP Use_MFC 0 +# PROP Use_Debug_Libraries 1 +# PROP Output_Dir "Debug" +# PROP Intermediate_Dir "Debug" +# PROP Target_Dir "" +# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c +# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /I "..\src" /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c +# ADD BASE RSC /l 0x409 /d "_DEBUG" +# ADD RSC /l 0x409 /d "_DEBUG" +BSC32=bscmake.exe +# ADD BASE BSC32 /nologo +# ADD BSC32 /nologo +LINK32=link.exe +# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept +# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept + +!ENDIF + +# Begin Target + +# Name "ce - Win32 Release" +# Name "ce - Win32 Debug" +# Begin Group "Source Files" + +# PROP Default_Filter "cpp;c;cxx;rc;def;r;odl;idl;hpj;bat" +# Begin Source File + +SOURCE=..\src\ce.c +# End Source File +# Begin Source File + +SOURCE=..\src\cyrillic.c +# End Source File +# Begin Source File + +SOURCE=..\src\getopt.c +# End Source File +# Begin Source File + +SOURCE=..\src\getopt1.c +# End Source File +# Begin Source File + +SOURCE=..\src\tables.c +# End Source File +# Begin Source File + +SOURCE=..\src\util.c +# End Source File +# Begin Source File + +SOURCE=..\src\wrappers.c +# End Source File +# End Group +# Begin Group "Header Files" + +# PROP Default_Filter "h;hpp;hxx;hm;inl" +# Begin Source File + +SOURCE=..\src\config.h +# End Source File +# Begin Source File + +SOURCE=..\src\cyrillic.h +# End Source File +# Begin Source File + +SOURCE=..\src\fe.h +# End Source File +# Begin Source File + +SOURCE=..\src\getopt.h +# End Source File +# Begin Source File + +SOURCE=..\src\util.h +# End Source File +# Begin Source File + +SOURCE=..\src\wrappers.h +# End Source File +# End Group +# Begin Group "Resource Files" + +# PROP Default_Filter "ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe" +# End Group +# End Target +# End Project diff --git a/win32/config.dsp b/win32/config.dsp new file mode 100644 index 0000000..55c69d8 --- /dev/null +++ b/win32/config.dsp @@ -0,0 +1,92 @@ +# Microsoft Developer Studio Project File - Name="config" - Package Owner=<4> +# Microsoft Developer Studio Generated Build File, Format Version 6.00 +# ** DO NOT EDIT ** + +# TARGTYPE "Win32 (x86) Generic Project" 0x010a + +CFG=config - Win32 Debug +!MESSAGE This is not a valid makefile. To build this project using NMAKE, +!MESSAGE use the Export Makefile command and run +!MESSAGE +!MESSAGE NMAKE /f "config.mak". +!MESSAGE +!MESSAGE You can specify a configuration when running NMAKE +!MESSAGE by defining the macro CFG on the command line. For example: +!MESSAGE +!MESSAGE NMAKE /f "config.mak" CFG="config - Win32 Debug" +!MESSAGE +!MESSAGE Possible choices for configuration are: +!MESSAGE +!MESSAGE "config - Win32 Release" (based on "Win32 (x86) Generic Project") +!MESSAGE "config - Win32 Debug" (based on "Win32 (x86) Generic Project") +!MESSAGE + +# Begin Project +# PROP AllowPerConfigDependencies 0 +# PROP Scc_ProjName "" +# PROP Scc_LocalPath "" +MTL=midl.exe + +!IF "$(CFG)" == "config - Win32 Release" + +# PROP BASE Use_MFC 0 +# PROP BASE Use_Debug_Libraries 0 +# PROP BASE Output_Dir "Release" +# PROP BASE Intermediate_Dir "Release" +# PROP BASE Target_Dir "" +# PROP Use_MFC 0 +# PROP Use_Debug_Libraries 0 +# PROP Output_Dir "Release" +# PROP Intermediate_Dir "Release" +# PROP Target_Dir "" + +!ELSEIF "$(CFG)" == "config - Win32 Debug" + +# PROP BASE Use_MFC 0 +# PROP BASE Use_Debug_Libraries 1 +# PROP BASE Output_Dir "Debug" +# PROP BASE Intermediate_Dir "Debug" +# PROP BASE Target_Dir "" +# PROP Use_MFC 0 +# PROP Use_Debug_Libraries 1 +# PROP Output_Dir "Debug" +# PROP Intermediate_Dir "Debug" +# PROP Target_Dir "" + +!ENDIF + +# Begin Target + +# Name "config - Win32 Release" +# Name "config - Win32 Debug" +# Begin Source File + +SOURCE="..\src\config-win32.h" + +!IF "$(CFG)" == "config - Win32 Release" + +# Begin Custom Build +InputDir=\Documents and Settings\vadimp\My Documents\src\fe\src +InputPath="..\src\config-win32.h" + +"$(InputDir)\config.h" : $(SOURCE) "$(INTDIR)" "$(OUTDIR)" + copy $(InputPath) $(InputDir)\config.h + +# End Custom Build + +!ELSEIF "$(CFG)" == "config - Win32 Debug" + +# Begin Custom Build +InputDir=\Documents and Settings\vadimp\My Documents\src\fe\src +InputPath="..\src\config-win32.h" + +"$(InputDir)\config.h" : $(SOURCE) "$(INTDIR)" "$(OUTDIR)" + copy "$(InputPath)" "$(InputDir)\config.h" + +# End Custom Build + +!ENDIF + +# End Source File +# End Target +# End Project diff --git a/win32/fe.dsp b/win32/fe.dsp new file mode 100644 index 0000000..f9e0f59 --- /dev/null +++ b/win32/fe.dsp @@ -0,0 +1,188 @@ +# Microsoft Developer Studio Project File - Name="fe" - Package Owner=<4> +# Microsoft Developer Studio Generated Build File, Format Version 6.00 +# ** DO NOT EDIT ** + +# TARGTYPE "Win32 (x86) Console Application" 0x0103 + +CFG=fe - Win32 Debug +!MESSAGE This is not a valid makefile. To build this project using NMAKE, +!MESSAGE use the Export Makefile command and run +!MESSAGE +!MESSAGE NMAKE /f "fe.mak". +!MESSAGE +!MESSAGE You can specify a configuration when running NMAKE +!MESSAGE by defining the macro CFG on the command line. For example: +!MESSAGE +!MESSAGE NMAKE /f "fe.mak" CFG="fe - Win32 Debug" +!MESSAGE +!MESSAGE Possible choices for configuration are: +!MESSAGE +!MESSAGE "fe - Win32 Release" (based on "Win32 (x86) Console Application") +!MESSAGE "fe - Win32 Debug" (based on "Win32 (x86) Console Application") +!MESSAGE + +# Begin Project +# PROP AllowPerConfigDependencies 0 +# PROP Scc_ProjName "" +# PROP Scc_LocalPath "" +CPP=cl.exe +RSC=rc.exe + +!IF "$(CFG)" == "fe - Win32 Release" + +# PROP BASE Use_MFC 0 +# PROP BASE Use_Debug_Libraries 0 +# PROP BASE Output_Dir "Release" +# PROP BASE Intermediate_Dir "Release" +# PROP BASE Target_Dir "" +# PROP Use_MFC 0 +# PROP Use_Debug_Libraries 0 +# PROP Output_Dir "Release" +# PROP Intermediate_Dir "Release" +# PROP Target_Dir "" +# ADD BASE CPP /nologo /W3 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c +# ADD CPP /nologo /W3 /GX /O2 /I "..\src" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /c +# ADD BASE RSC /l 0x409 /d "NDEBUG" +# ADD RSC /l 0x409 /d "NDEBUG" +BSC32=bscmake.exe +# ADD BASE BSC32 /nologo +# ADD BSC32 /nologo +LINK32=link.exe +# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386 +# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /machine:I386 + +!ELSEIF "$(CFG)" == "fe - Win32 Debug" + +# PROP BASE Use_MFC 0 +# PROP BASE Use_Debug_Libraries 1 +# PROP BASE Output_Dir "Debug" +# PROP BASE Intermediate_Dir "Debug" +# PROP BASE Target_Dir "" +# PROP Use_MFC 0 +# PROP Use_Debug_Libraries 1 +# PROP Output_Dir "Debug" +# PROP Intermediate_Dir "Debug" +# PROP Target_Dir "" +# ADD BASE CPP /nologo /W3 /Gm /GX /ZI /Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c +# ADD CPP /nologo /W3 /Gm /GX /ZI /Od /I "..\src" /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /YX /FD /GZ /c +# ADD BASE RSC /l 0x409 /d "_DEBUG" +# ADD RSC /l 0x409 /d "_DEBUG" +BSC32=bscmake.exe +# ADD BASE BSC32 /nologo +# ADD BSC32 /nologo +LINK32=link.exe +# ADD BASE LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept +# ADD LINK32 kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /nologo /subsystem:console /debug /machine:I386 /pdbtype:sept + +!ENDIF + +# Begin Target + +# Name "fe - Win32 Release" +# Name "fe - Win32 Debug" +# Begin Group "Source Files" + +# PROP Default_Filter "cpp;c;cxx;rc;def;r;odl;idl;hpj;bat" +# Begin Source File + +SOURCE=..\src\avl.c +# End Source File +# Begin Source File + +SOURCE=..\src\avl_low.c +# End Source File +# Begin Source File + +SOURCE=..\src\context.c +# End Source File +# Begin Source File + +SOURCE=..\src\cyrillic.c +# End Source File +# Begin Source File + +SOURCE=..\src\fe.c +# End Source File +# Begin Source File + +SOURCE=..\src\getopt.c +# End Source File +# Begin Source File + +SOURCE=..\src\getopt1.c +# End Source File +# Begin Source File + +SOURCE=..\src\lingtbl.c +# End Source File +# Begin Source File + +SOURCE=..\src\reader.c +# End Source File +# Begin Source File + +SOURCE=..\src\tables.c +# End Source File +# Begin Source File + +SOURCE=..\src\ui.c +# End Source File +# Begin Source File + +SOURCE=..\src\util.c +# End Source File +# Begin Source File + +SOURCE=..\src\wrappers.c +# End Source File +# End Group +# Begin Group "Header Files" + +# PROP Default_Filter "h;hpp;hxx;hm;inl" +# Begin Source File + +SOURCE=..\src\avl.h +# End Source File +# Begin Source File + +SOURCE=..\src\config.h +# End Source File +# Begin Source File + +SOURCE=..\src\context.h +# End Source File +# Begin Source File + +SOURCE=..\src\cyrillic.h +# End Source File +# Begin Source File + +SOURCE=..\src\fe.h +# End Source File +# Begin Source File + +SOURCE=..\src\getopt.h +# End Source File +# Begin Source File + +SOURCE=..\src\reader.h +# End Source File +# Begin Source File + +SOURCE=..\src\ui.h +# End Source File +# Begin Source File + +SOURCE=..\src\util.h +# End Source File +# Begin Source File + +SOURCE=..\src\wrappers.h +# End Source File +# End Group +# Begin Group "Resource Files" + +# PROP Default_Filter "ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe" +# End Group +# End Target +# End Project diff --git a/win32/fe.dsw b/win32/fe.dsw new file mode 100644 index 0000000..0a982b3 --- /dev/null +++ b/win32/fe.dsw @@ -0,0 +1,77 @@ +Microsoft Developer Studio Workspace File, Format Version 6.00 +# WARNING: DO NOT EDIT OR DELETE THIS WORKSPACE FILE! + +############################################################################### + +Project: "ce"=.\ce.dsp - Package Owner=<4> + +Package=<5> +{{{ +}}} + +Package=<4> +{{{ + Begin Project Dependency + Project_Dep_Name config + End Project Dependency +}}} + +############################################################################### + +Project: "config"=.\config.dsp - Package Owner=<4> + +Package=<5> +{{{ +}}} + +Package=<4> +{{{ +}}} + +############################################################################### + +Project: "fe"=.\fe.dsp - Package Owner=<4> + +Package=<5> +{{{ +}}} + +Package=<4> +{{{ + Begin Project Dependency + Project_Dep_Name config + End Project Dependency + Begin Project Dependency + Project_Dep_Name recode + End Project Dependency +}}} + +############################################################################### + +Project: "recode"=.\recode.dsp - Package Owner=<4> + +Package=<5> +{{{ +}}} + +Package=<4> +{{{ + Begin Project Dependency + Project_Dep_Name ce + End Project Dependency +}}} + +############################################################################### + +Global: + +Package=<5> +{{{ +}}} + +Package=<3> +{{{ +}}} + +############################################################################### + diff --git a/win32/readme b/win32/readme new file mode 100644 index 0000000..2f23f90 --- /dev/null +++ b/win32/readme @@ -0,0 +1,74 @@ +This file describes how to build Fresh Eye on a Win32 platform. + +In general, there are two ways of building Fresh Eye on a Win32 platform +that we support at the time of this writing: +- Using Cygwin. +- Using Microsoft Developer Studio. + +0. Building Fresh Eye using Cygwin + +Pre-requisites: + Latest distribution of Cygwin. Cygwin is freely available at + the following address: http://www.cygwin.com/ + +Build process on Cygwin does not differ much from any other UNIX-like +platform, as described in the INSTALL file. A typical installation sequence +would be: + ./configure --enable-encoding=cp866 + make + make install + +However, there are some Cygwin-specific details. Executables built using +Cygwin depend on cygwin1.dll, a Cygwin POSIX layer. Besides adding a +dependency affecting distribution process (one must supply cygwin1.dll +together with every Cygwin-made executable), binaries that use POSIX +layer run considerably slower, which is not acceptable for Fresh Eye. In +addition, executables produced using Cygwin are covered by the GPL. +While this fact has no influence on Fresh Eye, which is distributed +under GPL itself, it becomes important in a proprietary environment. + +Fresh Eye addresses dependency on Cygwin POSIX layer by using a special +GCC option `-mno-cygwin', which disables POSIX layer and creates object +files that depend on Microsoft C Runtime Library distributed with Windows. +To enable this option, specify --without-cygwin while invoking `configure': + + ./configure --without-cygwin --enable-encoding=cp866 + +and then proceed as usual. Since Cygwin does not detect debug version of +Microsoft C Runtime Library, configuring with `--without-cygwin' +automatically disables debugging and profiling, even if they were +requested using `--enable-debug' and `--enable-profile'. + +1. Building Fresh Eye using Microsoft Developer Studio + +Pre-requisites: + Microsoft Developer Studio 6.0 with Service Pack 5 applied. + +(The above does not mean that you must apply SP5 to Developer Studio, +it is the version we used to build and test Fresh Eye on Win32.) + +If you wish Fresh Eye to use other encoding than CP-866 by default, you +will need to modify src\config-win32.h: change the value of CYR_CP_DEFAULT +from CYR_CP_866 to a value listed in the first enum in +src\cyrillic.h. Also, do not forget to change the value of CYR_CP_NAME. +Please note, that this will change encoding of the messages issued by +Fresh Eye during run time as well as encoding of its log file. + +To build Fresh Eye, please complete the following steps: + + 0. Open workspace file named fe.dsw using Developer Studio. + 1. Change active configuration to "fe - Win32 Debug" for + building with debugging information or "fe - Win32 Release" + for building without debugging information. + (Build -> Set Active Configuration). + 2. Perform a Build (Build -> Build) + +Executable named fe.exe will appear in win32\Debug for builds with +debugging information or in win32\Release for builds without debugging +information. + +If you think that you discovered a bug, please report it to +Vadim Penzin . Please make sure there are words +'Fresh Eye' in the Subject: line. + +$Id: README,v 1.4 2002/06/29 06:49:25 vadimp Exp $ diff --git a/win32/recode.dsp b/win32/recode.dsp new file mode 100644 index 0000000..92bb070 --- /dev/null +++ b/win32/recode.dsp @@ -0,0 +1,166 @@ +# Microsoft Developer Studio Project File - Name="recode" - Package Owner=<4> +# Microsoft Developer Studio Generated Build File, Format Version 6.00 +# ** DO NOT EDIT ** + +# TARGTYPE "Win32 (x86) Generic Project" 0x010a + +CFG=recode - Win32 Debug +!MESSAGE This is not a valid makefile. To build this project using NMAKE, +!MESSAGE use the Export Makefile command and run +!MESSAGE +!MESSAGE NMAKE /f "recode.mak". +!MESSAGE +!MESSAGE You can specify a configuration when running NMAKE +!MESSAGE by defining the macro CFG on the command line. For example: +!MESSAGE +!MESSAGE NMAKE /f "recode.mak" CFG="recode - Win32 Debug" +!MESSAGE +!MESSAGE Possible choices for configuration are: +!MESSAGE +!MESSAGE "recode - Win32 Release" (based on "Win32 (x86) Generic Project") +!MESSAGE "recode - Win32 Debug" (based on "Win32 (x86) Generic Project") +!MESSAGE + +# Begin Project +# PROP AllowPerConfigDependencies 0 +# PROP Scc_ProjName "" +# PROP Scc_LocalPath "" +MTL=midl.exe + +!IF "$(CFG)" == "recode - Win32 Release" + +# PROP BASE Use_MFC 0 +# PROP BASE Use_Debug_Libraries 0 +# PROP BASE Output_Dir "Release" +# PROP BASE Intermediate_Dir "Release" +# PROP BASE Target_Dir "" +# PROP Use_MFC 0 +# PROP Use_Debug_Libraries 0 +# PROP Output_Dir "Release" +# PROP Intermediate_Dir "Release" +# PROP Target_Dir "" + +!ELSEIF "$(CFG)" == "recode - Win32 Debug" + +# PROP BASE Use_MFC 0 +# PROP BASE Use_Debug_Libraries 1 +# PROP BASE Output_Dir "recode___Win32_Debug" +# PROP BASE Intermediate_Dir "recode___Win32_Debug" +# PROP BASE Target_Dir "" +# PROP Use_MFC 0 +# PROP Use_Debug_Libraries 1 +# PROP Output_Dir "Debug" +# PROP Intermediate_Dir "Debug" +# PROP Target_Dir "" + +!ENDIF + +# Begin Target + +# Name "recode - Win32 Release" +# Name "recode - Win32 Debug" +# Begin Group "Source files" + +# PROP Default_Filter ".koi8-r" +# Begin Source File + +SOURCE="..\src\fe.koi8-r" + +!IF "$(CFG)" == "recode - Win32 Release" + +# Begin Custom Build +InputDir=\cygwin\home\vadimp\src\fe\src +OutDir=.\Release +InputPath=..\src\fe.koi8-r +InputName=fe + +"$(InputDir)\$(InputName).c" : $(SOURCE) "$(INTDIR)" "$(OUTDIR)" + "$(OutDir)\ce" -i koi8-r < "$(InputDir)\$(InputName).koi8-r" > "$(InputDir)\$(InputName)".c + +# End Custom Build + +!ELSEIF "$(CFG)" == "recode - Win32 Debug" + +# Begin Custom Build +InputDir=\cygwin\home\vadimp\src\fe\src +OutDir=.\Debug +InputPath=..\src\fe.koi8-r +InputName=fe + +"$(InputDir)\$(InputName).c" : $(SOURCE) "$(INTDIR)" "$(OUTDIR)" + "$(OutDir)\ce" -i koi8-r < "$(InputDir)\$(InputName).koi8-r" > "$(InputDir)\$(InputName)".c + +# End Custom Build + +!ENDIF + +# End Source File +# Begin Source File + +SOURCE="..\src\lingtbl.koi8-r" + +!IF "$(CFG)" == "recode - Win32 Release" + +# Begin Custom Build +InputDir=\cygwin\home\vadimp\src\fe\src +OutDir=.\Release +InputPath=..\src\lingtbl.koi8-r +InputName=lingtbl + +"$(InputDir)\$(InputName).c" : $(SOURCE) "$(INTDIR)" "$(OUTDIR)" + "$(OutDir)\ce" -i koi8-r < "$(InputDir)\$(InputName).koi8-r" > "$(InputDir)\$(InputName)".c + +# End Custom Build + +!ELSEIF "$(CFG)" == "recode - Win32 Debug" + +# Begin Custom Build +InputDir=\cygwin\home\vadimp\src\fe\src +OutDir=.\Debug +InputPath=..\src\lingtbl.koi8-r +InputName=lingtbl + +"$(InputDir)\$(InputName).c" : $(SOURCE) "$(INTDIR)" "$(OUTDIR)" + "$(OutDir)\ce" -i koi8-r < "$(InputDir)\$(InputName).koi8-r" > "$(InputDir)\$(InputName)".c + +# End Custom Build + +!ENDIF + +# End Source File +# Begin Source File + +SOURCE="..\src\ui.koi8-r" + +!IF "$(CFG)" == "recode - Win32 Release" + +# Begin Custom Build +InputDir=\cygwin\home\vadimp\src\fe\src +OutDir=.\Release +InputPath=..\src\ui.koi8-r +InputName=ui + +"$(InputDir)\$(InputName).c" : $(SOURCE) "$(INTDIR)" "$(OUTDIR)" + "$(OutDir)\ce" -i koi8-r < "$(InputDir)\$(InputName).koi8-r" > "$(InputDir)\$(InputName)".c + +# End Custom Build + +!ELSEIF "$(CFG)" == "recode - Win32 Debug" + +# Begin Custom Build +InputDir=\cygwin\home\vadimp\src\fe\src +OutDir=.\Debug +InputPath=..\src\ui.koi8-r +InputName=ui + +"$(InputDir)\$(InputName).c" : $(SOURCE) "$(INTDIR)" "$(OUTDIR)" + "$(OutDir)\ce" -i koi8-r < "$(InputDir)\$(InputName).koi8-r" > "$(InputDir)\$(InputName)".c + +# End Custom Build + +!ENDIF + +# End Source File +# End Group +# End Target +# End Project