Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing text2tif on Windows with Cygwin #2

Open
amitdo opened this issue Mar 20, 2016 · 18 comments
Open

Testing text2tif on Windows with Cygwin #2

amitdo opened this issue Mar 20, 2016 · 18 comments
Labels

Comments

@amitdo
Copy link
Owner

amitdo commented Mar 20, 2016

Someone needs to test it...

@amitdo
Copy link
Owner Author

amitdo commented Mar 21, 2016

Needed to build leptonica and also download development packages for pango, cairo etc with c++ bindings

The dependencies are the same as Tesseract's training tools (Tesseract itself is not needed).

When you provide an output, please mark the output blocks with the mouse/keyboard and then press the 'insert code' button above the comment's text editing area.

@Shreeshrii
Copy link
Contributor

Thanks Amit for the tip regarding 'insert code'.

There is one error.

In file included from /usr/include/stdlib.h:11:0,
                 from ./training/pango_font_info.cpp:30:
/usr/include/string.h:76:7: error: conflicting declaration of ‘char* strcasestr(const char*, const char*)’ with ‘C’ linkage
 char *_EXFUN(strcasestr,(const char *, const char *));
       ^

@Shreeshrii
Copy link
Contributor

Compiled ok.

ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif
Text file missing!
!FLAGS_text.empty():Error:Assert failed:in file ./training/text2image.cpp, line 427
Segmentation fault (core dumped)

@amitdo
Copy link
Owner Author

amitdo commented Mar 22, 2016

I pushed a new commit, please check that it did not break anything.

@Shreeshrii
Copy link
Contributor

compiled ok.

Please see Issue #5

ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --list_available_fonts

(process:8744): Pango-CRITICAL **: pango_font_description_set_size: assertion 'size >= 0' failed
  0: 8514fix

(process:8744): Pango-CRITICAL **: pango_font_description_set_size: assertion 'size >= 0' failed
  1: 8514fix Bold

It does list the fonts, but with the pango messages coming in between also.

@amitdo
Copy link
Owner Author

amitdo commented Mar 22, 2016

  1. Did these messages appear with the previous commit?
  2. Do these messages appear with Tesseract ?

@Shreeshrii
Copy link
Contributor

  1. I had not tested this with previous commit. if you let me know the commands to roll back and compile again, I can test that.
  2. Tested with text2image (tesseract) just now. Yes, these messages appear in it too. Both text2image and text2tif show same number of fonts.

I think I had installed pango debug info also on cygwin - possibly that is giving extra info.

@Shreeshrii
Copy link
Contributor

ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --list_available_fonts
FcInitiReinitialize failed!!
Segmentation fault (core dumped)

@Shreeshrii
Copy link
Contributor

$ ./text2tif --list_available_fonts --fonts_dir=

fonts-list.txt

@amitdo
Copy link
Owner Author

amitdo commented Mar 22, 2016

Because these messages also appear when you run Tesseract, retesting the previous commit is not needed.

@Shreeshrii
Copy link
Contributor

ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/ara/ara.training_text --font FreeSerif --outputbase ara.FreeSerif.exp0
Could not find font named FreeSerif. Pango suggested font DejaVu Serif
Please correct --font arg.:Error:Assert failed:in file ./training/text2image.cpp, line 437
Segmentation fault (core dumped)

@Shreeshrii
Copy link
Contributor

works if all info is given correctly

ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/san/san.training_text --font Kokila  --outputbase san.Kokila.exp0
Rendered page 0 to file san.Kokila.exp0.tif
Rendered page 1 to file san.Kokila.exp0.tif
Rendered page 2 to file san.Kokila.exp0.tif
Rendered page 3 to file san.Kokila.exp0.tif
Rendered page 4 to file san.Kokila.exp0.tif
Rendered page 5 to file san.Kokila.exp0.tif
Rendered page 6 to file san.Kokila.exp0.tif
Rendered page 7 to file san.Kokila.exp0.tif
Rendered page 8 to file san.Kokila.exp0.tif
Rendered page 9 to file san.Kokila.exp0.tif
Rtl = 0 ,vertical=0

ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/eng/eng.training_text --font Arial --outputbase eng.Arial.exp0
Rendered page 0 to file eng.Arial.exp0.tif
Rendered page 1 to file eng.Arial.exp0.tif
Rtl = 0 ,vertical=0

ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/ara/ara.training_text --font Arial  --outputbase ara.Arial.exp0
Rendered page 0 to file ara.Arial.exp0.tif
Rendered page 1 to file ara.Arial.exp0.tif
Rtl = 1 ,vertical=0

@Shreeshrii
Copy link
Contributor

ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fontconfig_refresh_cache
Text file missing!
!FLAGS_text.empty():Error:Assert failed:in file ./training/text2image.cpp, line 427
Segmentation fault (core dumped)

ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/san/san.training_text --fontconfig_refresh_cache
Output file missing!
!FLAGS_outputbase.empty():Error:Assert failed:in file ./training/text2image.cpp, line 428
Segmentation fault (core dumped)

@Shreeshrii
Copy link
Contributor

Please see Issue #6

@Shreeshrii
Copy link
Contributor

ra@Shree ~/tesseract-ocr/text2tif
$ ./text2tif --fonts_dir= --text ../langdata/san/san.training_text --fontconfig_refresh_cache --outputbase san.Kokila.exp0
Stripped 2226 unrenderable words
Error in boxaGetExtent: boxa not defined
Error in boxaAddBox: box not defined
Rendered page 0 to file san.Kokila.exp0.tif
Stripped 2148 unrenderable words
Error in boxaGetExtent: boxa not defined
Error in boxaAddBox: box not defined
Rendered page 1 to file san.Kokila.exp0.tif
Stripped 2173 unrenderable words
Error in boxaGetExtent: boxa not defined
Error in boxaAddBox: box not defined
Rendered page 2 to file san.Kokila.exp0.tif
Stripped 1844 unrenderable words
Rendered page 3 to file san.Kokila.exp0.tif
Stripped 2603 unrenderable words
Rendered page 4 to file san.Kokila.exp0.tif
Stripped 1760 unrenderable words
Error in boxaGetExtent: boxa not defined
Error in boxaAddBox: box not defined
Rendered page 5 to file san.Kokila.exp0.tif
Rtl = 0 ,vertical=0

If font is not specified, default font Arial is used. If it does not have coverage for the script then the tif file will be blank.

@Shreeshrii
Copy link
Contributor

To help find available fonts for a particular script/language eg. ta for Tamil

$ fc-list :lang=ta -f "%{file}\n%{family}\n%{style}\n\n"

/usr/share/fonts/win-fonts/Nirmala.ttf
Nirmala UI
Regular,Normal,obyčejné,Standard,Κανονικά,Normaali,Normál,Normale,Standaard,Normalny,Обычный,Normálne,Navadno,Arrunta

/usr/share/fonts/unifont/unifont.ttf
Unifont
Medium

/usr/share/fonts/win-fonts/NirmalaS.ttf
Nirmala UI,Nirmala UI Semilight
Semilight,Normal,obyčejné,Standard,Κανονικά,Regular,Normaali,Normál,Normale,Standaard,Normalny,Обычный,Normálne,Navadno,Arrunta

/usr/share/fonts/win-fonts/NirmalaB.ttf
Nirmala UI
Bold,Negreta,tučné,fed,Fett,Έντονα,Negrita,Lihavoitu,Gras,Félkövér,Grassetto,Vet,Halvfet,Pogrubiony,Negrito,Полужирный,Fet,Kalın,Krepko,Lodia

/usr/share/fonts/lohit-tamil/Lohit-Tamil.ttf
Lohit Tamil
Regular

/usr/share/fonts/lohit-tamil-classical/Lohit-Tamil-Classical.ttf
Lohit Tamil Classical
Regular

@amitdo
Copy link
Owner Author

amitdo commented Mar 23, 2016

fc-list :lang=en -f "%{family[0]} %{style[0]}\n" | sort -u > en-fonts-list

@Shreeshrii
Copy link
Contributor

We cannot use ALL fonts for a particular language as some of them may not have correct rendering, specially for devanagari etc.

However such a list can be useful for fixing the language specific.sh file to only list available fonts.

Sample of incorrect rendering for devanagari:

san exp-1 unifont_medium

@amitdo amitdo added the Cygwin label Mar 31, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants