Merge branch 'release/alpha/master'

coderarjob · Mar 24, 2022 · a93f7c3 · a93f7c3
2 parents f4de79b + 1265df3
commit a93f7c3
Show file tree

Hide file tree

Showing 22 changed files with 1,336 additions and 664 deletions.
diff --git a/Manifest.txt b/Manifest.txt
@@ -0,0 +1,3 @@
+Main-Class: coderarjob.kpdfsync.poc.Main
+Class-Path: pdfclown.jar
+
diff --git a/README.md b/README.md
@@ -1,16 +1,15 @@
 ## About kpdfsync
 
-![Screenshot](/docs/images/screenshot.png)
+![Screenshot](/docs/images/screenshot_alpha.png)
 
-If you use Kindle to read PDF books or documents, you might have seen that the highlights or notes
-made on the Kindle are not saved on the PDF file itself. This means, if you take the file and
-read on another device, you will not see those highlights and notes make on the Kindle.
+If you use Kindle to read PDF books or documents, you might have seen that the highlights and notes
+made on the Kindle are not saved on the PDF file itself. This means, that if you take the PDF file
+from your Kindle and read on another device, you will not see those highlights and notes there.
 
-This software tries to provide a solution. The basis on which this solution stands is the
-Clippings.txt file on your Kindle.
-
-This is the file, where Kindle saves the page numbers of all the highlights and notes done on the
-Kindle device.
+This software tries to provide a solution. The basis is the Clippings.txt file on your Kindle.
+Kindle saves the page numbers and content of the highlights and notes in the text file. So in
+theory, one can read the Clippings file and reapply the highlights and notes on the PDF separately.
+This software automates the process.
 
 Currently it is in development, so not all the features work or even present. Here is the rough
 roadmap.
@@ -20,11 +19,14 @@ roadmap.
 - [X] Parsing the Clippings.txt file
 - [X] Search for the highlighted text in a page of the PDF file.
 - [X] Annotate highlight and notes on the PDF file.
-- [X] Graphical User Interface testing.
-- [X] Comments to notes mapping. This is required, because the clippings text file does not provide
-  information which can used to determine which comments are related to which note on a single
-  page.
-- [ ] Debug loggings
+- [X] Graphical User Interface (GUI) testing.
+- [X] Highlights to notes mapping. This is required, because the clippings text file does not
+  provide information which can used to determine which notes are related to which highlight on a
+  single page. Some cases where a page contains a single note and highlight, automatic pairs are
+  created, however in cases where there are more than one note, these associations can be created
+  manually by the user.
+- [X] GUI finalizing for the Alpha release.
+- [X] Debug loggings
 - [ ] **Alpha Release**
 
 ----
@@ -34,6 +36,10 @@ roadmap.
 - [ ] Finalizing and optimizing the Graphical User Interface.
 - [ ] **Beta Release**
 
+## Requirements
+- JRE 1.8 or higher
+- Linux, Mac, Windows
+
 ## 3rd-party License
 
 * PDF Clown library is used to read and highlight on PDF files. PDF Clown library is covered under

diff --git a/TODO b/TODO
@@ -1,36 +1,46 @@
 Kpdfsync                                                                               THINGS-TO-DO
 ---------------------------------------------------------------------------------------------------
 
+# Alpha Release
 # TASKS                                                                 Estimated       Actual
-[X] String comparison algorithm, that can analyze the degree of match. 
+[X] String comparison algorithm, that can analyze the degree of match.
     So that minor differences between the pattern and the read text 
     from pdf files are handled.
-
 [X] Use PDFClown library to highlight the text which matches the most
     with the highlighted text from My Clippings file.
-
 [X] Parse the 'My Clippings.txt' file.
-
 [-] Gui POC
+[X] Manual and Automatic creation of association between highlights.
+    and notes.
+[ ] Use grid layout for displaying and creating page mappings.
+    (Not done, in favor of below)
+[X] Use custom renderer in list box to show highlight nore mappings.
+[X] A separate dialog window for selection of notes for a highlight.
+[X] Loging
 
-[ ] Finalize GUI
-
+# Beta Release
+# TASKS                                                                 Estimated       Actual
 [ ] Optimization and cleanup objects.
-
+[ ] Lib - Use Iterator instead of Enumeration. (Not sure)
+[ ] GUI - Status bar showing last error or success message.
+[ ] Lib - parseLine function can be protected. It is public now.
+[ ] Lib - matching Bom bytes can be put inside a method in the
+          ByteOrderMarkTypes enum. It is now separe in
+          ByteOrderMark file.
 # BUGS:
-[ ] The string matching algo is too simple, and gives wrong match 
-    percentage, if the strings being compared differ in the number 
-    of non-whitespace characters. The two indexes get out of sync 
-    at the first mismatch and never recover. 
+[ ] The string matching algo is too simple, and gives wrong match
+    percentage, if the strings being compared differ in the number
+    of non-whitespace characters. The two indexes get out of sync
+    at the first mismatch and never recover.
     Example:
     PDF text      = 123 56 789
     Clipping text = 123 456 789
     % match       = 3/8          (Wrong)
     % match       = 7/8          (What is expected)
 
-[ ] Related to the above bug, we are highlighting more characters - 
-    by that many characters as the diffence in the number of 
-    characters, between the text read from the PDF and the pattern 
+[ ] Related to the above bug, we are highlighting more characters -
+    by that many characters as the diffence in the number of
+    characters, between the text read from the PDF and the pattern
     read from the clippings file.
     The algorithm matches character by character, the pattern and the
     text from the pdf. The matching and thus the highlighting is as 
@@ -46,7 +56,37 @@ Kpdfsync
                                   highlighting)
 
 [ ] For some PDF files, org.pdfclown.tools.TextExtractor.extract() is returning null.
-    This is seen with the Concrete Mathematics original PDF file.
+    This is seen with the Concrete Mathematics original PDF file. May be a TrueType font issue.
+    Here is the stack trace:
+    java.lang.NullPointerException
+            at java.base/java.util.Hashtable.put(Hashtable.java:476)
+            at org.pdfclown.documents.contents.fonts.PfbParser.parse(PfbParser.java:99)
+            at org.pdfclown.documents.contents.fonts.Type1Font.getNativeEncoding(Type1Font.java:96)
+            at org.pdfclown.documents.contents.fonts.Type1Font.loadEncoding(Type1Font.java:141)
+            at org.pdfclown.documents.contents.fonts.SimpleFont.onLoad(SimpleFont.java:118)
+            at org.pdfclown.documents.contents.fonts.Font.load(Font.java:738)
+            at org.pdfclown.documents.contents.fonts.Font.<init>(Font.java:351)
+            at org.pdfclown.documents.contents.fonts.SimpleFont.<init>(SimpleFont.java:62)
+            at org.pdfclown.documents.contents.fonts.Type1Font.<init>(Type1Font.java:75)
+            at org.pdfclown.documents.contents.fonts.Font.wrap(Font.java:249)
+            at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:72)
+            at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:1)
+            at org.pdfclown.documents.contents.ResourceItems.get(ResourceItems.java:119)
+            at org.pdfclown.documents.contents.objects.SetFont.getResource(SetFont.java:119)
+            at org.pdfclown.documents.contents.objects.SetFont.getFont(SetFont.java:83)
+            at org.pdfclown.documents.contents.objects.SetFont.scan(SetFont.java:97)
+            at org.pdfclown.documents.contents.ContentScanner.moveNext(ContentScanner.java:1330)
+            at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:811)
+            at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:817)
+            at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:777)
+            at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:770)
+            at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.get(ContentScanner.java:690)
+            at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.access$0(ContentScanner.java:682)
+            at org.pdfclown.documents.contents.ContentScanner.getCurrentWrapper(ContentScanner.java:1154)
+            at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:633)
+            at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:296)
+            at coderarjob.kpdfsync.lib.annotator.PdfAnnotatorV1.highlight(PdfAnnotatorV1.java:62)
+            at coderarjob.kpdfsync.poc.MainFrame$2.run(MainFrame.java:172)
 
 [ ] Highlight is not visible on the output PDF file. This was seen on the Concrete Mathematics
     cropped PDF file.
@@ -62,3 +102,74 @@ Kpdfsync
     5. Begin highlighting.
 
     The times, this exception occures, it occures around the 73% mark.
+
+[ ] EOFException at org.pdfclown.tools.TextExtractor.extract() method. This is seen on
+    'the_evolution_of_operating_system_cropped.pdf' file. Could also be a font issue.
+    Here is the stack trace
+java.lang.RuntimeException: java.io.EOFException
+        at org.pdfclown.documents.contents.fonts.CffParser.load(CffParser.java:703)
+        at org.pdfclown.documents.contents.fonts.CffParser.<init>(CffParser.java:640)
+        at org.pdfclown.documents.contents.fonts.Type1Font.getNativeEncoding(Type1Font.java:104)
+        at org.pdfclown.documents.contents.fonts.Type1Font.loadEncoding(Type1Font.java:151)
+        at org.pdfclown.documents.contents.fonts.SimpleFont.onLoad(SimpleFont.java:118)
+        at org.pdfclown.documents.contents.fonts.Font.load(Font.java:738)
+        at org.pdfclown.documents.contents.fonts.Font.<init>(Font.java:351)
+        at org.pdfclown.documents.contents.fonts.SimpleFont.<init>(SimpleFont.java:62)
+        at org.pdfclown.documents.contents.fonts.Type1Font.<init>(Type1Font.java:75)
+        at org.pdfclown.documents.contents.fonts.Font.wrap(Font.java:249)
+        at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:72)
+        at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:1)
+        at org.pdfclown.documents.contents.ResourceItems.get(ResourceItems.java:119)
+        at org.pdfclown.documents.contents.objects.SetFont.getResource(SetFont.java:119)
+        at org.pdfclown.documents.contents.objects.SetFont.getFont(SetFont.java:83)
+        at org.pdfclown.documents.contents.objects.SetFont.scan(SetFont.java:97)
+        at org.pdfclown.documents.contents.ContentScanner.moveNext(ContentScanner.java:1330)
+        at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:811)
+        at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:777)
+        at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:770)
+        at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.get(ContentScanner.java:690)
+        at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.access$0(ContentScanner.java:682)
+        at org.pdfclown.documents.contents.ContentScanner.getCurrentWrapper(ContentScanner.java:1154)
+        at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:633)
+        at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:296)
+        at coderarjob.kpdfsync.lib.annotator.PdfAnnotatorV1.highlight(PdfAnnotatorV1.java:62)
+        at coderarjob.kpdfsync.poc.MainFrame$2.run(MainFrame.java:172)
+        at java.base/java.lang.Thread.run(Thread.java:833)
+Caused by: java.io.EOFException
+        at org.pdfclown.bytes.Buffer.readUnsignedShort(Buffer.java:511)
+        at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:306)
+        at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:324)
+        at org.pdfclown.documents.contents.fonts.CffParser.load(CffParser.java:669)
+        ... 27 more
+:: Cause #1
+java.io.EOFException
+        at org.pdfclown.bytes.Buffer.readUnsignedShort(Buffer.java:511)
+        at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:306)
+        at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:324)
+        at org.pdfclown.documents.contents.fonts.CffParser.load(CffParser.java:669)
+        at org.pdfclown.documents.contents.fonts.CffParser.<init>(CffParser.java:640)
+        at org.pdfclown.documents.contents.fonts.Type1Font.getNativeEncoding(Type1Font.java:104)
+        at org.pdfclown.documents.contents.fonts.Type1Font.loadEncoding(Type1Font.java:151)
+        at org.pdfclown.documents.contents.fonts.SimpleFont.onLoad(SimpleFont.java:118)
+        at org.pdfclown.documents.contents.fonts.Font.load(Font.java:738)
+        at org.pdfclown.documents.contents.fonts.Font.<init>(Font.java:351)
+        at org.pdfclown.documents.contents.fonts.SimpleFont.<init>(SimpleFont.java:62)
+        at org.pdfclown.documents.contents.fonts.Type1Font.<init>(Type1Font.java:75)
+        at org.pdfclown.documents.contents.fonts.Font.wrap(Font.java:249)
+        at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:72)
+        at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:1)
+        at org.pdfclown.documents.contents.ResourceItems.get(ResourceItems.java:119)
+        at org.pdfclown.documents.contents.objects.SetFont.getResource(SetFont.java:119)
+        at org.pdfclown.documents.contents.objects.SetFont.getFont(SetFont.java:83)
+        at org.pdfclown.documents.contents.objects.SetFont.scan(SetFont.java:97)
+        at org.pdfclown.documents.contents.ContentScanner.moveNext(ContentScanner.java:1330)
+        at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:811)
+        at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:777)
+        at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:770)
+        at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.get(ContentScanner.java:690)
+        at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.access$0(ContentScanner.java:682)
+        at org.pdfclown.documents.contents.ContentScanner.getCurrentWrapper(ContentScanner.java:1154)
+        at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:633)
+        at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:296)
+        at coderarjob.kpdfsync.lib.annotator.PdfAnnotatorV1.highlight(PdfAnnotatorV1.java:62)
+        at coderarjob.kpdfsync.poc.MainFrame$2.run(MainFrame.java:172)
diff --git a/build.sh b/build.sh
@@ -10,23 +10,34 @@ find src -name "*.java" -exec sed -i s/\ \*$//g {} \; || exit
 
 export CLASSPATH="lib/pdfclown.jar:$BIN_DIR"
 
+JDK_VER_TARGET=8
+
 # Build AJL
-javac -Xlint -d "$BIN_DIR/" src/coderarjob/ajl/file/*.java || exit
+javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
+      src/coderarjob/ajl/file/*.java || exit
 
 # Build Pattern Matcher
-javac -Xlint -d "$BIN_DIR/" src/coderarjob/kpdfsync/lib/pm/*.java || exit
+javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
+      src/coderarjob/kpdfsync/lib/pm/*.java || exit
 
 # Build Annotator
-javac -Xlint -d "$BIN_DIR/" src/coderarjob/kpdfsync/lib/annotator/*.java || exit
+javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
+      src/coderarjob/kpdfsync/lib/annotator/*.java || exit
 
 # Build Kindle Clippings File Parser
-javac -Xlint -d "$BIN_DIR/" src/coderarjob/kpdfsync/lib/clipparser/*.java || exit
+javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
+      src/coderarjob/kpdfsync/lib/clipparser/*.java || exit
 
 # Build kpdfsync library
-javac -Xlint -d "$BIN_DIR/" src/coderarjob/kpdfsync/lib/*.java || exit
+javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
+      src/coderarjob/kpdfsync/lib/*.java || exit
 
 # Build POC
-javac -Xlint -d "$BIN_DIR/" src/coderarjob/kpdfsync/poc/*.java || exit
+javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
+      src/coderarjob/kpdfsync/poc/*.java || exit
+
+# Copy resources
+cp -r src/coderarjob/kpdfsync/poc/res $BIN_DIR/coderarjob/kpdfsync/poc || exit
 
 # Generate tags file
 ctags --recurse ./src || exit

diff --git a/docs/images/screenshot_alpha.png b/docs/images/screenshot_alpha.png
diff --git a/pack.sh b/pack.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+echo :: Building packages
+
+rm -rf dist
+mkdir dist
+
+pushd build/classes
+
+# Create testplib.jar
+# BasicParser is what that can change. So it is packaged separately.
+jar cfm kpdfsync.jar ../../Manifest.txt \
+                     coderarjob 
+popd
+
+mv build/classes/kpdfsync.jar ./dist/
+cp lib/pdfclown.jar ./dist/
+
+
+echo :: Building packages completed
+
diff --git a/src/coderarjob/kpdfsync/poc/HighlightNotePairListRenderer.java b/src/coderarjob/kpdfsync/poc/HighlightNotePairListRenderer.java
@@ -0,0 +1,51 @@
+package coderarjob.kpdfsync.poc;
+
+import javax.swing.*;
+import java.awt.Component;
+import java.awt.BorderLayout;
+import java.awt.Color;
+
+public class HighlightNotePairListRenderer extends JPanel implements ListCellRenderer<HighlightNotePair>
+{
+
+  private JLabel highlightLabel;
+  private JLabel pairedNoteLabel;
+  private Color alternateColor;
+
+  public HighlightNotePairListRenderer()
+  {
+    this.setLayout (new BorderLayout());
+
+    highlightLabel = new JLabel();
+    String iconResourceName = "/coderarjob/kpdfsync/poc/res/highlighter.png";
+    highlightLabel.setIcon (new ImageIcon (getClass().getResource (iconResourceName)));
+    highlightLabel.setForeground (Color.BLACK);
+    highlightLabel.setOpaque (false);
+    this.add (highlightLabel, BorderLayout.PAGE_START);
+
+    pairedNoteLabel = new JLabel();
+    pairedNoteLabel.setForeground (Color.DARK_GRAY);
+    pairedNoteLabel.setOpaque (false);
+    this.add (pairedNoteLabel, BorderLayout.PAGE_END);
+
+    alternateColor = new Color (237, 244, 249);
+    this.setBorder (BorderFactory.createEmptyBorder (5, 2, 5, 0));
+  }
+
+  public Component getListCellRendererComponent(JList<? extends HighlightNotePair> list,
+                                                HighlightNotePair value, int index,
+                                                boolean isSelected, boolean cellHasFocus)
+  {
+    highlightLabel.setText (value.getHighlightText());
+    pairedNoteLabel.setText (value.getNoteText());
+
+    Color normalBackgroundColor = (index % 2 == 0) ? alternateColor : list.getBackground();
+
+    if (isSelected)
+      this.setBackground (list.getSelectionBackground());
+    else
+      this.setBackground (normalBackgroundColor);
+
+    return this;
+  }
+}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		Main-Class: coderarjob.kpdfsync.poc.Main
		Class-Path: pdfclown.jar