Skip to content

Commit

Permalink
Merge branch 'release/alpha/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
coderarjob committed Mar 24, 2022
2 parents f4de79b + 1265df3 commit a93f7c3
Show file tree
Hide file tree
Showing 22 changed files with 1,336 additions and 664 deletions.
3 changes: 3 additions & 0 deletions Manifest.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Main-Class: coderarjob.kpdfsync.poc.Main
Class-Path: pdfclown.jar

34 changes: 20 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,15 @@
## About kpdfsync

![Screenshot](/docs/images/screenshot.png)
![Screenshot](/docs/images/screenshot_alpha.png)

If you use Kindle to read PDF books or documents, you might have seen that the highlights or notes
made on the Kindle are not saved on the PDF file itself. This means, if you take the file and
read on another device, you will not see those highlights and notes make on the Kindle.
If you use Kindle to read PDF books or documents, you might have seen that the highlights and notes
made on the Kindle are not saved on the PDF file itself. This means, that if you take the PDF file
from your Kindle and read on another device, you will not see those highlights and notes there.

This software tries to provide a solution. The basis on which this solution stands is the
Clippings.txt file on your Kindle.

This is the file, where Kindle saves the page numbers of all the highlights and notes done on the
Kindle device.
This software tries to provide a solution. The basis is the Clippings.txt file on your Kindle.
Kindle saves the page numbers and content of the highlights and notes in the text file. So in
theory, one can read the Clippings file and reapply the highlights and notes on the PDF separately.
This software automates the process.

Currently it is in development, so not all the features work or even present. Here is the rough
roadmap.
Expand All @@ -20,11 +19,14 @@ roadmap.
- [X] Parsing the Clippings.txt file
- [X] Search for the highlighted text in a page of the PDF file.
- [X] Annotate highlight and notes on the PDF file.
- [X] Graphical User Interface testing.
- [X] Comments to notes mapping. This is required, because the clippings text file does not provide
information which can used to determine which comments are related to which note on a single
page.
- [ ] Debug loggings
- [X] Graphical User Interface (GUI) testing.
- [X] Highlights to notes mapping. This is required, because the clippings text file does not
provide information which can used to determine which notes are related to which highlight on a
single page. Some cases where a page contains a single note and highlight, automatic pairs are
created, however in cases where there are more than one note, these associations can be created
manually by the user.
- [X] GUI finalizing for the Alpha release.
- [X] Debug loggings
- [ ] **Alpha Release**

----
Expand All @@ -34,6 +36,10 @@ roadmap.
- [ ] Finalizing and optimizing the Graphical User Interface.
- [ ] **Beta Release**

## Requirements
- JRE 1.8 or higher
- Linux, Mac, Windows

## 3rd-party License

* PDF Clown library is used to read and highlight on PDF files. PDF Clown library is covered under
Expand Down
141 changes: 126 additions & 15 deletions TODO
Original file line number Diff line number Diff line change
@@ -1,36 +1,46 @@
Kpdfsync THINGS-TO-DO
---------------------------------------------------------------------------------------------------

# Alpha Release
# TASKS Estimated Actual
[X] String comparison algorithm, that can analyze the degree of match.
[X] String comparison algorithm, that can analyze the degree of match.
So that minor differences between the pattern and the read text
from pdf files are handled.

[X] Use PDFClown library to highlight the text which matches the most
with the highlighted text from My Clippings file.

[X] Parse the 'My Clippings.txt' file.

[-] Gui POC
[X] Manual and Automatic creation of association between highlights.
and notes.
[ ] Use grid layout for displaying and creating page mappings.
(Not done, in favor of below)
[X] Use custom renderer in list box to show highlight nore mappings.
[X] A separate dialog window for selection of notes for a highlight.
[X] Loging

[ ] Finalize GUI

# Beta Release
# TASKS Estimated Actual
[ ] Optimization and cleanup objects.

[ ] Lib - Use Iterator instead of Enumeration. (Not sure)
[ ] GUI - Status bar showing last error or success message.
[ ] Lib - parseLine function can be protected. It is public now.
[ ] Lib - matching Bom bytes can be put inside a method in the
ByteOrderMarkTypes enum. It is now separe in
ByteOrderMark file.
# BUGS:
[ ] The string matching algo is too simple, and gives wrong match
percentage, if the strings being compared differ in the number
of non-whitespace characters. The two indexes get out of sync
at the first mismatch and never recover.
[ ] The string matching algo is too simple, and gives wrong match
percentage, if the strings being compared differ in the number
of non-whitespace characters. The two indexes get out of sync
at the first mismatch and never recover.
Example:
PDF text = 123 56 789
Clipping text = 123 456 789
% match = 3/8 (Wrong)
% match = 7/8 (What is expected)

[ ] Related to the above bug, we are highlighting more characters -
by that many characters as the diffence in the number of
characters, between the text read from the PDF and the pattern
[ ] Related to the above bug, we are highlighting more characters -
by that many characters as the diffence in the number of
characters, between the text read from the PDF and the pattern
read from the clippings file.
The algorithm matches character by character, the pattern and the
text from the pdf. The matching and thus the highlighting is as
Expand All @@ -46,7 +56,37 @@ Kpdfsync
highlighting)

[ ] For some PDF files, org.pdfclown.tools.TextExtractor.extract() is returning null.
This is seen with the Concrete Mathematics original PDF file.
This is seen with the Concrete Mathematics original PDF file. May be a TrueType font issue.
Here is the stack trace:
java.lang.NullPointerException
at java.base/java.util.Hashtable.put(Hashtable.java:476)
at org.pdfclown.documents.contents.fonts.PfbParser.parse(PfbParser.java:99)
at org.pdfclown.documents.contents.fonts.Type1Font.getNativeEncoding(Type1Font.java:96)
at org.pdfclown.documents.contents.fonts.Type1Font.loadEncoding(Type1Font.java:141)
at org.pdfclown.documents.contents.fonts.SimpleFont.onLoad(SimpleFont.java:118)
at org.pdfclown.documents.contents.fonts.Font.load(Font.java:738)
at org.pdfclown.documents.contents.fonts.Font.<init>(Font.java:351)
at org.pdfclown.documents.contents.fonts.SimpleFont.<init>(SimpleFont.java:62)
at org.pdfclown.documents.contents.fonts.Type1Font.<init>(Type1Font.java:75)
at org.pdfclown.documents.contents.fonts.Font.wrap(Font.java:249)
at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:72)
at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:1)
at org.pdfclown.documents.contents.ResourceItems.get(ResourceItems.java:119)
at org.pdfclown.documents.contents.objects.SetFont.getResource(SetFont.java:119)
at org.pdfclown.documents.contents.objects.SetFont.getFont(SetFont.java:83)
at org.pdfclown.documents.contents.objects.SetFont.scan(SetFont.java:97)
at org.pdfclown.documents.contents.ContentScanner.moveNext(ContentScanner.java:1330)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:811)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:817)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:777)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:770)
at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.get(ContentScanner.java:690)
at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.access$0(ContentScanner.java:682)
at org.pdfclown.documents.contents.ContentScanner.getCurrentWrapper(ContentScanner.java:1154)
at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:633)
at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:296)
at coderarjob.kpdfsync.lib.annotator.PdfAnnotatorV1.highlight(PdfAnnotatorV1.java:62)
at coderarjob.kpdfsync.poc.MainFrame$2.run(MainFrame.java:172)

[ ] Highlight is not visible on the output PDF file. This was seen on the Concrete Mathematics
cropped PDF file.
Expand All @@ -62,3 +102,74 @@ Kpdfsync
5. Begin highlighting.

The times, this exception occures, it occures around the 73% mark.

[ ] EOFException at org.pdfclown.tools.TextExtractor.extract() method. This is seen on
'the_evolution_of_operating_system_cropped.pdf' file. Could also be a font issue.
Here is the stack trace
java.lang.RuntimeException: java.io.EOFException
at org.pdfclown.documents.contents.fonts.CffParser.load(CffParser.java:703)
at org.pdfclown.documents.contents.fonts.CffParser.<init>(CffParser.java:640)
at org.pdfclown.documents.contents.fonts.Type1Font.getNativeEncoding(Type1Font.java:104)
at org.pdfclown.documents.contents.fonts.Type1Font.loadEncoding(Type1Font.java:151)
at org.pdfclown.documents.contents.fonts.SimpleFont.onLoad(SimpleFont.java:118)
at org.pdfclown.documents.contents.fonts.Font.load(Font.java:738)
at org.pdfclown.documents.contents.fonts.Font.<init>(Font.java:351)
at org.pdfclown.documents.contents.fonts.SimpleFont.<init>(SimpleFont.java:62)
at org.pdfclown.documents.contents.fonts.Type1Font.<init>(Type1Font.java:75)
at org.pdfclown.documents.contents.fonts.Font.wrap(Font.java:249)
at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:72)
at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:1)
at org.pdfclown.documents.contents.ResourceItems.get(ResourceItems.java:119)
at org.pdfclown.documents.contents.objects.SetFont.getResource(SetFont.java:119)
at org.pdfclown.documents.contents.objects.SetFont.getFont(SetFont.java:83)
at org.pdfclown.documents.contents.objects.SetFont.scan(SetFont.java:97)
at org.pdfclown.documents.contents.ContentScanner.moveNext(ContentScanner.java:1330)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:811)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:777)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:770)
at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.get(ContentScanner.java:690)
at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.access$0(ContentScanner.java:682)
at org.pdfclown.documents.contents.ContentScanner.getCurrentWrapper(ContentScanner.java:1154)
at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:633)
at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:296)
at coderarjob.kpdfsync.lib.annotator.PdfAnnotatorV1.highlight(PdfAnnotatorV1.java:62)
at coderarjob.kpdfsync.poc.MainFrame$2.run(MainFrame.java:172)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.EOFException
at org.pdfclown.bytes.Buffer.readUnsignedShort(Buffer.java:511)
at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:306)
at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:324)
at org.pdfclown.documents.contents.fonts.CffParser.load(CffParser.java:669)
... 27 more
:: Cause #1
java.io.EOFException
at org.pdfclown.bytes.Buffer.readUnsignedShort(Buffer.java:511)
at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:306)
at org.pdfclown.documents.contents.fonts.CffParser$Index.parse(CffParser.java:324)
at org.pdfclown.documents.contents.fonts.CffParser.load(CffParser.java:669)
at org.pdfclown.documents.contents.fonts.CffParser.<init>(CffParser.java:640)
at org.pdfclown.documents.contents.fonts.Type1Font.getNativeEncoding(Type1Font.java:104)
at org.pdfclown.documents.contents.fonts.Type1Font.loadEncoding(Type1Font.java:151)
at org.pdfclown.documents.contents.fonts.SimpleFont.onLoad(SimpleFont.java:118)
at org.pdfclown.documents.contents.fonts.Font.load(Font.java:738)
at org.pdfclown.documents.contents.fonts.Font.<init>(Font.java:351)
at org.pdfclown.documents.contents.fonts.SimpleFont.<init>(SimpleFont.java:62)
at org.pdfclown.documents.contents.fonts.Type1Font.<init>(Type1Font.java:75)
at org.pdfclown.documents.contents.fonts.Font.wrap(Font.java:249)
at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:72)
at org.pdfclown.documents.contents.FontResources.wrap(FontResources.java:1)
at org.pdfclown.documents.contents.ResourceItems.get(ResourceItems.java:119)
at org.pdfclown.documents.contents.objects.SetFont.getResource(SetFont.java:119)
at org.pdfclown.documents.contents.objects.SetFont.getFont(SetFont.java:83)
at org.pdfclown.documents.contents.objects.SetFont.scan(SetFont.java:97)
at org.pdfclown.documents.contents.ContentScanner.moveNext(ContentScanner.java:1330)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.extract(ContentScanner.java:811)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:777)
at org.pdfclown.documents.contents.ContentScanner$TextWrapper.<init>(ContentScanner.java:770)
at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.get(ContentScanner.java:690)
at org.pdfclown.documents.contents.ContentScanner$GraphicsObjectWrapper.access$0(ContentScanner.java:682)
at org.pdfclown.documents.contents.ContentScanner.getCurrentWrapper(ContentScanner.java:1154)
at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:633)
at org.pdfclown.tools.TextExtractor.extract(TextExtractor.java:296)
at coderarjob.kpdfsync.lib.annotator.PdfAnnotatorV1.highlight(PdfAnnotatorV1.java:62)
at coderarjob.kpdfsync.poc.MainFrame$2.run(MainFrame.java:172)
23 changes: 17 additions & 6 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,23 +10,34 @@ find src -name "*.java" -exec sed -i s/\ \*$//g {} \; || exit

export CLASSPATH="lib/pdfclown.jar:$BIN_DIR"

JDK_VER_TARGET=8

# Build AJL
javac -Xlint -d "$BIN_DIR/" src/coderarjob/ajl/file/*.java || exit
javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
src/coderarjob/ajl/file/*.java || exit

# Build Pattern Matcher
javac -Xlint -d "$BIN_DIR/" src/coderarjob/kpdfsync/lib/pm/*.java || exit
javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
src/coderarjob/kpdfsync/lib/pm/*.java || exit

# Build Annotator
javac -Xlint -d "$BIN_DIR/" src/coderarjob/kpdfsync/lib/annotator/*.java || exit
javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
src/coderarjob/kpdfsync/lib/annotator/*.java || exit

# Build Kindle Clippings File Parser
javac -Xlint -d "$BIN_DIR/" src/coderarjob/kpdfsync/lib/clipparser/*.java || exit
javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
src/coderarjob/kpdfsync/lib/clipparser/*.java || exit

# Build kpdfsync library
javac -Xlint -d "$BIN_DIR/" src/coderarjob/kpdfsync/lib/*.java || exit
javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
src/coderarjob/kpdfsync/lib/*.java || exit

# Build POC
javac -Xlint -d "$BIN_DIR/" src/coderarjob/kpdfsync/poc/*.java || exit
javac --release $JDK_VER_TARGET -Xlint -d "$BIN_DIR/" \
src/coderarjob/kpdfsync/poc/*.java || exit

# Copy resources
cp -r src/coderarjob/kpdfsync/poc/res $BIN_DIR/coderarjob/kpdfsync/poc || exit

# Generate tags file
ctags --recurse ./src || exit
Expand Down
Binary file added docs/images/screenshot_alpha.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 21 additions & 0 deletions pack.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/bash

echo :: Building packages

rm -rf dist
mkdir dist

pushd build/classes

# Create testplib.jar
# BasicParser is what that can change. So it is packaged separately.
jar cfm kpdfsync.jar ../../Manifest.txt \
coderarjob
popd

mv build/classes/kpdfsync.jar ./dist/
cp lib/pdfclown.jar ./dist/


echo :: Building packages completed

51 changes: 51 additions & 0 deletions src/coderarjob/kpdfsync/poc/HighlightNotePairListRenderer.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
package coderarjob.kpdfsync.poc;

import javax.swing.*;
import java.awt.Component;
import java.awt.BorderLayout;
import java.awt.Color;

public class HighlightNotePairListRenderer extends JPanel implements ListCellRenderer<HighlightNotePair>
{

private JLabel highlightLabel;
private JLabel pairedNoteLabel;
private Color alternateColor;

public HighlightNotePairListRenderer()
{
this.setLayout (new BorderLayout());

highlightLabel = new JLabel();
String iconResourceName = "/coderarjob/kpdfsync/poc/res/highlighter.png";
highlightLabel.setIcon (new ImageIcon (getClass().getResource (iconResourceName)));
highlightLabel.setForeground (Color.BLACK);
highlightLabel.setOpaque (false);
this.add (highlightLabel, BorderLayout.PAGE_START);

pairedNoteLabel = new JLabel();
pairedNoteLabel.setForeground (Color.DARK_GRAY);
pairedNoteLabel.setOpaque (false);
this.add (pairedNoteLabel, BorderLayout.PAGE_END);

alternateColor = new Color (237, 244, 249);
this.setBorder (BorderFactory.createEmptyBorder (5, 2, 5, 0));
}

public Component getListCellRendererComponent(JList<? extends HighlightNotePair> list,
HighlightNotePair value, int index,
boolean isSelected, boolean cellHasFocus)
{
highlightLabel.setText (value.getHighlightText());
pairedNoteLabel.setText (value.getNoteText());

Color normalBackgroundColor = (index % 2 == 0) ? alternateColor : list.getBackground();

if (isSelected)
this.setBackground (list.getSelectionBackground());
else
this.setBackground (normalBackgroundColor);

return this;
}
}
Loading

0 comments on commit a93f7c3

Please sign in to comment.