Utility of ranking #23

deklanw · 2021-09-21T22:46:42Z

deklanw
Sep 21, 2021

Adding options for ranking (while still retaining the text-order as an option) would be useful. Here's an illustration:

I just finished reading the book A History of the World in 100 Objects (good book btw). I made some highlights of some interesting parts while I was reading, which I plan to extract into notes and cards. But... I haven't gotten around to it yet (a relatable situation I assume). It would be a shame if I never get around to it -- I would really prefer to remember some stuff I read.

It's not particularly important to me what parts of the book I remember; I would be satisfied with having 20 high-quality cards. Let's say I have the patience to read through 200 candidate cards to choose my preferred 20. I don't want to manually edit any cards.

My problem reduces to returning an (ordered) list of cards and ensuring that precision@200 >= 0.1. How could I use Autocards to solve my issue? Currently the cards are returned in text order. Are the first 200 cards in the text the actual best shot?

At the very least, filtering out non-sense questions is a good start. Here are some ideas:

There's a dataset called CoLA: The Corpus of Linguistic Acceptability which can be used for classifying 'acceptability' of sentences
LM models like GPT already have a natural way of evaluating text for plausibility: perplexity
There's a new method called LM-Critic for Grammatical Error Correction which includes a component that checks for grammatical issues.

I tried out these methods, using a RoBERTa model finetuned on CoLA and using gpt2 (the 117M parameter version) and EleutherAI/gpt-neo-1.3B (a new GPT-3 replication, the largest that fits on my Colab GPU) for perplexity.

Here are some examples from running Autocards on some Wiki pages and including some ranking: WW2 and Transhumanism

The CoLA scores are generally good

Number cards < 0.5 CoLA
WW2: 18/517
Transhumanism: 20/293

Here are the bottom 5 cards according to CoLA and gpt-neo-perplexity from both sets

WW2

CoLA

Where did Japan incursion into?
Who did Italy object to absorbing Austria?
What did the Americans dismiss Japan's proposals?
Where did the German Army Group Centre resist until 11 May?
What country was annexed into Italy East Africa?

perplexity

What fleet did Japan neutralize?
What treaties did Hitler defy?
Who invaded Thailand?
What salient did the Soviets attack?
When did Commonwealth forces quake an uprising in Iraq?

Transhumanism

CoLA

What did Max Dublin call a few in advancing transhumanist causes?
What issue do some secular humanists argue transhumanists differ from the humanist mainstream?
What do Bailey and other supporters of reject the claim that life would be experienced as meaningless if some human limitations are overcome with enhancement technologies?
What does Mary Midgley describe the ideas of the early 20th century?
What are transhumanists often concerned with methods of?

perplexity

Who wrote Citizen Cyborg?
Who called the Frankenstein complex?
What colors does Harbisson's antenna sense?
What show did Vita-More produce in 1982?
When did Japanese Metabolist architects produce a manifesto?

I encourage anyone to look through the spreadsheets themselves. Here are my impressions:

Low CoLA scores do seem to correlate well with being grammatically whacky or semantically nonsensical
High perplexity does not seem to correlate with being nonsensical. On the other hand, low perplexity questions do generally seem more contextual and higher-quality overall.
LM-Critic is all over the place. From reading the logs, it calls (emphasis mine) "With the rapid progress of what, every one will have a brain wave receiver in his ear?" ungrammatical because the local perturbation "With the rapid progress of that, every one will have a brain wave receiver in his ear?" has higher probability according to gpt-2. The fundamental idea is cool but it seems to give too many false negatives.

Given those observations my first try for the challenge above of getting precision@200 >= 0.1 would be: filter by CoLA > 0.9 and order by perplexity ascending. That gives pretty solid results.

I'd bet there are better ways of ranking out there (finding a larger dataset than CoLA and training it with something better than RoBERTa is an obvious route), but for now I think this illustrates the potential.

thiswillbeyourgithub · 2021-09-22T08:58:09Z

thiswillbeyourgithub
Sep 22, 2021
Collaborator

Thanks for the detailed write up. I especially liked that you explained clearly what you hoped to gain from such a ranking system. I am now convinced of the potentiel of this concept.

I think it could be actually quite easy to add to autocards in the following way:

run autocards as usual
but instead of exporting the q&a at the end: the user can run another method that keeps only the N highest scored q&a.
exports only the best q&a to whatever format you need as usual

This way the user gets to create all the q&a, can estimate the number he or she wants to keep and then filter them.

One potential limitation I can imagine is that I don't think it's acceptable to require the use of a GPU or to have a really large model to run just for this task. I think it's worthwhile to spend some effort finding the absolute minimum viable model for that.

Opinions?

2 replies

deklanw Sep 23, 2021
Author

Sounds reasonable to me. Although, I should mention that training a small-ish model like distilroberta on CoLA and just filtering with no ranking is a small overhead. Measuring perplexity with a LM, though, is more of a hassle.

require the use of a GPU

Without a GPU running Autocards on a long article already seems like a nightmare. Hell, even with one.

I made another post that might put this one in a different light, here.

thiswillbeyourgithub Sep 23, 2021
Collaborator

Without a GPU running Autocards on a long article already seems like a nightmare. Hell, even with one.

I don't own a GPU yet and do everything on CPU and it's quite fine. My use case does not involve running autocards on hundreds of page regularly anyway.

To me, trying to maintain usability to GPU-less users is important. Or at the very least, no changes should be made that render autocards unusable without a CPU. Hence the idea of putting your idea in a separate method.

Given that I played a role in refactoring the current code base I know it quite well and I can add the method I talked about earlier if it's easier for you. Granted it's not much time saved probably.

You just have to provide me with some sort of example of how to compute the score for a given text and I could go from there. No promises on deadline though :)

thiswillbeyourgithub · 2021-10-08T16:54:37Z

thiswillbeyourgithub
Oct 8, 2021
Collaborator

Just an idea, I am still working quite a lot on AnnA. And somewhere in the code I end up with a distance matrix of my anki cards.

Idea: create a distance matrix between each q&a pair (not separating q and a), use that to create a dendrogram (phylogenic tree), use it to find the best order to review the new cards.

Background: in biology it is routine stuff to create phylogenic tree from distance matrix of species for example.

End-goal: you have a book, you create a lot of cards about it using autocards, remove those whose score indicate very probable low quality, then sort the cards by the dendrogram and only then start learning.

Tell me if it's not clear.

I intend to play around with phylogenetic vizualisation of my anki collection sometime and it would be interesting to see how well it works.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utility of ranking #23

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Utility of ranking #23

deklanw Sep 21, 2021

WW2

CoLA

perplexity

Transhumanism

CoLA

perplexity

Replies: 2 comments · 2 replies

thiswillbeyourgithub Sep 22, 2021 Collaborator

deklanw Sep 23, 2021 Author

thiswillbeyourgithub Sep 23, 2021 Collaborator

thiswillbeyourgithub Oct 8, 2021 Collaborator

deklanw
Sep 21, 2021

Replies: 2 comments 2 replies

thiswillbeyourgithub
Sep 22, 2021
Collaborator

deklanw Sep 23, 2021
Author

thiswillbeyourgithub Sep 23, 2021
Collaborator

thiswillbeyourgithub
Oct 8, 2021
Collaborator