Replies: 2 comments 2 replies
-
Thanks for the detailed write up. I especially liked that you explained clearly what you hoped to gain from such a ranking system. I am now convinced of the potentiel of this concept. I think it could be actually quite easy to add to autocards in the following way:
This way the user gets to create all the q&a, can estimate the number he or she wants to keep and then filter them. One potential limitation I can imagine is that I don't think it's acceptable to require the use of a GPU or to have a really large model to run just for this task. I think it's worthwhile to spend some effort finding the absolute minimum viable model for that. Opinions? |
Beta Was this translation helpful? Give feedback.
-
Just an idea, I am still working quite a lot on AnnA. And somewhere in the code I end up with a distance matrix of my anki cards. Idea: create a distance matrix between each q&a pair (not separating q and a), use that to create a dendrogram (phylogenic tree), use it to find the best order to review the new cards. Background: in biology it is routine stuff to create phylogenic tree from distance matrix of species for example. End-goal: you have a book, you create a lot of cards about it using autocards, remove those whose score indicate very probable low quality, then sort the cards by the dendrogram and only then start learning. Tell me if it's not clear. I intend to play around with phylogenetic vizualisation of my anki collection sometime and it would be interesting to see how well it works. |
Beta Was this translation helpful? Give feedback.
-
Adding options for ranking (while still retaining the text-order as an option) would be useful. Here's an illustration:
I just finished reading the book A History of the World in 100 Objects (good book btw). I made some highlights of some interesting parts while I was reading, which I plan to extract into notes and cards. But... I haven't gotten around to it yet (a relatable situation I assume). It would be a shame if I never get around to it -- I would really prefer to remember some stuff I read.
It's not particularly important to me what parts of the book I remember; I would be satisfied with having 20 high-quality cards. Let's say I have the patience to read through 200 candidate cards to choose my preferred 20. I don't want to manually edit any cards.
My problem reduces to returning an (ordered) list of cards and ensuring that
precision@200 >= 0.1
. How could I use Autocards to solve my issue? Currently the cards are returned in text order. Are the first 200 cards in the text the actual best shot?At the very least, filtering out non-sense questions is a good start. Here are some ideas:
I tried out these methods, using a RoBERTa model finetuned on CoLA and using
gpt2
(the 117M parameter version) andEleutherAI/gpt-neo-1.3B
(a new GPT-3 replication, the largest that fits on my Colab GPU) for perplexity.Here are some examples from running Autocards on some Wiki pages and including some ranking: WW2 and Transhumanism
The CoLA scores are generally good
Here are the bottom 5 cards according to CoLA and gpt-neo-perplexity from both sets
WW2
CoLA
perplexity
Transhumanism
CoLA
perplexity
I encourage anyone to look through the spreadsheets themselves. Here are my impressions:
Given those observations my first try for the challenge above of getting
precision@200 >= 0.1
would be: filter byCoLA > 0.9
and order by perplexity ascending. That gives pretty solid results.I'd bet there are better ways of ranking out there (finding a larger dataset than CoLA and training it with something better than RoBERTa is an obvious route), but for now I think this illustrates the potential.
Beta Was this translation helpful? Give feedback.
All reactions