MRR means Mean Reciprocal Rank.
where:
They are both embedding, which means we use a low-dimension vector to represent a word. We call the embedding is static in the sense that it will not change with the context once been learned; while dynamic or contextualized embedding represents a word among its contexts.
Example
In two sentences: “Apple sells phones” and “I eat an apple”, dynamic embeddings will represent “apple” differently according to the contexts, while static embedding can not distinguish the semantic difference between two “apples”.
- full name
Co
ntextualizedl
ate interaction overBERT
- paper ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT
- year 2020 (SIGIR'20)
- from Stanford
- GitHub https://github.com/stanford-futuredata/ColBERT
Trade-off between Effectiveness and Efficiency. ColBERT is in the center of below compromise matrix.
-
The Most Effective, but Least Efficiency
Fully dynamic in both representation and interaction between every word of query and document.
-
The Compromises
-
The Comparison
ColBERT attains similar MRR w/ BERT while
10~100x
better latency.
-
Hyper-parameters
- encoder: BERT-base-uncased
- FC:
out-dim = 128
,in-dim = 768
- Query Max Len = 32
-
Data Layout
(N, S, E)
,N
is batch size,S
is sequence length,E
is embedding size
Each query is encoded into
To decrease retriever time consumption, can use sparse retriever to do the first-round filtering, and ColBERT do the re-ranking based on the much smaller set generated by retriever, a typical coarse-to-fine
idea.
Yet, we need to pay attention to the recall of the sparse retriever, once true candidates are missed, we cannot get it back in re-ranker.
Greatly better latency than all-to-all models, greatly better MRR
than static models and no-interaction models.
Good MRR
and Recall
lifting, at the expense of latency increase.
Another interesting observation: While to Recall@1000
, we can see that ColBERT re-ranker's recall is bounded by its precedent BM25 retriever, it can only get to 81.4%
which is BM25
'S recall@1000
, even we can see ColBERT retriever can get 96.8%
which is much better.
- paper ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction
- year 2021 (NAACL'22)
- from Stanford
- GitHub https://github.com/stanford-futuredata/ColBERT
ColBERT-v2 is an end-2-end retriever.