Skip to content

jasonleex1995/AI-Paper-Digest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 

Repository files navigation

AI Paper Digest

  • Super-brief summaries of AI papers I've read
  • May not perfectly align with authors' claims or intentions
  • Some papers I think important include detailed summary links, which leads to my blog posts

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
ECCV 2024, arxiv, code
Task: referring, grounding, and reasoning on mobile UI screens

  • Directly adapting MLLMs to UI screens has limitation, since UI screens exhibit more elongated aspect ratios and contain smaller objects of interests than natural images.
  • Incorporate "any resolution" (anyres) on top of Ferret, and then train with curated dataset.
  • During training, both the decoder and the projection layer are updated while the vision encoder is kept frozen.

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection
ICLR 2024, arxiv, review, code, summary
Task: zero-shot anomaly detection (ZSAD)

  • Previous works use CLIP with object-aware text prompts.
  • Even though the foreground object semantics can be completely different, anomaly patterns remain quite similar.
  • Thus, use CLIP with learnable object-agnostic text prompts.

Tiny and Efficient Model for the Edge Detection Generalization
ICCV 2023 Workshop (Resource Efficient Deep Learning for Computer Vision), arxiv, code
Task: edge detection

  • Propose simple, efficient, and robust CNN model: Tiny and Efficient Edge Detector (TEED).
  • TEED generates thinner and clearer edge-maps, but requires a paired dataset for training.
  • Two core methods: architecture (edge fusion module) & loss (weighted cross-entropy, tracing loss).
  • Weighted cross-entropy helps to detect as many edges as possible, while tracing loss helps to predict thinner and clearer edge-maps.

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
CVPR 2023 Award Candidate, arxiv, website, code, summary
Task: subject-driven image generation

  • Recently developed large T2I diffusion models can generate high-quality and diverse photorealistic images.
  • However, these models lack the ability to mimic the appearance of subjects in a given reference set.
  • Generate novel photorealistic images of the subject contextualized in different scenes via fine-tuning with rare tokens and class-specific prior preservation loss.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
ICLR 2023 Spotlight, arxiv, review, website, code, summary
Task: personalized text-to-image generation

  • Recently, large-scale T2I models have demonstrated an unprecedented capability to reason over natural language descriptions.
  • However, generating a desired target, such as user-specific concept, through text is quite difficult.
  • Training T2I models have several limitations.
  • Generate novel photorealistic images of the subject via optimizing only a single word embedding.

Learning to generate line drawings that convey geometry and semantics
CVPR 2022, arxiv, website, code
Task: automatic line generation

  • View line drawing generation as an unsupervised image translation problem, which means training models with unpaired data.
  • Previous works solely consider preserving photographic appearence through cycle consistency.
  • Instead, use 4 losses to improve quality: adversarial loss (LSGAN), geometry loss (pseudo depth map), semantic loss (CLIP), appearance loss (cycle consistency).

Generalisation in humans and deep neural networks
NeurIPS 2018, arxiv, review, code, summary
Task: understanding the differences between DNNs and humans

  • Compare the robustness of humans and DNNs (VGG, GoogLeNet, ResNet) on object recognition under 12 different image distortions.
  • Human visual system is more robust than DNNs.
  • DNNs generalize so poorly under non-i.i.d. settings.

format

> **paper title**  
> *accept info*, [arxiv](), [review](), [website](), [code](), [summary]()  
> Task:  
> 
> - super-brief summary