AI Paper Digest

Super-brief summaries of AI papers I've read
May not perfectly align with authors' claims or intentions
Some papers I think important include detailed summary links, which leads to my blog posts

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
ECCV 2024, arxiv, code
Task: referring, grounding, and reasoning on mobile UI screens

Directly adapting MLLMs to UI screens has limitation, since UI screens exhibit more elongated aspect ratios and contain smaller objects of interests than natural images.

Incorporate "any resolution" (anyres) on top of Ferret, and then train with curated dataset.

During training, both the decoder and the projection layer are updated while the vision encoder is kept frozen.

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection
ICLR 2024, arxiv, review, code, summary
Task: zero-shot anomaly detection (ZSAD)

Previous works use CLIP with object-aware text prompts.

Even though the foreground object semantics can be completely different, anomaly patterns remain quite similar.

Thus, use CLIP with learnable object-agnostic text prompts.

Tiny and Efficient Model for the Edge Detection Generalization
ICCV 2023 Workshop (Resource Efficient Deep Learning for Computer Vision), arxiv, code
Task: edge detection

Propose simple, efficient, and robust CNN model: Tiny and Efficient Edge Detector (TEED).

TEED generates thinner and clearer edge-maps, but requires a paired dataset for training.

Two core methods: architecture (edge fusion module) & loss (weighted cross-entropy, tracing loss).

Weighted cross-entropy helps to detect as many edges as possible, while tracing loss helps to predict thinner and clearer edge-maps.

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
CVPR 2023 Award Candidate, arxiv, website, code, summary
Task: subject-driven image generation

Recently developed large T2I diffusion models can generate high-quality and diverse photorealistic images.

However, these models lack the ability to mimic the appearance of subjects in a given reference set.

Generate novel photorealistic images of the subject contextualized in different scenes via fine-tuning with rare tokens and class-specific prior preservation loss.

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
ICLR 2023 Spotlight, arxiv, review, website, code, summary
Task: personalized text-to-image generation

Recently, large-scale T2I models have demonstrated an unprecedented capability to reason over natural language descriptions.

However, generating a desired target, such as user-specific concept, through text is quite difficult.

Training T2I models have several limitations.

Generate novel photorealistic images of the subject via optimizing only a single word embedding.

Learning to generate line drawings that convey geometry and semantics
CVPR 2022, arxiv, website, code
Task: automatic line generation

View line drawing generation as an unsupervised image translation problem, which means training models with unpaired data.

Previous works solely consider preserving photographic appearence through cycle consistency.

Instead, use 4 losses to improve quality: adversarial loss (LSGAN), geometry loss (pseudo depth map), semantic loss (CLIP), appearance loss (cycle consistency).

Generalisation in humans and deep neural networks
NeurIPS 2018, arxiv, review, code, summary
Task: understanding the differences between DNNs and humans

Compare the robustness of humans and DNNs (VGG, GoogLeNet, ResNet) on object recognition under 12 different image distortions.

Human visual system is more robust than DNNs.

DNNs generalize so poorly under non-i.i.d. settings.

format

> **paper title**  
> *accept info*, [arxiv](), [review](), [website](), [code](), [summary]()  
> Task:  
> 
> - super-brief summary

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Paper Digest

About

jasonleex1995/AI-Paper-Digest

Folders and files

Latest commit

History

Repository files navigation

AI Paper Digest

About

Topics

Resources

Stars

Watchers

Forks