SEAGULL

Zewen Chen ^1,2, Juan Wang ¹, Wen Wang³, Sunhan Xun⁴, Hang Xiong³, Yun Zeng⁵, Jian Guo⁴, Shuxun Wang¹,
Chunfeng Yuan ¹ Bing Li ^1,6 Weiming Hu ^1,2,7

{chenzewen2022, jun_wang}@ia.ac.cn, bli@nlpr.ia.ac.cn

¹ State Key Laboratory of Multimodal Artificial Intelligence Systems, CASIA
² School of Artificial Intelligence, University of Chinese Academy of Sciences
³ Beijing Jiaotong University; ⁴ Beijing Union University; ⁵ China University of Petroleum;
⁶ PeopleAI Inc. Beijing, China;
⁷ School of Information Science and Technology, ShanghaiTech University

TL;DR: We propose a novel network (SEAGULL) and construct two datasets (SEAGULL-100w and SEAGULL-3k) to achieve fine-grained IQA for any ROIs.

ABSTRACT

Existing Image Quality Assessment (IQA) methods achieve remarkable success in analyzing quality for overall image, but few works explore quality analysis for Regions of Interest (ROIs). The quality analysis of ROIs can provide fine-grained guidance for image quality improvement and is crucial for scenarios focusing on region-level quality. This paper proposes a novel network, SEAGULL, which can SEe and Assess ROIs quality with GUidance from a Large vision-Language model. SEAGULL incorporates a vision-language model (VLM), masks generated by Segment Anything Model (SAM) to specify ROIs, and a meticulously designed Mask-based Feature Extractor (MFE) to extract global and local tokens for specified ROIs, enabling accurate fine-grained IQA for ROIs. Moreover, this paper constructs two ROI-based IQA datasets, SEAGULL-100w and SEAGULL-3k, for training and evaluating ROI-based IQA. SEAGULL-100w comprises about 100w synthetic distortion images with 33 million ROIs for pre-training to improve the model's ability of regional quality perception, and SEAGULL-3k contains about 3k authentic distortion ROIs to enhance the model's ability to perceive real world distortions. After pre-training on SEAGULL-100w and fine-tuning on SEAGULL-3k, SEAGULL shows remarkable performance on fine-grained ROI quality assessment. The framework of SEAGULL

APPLICATIONS

1. Apply on videos for ROI quality monitoring with a tracker algorithm

2. Apply on images for compression algorithm parameters selection to ensure the quality of ROIs.

DEMONSTRATE 🎥

DOWNLOAD

Please raise an issue here if you encounter any problems.

Database

SEAGULL-100w
8,156 Ref. images
6 Dist. types
20 Dist. levels
98w Dist. images
33m Mask-based ROIs

SEAGULL-3k
968 Dist. images
3,261 Mask-based ROIs
9,783 Annotations
Mixture distortions annotations

Code

Paper

Arxiv

ACKNOWLEDGEMENT 💌

Osprey and LLaVA-v1.5: We build this repository based on them.
RAISE: The distorted images in SEAGULL-100w are constructed based on this dataset.
SAM and SEEM: The mask-based ROIs are generated using these two awesome works. SAM is used to get the segmentation result in the demo.
TOPIQ: The quality scores and importance scores for ROIs are generated using this great FR-IQA.