paint-brush
New AI Dataset Pushes Boundaries While Tackling Challenges in Ethics and Precisionby@autoencoder

New AI Dataset Pushes Boundaries While Tackling Challenges in Ethics and Precision

tldt arrow

Too Long; Didn't Read

Researchers at the Mohamed bin Zayed University developed an AI model that can create text-based conversations tied to specific objects or regions in an image.
featured image - New AI Dataset Pushes Boundaries While Tackling Challenges in Ethics and Precision
Auto Encoder: How to Ignore the Signal Noise HackerNoon profile picture

Authors:

(1) Hanoona Rasheed, Mohamed bin Zayed University of AI and equally contributing first authors;

(2) Muhammad Maaz, Mohamed bin Zayed University of AI and equally contributing first authors;

(3) Sahal Shaji, Mohamed bin Zayed University of AI;

(4) Abdelrahman Shaker, Mohamed bin Zayed University of AI;

(5) Salman Khan, Mohamed bin Zayed University of AI and Australian National University;

(6) Hisham Cholakkal, Mohamed bin Zayed University of AI;

(7) Rao M. Anwer, Mohamed bin Zayed University of AI and Aalto University;

(8) Eric Xing, Mohamed bin Zayed University of AI and Carnegie Mellon University;

(9) Ming-Hsuan Yang, University of California - Merced and Google Research;

(10) Fahad S. Khan, Mohamed bin Zayed University of AI and Linköping University.

Editor's Note: This is Part 10 of 10 of a study detailing the development of an AI model that is designed to describe images to users. Read the rest below.


Supplementary Material (Part 1)


Supplementary Material (Part 2)

D. Dataset Visualization

In this section, we provide additional dataset samples of our GranD and GranDf datasets to better understand the functionalities they offer. Please see Fig. 15 and Fig. 14.

E. Limitations and Future Work

The large-scale automated pipeline provides dense labelings that are important for our pretraining but still contains some noise. A high-quality, clean dataset could help further improve the pretrained representations, although this comes at a significantly higher annotation cost. A potential research direction is to develop a cost-effective annotation pipeline aimed at reducing noise in dense labeling. Additionally, expanding the GLaMM framework to include modalities such as video and 3D is also a future research direction.

F. Ethics and Societal Impact

Our Grounding-anything Dataset (GranD) utilizes SAM images that have de-identified personal information, with all faces and license plates obscured. To the best of our knowledge, the dataset does not portray any strong biases or discrimination. We urge for the responsible use of GranD and GLaMM, promoting research progress while safeguarding privacy.



This paper is available on arxiv under CC BY 4.0 DEED license.