paint-brush
How to Use GenAI to Classify an Image as a Photo, Screenshot, or Memeby@raymondcamden
134 reads

How to Use GenAI to Classify an Image as a Photo, Screenshot, or Meme

by Raymond CamdenFebruary 7th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

A friend on Facebook wondered if there was some way to take a collection of photos and figure out which were'real' photos versus memes. I opened up Google's AI Studio and did a few initial tests: Screenshot from AI Studio(https://static.raymondcamden.com/images/2024/01/cm1.jpg) I then simply removed that image and pasted more info to test. From what I could see, it worked well enough.
featured image - How to Use GenAI to Classify an Image as a Photo, Screenshot, or Meme
Raymond Camden HackerNoon profile picture

File this under the "I wasn't sure if it would work and it did" category. Recently, a friend on Facebook wondered if there was some way to take a collection of photos and figure out which were 'real' photos versus memes. I thought it could possibly be a good exercise for GenAI and decided to take a shot at it. As usual, I opened up Google's AI Studio and did a few initial tests:


Screenshot from AI Studio

I then simply removed that image and pasted more info to test. From what I could see, it worked well enough. I then took the source code from AI Studio and began working.

The Code

First, I grabbed some pictures from my collection, eleven of them, and tried to get a few photos, memes, and screenshots. To make it easier for me, after downloading them I renamed them so it would be quicker to see if it worked right.


As I mentioned above, AI Studio gave me the code, but I modified it slightly so I could pass a directory of images:


import fs from 'fs/promises';
import 'dotenv/config';

import { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } from '@google/generative-ai';

const MODEL_NAME = "gemini-pro-vision";
const API_KEY = process.env.GOOGLE_AI_KEY;


async function detectPhoto(path) {
  const genAI = new GoogleGenerativeAI(API_KEY);
  const model = genAI.getGenerativeModel({ model: MODEL_NAME });

  const generationConfig = {
    temperature: 0.4,
    topK: 32,
    topP: 1,
    maxOutputTokens: 4096,
  };

  const safetySettings = [
    {
      category: HarmCategory.HARM_CATEGORY_HARASSMENT,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
    {
      category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
      threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
    },
  ];

  const parts = [
    {text: "Look at the following photo and tell me if it's a photo, a screenshot, or a meme. Answer with just one word.\n"},
    {
      inlineData: {
        mimeType: "image/jpeg",
        data: Buffer.from(await fs.readFile(path)).toString("base64")
      }
    },
    {text: "\n\n"},
  ];

  const result = await model.generateContent({
    contents: [{ role: "user", parts }],
    generationConfig,
    safetySettings,
  });

  const response = result.response;
  return response.text();
}

const root = './source_for_detector/';
let files = await fs.readdir(root);
for(const file of files) {
	console.log(`Check to see if ${file} is a photo, meme, or screenshot...`);
	let result = await detectPhoto(root + file);
	console.log(result);
}


It worked perfectly!


Terminal output from script

If you want a copy of the source, you can grab it here: https://github.com/cfjedimaster/ai-testingzone/tree/main/detect_meme_ss

The Photos

Ok, technically you can just head over to the GitHub repo to see these, but here are the source images. First, the 'regular' photos:


Cat lying on a desk next to a computer mouse


Display case that says 'invisible snake'


Picture from a football game


Two cats on a chair


Next, the screenshots:


Screenshot from Reddit app


Screenshot from walmart.com, Nebulon-B Frigate LEGO


Screenshot from OneNote, a list of shows to watch


And finally, the memes. Enjoy.


Time's Person of the Year - Godzilla


Vote Cobra


Who is Cobra Commander - I mean really...


Brace yourself - winter is coming. The entire thing. All at once. In one weekend.