Last night, I had an interesting thought. Many times I work with images that have vague filenames. For example, screenshot_1_24_12_23.jpg
. Given that there are many APIs out there that can look at an image and provide a summary, what if we could use that to provide a better file name based on the content of the image? Here's what I was able to find.
As always, I began by prototyping in Google AI Studio. I apologize for stating this in basically every post on the topic, but I really want to stress how useful that is for development.
I used a very simple prompt:
Write a one sentence short summary of this image. The sentence
should be no more than five words.
And then did a quick test:
If it's a bit hard to read in the screenshot, the result was:
A Mardi Gras king cake.
Which is absolutely perfect. So, I took the code from this prompt and whipped up the following script:
import fs from 'fs/promises';
import 'dotenv/config';
import slugify from '@sindresorhus/slugify';
import { GoogleGenerativeAI, HarmCategory, HarmBlockThreshold } from '@google/generative-ai';
const MODEL_NAME = "gemini-pro-vision";
const API_KEY = process.env.GOOGLE_AI_KEY;
const SOURCE = './source/';
const OUTPUT = './output/';
async function getImageSummary(path) {
const genAI = new GoogleGenerativeAI(API_KEY);
const model = genAI.getGenerativeModel({ model: MODEL_NAME });
const generationConfig = {
temperature: 0.4,
topK: 32,
topP: 1,
maxOutputTokens: 4096,
};
const safetySettings = [
{
category: HarmCategory.HARM_CATEGORY_HARASSMENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
{
category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
},
];
// ToDo potentially - make mimeType actually check the image type
const parts = [
{text: "Write a one sentence short summary of this image. The sentence should be no more than five words.\n\n"},
{
inlineData: {
mimeType: "image/jpeg",
data: Buffer.from(await fs.readFile(path)).toString("base64")
}
},
{text: "\n"},
];
const result = await model.generateContent({
contents: [{ role: "user", parts }],
generationConfig,
safetySettings,
});
// This assumes a good response. Never assume.
return result.response.candidates[0].content.parts[0].text.trim();
}
const files = await fs.readdir(SOURCE);
for(let file of files) {
console.log(`Processing ${file}`);
let result = await getImageSummary(SOURCE + file);
// again, assumes jpg
let newname = OUTPUT + slugify(result) + '.jpg';
console.log(`Copying to ${newname}`);
await fs.copyFile(SOURCE + file, newname);
}
console.log('Done');
The main part is the getImageSummary
function which is a modified version of the code AI Studio output. As you can see, I do kinda assume .jpg
only, which isn't great, and I don't handle errors at all, but for a quick test, it's fine.
After the function, I basically list the files from a source directory, get the summary, and use the excellent @sindresorhus/slugify
package to translate the sentence into a slug for use in renaming.
Now, one thing to keep in mind is that if you run this script multiple times, you will get multiple results with slightly different summaries. You could wipe the output directory before running perhaps.
So, how well did it work?
I tested with these five images. Below each image, you can see their original filename and how they were renamed.
20240107_120237.jpg to an-nfl-game-between-the-saints-and-the-falcons.jpg
20240108_152420.jpg to cat-interrupts-work-day.jpg
20240113_193655.jpg to a-small-cake-with-yellow-frosting.jpg
20240120_170218.jpg to cat-sits-in-a-box.jpg
20240122_184642.jpg to a-mardi-gras-king-cake.jpg
Honestly, these feel perfect. I will say the "cat-interrupts-work-day" one is... perhaps a bit more funny than practical. I think that would be a fair criticism, but in general, this worked incredibly well.
If you want the full source code with the images as well, you can grab it here: https://github.com/cfjedimaster/ai-testingzone/tree/main/rename_images
Let me know what you think in the comments below.