Have you ever found yourself staring at an image spotted somewhere on the internet and wondering whether it results from human or artificial creative labor? Since the breakthrough in image generation technology, I’ve been captivated by interior designs, nature scenery, fashion photographs, and futuristic cityscapes made with AI that look like professional artwork.
As someone with a creative eye yet no artistic skills, I was excited about the opportunity to transform ideas from my mind into visuals based solely on text input. I tried out several popular text-to-image tools, like MidJourney and DreamStudio on the web and ARTA on mobile. Still, while they all provided a good starting point and a straightforward user experience, the generation outcomes often fell short of my expectations and differed from what I envisioned — until I learned to tell AI what to draw in the language it understands.
If you’re new to AI-generated art and need assistance streamlining the creative process, I’ll provide a step-by-step guide below to help you compose concise and well-structured text prompts that the AI models can effectively interpret and translate into the desired images. In the bottom line, I will share an example of both a correct and an incorrect prompt to illustrate how significantly they can impact the generation output in practice.
First things first, two common prompt-writing mistakes you should avoid when using any text-to-image generator are being too wordy and using complex grammar. In fact, the more words you incorporate and the more complicated your grammar, the more difficult it is for AI to understand your prompt and adequately depict your concept. Therefore, communicate concisely, carefully choosing keywords, and list your ideas explicitly and separately rather than trying to cram them into one sentence.
Ask yourself how you want your image to look overall. Do you see it as a photograph, illustration, painting, sketch, or other artwork type? This initial decision puts you in control of the final result and helps align all the elements with this general framework. For instance, choosing a sketch will typically produce pencil images, while opting for a painting will add colors and likely create a more vintage appearance.
AI image generators require a specific subject for rendering, so you must indicate the main subject of your future image, whether a person, an animal, or a landscape. Stick to one subject per prompt and opt for concrete nouns like girl, dog, train, rose, ocean, temple, etc., for more accurate results. You can most likely render something using abstract keywords like love, peace, or joy, but the results may appear very inconsistent in what they depict.
Now, it’s time to detail the concept with modifiers – adjectives or attributive nouns that add to the sense of a head subject. Modifiers can completely change the perspective of your image and boost its overall quality, so the more descriptive you are, the better the generated output. Define what the subject looks like, what the subject is doing, and other characteristics essential to representing your concept.
To help AI create the exact atmosphere and surroundings you need, give attention to aspects like lightning (studio, soft, cinematic, volumetric, accent lighting, sunlight, frontlight, backlight, ambient, moonlight, etc.), color theme (pastel, vibrant, neon pink, dark, etc.), environment/scene (in the sky, indoor, underwater, at night, etc.), and background (forest, beach, solid color, nebula, etc.). Additionally, consider the mood and vibe you want to convey. Simple feeling modifiers can set the scene’s atmosphere, whether you opt for positive (energetic, cozy, romantic, satisfaction, etc.) or negative (depressive, loneliness, regret, fear, etc.) ones.
Add a point of view to determine the angle from which you are looking at the image, such as side, overhead, or front. Other essential modifiers for creating photo-like images involve lenses and cameras. Here are a few examples: close-up, macro, lens flare, wide angle, microscopic, bokeh, isometric, depth of field, ultra-wide angle, fisheye lens, and panorama.
Proceed to aesthetic settings and define the overall artistic style. Also, specify the art medium. These aspects can greatly contribute to the value of the generated output, making it more consistent and expressive. Here are some of the common keywords: geometrical art, pop art, watercolor, line drawing, line art, 3D sculpture, crayon drawing, painting, oil painting, pastel drawing, medieval art, photography, digital art, stained glass, cave painting, polaroid, vector art, retro, illustration, comics, editorial fashion photography, Japanese anime style, surreal, travel photography, pencil sketch, steampunk, cartoon, futurism, vaporwave, art deco, interior design, psychedelic, street art.
Consider including artistic influences. Adding the name of a specific artist whose signature artwork aligns with your chosen art genre can enhance the final image. Note, however, that finding the right artist is crucial so as not to mess up the output. Some artists affect the generation only subtly, while others can alter it significantly depending on their distinctive manner. For example, Claude Monet can infuse impressionism into landscapes, while mentioning Salvador Dali adds a surrealistic touch.
Certain words can make a dramatic difference in the outcomes of the generation. The keywords like highly detailed, ultra-realistic, 4K, 8K, UHD, and HDR are the game changers for the overall quality of the image. Adding “studio lighting” will introduce appealing textures to the image, while “accent lighting” can draw attention to specific features. Describing the image as “professional” can enhance color contrast and detail, while mentioning “vivid colors” will bring life to your image. Using terms like ultra-wide angle, panoramic, or long-shot will help guide the camera.
Packing your prompt with phrases related to artistic techniques and materials (such as trending on artstation, precise line-art, golden ratio, fine art, etc.) and those associated with artistic value (such as most beautiful image ever seen, epic composition, etc.) can also bring sophisticated improvements.
Negative prompts are parameters that inform the generation model about what you do not want to see in the final image. They help prevent AI from generating specific elements, fix image abnormalities, and improve overall quality. Here are some universal examples: blurry, cropped, cloned face, duplicate, error, ugly, wrong proportions, deform, fused fingers, extra arms, law quality res, watermark, and more.
The order is almost as important as the vocabulary, guiding the AI in generating the desired output. Combine keywords to direct the focus of your prompt and note key phrases that influence image preference. Emphasize essential keywords with a full colon and decimal number, use parentheses for added importance, and brackets for less importance.
Typically, the earlier a word appears in the sentence, the more importance it is given. If the AI model didn’t take in some key phrase and failed to depict it in the generated image, try rearranging the order of words, placing the missing element earlier in the prompt.
Bad prompt: portrait of a woman looking directly at the camera, neck tattoo, urban pink background.
Good prompt: head and shoulders realistic photo portrait of a beautiful woman looking at the camera, white skin, pink lips, red hair, blue eyes, black tattoos on her neck, neon punk, highly detailed, pink neon city background.
Creating effective prompts requires both artistic imagination and technical skills. By including the necessary building blocks and following the proper prompt structure you ensure that the text-to-image generation model understands your vision and produces optimal output. However, don’t be afraid to experiment; the more you practice, the more adept you become, and the more remarkable your creations will be.