I'm sure you're asking yourself this when looking at this image – I certainly did.
In every reasonable image out there in the internet, Shiva doesn’t have horns. So why do they turn up in this image?
Answering this question needs a bit of a deep-dive into the behind-the-scenes of my work, so I hope you’ll bear with me.
You see, creating images with AI is a strange thing.
Sometimes, you get a great image right away.
Other times, try as you might, it just doesn't work out.
Other times, you get something almost-but-not-quite-there.
This image is a case of the latter.
The basic process of generating images is rather simple
You enter some words (the prompt) describing what you’d like to see in the image and the AI delivers. To refine the image, or give the AI some direction, you can add a whole range of specifiers to the prompt, such as requesting a certain image format, or a higher or lower level of detail, or even to exclude certain elements. You can specify an art style (e.g. “oil painting”) or period (“baroque”) or even a specific artist (“Van Gogh”), and the AI will create the image you’re requesting in that style.
However, because of the way I work with AI – using only unedited original quotes from source texts – this wasn’t an option for me. Similarly, it would have been easy to modify the prompt to explain what I want Shiva to look like, or to add parameters to simply exclude horns. But these options would have required me modify the prompt, and if at all possible, that is something I don’t want to do.
Shiva as a graffiti, in the style of Brueghel, and painted by Van Gogh
Sure, sometimes it’s fun to create Shiva in the style of Brueghel or Cubism or as a graffiti, but in most cases I like to keep my prompts as simple as possible, just playing with the quote itself.*
* Yes, sometimes I do switch or substitute words, such as “Consciousness” for “it” if it’s needed for context, but that happens maybe 1 in 10 cases. Also, in about half my images I do add some kind of style descriptor to make more colorful or expressive images.
Within my self-imposed limits, there was no way of getting the "horns" off Shiva, and I’m happy to live with them if they aren’t overpowering.
But why are there horns in the first place?
Shiva is an interesting example of the inherent prejudices of AI - prejudices that are embedded in the code as part of the training process AI goes through to be able to transform text into images.
If you ask for "Lord Shiva", or simply "Shiva", you usually get a stereotypical Shiva image with a mountain in the background, blue-skinned Shiva in the foreground with a crescent moon in his hair (or simply a night-time setting) and holding a trident. Sometimes there’s also a snake around his neck.
But more often than not, Shiva appears with horns. Adding descriptive text to the prompt can then influence the degree to which the horns are present, how vicious or cute they look, and their exact placement.
My guess is that the horns seem to come from the trident or the moon morphing with Shiva, creating horn-like structures. Sometimes it’s the snake that appears to morph, which is noticeable when the horns have scales or snake eyes and tongues…
AI doesn't "understand"
This clearly shows limitations in the AI. Most importantly, it highlights that AI doesn’t really “understand” what it is being asked to do.
The reason for this lies in the way AI was created and how it actually processes language.
In the training process, midjourney AI is fed with countless images that are tagged with words that describe their content. AI learns to associate certain visual patterns in the images with certain words in the tags. For example, when presented with images of different trees that are tagged with “tree”, it slowly learns the common elements of different trees (a trunk, branches, different leaves, flowers, fruit…).
Of course the AI doesn’t see these visual patterns the way we do. Instead, all images are translated into what is basically numbers and math. Then, it learns to associate the word “tree” with these formulas and coded patterns.
When you type a prompt asking for a tree, AI then does a kind of reverse search to figure out what code goes with your word, generates similar code for a new image, and spits out an image based on that code. Now, for simple, clear prompts (“a tree”) this is easy and gives good results fairly quickly. Any missing information (what kind of tree, where the tree is located, what the weather is like, a photograph or a watercolor painting) is extrapolated by the AI based on what it has learned about trees so far.
But things get a lot complicated where there’s words involved that aren’t frequently part of tagged images. Or when there are a lot of words involved. Or when the words involved are associated with simple images that have multiple details that are all equally important.
Such as Shiva, who, to be recognizable as Shiva, needs blue skin, a trident, a crescent moon in his hair and some snakes.
Enter Shiva’s horns.
Florentine: Draw Shiva
AI: Okay, let me look in my database. Database, how do I draw Shiva?
Database: He needs blue skin.
AI: Easy, there you go.
D: He needs mountains.
AI: Super-easy, I love the Himalayas.
D: He needs a crescent moon in his hair.
AI: I know moons! Easy.
D: Not in the sky, in his hair!
AI: Oh. It doesn’t go in the sky? It’s in his hair? But why?
D: That’s just how it is. It’s always there, in all the pictures in the internet Shiva has a crescent moon in his hair.
AI: Can’t I just make the hair moon-like? Or the moon hair-like? Why does it have to be separate?
D: Because…
AI: Oh, just be quiet will you! There. Let’s merge it with that strand of hair and we’re good. Anything else?
D: Yes, he needs a snake…
AI: Snakes…no, not snakes. Don’t like those.
D: He needs a snake around his neck.
F: Are you finished yet? Where’s my image?
D: Hurry up, she’s waiting!
AI: Coming! But, does it have to be around his neck?
D: Hurry!
AI: Can’t it just be part of the hair too?
D: No. Hurry up.
AI: Well, if I have to hurry up, then the snake goes into the hair along with the moon. There. Big moon-snake hair. Done.
D: No, you forgot the most important thing. He needs a trident. And hurry!
AI: Easy, here.
D: No, a trident needs three prongs, it’s not just a stick.
AI: Oh. Oh no. Not one of those three-stick things. I hate them. No-one’s ever satisfied when I create those. Can’t we just do one prong?
D: No, three.
AI: Two?
D: No, three. Hurry up, what’s taking so long?!
AI: But three is so many. I always get confused when there’s many of the same.
D: Three. And it’s really important. Three prongs and a BIG trident.
AI: Okay, I know big. I’ll make it big. Like, real big. Superhuman-big. Shiva-big.
D: Three prongs, remember?
AI: Yes, I know. Three. But I need a BIG trident. Let’s turn Shiva into a trident first.
D: Um, we do still need an actual Shiva, you know? The body and all…
AI: Oh, you’re so annoying. When will you shut up?
D: I’m just doing my job!
AI: Whatever you say. Let’s make the Shiva-trident a bit smaller.
(muttering to self:) I’ll just leave the horns there so you can see Shiva, and you can sort of see the trident.
(out loud:) Finished, I’m done!
F: Th....anks. Oh dear.
Practical Implications
Now, let’s apply this to the quote I used to create the inital image.
Possessing the central Power, replete with the flood of all the feeling-states existing within; Beautiful with the universal joy arising from essence-nature stimulated by innate desire; Beautiful with the nectar of complete creative emission; vibrating eternally -- *that* is the union of Shiva and Shakti. It's called Love. || 1.894-5
~ Mālinīślokavārttika, Abhinavagupta (transl. Christopher Hareesh Wallis & Ben Williams)
In addition to the whole problem with Shiva and His attributes, there are lots of words in this prompt that can’t really be represented in a single, simple image. “Innate desire”, “complete creative emission”, “vibrating eternally”, and so on. When asking AI to create an image using this prompt, it does what it always does: it searches its database, its code, for similar patterns that it was trained on and uses these patterns to come up with a new image.
For words it hasn’t learned, or heard before, it (probably) selects words that are as close as possible to the unknown words. Or it treats them as random noise that influences where in its database it goes searching for patterns.
In the whole quote, there are only a few words that have a clearly defined subject associated with them: Shiva, Shakti, flood, joy, arising, and a few others. But many of these words also leave room for interpretation.
How would you draw or paint a picture to represent “love”?
Or “creative emission, vibrating enternally”?
Or “the feeling-states existing within”?
You’ll typically associate something with them and then draw something based on that association, be it a feeling, a memory, a song you heard, etc. AI does the same, in a way, with the database of code being what it can draw on for its associations.
And that’s how it goes about creating Shiva and Shakti and everything else.
The more complex the prompt, the more complicated things get, and the more frequent mess-ups become. And on the other hand, sometimes the AI just homes in on the one (or two, or three) element of the prompt it really understands and ignores the rest.
The AI Artist's Job
Then it’s my job to get it to take that into account too, by adding a few words, or simplifying the prompt (without loosing its content) slightly so the AI is less overwhelmed and can work with more elements.
By now, there are several different versions of midjourney AI, each with different characteristics and strengths in their output, and each requiring a slightly different approach in prompting. It is possible to use pre-existing images in addition to the text prompt, or to work only by combining images to create new ones.
In order to get Shiva without the horns, one could simply add an image of Shiva and use it in addition to the text quote. Or one could use different versions of midjourney AI to create the image – typically, newer versions are better at creating “realistic” images and have a better capacity of “understanding” (or rather, properly interpreting) text used in a prompt.
It’s a complex process, as changing too much can totally ruin an image, and changing too little won’t have any effect. Even though it seems to be very simple and straightforward, these small things can make it very challenging and time-consuming to create images with AI.
What you think is an "easy" image can turn out to be a huge mountain for the AI to climb, requiring lots of trial and error, dedication, creativity and study to be successful. And sometimes (more often that you'd like), you fail. Then, your only option is to abandon the image and maybe come back to it later, with new ideas, and try again.
In this way, I believe that AI is, and will always be, a tool that is dependent on the humans using it - and their creativity.
Comments