After experimenting with online sources, then running Stable Diffusion locally using Windows Subsystem for Linux, I wanted more, and better, because I knew my machine was capable of much more. So I looked into alternatives and found Automatic1111’s Stable Diffusion variant.
The core take away here, is how this is like night and day in performance and quality.
Previously, with WSL, I would run batches of prompts and seeds and maybe get a few okish results. Also, any dimensions larger than the base 512×512 would crash the thing and I’d get nothing. Basically, it definitely was not exploiting the full potential here. It also completely dogged my entire rig down while building an image, which took maybe 5-10 minutes for the actual processing to work.
It still dogs the machine down, but not nearly as much as it had been. And it takes like 10-20 SECONDS to produce an image. The image quality is also like 1000 times better, though not still without that “AI Art Wonkeyness” like these weird square cows.
—- Read More —-: A Progressive Journey Through Working With AI Art – Part 3 – Running Natively with Automatic1111Os these Lord fo the Rings based Prompts, “A Portrait of Gandalf” and “Gandalf battling the Balrog”
It does still have it’s weird ticks. I did some city scenes of Chicago and New York and New York, maybe 50% of the time, is like, all Taxis (which is probably accurate) and Chicago always ends up with several John Hancock buildings.
I also have a few prompts I keep coming back to for testing. One of which is Godzilla, and Kaiju and Evangelion battles, in general.
Or these generic Final Fantasy Characters, who all feel like they would fit right into a current era game.
One of the main things I have noticed is that the results with this system are much “fuller”. It’s not just a random image on a bare backdrop, it does a much better job of filling the entire space. It also handles faces MUCH MUCH Better, though still often has trouble with hands and multiple arms. It happens a lot less frequently though. It’s all light years ahead of the old DALL*E Mini results. Every single time I used a prompt that would produce a face on DALL*E Mini it was a weird Eldritch horror or hidden in shadow. The local instance, and especially the Auto1111 version, produces pretty good faces every time.
Prompt: A Normal Human Hand
It does a lot better at generic faces though. You can add celebrity names to prompts, and get really consistent results, but a lot of “celebrity prompts” produce weird caricature versions of celebrities. It also only really recognizes really well l known celebrities, and I am pretty sure this has to do with the amount of training data it’s been fed. Taylor Swift, for example, comes out pretty spot on, but I am pretty sure she has a huge number of images in the original training set. Sigrid, even “Sigrid Raabe” doesn’t result in anything that looks anything like Sigrid.
Or Harry Potter. Which seems to produce some weird goony looking Newt Scamanders, but does pretty well with Hermione and Harry. Interesting though that the Harry results tend to look like Daniel Radcliff, but the Hermione results tent to come off more like what the Book Version of the character might look like.
The whole process can still be improved upon. This set up will give plenty of larger, nicer images, but they still aren’t quite “amazing”. The next step was to learn to use better prompts. And so I set out to see what kind of prompts other people used to get that “Extra Pop”.
Which will be the topic of Part 4.
Josh Miller aka “Ramen Junkie”. I write about my various hobbies here. Mostly coding, photography, and music. Sometimes I just write about life in general. I also post sometimes about toy collecting and video games at Lameazoid.com.