DALL*E

A Progressive Journey Working With AI Art – Part 5 – Training the AI

I’ve had a bit of a pause on this series, for a few reasons, mostly just, the process is slow. One of the interesting things you can do with Stable Diffusion, is train your own models. The thing is, training models takes time. A LOT of time. I have only trained Embeddings, I believe Hyperwork Training takes even longer, and I am still not entirely sure what the difference is, despite researching it a few times. The results I’ve gotten have been hit and miss, and for reasons I have not entirely pinned down, it seems to have gotten worse over time.

So how does it work. Basically, at least in the Automatic1111 version of SD I’ve been using, you create the Embedding file, along with the prompt you want to use to trigger it. My Advice on this, make the trigger, something unique. If I train a person, like a celebrity, for example, I will add an underscore between first and last name, and use the full name, so it will differentiate from any built in models for that person. I am not famous, but as an example, “Ramen Junkie” would become Ramen_Junkie” for example. So when I want to trigger it, I can do something like, “A photograph of ramen_junkie in a forest”.

This method definitely works.

Some examples, If I use Stable Diffusion with “Lauren Mayberry” from CHVRCHES, I get an image like this:

Which certainly mostly looks like her, but it’s clearly based on some older images. After training a model for “Lauren_Mayberry” using some more recent photos from the current era, I can get images like this:

Which are a much better match, especially for how she looks now.

Anyway, after setting up the prompt and embedding file name, you preprocess the images, which mostly involves pointing the system at a folder of images so it can crop them to 512×512. There are some options here, I usually let it do reversed images, so it gets more data, and for people, I will use the auto focal point deal, where it, theoretically picks out faces.

The last step is the actual training. Select the created Embedding from the drop down, enter the folder of the preprocessed images, then hit “Train Embedding”. This takes a LONG time. In my experience, on my pretty beefy machine, it takes 11-12 hours. I almost always leave this to run overnight, because it also puts a pretty heavy load on everything, so anything except basic web browsing or writing is going to not work at all. Definitely not any sort of gaming.

The main drawback of the long time is, it often fails. I’m not entirely sure WHY it sometimes fails. Sometimes you get bad results, which I can understand, but the failing just leaves cryptic error messages, usually involving CUDA. I also believe sometimes it crashes the PC, because occasionally I check on it in the morning and the PC has clearly rebooted (no open windows, Steam/etc all start up). I generally keep my PC up to date, so it’s not a Windows Update problem. Sometimes if the same data set fails repeatedly I’ll go through and delete some of the less ideal images, in case there is some issue with the data set.

Speaking of Data Sets, the number needed is not super clear either. I’ve done a few with a dozen images, I’ve done some with 500 images. Just to see what kind of different results I can get. The larger data sets actually seemed to produce worse results. I suspect that larger data sets don’t give it enough to pull out the nuances of the lesser number of images. Also, at least one large data set I tried was just a series of still frames from a video, and the results there were ridiculously cursed. My point is mostly, a good middle ground seems to be 20-30 base images, with similar but not identical styles. For people, clear faces helps a lot.

I have tried to do training on specific styles but I have not had any luck on that one yet. I’m thinking maybe my data sets on styles are not “regular” enough or something. I may still experiment a bit with this, I’ve only tried a few data sets. For example I tried to train one on the G1 Transformers Cartoon, Floro Dery art style, but it just kept producing random 3D style robots.

For people, I also trained it on myself, which I may use a bit more for examples in a future post. It came out mostly OK, other than AI Art me is a lot skinnier and a lot better dressed. I have no idea, but every result is wearing a suit. I did not ask for a suit and I don’t think any of the training images were wearing a suit. Also, you might look at them and think “the hair is all over”, but I am real bad about fluctuating from “Recent hair cut” to “desperately needs a haircut” constantly. The hair is almost the MOST accurate part.

Anyway, a few more samples of Stable Diffusion Images built using training data.

A Progressive Journey Through Working With AI Art – Part 4 – Better Prompts

The next step in my journey to better AI Art, was better prompts. Which also has sort of landed me on just using one complex prompt I found and modifying it as needed, which works very well. I started off by adding more descriptive words to the basic prompts. Including Camera models which was suggested by quite a few people.

  • “In the Style of Manga”
  • “An oil Painting Of”
  • “A Pencil Sketch of”
  • “in the style of [artist]
  • “Realistic”
  • “Hyper-realistic”
  • Canon 5D

This worked better. But I started looking around on the Stable diffusion Sub-Reddits for good prompts to use. I came across the following Prompt:

, (humorous illustration, hyperrealistic, big depth of field, colors, night club scenery, 3d octane render, 4k, concept art, hyperdetailed, hyperrealistic, trending on artstation:1.1)

Negatives:
text, b&w, (cartoon, 3d, bad art, poorly drawn, close up, blurry, disfigured, deformed, extra limbs:1.5)

Which I have used and adapted quite a lot. Essentially, everything in front of the first Comma is your actual prompt. This is essentially, what I have been doing. Everything after refines things a lot. You can also change the background by editing the “night club scenery” bit.

Anyway, the rest of the post is sharing some more pics based on this prompt.

Prompt: “Tracer from Overwatch” +

As normal, really iffy on the hands, but still some neat concepts that could actually be skins in the game.

Prompt: Godzilla +

Prompt: Several different Batman Prompts (Batman Fighting, Batman Overlooking Gotham, Batman Battling Joker)

Prompt: The Joker +

These are some of my favorites so far. I am not a huge Joker Fan really, but they do a REALLY good job of portraying the more modern crazy that is The Joker. I actually left a few off because frankly, they are super creepy, but really are nice.

Prompt: Professor Layton

Again, it has no idea who Layton is, but still seems to do really well with the Aethetic of Layton. Which is kind of odd honestly.

Prompt: An Adorable Pixar Kitten

Feels like Pixar styled art is cheater mode a bit but these came out pretty good as well.

Three Prompts with similar results, A Norwegian Landscape, The Lord of the Rings, and Arya Stark,

It’s kind of crazy just how much better the results have gotten from previous attempts, especially just like, 6 months ago or something, when I started playing with this concept using online tools. That said, it also gets old pretty quick, and you end up with a lot of “Weird shit” output, extra limbs, weird proportions, extra elbows, odd faces. I can see how it might be useful to produce some generic banner backdrops and whatnot. I also can see it just getting even better, very rapidly. If hands can be figured out, that would be a real game changer.

A Progressive Journey Through Working With AI Art – Part 3 – Running Natively with Automatic1111

After experimenting with online sources, then running Stable Diffusion locally using Windows Subsystem for Linux, I wanted more, and better, because I knew my machine was capable of much more. So I looked into alternatives and found Automatic1111’s Stable Diffusion variant.

The core take away here, is how this is like night and day in performance and quality.

Previously, with WSL, I would run batches of prompts and seeds and maybe get a few okish results. Also, any dimensions larger than the base 512×512 would crash the thing and I’d get nothing. Basically, it definitely was not exploiting the full potential here. It also completely dogged my entire rig down while building an image, which took maybe 5-10 minutes for the actual processing to work.

It still dogs the machine down, but not nearly as much as it had been. And it takes like 10-20 SECONDS to produce an image. The image quality is also like 1000 times better, though not still without that “AI Art Wonkeyness” like these weird square cows.

—- Read More —-: A Progressive Journey Through Working With AI Art – Part 3 – Running Natively with Automatic1111 Read More

A Progressive Journey Through Working With AI Art – Part 2 – Running Locally

My initial foray into doing AI Art locally involved Stable Diffusion, which can be found here. This path ended up being sort of, multi staged. Initially I set it all up to run off the command line through PowerShell and Python. I ran through several prompts aver a few days off and on with iffy results (above). At some point I closed it down. When I came back, I couldn’t get it to run again. I couldn’t figure out WHY and so I gave up and nuked it to start fresh. (The problem was I forgot to use the Python Virtual Environment, DUUUUH).

Part 1 can be found here.

Side Note, Unfortunately, I don’t know the prompts for many of these because my initial runs just produced file names with numbers for file names.

—- Read More —-: A Progressive Journey Through Working With AI Art – Part 2 – Running Locally Read More

A Progressive Journey Through Working With AI Art – Part 1 – On the Web

This was originally one huge post, but I decided it would be better to be split up across a few posts.

Is it Dall*E? Dall-E? DALLE? I’m honestly not sure, and I’ll probably jump around a bit on which I use. Whatever it is, it’s AI Generated Art from text based prompts. And it’s pretty neat, and progressively more and more neat.

Initially, it was very limited to even do anywhere, but there was this great little service called Dall*E Mini, which let you make simple images with a sort of simplified version of the Dall*E 2 Engine. Or something like that. It eventually evolved into crAIyon.com. I don’t really use it much anymore and I’ve misplaced the images I generated with it… somewhere. Aside from that “Ramen Junkie” image above.

I have tried a few other similar systems with mixed results.

This is long and there are lots of images, so feel free to click through to read more…

—– Read More —-: A Progressive Journey Through Working With AI Art – Part 1 – On the Web Read More