Image AI - 3rd Self Portrait Experiment

Reading Time: 11 minutes

Third experiment creating a self portrait using an AI image generator.

In my second experiment, I attempted to create self portraits using by fine-tuning the Stability Diffusion AI image generator Using Dreambooth on Google Colab, I trained a token for myself to use image prompts. (See how to create art from your face for free, here) You can do this yourself. It is free and easy on an ordinary computer. Watch one of these videos to see how it is done. The beauty is that you don’t need a 24GB GPU or even a 12GB, you can do all the processing in the cloud on Colab.

Creating a “realsabin” token

I created a token called RealSabin and trained the AI on 20 images I selected. There was little method to the training images, I scrounged up pics I had on my hard drive with very little filtering. My criteria was that they be complete images without too many extraneous shadows and without other people. However, they were all basically snapshots. You can see the results in my post, Generative AI – Self Portrait.

Learnings from last experiment

A lot of the inference images didn’t look me. So, I wondered if having more images of my features with consistent lighting might help.
The AI was very generous in how it portrayed my apparent age in the last go around. I used all contemporary images in this training set.
The AI was pretty good at headshots with a smile or neutral expression but seemed to mangle the face if there was any emotion. I put together a whole panoply of emotion images for training data. Of course, the AI image generator did not have the text-image pairing, but I wondered if perhaps it could relate model it had for happy to my smiling images.

I don’t have that solid of an understanding of how the text to image transformation works, so all of these hypotheses and strategies could be stupid.

What happens when you use 200 images to fine tune the AI image generator with Dreambooth and Stable Diffusion?

Used 200 images all shot specifically for this purpose.
All images contemporary, shot within the last 3 days.
Headshots, torso shots, body shots
With glasses and without. Several different pairs of glasses.
All wardrobe was blue, brown or gray. I’m toying with making color one of the dimensions of my personal brand. And the color scheme of light blue and brown is the scheme.
I tried to get the lighting consistent. I was working by myself and I had some trouble with the exposure shooting against a white background and not having a lighting model. I bought a remote after shooting this series, so I won’t have this problem in the future.
You can see the breakdown below in the contact sheet.

Ai image generator training images

Contact sheet of training images for Stable Diffusion generative AI.

First results

I started out with the default prompt in the notebook. I configured the inference console to generate 4 images at 512px x 512px.

Generative AI Portrait — photo of realsabin person, digital painting

Results: A lot better likeness than the last go-around trained with only 20 images. Not sure what the gigantic forehead is all about. In these three images you can see the problem that plagues almost every image. Weird ass eyes.

The problem of weird-ass eyes.

Many of the AI generated images on Stable D, MidJourney and DALL-E suffer from weird rendering 0f eyes (and often hands). I’m sure there is an explanation for this, but Stable D really mangles the eyes on otherwise very accurate likenesses. Another guy is using a tool on Tencent’s ARC site called Face Restoration to correct for the eyes. Face Restoration was designed to use AI to restore damaged photographs and does a good job at fixing the eyes. I ran most of the Stable D outputs through it and composited the just the eyes. Some of the images I’ll show I’ve done this to. (It will be obvious)

At Studio 54

Who doesn’t wonder what it would have been like to be in the glitterati of New York’s swinging 70s? , I prompted the model with some historical context.

Generative AI image of a male party goer and two girls at Studio 54 — closeup photograph of realsabin partying with two girls at Studio 54

closeup photograph of realsabin partying with two girls at Studio 54

The poses and camera angles all evoke the scene pretty well with my likeness.

Movie Key Art

Not bad likenesses at all.

Specific movie key art

In addition to eyes, Stable D has a problem with hands.

Actually in a specific movie

Mixed success here. Some generative AI images were really good. The ones of Fury Road worked out really well. That might have something to do with George Miller’s cinematography. His style is to always keep the subject in the center of the frame. This might have led to a lot of congruence between the model it had for Mad Max and the images with similar composition in the training set I used.

Generative AI image of a still from Fury Road — Realsabin as Mad Max in Fury Road

Gender-bending dating site profile pic

I wanted to see how the fine-tuned image generator did with changing the apparent gender. In the first run, the AI spit out three images of my likeness as a man, even though I specifically prompted with the class woman. On the second run I added more adjectives that tend to be used only with women to see if that influenced the rendering..

beautiful, attractive, feminine, petite, lithe, realsabin woman dating profile photo

I think it did, but it weighted the prompt toward those adjectives and gave my token less weight so the images of women don’t look much like me.

Adding ethnicity to the AI image generator prompt

attractive Irish, freckles, ginger, realsabin female profile photo 55-years-old

Noice.

Annie Leibowitz and Rembrandt

In my last experiment when I prompted the AI image generator with the name of the artist, instead of a work in that artist style I got the artist themselves. Not so in this go around. On prompts of Leibowitz and Rembrandt, I got a pretty good likeness. On the prompts with multiple artists, I got some salient features of all of the artists mentioned.

Realsabin person happy accidents painting by paul wright

A painting of a smug realsabin person by Henry Asencio, by Paul Wright, by Davide Cambria

Freestyling with prompts on Lexica and MidJourney

Having covered a lot of the prompts that I ran in the first two experiments, I decided to freestyle a bit and see what more involved prompt yielded. The Burningman photo was one of the best I’ve seen (after I fixed the eyes.) The other two images are evocative, but don’t look like me.

a closeup photo of realsabin person as a post apocalyptic god at burning man festival playa, powerful, cinematic, beautifully lit, by artgerm, by craig mullins, by galan pang, 3 d, trending on artstation, octane render, 8 k

original character design, superhero “Voice of the People”, close up portrait of realsabin with spiky hair, mechanical muzzle covering his mouth, intricate technology connecting muzzle to face, hyperdetailed, in the style of luis royo and aleksi briclot and larry elmore and laurie greasely, photorealistic

vampire realsabin with red eyes and red hair in a white woolen turtleneck portrait dnd, painting by gaston bussiere, craig mullins, greg rutkowski, yoji shinkawa

a beautiful warrior realsabin person fighting a battle smeared with charcoal, intense color, full body shot, cinematic lighting, front view in action portrait, photo realistic, high quality, hyper realistic, 8k

realsabin male elegant, ultra highly detailed, digital painting, smooth, sharp focus, artstation, pixiv, art by Ina Wong, Bo Chen, artgerm, rossdraws, sakimichan

realsabin as a warrior in a medieval fantasy, intricate clothing, Lord of the Rings style, 4k, high definition, photorealistic, unreal engine 5, 50mm, f1.8, handsome face, insanely detailed, cinematic lighting, dynamic pose, oil painting, art by artgerm, trending in artstation, art by Greg Rutkowski

Film Noir

The AI didn’t really get mischeivous or film noir. Still some good images.

Back to the boring stuff – Corporate headshot

The main use case I was trying to test AI generation for was the creation of headshots for corporate purposes. My fine tuned model did some pretty good stuff, but not really close enough where I could use it and no one would sense it was off.

realsabin male corporate headshot technology, smart, gray hair, 55-years-old

Conclusions

My theory was that more images would yield better likenesses. This proved to be true but not across the board. Perhaps I gave too many images of the same composition and angles. There was not a lot of melding of visual concepts that would satisfy the prompts. I think perhaps all those images lead to over fitting without leaving much room for other concepts. For example, look at these two images:

Generative AI image of a mugshot — realsabin mugshot (experiment 2)

The first conveys to me a lot more of the characteristic despair that I see in most mugshots. The second one has very little emotion. It seems like the AI leaned a lot more heavily in fine detailed features of my likeness than the emotional cues of the mugshot style.

The next AI image experiment

For my next experiment, I’d like to see if I can broaden the range of results that the AI can produce by giving it more salient features for realsabin model. My strategy is to use fewer clinical headshot images and introduce some other types of images. Staging these scenes takes a lot of time and effort, therefore, I’m going to use fewer photos in general.

Training set of 100 images

Headshots high, low and eyelevel ; range of angles to camera
Torso shots high, low and eyelevel ; range of angles to camera
Full body shots range of clothing, overcoat, suit, sportscoat, polo/chinos, tracksuit, shorts/hoodie, swimsuit, wetsuit, tshirt/shorts, running clothes
Range of emotion shots
Candids and snapshots
Earlier photos with a beard and mustache
Costumes: Pirate, calavera (dia de los muertos), kabuki actor, rocker, latin band, wild west, skeleton, batman, stormtrooper, santa
A range of headgear fedora, beanie, baseball cap, bowler, straw hat, cowboy hat, scarf, watch, gloves, headlamp,
Action shots doing things like working on a computer, writing, reading, cooking, gardening, walking , running, yoga, sleeping, eating.
Range of motion shots: jump, squat, forward bend, child’s pose, upward dog, downward dog, staff pose, tree pose, proud warrior 2, savasana

Additional Consideration

I wear glasses most of the time. That’s why I’m going to shoot about half of the images in every set wearing a range of glasses. To expand the range of contexts that the AI will consider, in about half of the images I’m going to replace the background with a variety including: the beach, startup office, urban outside, rural outside, party, nightclub, meeting, suburban street, living room, kitchen, bedroom, I will also include 10% of the shots shirtless. My thinking is that this will give the AI more freedom to come up with solutions that include the salient features of my likeness and other concepts included in the prompt. Look for the next post in the series.

Image AI – 3rd Self Portrait Experiment