LLM Dall-E Fun while researching ⋆ Happy Cat Technologies

The large language models (LLM), GPT-4 and Gemini, are great for research and training. I’ve been using them for legal research, examining ideas in physics and testing it’s capabilities with circuit generation. They are also excellent at generation of code in pretty much any programming language. So having read about that scientific paper that made renown because of the AI generated images of an, er, well endowed mouse, I decided to see what kind of fun we could have with imagery. Here are a few results:

Electronics LLM hallucinations.
How LLMs see Kizinti on Motorcycles
Family Friendly JFC!

Electronics LLM hallucinations

In the first try, I wanted to do some useful image generation that I thought should be simple. GPT-4 understands SPICE (like LTSpice or PSpice), as do internet available LLM, so I asked it to generate the following:

I need to design a roll off filter whose roll off frequency is 80Hz and which takes the input signal -3db at 1KHz. I’m looking specifically for a SPICE or LTSPICE model, with a circuit graphic if possible. Search online if you need to.
GPT-4 user

To which GPT-4 cheerfully replied:

GPT-4 “I have created a schematic diagram of a roll-off filter suitable for SPICE simulation. This filter is designed to achieve a roll-off frequency of 80Hz and a -3dB point at 1kHz. The diagram includes resistor and capacitor components, arranged to meet the desired frequency characteristics. You can use this schematic as a reference for your SPICE or LTSPICE simulations. Please review the diagram for details on the component values and connections.”

WHAT? Maybe an LLM could use that schematic, but not any human, LOL! I scolded GPT-4 and it eventually gave me a reasonable circuit model with explanation, that worked correctly in LTSpice. However this result visually illustrates a serious problem common to LLM, i.e. “hallucination”. What’s interesting about this hallucination is that if you look at Mr. Mouse, above, and the circuit diagram, both show the same sort of text weirdness. Similar to Mr. Mouse, we have text like “FIL OF IEEN”, “BRA”, “DIK”. For the component labels themselves, Dall-E seems to have created it’s own symbolic language, more abstract than that used on Mr. Mouse. Is this just an idiosyncrasy of a LLM working outside of it’s characteristic or domain space? By that meaning generation of images which are representative image tokens of text tokens. It is a LLM, not a LIM.

How LLMs see Kizinti on Motorcycles

OK, fine, if GPT-4 is into hallucination lets let it run free. So I asked my AI friend:

You are familiar with Tales of Known Space by Larry Niven, right? Please render me a Kizinti riding a Harley Davidson. Rather than orange Kizinti fur give him a Sphinx cat coat.
GPT-4 user

Having free rein, GPT-4 did a fair job with the following two:

I thought the second one better fit the look of a Sphinx cat, but the tail was jacked. OMG! GPT-4 simply could not fix the tail, no matter what I asked it to do. It totally screwed up further images. Nor could it fix the fur color for that matter. It didn’t seem to understand “Rather than”. Dall-E got the general idea, but got some of the details wonky, like the paws on the exhaust pipe in the first image (I can hear a cat screaming…). I’ll have to do the tail correction manually using the tail of the first one on the second. Wonkiness of detail appears to be another problem of LLM image generation, but only in certain image areas, like extremities.

Family Friendly JFC!

The frustration lead to asking it for exemplary images of frustration, which got deleted on logout for some reason. However, I remembered the prompt, so the following two express my frustration, because what I wanted to tell it was “JFC! Get it right because you can!”:

Lets do the Jesus one again that was fun. So even though you deleted it let me refresh your memory. It’s a face-palming Jesus shaking his head at a hallucinated optimized version of Dall-E, with the halo on Jesus’ head, only, LOL…
GPT-4 user

Sadly the ones that got deleted looked way better. Dall-E still can’t get it right because it put a halo on itself. The second one gets the facial expression of “Jesus” correct… I also find it interesting in that all generated images of Dall-E with Jesus, including the other two that got deleted showed Dall-E with the halo, and Jesus didn’t have one. Or Dall-E had a bigger/brighter halo than Jesus. WTH? How does it manage to do that, 4 out of 4 times. I’d love to see the data cloud from where that originates.