Preproduction AI Experiments

Digital image production paradigms are undergoing radical change due to the advent and popularisation of generative AI. These (and related) technologies will find their way into every aspect of the (digital) image production pipeline.

Preproduction Storyboards that look like storyboards…

While interesting for generating concepts, perhaps shots and angles, or interpretations of imagery, is there value in generating images that explicitly ‘look’ like pre-production materials (concept style, storyboard style etc)?

Massaging specific scenes for concept/storyboard by refined prompts and model selection

The generative platform resists certain descriptions, such as ‘profile’, defaulting to a 3/4 view. Prompts with complex description such as position, direction, description, action, environment, expression, colors etc become very unwieldy! Utilizing image constraints, or iterating between sketches and generation cycles becomes a necessary approach (human-in-the-loop).

Prompt Engineering is more complex than it seems…

We must understand the nature of prompts as tokens and interpretive clip models that create conditionings. As well as combining the different aspects of a description as text (ie, manipulating the order and nature of tokens), there are various ways of combining different conditionings – notably: combining, concatenating or averaging the weights of different prompt parts.

Left (a) = Prompt 1. Left (b) = Prompt 2.
Right (a) = image gen with **COMBINED** conditioning. Right (b) = with ***CONCATENATED*** conditioning. Right (c) = ***WEIGHTED AVERAGE*** of conditioning.

Altering prompts can often improve results, but most often other variations are also needed, such as using constraint images via controlnets, masking, pose estimators and other tools.

From sketch to concept art

Elements of the image can be refined separately – the layout, color scheme and content can be controlled independently by using different image guidance constraints.

Pose constraints

A very useful feature for pre-production tasks is the ability to constrain generated images to defined poses.

Limitations on discipline understanding

The fine tuned models available are mostly identified for style, not purpose. This means that it is relatively easy to select a model that aligns with a style (or to train one, given enough source images), but it is more difficult at this stage to create planning images with proper utility for their role in the design process. For example, while the models seem to understand that an expression sheet involves multiple faces, it is very difficult to create prompts that generate an appropriate array of expressions.