Tagnet: a CLIP tags exploration tool

Online demo

CLIP tags

Introduction

CLIP and VQGan allow you to generate beautiful images from text. The descriptions of these images should be more specific than in natural language and are called prompts 1. The goal I have while making this document and code is to document how to reach the best results using prompts.

At Jun 1st., 2021, Aran Komatsuzaki tweeted that mentioning “Unreal Engine” changes the visual style and quality of an image. Since CLIP learned on the images from the Internet, the “Unreal Engine” can be called one of its many sources of inspiration. Even before that, many looked for tags, words that change how CLIP draws things.

I’ve experimented with many CLIP prompts using a Discord bot by BoneAmputee and decided to build a list of words I use often.

Then I experimented more, especially with a pencil style and understood I will need more than one list, because co-occurences of the words create a graph! I have also added many prompts by other users, often with some editing and pre-processing to make them more uniform. The prompts directory contains two files with mostly cleaned up prompt samples.

Footnotes

1

ArXiV: Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm by Laria Reynolds and Kyle McDonell. This article talks about the GPT-3 language model, but the same term applies to GPT-2, GPT-3, GPT-j and CLIP itself.