Multi-Input Image-to-Image Diffusion Model for Font Style Translation
Session Number
Project ID: CMPS 42
Advisor(s)
Dr. Ashwin Mohan, Illinois Mathematics and Science Academy
Discipline
Computer Science
Start Date
17-4-2024 8:55 AM
End Date
17-4-2024 9:10 AM
Abstract
Many attempts have been made to use generative artificial intelligence—neural networks that create new text or images given inputs of the same type—to synthesize characters or entire fonts from a few characters. Previous studies have used glyph (individual strokes that make up characters) detection and conjoinment to create these characters but fell short in connecting the glyphs to reproduce characters. A few recent studies have used AI diffusion models to try and accomplish the same. However, these models could only input Scalable Vector Graphics (SVG) images, leaving raster image formats such as Portable Network Graphics (PNG) images unusable. Additionally, these models were unable to create characters that look like lowercase letters, like the symbol for the Vietnamese dong, when given single-case fonts. Here, we have developed an image-to-image diffusion model, with a structure based on the one in IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis, to bypass the issues of glyph joining and file type limitations. Our model takes into account both a style input and a structure input, which enables us to create lowercase characters unhindered. We will present results about how accurate our model is and how it compares to other models, both diffusion and non-diffusion.
Multi-Input Image-to-Image Diffusion Model for Font Style Translation
Many attempts have been made to use generative artificial intelligence—neural networks that create new text or images given inputs of the same type—to synthesize characters or entire fonts from a few characters. Previous studies have used glyph (individual strokes that make up characters) detection and conjoinment to create these characters but fell short in connecting the glyphs to reproduce characters. A few recent studies have used AI diffusion models to try and accomplish the same. However, these models could only input Scalable Vector Graphics (SVG) images, leaving raster image formats such as Portable Network Graphics (PNG) images unusable. Additionally, these models were unable to create characters that look like lowercase letters, like the symbol for the Vietnamese dong, when given single-case fonts. Here, we have developed an image-to-image diffusion model, with a structure based on the one in IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis, to bypass the issues of glyph joining and file type limitations. Our model takes into account both a style input and a structure input, which enables us to create lowercase characters unhindered. We will present results about how accurate our model is and how it compares to other models, both diffusion and non-diffusion.