Multi-Input Image-to-Image Diffusion Model for Font Style Translation

Session Number

Project ID: CMPS 42

Advisor(s)

Dr. Ashwin Mohan, Illinois Mathematics and Science Academy

Discipline

Computer Science

Start Date

17-4-2024 8:55 AM

End Date

17-4-2024 9:10 AM

Abstract

Many attempts have been made to use generative artificial intelligence—neural networks that create new text or images given inputs of the same type—to synthesize characters or entire fonts from a few characters. Previous studies have used glyph (individual strokes that make up characters) detection and conjoinment to create these characters but fell short in connecting the glyphs to reproduce characters. A few recent studies have used AI diffusion models to try and accomplish the same. However, these models could only input Scalable Vector Graphics (SVG) images, leaving raster image formats such as Portable Network Graphics (PNG) images unusable. Additionally, these models were unable to create characters that look like lowercase letters, like the symbol for the Vietnamese dong, when given single-case fonts. Here, we have developed an image-to-image diffusion model, with a structure based on the one in IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis, to bypass the issues of glyph joining and file type limitations. Our model takes into account both a style input and a structure input, which enables us to create lowercase characters unhindered. We will present results about how accurate our model is and how it compares to other models, both diffusion and non-diffusion.

Share

COinS
 
Apr 17th, 8:55 AM Apr 17th, 9:10 AM

Multi-Input Image-to-Image Diffusion Model for Font Style Translation

Many attempts have been made to use generative artificial intelligence—neural networks that create new text or images given inputs of the same type—to synthesize characters or entire fonts from a few characters. Previous studies have used glyph (individual strokes that make up characters) detection and conjoinment to create these characters but fell short in connecting the glyphs to reproduce characters. A few recent studies have used AI diffusion models to try and accomplish the same. However, these models could only input Scalable Vector Graphics (SVG) images, leaving raster image formats such as Portable Network Graphics (PNG) images unusable. Additionally, these models were unable to create characters that look like lowercase letters, like the symbol for the Vietnamese dong, when given single-case fonts. Here, we have developed an image-to-image diffusion model, with a structure based on the one in IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis, to bypass the issues of glyph joining and file type limitations. Our model takes into account both a style input and a structure input, which enables us to create lowercase characters unhindered. We will present results about how accurate our model is and how it compares to other models, both diffusion and non-diffusion.