posted on 2008-01-01, 00:00authored byLeonid Kontorovich
Abstract: "We define the family of n-gram embeddings from strings over a finite alphabet into the semimodule N[superscript K]. We classify all [xi] [element of] N[superscript K] that are valid images of strings under such embeddings, as well as all [xi] whose inverse image consists of exactly 1 string (we call such [xi] uniquely decodable). We prove that for a fixed alphabet, the set of all strings whose image is uniquely decodable is a regular language."