
A new Apple patent, released Thursday by the United States Patent Office, details an improved system for text-to-speech software. Previous efforts have always used a static voice to translate the words into sound, but Apple intends to now give them personality.
The patent details a system where the program will search through a message’s metadata to put together a profile. It will scan a message for details like name of the sender, email address, any profile information in your contacts and compile it together. It will then generate a voice to match the qualities of the sender, which for the most part means gender.
The data from the text will be added to the metadata to transform the text into speech. Each word will be converted either in whole or phonetically and produce the sounds in the given language. The TTS engine then divides and marks rhythmic sounds like phrases, clauses and sentences. In some cases, the speech can be created by piecing together pre-recorded voice fragments, including sounds, entire words or even sentences. These would be either stored on the mobile device or in an off-site database.
An additional proposal on top of this, however, is to allow the program to record a speaker’s voice and analyzed to generate voice data. From the patent filing’s description:
For example, the speaker’s voice can be recorded by a recording application running on the device or during a telephone call (with permission). The voice characteristics of the speaker can be obtained using known voice recognition techniques. In this implementation, a speaker profile may not be necessary as the speaker’s name can be directly associated with voice data stored in voice database.
We don’t know if Apple intends to use this system in the future. Like with all patents they may just be sitting on it. They already have Siri, which is similar, but nowhere near as advanced. We don’t even know if this could be implemented commercially.