Getting your Mac to Speak Large Texts
First of all this tutorial will only be useful if you have a Mac.
It also assumes some very basic familiarity with the command line
(I won’t explain how to change directories with cd
). But,
this article will explain a few unix and command line features along the way
to help you be more proficient in Unix.
Dependencies
Here are the dependencies used in the article you may or may not need them Depending on how far you choose to go. Expand the below accordions to read what the dependencies are for and for installation instructions.
Homebrew
Install from homebrew’s website.
This is optional I provide other links to downloads. This is a package manager for mac. It allows you to install programs from the command line. It makes installing programs quite easy and you will only need it to install the other dependencies in this blog post.
pandoc
You’ll only need this if your text isn’t in plain text. If you have brew this should be as simple as:
brew install pandoc
Otherwise install pandoc from pandoc’s website
ffmpeg
If you installed brew this should be as simple as:
brew install ffmpeg
Otherwise install from ffmpeg’s website.
Intro to Say
Now on to using text to speech commands.
The simplest way to use Text To Speech (TTS) on Mac
is through the Mac terminal’s say
command.
say 'I can say anything'
Alot can be learned from man pages (man is short for manual).
Why don’t we look at say’s man page. To get to a commands man
page you just type man
followed by a space and then the command.
(If a command doesn’t have a manual page then you could get some brief
help with the command generally by typing the command followed by a space and then
the --help
flag)
man say
Scroll down to the options and look at some of them.
For example we’ll use the -o
flag for output
and the -v
flag for using different voices.
When you are done looking at the manual page hit q
to close it.
As pointed out by the man page, if you want to know all the voices type
use the v
flag like so:
say -v ?
Let’s get only the english voices with grep. Here we will use
the pipe command which takes the standard output (stdout) from one command
which here is the say
command and puts it into the standard input (stdin)
for the next command which here is the grep
command:
say -v ? | grep en_
Let’s try Daniel’s voice.
say -v Daniel Hello, my name is Daniel and I am a British-English voice.
Now we’ll try outputting the voice. The main output type that you can
use on the say
command has the .aiff
extension. Let’s output that last command to
a file.
say -o daniel.aiff -v Daniel Hello, my name is Daniel and I am a British-English voice.
The open
command will open a file or directory in the default application
although you can target a specific application in your applications folder
with the -a
flag. Let’s open the sound file in quicktime player:
open -a 'Quicktime Player' daniel.aiff
Next if you’re done listening to the file you can remove it with the rm
command.
rm daniel.aiff
For a safer alternative to deleting things with rm
try just opening the current directory in your finder
with the following command and then deleting the file from
the finder export const
open .
Don’t forget the .
in the above command the .
in Unix
and Linux stands for the current directory.
Downloading Higher Quality Voices
Some voices (mainly the Siri voices), can’t be used from the say
command, but
you can download higher quality voices on Mac through the accesibility options
which are located at System Preferences > Accessibility > Speech
. You then
click on the system voice and click customize
to download new voices, some of
them will let you check a box with downloading a higher quality voice.
Piping a Text File to Text To Speech.
Let’s create a simple text file named helloWorld.txt
:
echo 'Hello World is a common first program.' > helloWorld.txt
The >
sign is used for file redirection, notice how the arrow points toward the file name
and away from the command echo
this is because it is redirecting the standard text output
from the echo
into the file helloWorld.txt
.
If we want the say command to use this helloWorld.txt
file we can redirect the file into
the say commands standard input with the <
sign.
say -v Daniel < helloWorld.txt
An alternative to the above command is to pipe from the cat
command. Let’s try running cat
by itself to see what it outputs the file contents to stdout.
cat helloWorld.txt
Then we can pipe that to the say
command.
cat helloWorld.txt | say -v Daniel
Converting a .docx file to Plain Text for TTS.
Pandoc will come in handy now for helping us convert to and from document formats. If you don’t already have it go back to the top in the dependencies.
Let’s get look at the man pages for pandoc real quick. If you scroll down to the ‘Using pandoc’.
man pandoc
In the ‘Using pandoc’ section of the man page it has an example of the most basic and versatile command
Pandoc’s example is similar to the below command which converts a .docx
file to .txt
:
pandoc -o temp.txt input.docx
Pandoc is able to infer the filetype from the extension of the file. Our input file here given
as input.docx
, which is the first argument after all the options, could be replaced with
a file of any of the supported input file types from pandoc .docx
, .md
, .epub
, etc.
Similarly the output (signified by the -o
flag) can be replaced with a file with filetype
allowed by any of the possible output filetypes.
If you looked at your text output file you may notice that it has headings like this:
Example Heading
---------------
This helps pandoc keep the structure of the document between formats, but that won’t be very good for turning
the text file into speech, so
we can make it plain text instead of markdown/asciidoc (whatever .txt
’s are by default) with the -t
flag which stands for type. You can consult the man page for pandoc to see all the different types the -t
flag
can take. Here we’ll use the plain
text type.
pandoc -t plain -o temp.txt input.docx
If you look at that output it is a nice plain text document that could be useful for text to speech. Let’s speak that ouput.
say -v Daniel -o temp.aiff < temp.txt
Converting the .aiff to .mp3 with ffmpeg
Pandoc is to text documents as ffmpeg is to audio and video files. Meaning using ffmpeg you can take pretty much any audio or video file and convert it to another.
ffmpeg’s api is pretty similar to pandoc’s, but instead of using -o
for the output you use
-i
for the input. Since things like this can make using commands confusing sometimes it is
recommended to consult the man page or atleast use the --help
flag when your unsure what flags do.
You can use the man page for ffmpeg if you want to, but for brevities sake I’ll just show you how
to take our temp.aiff
file as input and output output.mp3
as output.
ffmpeg -i temp.aiff output.mp3
Putting Everything Together
So let’s put all three commands for TTS in sequence and open Quicktime Player: You’ll probably want to put all these lines of commands in a text editor before executing them, so that you can edit them before hand.
pandoc -t plain -o temp.txt input.docx \
&& say -v Daniel -o temp.aiff < temp.txt \
&& ffmpeg -i temp.aiff output.mp3 \
&& rm temp.aiff && rm temp.txt \
&& open -a 'Quicktime Player' output.mp3
One last note in the above code block all you need to supply to the command is some
sort of document in place of input.docx
and possibly a voice denoted here by say -v Daniel
. When the command is done running you’ll get an output.mp3
.
Here’s the above code annotated in case you forgot anything:
# Convert doc to plain text
pandoc -t plain -o temp.txt input.docx \
# Speak the document into a audio file format
&& say -v Daniel -o temp.aiff < temp.txt \
# Convert to mp3
&& ffmpeg -i temp.aiff output.mp3 \
# Remove our temp files
&& rm temp.aiff && rm temp.txt \
# Open quicktime player
&& open -a 'Quicktime Player' output.mp3