First of all this tutorial will only be useful if you have a Mac.
It also assumes some very basic familiarity with the command line
(I won't explain how to change directories with
this article will explain a few unix and command line features along the way
to help you be more proficient in Unix.
Here are the dependencies used in the article you may or may not need them Depending on how far you choose to go. Expand the below accordions to read what the dependencies are for and for installation instructions.
HomebrewInstall from homebrew's website.
This is optional I provide other links to downloads. This is a package manager for mac. It allows you to install programs from the command line. It makes installing programs quite easy and you will only need it to install the other dependencies in this blog post.
You'll only need this if your text isn't in plain text. If you have brew this should be as simple as:
brew install pandoc
Otherwise install pandoc from pandoc's website
If you installed brew this should be as simple as:
brew install ffmpeg
Otherwise install from ffmpeg's website.
Now on to using text to speech commands.
The simplest way to use Text To Speech (TTS) on Mac
is through the Mac terminal's
$ say 'I can say anything'
Alot can be learned from man pages (man is short for manual).
Why don't we look at say's man page. To get to a commands man
page you just type
man followed by a space and then the command.
(If a command doesn't have a manual page then you could get some brief
help with the command generally by typing the command followed by a space and then
$ man say
Scroll down to the options and look at some of them.
For example we'll use the
-o flag for output
-v flag for using different voices.
When you are done looking at the manual page hit
q to close it.
As pointed out by the man page, if you want to know all the voices type
v flag like so:
$ say -v ?
Let's get only the english voices with grep. Here we will use
the pipe command which takes the standard output (stdout) from one command
which here is the
say command and puts it into the standard input (stdin)
for the next command which here is the
$ say -v ? | grep en_
Let's try Daniel's voice.
$ say -v Daniel Hello, my name is Daniel and I am a British-English voice.
Now we'll try outputting the voice. The main output type that you can
use on the
say command has the
.aiff extension. Let's output that last command to
$ say -o daniel.aiff -v Daniel Hello, my name is Daniel and I am a British-English voice.
open command will open a file or directory in the default application
although you can target a specific application in your applications folder
-a flag. Let's open the sound file in quicktime player:
$ open -a 'Quicktime Player' daniel.aiff
Next if you're done listening to the file you can remove it with the
$ rm daniel.aiff
For a safer alternative to deleting things with
try just opening the current directory in your finder
with the following command and then deleting the file from
the finder window.
$ open .
Don't forget the
. in the above command the
. in Unix
and Linux stands for the current directory.
Some voices (mainly the Siri voices), can't be used from the
say command, but
you can download higher quality voices on Mac through the accesibility options
which are located at
System Preferences > Accessibility > Speech. You then
click on the system voice and click
customize to download new voices, some of
them will let you check a box with downloading a higher quality voice.
Let's create a simple text file named
$ echo 'Hello World is a common first program.' > helloWorld.txt
> sign is used for file redirection, notice how the arrow points toward the file name
and away from the command
echo this is because it is redirecting the standard text output
echo into the file
If we want the say command to use this
helloWorld.txt file we can redirect the file into
the say commands standard input with the
$ say -v Daniel < helloWorld.txt
An alternative to the above command is to pipe from the
cat command. Let's try running cat
by itself to see what it outputs the file contents to stdout.
$ cat helloWorld.txt
Then we can pipe that to the
$ cat helloWorld.txt | say -v Daniel
Pandoc will come in handy now for helping us convert to and from document formats. If you don't already have it go back to the top in the dependencies.
Let's get look at the man pages for pandoc real quick. If you scroll down to the 'Using pandoc'.
$ man pandoc
In the 'Using pandoc' section of the man page it has an example of the most basic and versatile command
Pandoc's example is similar to the below command which converts a
.docx file to
$ pandoc -o temp.txt input.docx
Pandoc is able to infer the filetype from the extension of the file. Our input file here given
input.docx, which is the first argument after all the options, could be replaced with
a file of any of the supported input file types from pandoc
Similarly the output (signified by the
-o flag) can be replaced with a file with filetype
allowed by any of the possible output filetypes.
If you looked at your text output file you may notice that it has headings like this:
This helps pandoc keep the structure of the document between formats, but that won't be very good for turning
the text file into speech, so
we can make it plain text instead of markdown/asciidoc (whatever
.txt's are by default) with the
flag which stands for type. You can consult the man page for pandoc to see all the different types the
can take. Here we'll use the
plain text type.
$ pandoc -t plain -o temp.txt input.docx
If you look at that output it is a nice plain text document that could be useful for text to speech. Let's speak that ouput.
$ say -v Daniel -o temp.aiff < temp.txt
Pandoc is to text documents as ffmpeg is to audio and video files. Meaning using ffmpeg you can take pretty much any audio or video file and convert it to another.
ffmpeg's api is pretty similar to pandoc's, but instead of using
-o for the output you use
-i for the input. Since things like this can make using commands confusing sometimes it is
recommended to consult the man page or atleast use the
--help flag when your unsure what flags do.
You can use the man page for ffmpeg if you want to, but for brevities sake I'll just show you how
to take our
temp.aiff file as input and output
output.mp3 as output.
$ ffmpeg -i temp.aiff output.mp3
So let's put all three commands for TTS in sequence and open Quicktime Player: You'll probably want to put all these lines of commands in a text editor before executing them, so that you can edit them before hand.
$ pandoc -t plain -o temp.txt input.docx \&& say -v Daniel -o temp.aiff < temp.txt \&& ffmpeg -i temp.aiff output.mp3 \&& rm temp.aiff && rm temp.txt \&& open -a 'Quicktime Player' output.mp3
One last note in the above code block all you need to supply to the command is some
sort of document in place of
input.docx and possibly a voice denoted here by
say -v Daniel. When the command is done running you'll get an
Here's the above code annotated in case you forgot anything:
$ # Convert doc to plain textpandoc -t plain -o temp.txt input.docx \# Speak the document into a audio file format&& say -v Daniel -o temp.aiff < temp.txt \# Convert to mp3&& ffmpeg -i temp.aiff output.mp3 \# Remove our temp files&& rm temp.aiff && rm temp.txt \# Open quicktime player&& open -a 'Quicktime Player' output.mp3