Hopp til hovedinnhold

A quick introduction to Talon, a hands-free input system that lets you control you computer with your voice, and Python.

When I started getting my master’s degree I had just developed a repetitive strain injury (RSI) that made it painful to impossible to use a keyboard, mouse or even a phone. Still, I was able to finish my degree without any delay, as well as a summer internship at Bekk. The reason I was able to do this was because I had discovered Talon, a hands free input system, which allowed me to use the computer proficiently using only a microphone. These days the pain is not as bad as it used to be and I’m not fully reliant on Talon any more, but when I’m primarily using a keyboard I find myself missing some of the awesome features Talon has to offer. I really believe that systems like Talon can be really effective even for people without disabilities so in this article I will demonstrate how you can use talon to improve your workflow and significantly increase your wow-factor when pair programming on Zoom 😉

Using Talon

If you’ve want to follow along with the examples you can install Talon from here.Talon does not come with any built-in commands. If you want to be able to control your computer fully using only Talon I recommend downloading a larger command set such as knausj_talon, but if you’re just playing around you can simply create a file in the user directory (%APPDATA%\Talon\user on Windows, and ~/.talon/user on macOS/Linux.).

Let’s get started with the simplest example:

say hello: "Hello From Talon!"

Uttering the phrase say hello will now make Talon type Hello from Talon!.

You can also emulate any keystroke, for example to switch to the last opened window:

switch: key(cmd-tab) # or ctrl-tab for Linux/Windows

This is not really faster than pressing the key combination yourself, but voice commands are much easier to remember than keyboard shortcuts, and it’s really easy to chain them together to make macros.

Voice commands can also capture parts of a spoken phrase to do interesting things with them.

call <word>:
  "{word}()"
  key(left)

The above lets me say for example call print to produce print() with the cursor inside the parentheses. Now we have what we need to create some useful snippets!

for <word> in <word>: "for {word_1} in {word_2}: "
add to do <phrase>: "// TODO: {phrase}"

You can of course be really productive with just the snippets in your editor, but this allows you to leverage the naturally high throughput you have with your voice, and they have the benefit of being available everywhere (notepad, slack, browsers, etc).

If this syntax reminds you of pythons f-strings, then that’s great, because they are! That means we can write python expressions and evaluate them. We could for example create a spoken calculator with the number capture:

multiply <number> by <number>: "{number_1 * number_2}"

Now saying multiply two by three will produce 6 🤯

Talon can also respond to keyboard events, so the macros you write for Talon can still be used when you’re sitting in an office and don’t want to use a microphone. For example:

key(alt-t): speech.toggle() # toggle Talon with alt + t

Getting more advanced

You can do a lot with just the built-in scripting language, but it is when you start to add python modules things start to get really interesting. If you’re still following along, go ahead and create a python file next to the .talon file you created earlier.

import os
from talon import Context, Module, actions, clip

module = Module()
context = Context()

I haven’t explained what modules or contexts are, but for now it’s sufficient to know that they are used to control when groups of actions and voice commands are active. This makes it so that you can’t accidentally trigger application specific commands when that application is not open, and you can define actions that work differently based on the context you’re in, such as which programming language you’re using.

The following example shows how it can use Talon’s clipboard API to grab the visually selected text and append it to a file called notes.md in your home directory.

@module.action_class
class Actions:
  def note_selected_text():
    "save the selected text as a bullet point in a file"
    with clip.capture() as clipboard:
      actions.key("cmd-c")
    try:
      text = clipboard.get()
    except clip.NoChange:
      return
    message = f"- {text}\n"
    with open(os.path.expanduser("~/notes.md"), "a") as file: # make sure the file exists
      file.write(message)

clip.capture() allows us to work with the clipboard while making sure it gets restored once we are done with it. And just like that, we have a simple note tracking script that can be triggered in any application that allows copy pasting. The action can be invoked from a voice command like so:

make note of that: user.note_selected_text()

Going back to a programming example, let’s revisit our command for invoking python functions, which has a couple of issues. Not all function names are easy to pronounce, so to address this we can define a Talon list which associates a spoken form with some output:

module.list("python_functions", desc="python functions")
context.lists["user.python_functions"] = {
    "print": "print",
    "join": ".join",
    "compare files": "cmpfiles",
}

We can then reference this list in a voice command as {user.python_functions}. Another issue is that we used the word capture which only captures a single word, but we want to be able to dictate longer function names and apply snake case forming. One way to do this to define a custom capture which we can use to parse user input and process it however we like:

@module.capture(rule="({user.python_functions}|<phrase>)")
def python_function_name(m) -> str:
    # get the first capture
    text = m[0]
    # make every word lower case, and join them by "_"
    return "_".join((word.lower() for word in text.split()))

Rules are defined like regular expressions except that they work on whole words instead of characters, so hello* matches hello hello hello, and not hellooo. The above capture will match any phrase, but if it matches something in the python_functions list it will override the output. We can then update our voice command to be:

call <user.python_function_name>:
    "{python_function_name}()"
    key(left)

There’s a lot more fun to be had

The things I presented in this article only scratches the surface of what you can do with Talon. Talon also has (among other things):

  • Mouse control
  • Eye tracking
  • Noise recognition (pop, hiss, etc)
  • Facial gesture recognition
  • API for file watching
  • Many user scripts for working with different applications
  • Many cool user-defined commands in knausj_talon such as google that which will google the selected text, or launch/focus which will open / focus any application you say.

If any of this excites you, feel free to check out the official website, knausj_talon, and join the Slack channel (link on the website).

I can also recommend Emily Shea’s talk Voice Driven Development: Who needs a keyboard anyway?

Conclusion

Talon has a lot of features that I think can be really useful for anyone who likes optimising parts of their workflow. Even if you don’t want to use voice commands in your normal workflow, you can use it as a general purpose scripting environment and create macros that will work anywhere. If you need any help getting started feel free to contact me or anyone else in the slack 😊

Relevant resources recommended by the author

Did you like the post?

Feel free to share it with friends and colleagues