Whenever someone asks what my job is, I used to tiptoe around the subject. I’m not ashamed of what I do. In fact, I’m the only one of my friends that doesn’t complain about going to work in the morning. I plan on coming in to work until they tell me I can’t anymore. The reason I used to tiptoe around the subject is that for the longest time I didn’t have a great answer to the question I always get: “Why don’t they just use a speech-to-text program?” And I’m sure our potential customers have thought that as well. “Why spend a little extra on having a human do this when I could pay once for a computer program?”
Luckily, me being a gigantic nerd is what caused the answer to hit me smack-dab in the face as I sat in traffic one day. I happen to be a fan of those old text-based video games from the ’80s, and something clicked in my brain that made me realize, “Huh. My job kind of works the same way.” Don’t see what old Sierra games and a service primarily catering to fintech professionals have in common? Just roll with me for a second here.
These old games used something called a text parser. The idea was that you would type in the action you wanted to take to progress in the game. The only problem was that the computer didn’t always understand what you were saying. Despite being able to perform hundreds of complex calculations a second, the computer didn’t necessarily know which version of the word “pick” you were using. After all, pick could mean to grab, to choose, to pull at, or the word could be referring to the thing you hold between your fingers to play the guitar with. While you’re pulling your hair out wondering why the computer doesn’t understand that you want to grab the object in your hands so you can smash it on the ground to retrieve the item inside, it’s sitting there wondering why you haven’t just typed “Take object” yet. “Pick,” to the computer, meant to choose and nothing else.
It’s not the computer’s fault, and despite many an exasperated user insulting the machine’s intelligence, computers are actually very good at what they do. You tell a computer what to do, and it does exactly that. The problem is that computers don’t understand nuance. You’d think that we would have solved this problem by now, but we haven’t. How many times have you asked a digital assistant something and had to rephrase it over and over again until you realized there was a bit of slang in there that wasn’t being understood? What about when you have to ask something in a totally unnatural way because the computer is stuck on every possible meaning of a homonym except the one you want?
Computers are bright, but they don’t understand subtleties, especially in instructions. A human, on the other hand, does. If you say, “Strike that line,” to a human operator, the human knows that you want to get rid of the sentence you just wrote. A computer might think you’re asking it to strikethrough the line, leaving the undesired text in, but this time with a line through it. Likewise, if you stutter a little bit and then say, “Operator, start over,” the human sitting at the terminal knows you mean to just get rid of that one line. The computer, while knowing what a stutter is but having never experienced one, might get rid of all the text, thinking you want to start completely over. The other thing is that a computer doesn’t know when it’s about to get something wrong. It does what it thinks it has been told and doesn’t know to make a note questioning its instructions. While not optimal, if a human hasn’t heard something quite right, they know to make a note saying, “This might not be wrong.” If there’s a bit of static and the computer thinks it hears, “Ice-cold fried chicken in the metro on the Moon,” it types just that. If there’s a bit of static and a human hears that, they think, “Maybe I ought to make a note here.”
Computers are very good at rendering graphics, solving complex math problems, and monitoring statistics in real time, but a lot of effort goes into computer-human communication. There are entire PhDs one can earn in the subject, and despite this being a field of research for decades, we’re still regularly stumping computers without meaning to by way of slang, homonyms, and figures of speech. In an industry with so much on the line, wouldn’t you rather dictate to someone who knows that when you say, “Start over,” you just mean the last sentence and not the entire document?