Google's Meena: 'Most Human' Chatbot

Researchers in the Google Brain group have been working overtime to develop chatbot software that produces truly human interactions with users. They may have turned a corner with their new Meena end-to-end-trained neural conversational model.

The new model corrects what the researchers see as the "critical flaw" of specialized chatbots -- namely their tendency to say things that are inconsistent with what has been said in the conversation and/or to give responses that are not specific to the current context, because of their lack of "common sense and basic knowledge about the world."

Meena is an open-domain chatbot trained on data mined and filtered from public domain social media conversations for back-and-forth interactions—so-called multi-turn conversations It was designed to focus on the current context, the Brain group says, to provide that all-important "sensible reply." Unlike closed-domain chatbots, which are limited to responses to keywords or intents to accomplish specific tasks, open-domain chatbots are free to engage in conversations on any topic.

Specifically, Meena is a 2.6B parameter neural network trained on 341GB of text from social media conversations. That's 8.5 times more data than existing state-of-the-art generative models. The researchers claim it's the largest end-to-end model currently extant, and that it demonstrates that a large end-to-end model can generate almost human-like chat responses in an open-domain setting.

Meena's design is based on Google's Evolved Transformer seq2seq architecture, a Transformer architecture discovered by evolutionary neural architecture search (NAS) to improve perplexity. Google introduced Transformer, a novel neural network architecture based on a self-attention mechanism, several years ago to address the challenges of language understanding.

The Google researchers are also proposing a new human evaluation metric called the Sensibleness and Specificity Average (SSA), which aims to capture key elements of a human-like multi-turn conversation—in other words, making sense and being specific. They define "sensibleness" as common sense, logical coherence, and consistency in a conversation. The "specificity" element refers to things like answering the question "Do you like tennis?" with something less vague than "Yes, it's nice"—say "Me too, I can't get enough of Roger Federer! "

"Our experiments show strong correlation between perplexity and SSA," the researchers wrote in a paper ("Towards a Human-like Open-Domain Chatbot") describing the technology. "The fact that the best perplexity end-to-end trained Meena scores high on SSA (72 percent on multi-turn evaluation) suggests that a human-level SSA of 86 percent is potentially within reach if we can better optimize perplexity. Additionally, the full version of Meena (with a filtering mechanism and tuned decoding) scores 79 percent SSA, 23 percent higher in absolute SSA than the existing chatbots we evaluated."

Once Meena masters sensibleness and specificity, the researchers will be moving on to "personality" and "factuality." Also under consideration: safety and bias area, with the company not releasing a research demo today as a result.

Meena is not yet publicly available, but the Google Brain group is considering a release "in the coming months to help advance research in this area."

About the Author

John K. Waters is the editor in chief of a number of sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at