TechLurn (TL) is a place, where you can find the high-quality content like articles and trending stories from Science & Technology, Artificial Intelligence, Gadgets and Digital Marketing.
If you ask the Google search app “What is the fastest bird on Earth?,” and it will answer you Peregrine falcon. “According to YouTube, the peregrine falcon has a maximum recorded airspeed of 389 km/h.”
That’s the correct answer, but it doesn’t come from some master database of Google. When you ask the question, Google’s search engine pinpoints a YouTube video describing the five fastest birds on the planet and then extracts just the information you’re queried for. It doesn’t mention those other four birds.
Google answers the questions with the help from deep neural networks, a form of artificial intelligence rapidly remaking not just Google’s search engine but the entire company and Well, the other internet giants like Facebook to Microsoft. Deep neural networks are pattern recognition systems that can learn to perform specific tasks by analyzing huge amounts of data. In this case, they’ve learned to take a long sentence or paragraph from a relevant page on the web and extract the specific information you’re looking for.
These “sentence compression algorithms” just went live on the desktop incarnation of the search engine. They handle a task that’s pretty simple for human brain but has traditionally been quite difficult for machines. They show how deep learning is advancing the art of natural language understanding, the ability to understand and respond to natural human speech. “You need to use neural networks—or at least that is the only way we have found to do it,” Google research product manager David Orr says of the company’s sentence compression work. “We have to use all of the most advanced technology we have.”
Not to mention a whole lot of people with advanced degrees. Google trains these neural networks using data handcrafted by a massive team of PhD linguists it calls Pygmalion. In effect, Google’s machines learn how to extract relevant answers from long strings of text by watching humans do it—over and over again. These painstaking efforts show both the power and the limitations of deep learning. To train artificially intelligent systems like this, you need lots and lots of data that’s been sifted by human intelligence. That kind of data doesn’t come easy—or cheap. And the need for it isn’t going away anytime soon.
The Silver data and Gold data
To train Google’s artificial Q&A brain, Orr and company also use old news stories, where machines start to see how headlines serve a s short summaries of the longer articles that follow. But for now, the company still needs its team of PhD linguists. They not only demonstrate sentence compression, but actually label parts of speech in ways that help neural nets understand how human language works. Spanning about 100 PhD linguists across the globe, the Pygmalion team produces what Orr calls “the gold data,” while and the news stories are the “silver.” The silver data is still useful, because there’s so much of it. But the gold data is essential. Linne Ha, who oversees Pygmalion, says the team will continue to grow in the years to come.
This kind of human-assisted AI is called “supervised learning,” and today, it’s just how neural networks operate. Sometimes, companies can crowdsource this work—or it just happens organically. People across the internet have already tagged millions of cats in cat photos, for instance, so that makes it easy to train a neural net that recognizes cats. But in other cases, researchers have no choice but to label the data on their own.
To train systems like this, you need lots of data exquisitely sifted by human intelligence.
Chris Nicholson, the founder of a deep learning startup called Skymind, says that in the long term, this kind of hand-labeling doesn’t scale. “It’s not the future,” he says. “It’s incredibly boring work. I can’t think of anything I would less want do with my PhD.” The limitations are even more apparent when you consider that the system won’t really work unless Google employs linguists across all languages. Right now, Orr says, the team spans between 20 and 30 languages. But the hope is that companies like Google can eventually move to a more automated form of AI called “unsupervised learning.”
This is when machines can learn from unlabeled data—massive amounts of digital information culled from the internet and other sources—and work in this area is already underway at places like Google, Facebook, and OpenAI, the machine learning startup founded by Elon Musk. But that is still a longways off. Today, AI still needs a Pygmalion.