Friday, January 22, 2021
  • About Us
  • Contact Us
News i Can
  • Home
  • News
  • Politics
  • Business
  • Tech
  • Travel
  • Sports
  • Video
  • Books and Novels
  • Buy Products
  • Products
No Result
View All Result
News iCan
  • Home
  • News
  • Politics
  • Business
  • Tech
  • Travel
  • Sports
  • Video
  • Books and Novels
  • Buy Products
  • Products
  • About Us
  • Contact Us
No Result
View All Result
News iCan
Home News

This could lead to the next big breakthrough in common sense AI

Karen Hao by Karen Hao
November 8, 2020
in News, Tech
0
This could lead to the next big breakthrough in common sense AI
0
SHARES
2
VIEWS
Share on FacebookShare on Twitter



Source link

AI models that can parse both language and visual input also have very practical uses. If we want to build robotic assistants, for example, they need computer vision to navigate the world and language to communicate about it to humans.

But combining both types of AI is easier said than done. It isn’t as simple as stapling together an existing language model with an existing object recognition system. It requires training a new model from scratch with a data set that includes text and images, otherwise known as a visual-language data set.

The most common approach for curating such a data set is to compile a collection of images with descriptive captions. A picture like the one below, for example, would be captioned “An orange cat sits in the suitcase ready to be packed.” This differs from typical image data sets, which would label the same picture with only one noun, like “cat.” A visual-language data set can therefore teach an AI model not just how to recognize objects but how they relate to and act on one other, using verbs and prepositions.

But you can see why this data curation process would take forever. This is why the visual-language data sets that exist are so puny. A popular text-only data set like English Wikipedia (which indeed includes nearly all the English-language Wikipedia entries) might contain nearly 3 billion words. A visual-language data set like Microsoft Common Objects in Context, or MS COCO, contains only 7 million. It’s simply not enough data to train an AI model for anything useful.

“Vokenization” gets around this problem, using unsupervised learning methods to scale the tiny amount of data in MS COCO to the size of English Wikipedia. The resultant visual-language model outperforms state-of-the-art models in some of the hardest tests used to evaluate AI language comprehension today.

“You don’t beat state of the art on these tests by just trying a little bit,” says Thomas Wolf, the cofounder and chief science officer of the natural-language processing startup Hugging Face, who was not part of the research. “This is not a toy test. This is why this is super exciting.”

From tokens to vokens

Let’s first sort out some terminology. What on earth is a “voken”?

In AI speak, the words that are used to train language models are known as tokens. So the UNC researchers decided to call the image associated with each token in their visual-language model a voken. Vokenizer is what they call the algorithm that finds vokens for each token, and vokenization is what they call the whole process.

The point of this isn’t just to show how much AI researchers love making up words. (They really do.) It also helps break down the basic idea behind vokenization. Instead of starting with an image data set and manually writing sentences to serve as captions—a very slow process—the UNC researchers started with a language data set and used unsupervised learning to match each word with a relevant image (more on this later). This is a highly scalable process.

The unsupervised learning technique, here, is ultimately the contribution of the paper. How do you actually find a relevant image for each word?

Vokenization

Let’s go back for a moment to GPT-3. GPT-3 is part of a family of language models known as transformers, which represented a major breakthrough in applying unsupervised learning to natural-language processing when the first one was introduced in 2017. Transformers learn the patterns of human language by observing how words are used in context and then creating a mathematical representation of each word, known as a “word embedding,” based on that context. The embedding for the word “cat” might show, for example, that it is frequently used around the words “meow” and “orange” but less often around the words “bark” or “blue.”

This is how transformers approximate the meanings of words, and how GPT-3 can write such human-like sentences. It relies in part on these embeddings to tell it how to assemble words into sentences, and sentences into paragraphs.

There’s a parallel technique that can also be used for images. Instead of scanning text for word usage patterns, it scans images for visual patterns. It tabulates how often a cat, say, appears on a bed versus on a tree, and creates a “cat” embedding with this contextual information.

The insight of the UNC researchers was that they should use both embedding techniques on MS COCO. They converted the images into visual embeddings and the captions into word embeddings. What’s really neat about these embeddings is that they can then be graphed in a three-dimensional space, and you can literally see how they are related to one another. Visual embeddings that are closely related to word embeddings will appear closer in the graph. In other words, the visual cat embedding should (in theory) overlap with the text-based cat embedding. Pretty cool.

You can see where this is going. Once the embeddings are all graphed and compared and related to one another, it’s easy to start matching images (vokens) with words (tokens). And remember, because the images and words are matched based on their embeddings, they’re also matched based on context. This is useful when one word can have totally different meanings. The technique successfully handles that by finding different vokens for each instance of the word.

For example:



Source link

Related posts

Daily Memo: US Tariffs on Chinese Goods, Russian Weapons in the Kurils

Daily Memo: Brexit Plan B

December 10, 2020
THE IPO PLAYBOOK

THE IPO PLAYBOOK

December 10, 2020
Previous Post

GBP/USD pares slight drop earlier, little changed on the session now

Next Post

JOB: Hardware Engineer At Imagination Technologies

Next Post
JOB: Quality Engineer At Enquero

JOB: Hardware Engineer At Imagination Technologies

RECOMMENDED NEWS

Обзор Windows 10X — отличия и как установить?

Обзор Windows 10X — отличия и как установить?

9 months ago
‘I spoke with Guardiola’ – Kounde confirms Man City discussions

‘I spoke with Guardiola’ – Kounde confirms Man City discussions

2 months ago
Beautiful cat waiting for us at the end of our today’s hike near Zagreb, Croatia

Beautiful cat waiting for us at the end of our today’s hike near Zagreb, Croatia

5 months ago
TOP 10 Arduino projects of 2019

TOP 10 Arduino projects of 2019

1 year ago

FOLLOW US

  • 79 Followers
  • 93.2k Subscribers

BROWSE BY CATEGORIES

  • Books and Novels
  • Business
  • News
  • Politics
  • Products
  • Sports
  • Tech
  • Tech Gadgets Video
  • Travel
  • Uncategorized
  • Video

BROWSE BY TOPICS

books electronics mobile phone mobile phone accessories Sports Sports News
PopAds.net - The Best Popunder Adnetwork
ADVERTISEMENT

News Category

  • Books and Novels (661)
  • Business (2,017)
  • News (8,803)
  • Politics (1,356)
  • Products (213)
  • Sports (937)
  • Tech (2,001)
  • Tech Gadgets Video (2,087)
  • Travel (1,832)
  • Uncategorized (1)
  • Video (2,087)

POPULAR NEWS

  • VALE a pena COMPRAR o IPHONE XR em 2020 / 2021?

    VALE a pena COMPRAR o IPHONE XR em 2020 / 2021?

    0 shares
    Share 0 Tweet 0
  • The startup turning human bodies into compost

    0 shares
    Share 0 Tweet 0
  • Meet Windows 7 2020 Edition Concept

    0 shares
    Share 0 Tweet 0
  • The Deutsche Bank whistleblower who gave up $8m is going broke

    0 shares
    Share 0 Tweet 0
  • The War of the Norm

    0 shares
    Share 0 Tweet 0

Related posts

Daily Memo: US Tariffs on Chinese Goods, Russian Weapons in the Kurils

Daily Memo: Brexit Plan B

December 10, 2020
THE IPO PLAYBOOK

THE IPO PLAYBOOK

December 10, 2020

Navigation

  • Home
  • News
  • Politics
  • Business
  • Tech
  • Travel
  • Sports
  • Video
  • Books and Novels
  • Buy Products
  • Products
News

Daily Memo: Brexit Plan B

by Geopolitical Futures
December 10, 2020

Latest News

Daily Memo: Brexit Plan B

1 month ago

THE IPO PLAYBOOK

1 month ago
ADVERTISEMENT

© 2020 www.newsican.com – Premium news & magazine Web site; Designed By SL Creates

No Result
View All Result
  • Home
  • News
  • Politics
  • Business
  • Tech
  • Travel
  • Sports
  • Video
  • Books and Novels
  • Buy Products
  • Products

© 2020 www.newsican.com - Design By SL Creates.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?