Visiolinguistic Attention Learning (VAL)
That Online Boutique, But Smarter: A summary of Amazon's Visiolinguistic Attention Learning
Why search for “a cotton dress shirt with button-down collar, breast pockets, barrel cuffs, scooped hem, and tortoise shell buttons in grey” when a photo and the words “that shirt, but grey” will do the trick? A new network understands the image-text combo.