Need a wardrobe upgrade? You could ask the fashion mavens at Netflix’s Queer Eye — or you could use a new neural network.
What’s new: Yen-Liang Lin, Son Tran, and Larry S. Davis at Amazon propose Category-based Subspace Attention Network (CSA-Net) to predict and retrieve compatible garments and accessories that complement one another. (This is the third of three papers presented by Amazon at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). We covered the others in previous issues).
Key insight: Suppose you have several items that go together and want one more to complete the ensemble. Past approaches such as SCE-Net can find compatible outfits by scoring pairs of garments or accessories, but Amazon’s catalogue is too vast to compare every pair of items in it. CSA-Net retrieves items by learning a vector description of each item and finding nearby items. The network adjusts its representation based on the categories already selected. For instance, given a shirt and shoes, it can find a matching handbag or hat.
How it works: The researchers trained CSA-Net by providing outfits to complete, sets of candidate items, and labels that identify compatible candidates. CSA-Net learned to place outfits and compatible items nearby in the feature space while placing incompatible items farther away.
- A convolutional neural network learns features from an image of a garment or accessory.
- An attention mechanism modifies the features to place different types of items that go together — matching shirts and pants, matching pants and shoes — in distinct subspaces, or portions of the feature space.
- Presented with several items that comprise an incomplete outfit, CSA-Net predicts a missing item by pairing it with each item separately. Say you have a hat, pants, and shoes, and you want a top. The system looks for a top that goes with your hat, then a top that goes with your pants, and so on. It settles on the top that’s nearest to every other item.
Results: The researchers evaluated CSA-Net on the Polyvore-Outfit dataset of fashion items and labels that detail their compatibility. Provided an incomplete outfit of four items, CSA-Net predicted the correct fifth piece 59.26 percent of the time, compared with 53.67 percent achieved by the previous state of the art. It also outperformed the previous state of the art in predicting whether a pair of garments is compatible, achieving a higher area under the curve (the probability of predicting a positive match instead of a negative match).
Why it matters: The universe of fashion items and accessories is immense and complex, posing a challenge for matching items situated in a feature space. CSA-Net makes the task more tractable by restructuring the feature space into compatible subspaces.
We’re thinking: Leave it to machine learning engineers to build technology that liberates them from having to decide which shirt goes with what pants and shoes.