Objects that sound

Unofficial implementation of the paper "Objects that Sound" by Arandjelović et. al.

The work tries to learn intra-modal and cross-modal embeddings from audio and video. Due to resource constraints, we could only train on a skewed subset of videos. For more details, check out the Github source code and README.

Github source code is here.