Spectrograms and beyond!

It’s a pleasure to start the year (even though it’s already February) by celebrating the publication of a new paper from our lab. This time, it’s “Beyond Spectrograms: Rethinking Audio Classification from EnCodec’s Latent Space”. In this work, we ask the following question: what if the latent representation of a neural audio codec were used in an audio classification pipeline? What we’ve discovered will surprise you. Take a look at the paper and tell us what you think.
This work is one of the results of the thesis that Jorge Perianez-Pascual, a member of our lab, is working on. Interestingly, the concept of using EnCodec’s latent space for classification was initially described as bordering on unconventional, but our results demonstrate its effectiveness. By going beyond traditional spectrogram-based methods, we discover new possibilities for more efficient and accurate audio classification.
This achievement would not have been possible without the collaboration of Álvaro Rubio Largo and Laura Escobar Encinas, two colleagues from the University of Extremadura who contributed their multidisciplinary expertise both during the development and experimentation process and in writing the paper.
Beyond Spectrograms is one of the outcomes of musicgenia, a project funded by Grant CPP2021-008491 from MICIU/AEI/10.13039/50100011033 and by the European Union through NextGenerationEU/PRTR. The main goal of musicgenia is to develop a cloud-based platform that offers AI-generated music as a service for content creators and media, both online (live music generation) and offline (pre-recorded music generation). The direct benefits of this platform include: (1) royalty-free music, (2) original music, (3) ease of finding suitable music for each content and (4) streaming music, with a flexible consumption model where you pay per second rather than per song.