Improving Generalization of Deep Convolutional Neural Networks for Acoustic Scene Classification
Abstrakt
In recent years deep learning has become one of the most popular machine
learning techniques for a vast variety of complex problems. An example for
such a task is to mirror the human auditory system to classify audio recordings
according to the location they were recorded in. This work focuses mainly on
the Acoustic Scene Classification task proposed by the IEEE DCASE Challenge.
The dataset for Acoustic Scene Classification consists of recordings
from distinct recording locations. The aim of the challenge is to classify an
unseen test set of recordings. In the challenge of 2016 the training and test
set did not differ significantly. In the challenge of 2017, however, the test set originated from a different distribution, implying a strong need for generalization.
In the course of this work, the initial implementation consisting of
a Deep Convolutional Neural Network for the DCASE 2016 challenge submission
(done in Lasagne) was re-implemented in Keras. An extension of the
ADAM optimizer (AMSGrad) was investigated for improvement in generalization.
Other submissions to the DCASE 2017 challenge suggest that different
types of spectrograms might be key for better generalization. Therefore experiments utilizing different kinds of spectrograms were conducted. Furthermore, different interpolation algorithms were used for data augmentation, with some of them yielding significant improvements in classification accuracy and generalization.
For different spectrogram dimensions, slight adjustments in the
network architecture also resulted in a performance gain. To better understand
what different models "see" and what they focus on, their filters, and
activations were visualized and compared for differences. Finally the adjustments which led to better generalization on the dataset of the DCASE 2016
challenge were tested on the dataset of the DCASE 2017 challenge, leading to
an improvement over all submissions to the DCASE 2017 challenge from the
Institute of Computational Perception.