AI which classifies Sounds:Code:Python

I have written articles where we can classify different types of images such as:

Iris genus classification|DeepCognition| Azure ML studio_Kingdom:Plantae Clade:Angiosperms Order:Asparagales Family:Iridaceae Subfamily:Iridoideae Tribe:Irideae Genus:Iris_towardsdatascience.com

How good is ‘your’ selfie?_Check out how good is your selfie here!_medium.com

How To Make AI That Classifies Dog Breeds,DeepLearningStudio_Love dogs?_hackernoon.com

Or.. like generating stories using RNNs…

Generate stories using RNNs |pure Mathematics with code|:_Hi reader!_hackernoon.com

So I thought of writing an article which explains how to classify different sounds using AI.

In this article, we’ll see how to prepare a dataset for sound classification and how to use it for our Deep Learning model.

DATA SET:

We are going to use dataset from Urban Sound Classification Challenge. This dataset consist of 8700+ excerpts of sounds from 10 different sources in ‘.wav’ format.

Sources of Sound:

air conditioner
car horn
children playing
dog bark
drilling
engine idling
gun shot
jackhammer
siren
street music

The size of this dataset is around 5.6GB which made me a bit reluctant to train a model on it. So I have written a python script which can be used to decrease the size of dataset. Basically, the dataset contains around 600 excerpts of sound from each source. I have reduced them to 170, making dataset to around 1.1GB. You can download the script from my repo.

Manik9/Urban-Sound_Urban-Sound - Urban Sound classification using Neural Nets_github.com

Platform to train the program

Training a program which requires high computational power such as Deep Learning model, I prefer to use Deep Learning Studio’s(DLS) jupyter notebooks. It provides Amazon Deep Learning Instances with GPU which can be used to train the model.

Check it out here.

Home_We would like to invite you to join Deep Cognition's team at Booth# 1035 at the GPU Technology Conference March 26-29…_deepcognition.ai

Upload Dataset

Upload the ‘Urban sound’ dataset in datasets folder

Start DLS’s jupyter notebooks

Audio processing

In case of images, we generally pass the values of pixels to our model. In case of Audio too, we need to pass some sort of numerical values which represents our audio.

•librosa is a library in python which can be used for audio pre-processing.

line 1: train.csv contains location of each song and it’s label.
line 7: filename of a particular song.
line 8: ‘x’ is the song and ‘s’ stands for sampling rate i.e the rate at which song is read by librosa.
line 9: mfccs :

source: Wikipedia

line 17: the function ‘parser’ is applied to each row in ‘train’ DataFrame and results are stored in ‘temp’.
After all of the above steps, we have a DataFrame ‘temp’ which represents each of our song(row by row) with some numerical values.
Converting Outputs into One-Hot encoding

Model Architecture

Each example contains 40 columns i.e(1x40). So 1150 example contains 1150x40. Transpose of this is passed to the model.

Architecture of our model

We get a 1x10 output which represents score of each class

The architecture shown above is replicated in code below.

line 26 refers to number of different classes of sound.

Training

After 100 epochs:

Training Results

Our trained model obtained an accuracy of 83.04% on validation set, which is quite good as we even reduced the size of dataset to (1/3)rd. We can still improve the accuracy of this model by using CNNs. We’ll see that in another article.

Thanks for Reading 😄

If you have liked this article do 👏 and share this. Follow me on LinkedIn and Medium

Manik Soni - Machine Learning Intern - HEAD Infotech India Pvt ltd - Ace2three.com | LinkedIn_View Manik Soni's profile on LinkedIn, the world's largest professional community. Manik has 3 jobs listed on their…_www.linkedin.com

Manik Soni - Medium_Read writing from Manik Soni on Medium. Machine Learning Researcher. Every day, Manik Soni and thousands of other…_medium.com