I have written articles where we can classify different types of images such as:
Iris genus classification|DeepCognition| Azure ML studio_Kingdom:Plantae Clade:Angiosperms Order:Asparagales Family:Iridaceae Subfamily:Iridoideae Tribe:Irideae Genus:Iris_towardsdatascience.com
How good is āyourā selfie?_Check out how good is your selfie here!_medium.com
How To Make AI That Classifies Dog Breeds,DeepLearningStudio_Love dogs?_hackernoon.com
Or.. like generating stories using RNNsā¦
Generate stories using RNNs |pure Mathematics with code|:_Hi reader!_hackernoon.com
So I thought of writing an article which explains how to classify different sounds using AI.
In this article, weāll see how to prepare a dataset for sound classification and how to use it for our Deep Learning model.
DATA SET:
We are going to use dataset from Urban Sound Classification Challenge. This dataset consist of 8700+ excerpts of sounds from 10 different sources in ā.wavā format.
Sources of Sound:
- air conditioner
- car horn
- children playing
- dog bark
- drilling
- engine idling
- gun shot
- jackhammer
- siren
- street music
The size of this dataset is around 5.6GB which made me a bit reluctant to train a model on it. So I have written a python script which can be used to decrease the size of dataset. Basically, the dataset contains around 600 excerpts of sound from each source. I have reduced them to 170, making dataset to around 1.1GB. You can download the script from my repo.
Manik9/Urban-Sound_Urban-Sound - Urban Sound classification using Neural Nets_github.com
Platform to train the program
Training a program which requires high computational power such as Deep Learning model, I prefer to use Deep Learning Studioās(DLS) jupyter notebooks. It provides Amazon Deep Learning Instances with GPU which can be used to train the model.
Check it out here.
Home_We would like to invite you to join Deep Cognition's team at Booth# 1035 at the GPU Technology Conference March 26-29ā¦_deepcognition.ai
Upload Dataset
Upload the āUrban soundā dataset in datasetsĀ folder
Start DLSās jupyter notebooks
Audio processing
In case of images, we generally pass the values of pixels to our model. In case of Audio too, we need to pass some sort of numerical values which represents our audio.
ā¢librosa is a library in python which can be used for audio pre-processing.
- line 1: train.csv contains location of each song and itās label.
- line 7: filename of a particular song.
- line 8: āxā is the song and āsā stands for sampling rate i.e the rate at which song is read by librosa.
- line 9: mfccsĀ :
source: Wikipedia
- line 17: the function āparserā is applied to each row in ātrainā DataFrame and results are stored in ātempā.
- After all of the above steps, we have a DataFrame ātempā which represents each of our song(row by row) with some numerical values.
- Converting Outputs into One-Hot encoding
Model Architecture
Each example contains 40 columns i.e(1x40). So 1150 example contains 1150x40. Transpose of this is passed to the model.
Architecture of ourĀ model
We get a 1x10 output which represents score of each class
The architecture shown above is replicated in codeĀ below.
- line 26 refers to number of different classes of sound.
Training
After 100 epochs:
Training Results
Our trained model obtained an accuracy of 83.04% on validation set, which is quite good as we even reduced the size of dataset to (1/3)rd. We can still improve the accuracy of this model by using CNNs. Weāll see that in another article.
Thanks for Reading š
If you have liked this article do š and share this. Follow me on LinkedIn and Medium
Manik Soni - Machine Learning Intern - HEAD Infotech India Pvt ltd - Ace2three.com | LinkedIn_View Manik Soni's profile on LinkedIn, the world's largest professional community. Manik has 3 jobs listed on theirā¦_www.linkedin.com
Manik Soni - Medium_Read writing from Manik Soni on Medium. Machine Learning Researcher. Every day, Manik Soni and thousands of otherā¦_medium.com