I have written articles where we can classify different types of images such as:
Iris genus classification|DeepCognition| Azure ML studio_Kingdom:Plantae Clade:Angiosperms Order:Asparagales Family:Iridaceae Subfamily:Iridoideae Tribe:Irideae Genus:Iris_towardsdatascience.com
How good is ‘your’ selfie?_Check out how good is your selfie here!_medium.com
How To Make AI That Classifies Dog Breeds,DeepLearningStudio_Love dogs?_hackernoon.com
Or.. like generating stories using RNNs…
Generate stories using RNNs |pure Mathematics with code|:_Hi reader!_hackernoon.com
So I thought of writing an article which explains how to classify different sounds using AI.
In this article, we’ll see how to prepare a dataset for sound classification and how to use it for our Deep Learning model.
DATA SET:
We are going to use dataset from Urban Sound Classification Challenge. This dataset consist of 8700+ excerpts of sounds from 10 different sources in ‘.wav’ format.
Sources of Sound:
The size of this dataset is around 5.6GB which made me a bit reluctant to train a model on it. So I have written a python script which can be used to decrease the size of dataset. Basically, the dataset contains around 600 excerpts of sound from each source. I have reduced them to 170, making dataset to around 1.1GB. You can download the script from my repo.
Manik9/Urban-Sound_Urban-Sound - Urban Sound classification using Neural Nets_github.com
Platform to train the program
Training a program which requires high computational power such as Deep Learning model, I prefer to use Deep Learning Studio’s(DLS) jupyter notebooks. It provides Amazon Deep Learning Instances with GPU which can be used to train the model.
Check it out here.
Home_We would like to invite you to join Deep Cognition's team at Booth# 1035 at the GPU Technology Conference March 26-29…_deepcognition.ai
Upload Dataset
Upload the ‘Urban sound’ dataset in datasets folder
Start DLS’s jupyter notebooks
Audio processing
In case of images, we generally pass the values of pixels to our model. In case of Audio too, we need to pass some sort of numerical values which represents our audio.
•librosa is a library in python which can be used for audio pre-processing.
source: Wikipedia
Model Architecture
Each example contains 40 columns i.e(1x40). So 1150 example contains 1150x40. Transpose of this is passed to the model.
Architecture of our model
We get a 1x10 output which represents score of each class
Training
After 100 epochs:
Training Results
Our trained model obtained an accuracy of 83.04% on validation set, which is quite good as we even reduced the size of dataset to (1/3)rd. We can still improve the accuracy of this model by using CNNs. We’ll see that in another article.
Thanks for Reading 😄
If you have liked this article do 👏 and share this. Follow me on LinkedIn and Medium
Manik Soni - Machine Learning Intern - HEAD Infotech India Pvt ltd - Ace2three.com | LinkedIn_View Manik Soni's profile on LinkedIn, the world's largest professional community. Manik has 3 jobs listed on their…_www.linkedin.com
Manik Soni - Medium_Read writing from Manik Soni on Medium. Machine Learning Researcher. Every day, Manik Soni and thousands of other…_medium.com