In this series we will discuss a truly exciting natural language processing topic that is using deep learning techniques to summarize text , the code for this series is open source , and is found in a jupyter notebook format , to allow it to run on google colab without the need to have a powerful gpu , in addition all data is open source , and you don’t have to download it , as you can connect google colab with google drive and put your data directly onto google drive , without the need to download it locally , read this blog to learn more about google colab with google drive .
To summarize text you have 2 main approaches (i truly like how it is explained in this blog)
2. Abstractive method , which is building a neural network to truly workout the relation between the input and the output , not merely copying words , this series would go though this method , think of it like a pen.
this series is made for whomever feels excited to learn the power of building a deep network that is capable of
hence the name of seq2seq , sequence of inputs to sequence of outputs , which is the main algorithm that is used here .
This series would go into details on how to
Multiple research has been done throughout the last couple of years , I am currently researching these new approaches , in this series we would go through some of these approaches.
This series implement its code using google colab , so no need to have a powerful computer to implement these ideas , I am currently working on converting the most recent researches to a google colab notebooks for researches too try them out without the need to have powerful gpus , also all the data can be used without the need to download them , as we would use google drive with google colab , read this blog to learn more about how you can work on google ecosystem for deep learning
All the code would be available on this github repo , which contains modifications on some open source implementations of text stigmatization
these researches mainly include
this is a crucial implementation , as it is the cornerstone of any recent research for now i have collected different approaches that implement this concept
2. other implementation that i have found truly interesting is a combination of creating new sentences for summarization , with copying from source input , this method is called pointer generator , here is my modification in a google colab to the original implementation
3. other implementations that i am currently still researching , is the usage of reinforcement learning with deep learning
This series would be built to be easily understandable for any newbie like myself , as you might be the one that introduces the newest architecture to be used as the newest standard for text summarization , so lets begin !!
The following is a quick overview on the series , i hope you enjoy it
we would be using google colab for our work , this would enable us to use their free gpu time to build our network , ( this blog would give you even more insights on the free ecosystem for your deep project)
you have 2 main options to build your google colab
you can find the details on how to do this in this blog
having your code on google colab enables you to
you can find how to connect to google drive in this blog
since our task is a nlp task we would need a way to represent words ,this have 2 main approaches that we would discuses ,
For this task we would use a dataset in form of news and their headers , the most popular is using the CNN/Daily Mail dataset , the news body is used as the input for our model , while the header would be used as the summary target output .
These datasets could be found easily online , we would use 2 main approaches for using these datasets
Here i would briefly talk about the models that would be included if GOD wills in the coming series , hope you enjoy
to implement this task , researchers use a deep learning model that consists of 2 parts , an encoder , that understands the input , and represent it in an internal representation , and feed it to another part of the network which is the decoder ,
The main deep learning network that is used for these 2 parts in a LSTM , which stands for long short term memory , which is a modification on the rnn
in the encoder we mainly use a multi-layer bidirectional LSTM , while in the decoder we use an attention mechanism , more on this later
But researchers found 2 main problems with the above implementation , like discussed in this ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks , they have a truly amazing blog you need to see
which is
this research builds on these 2 main problems and try to fix them , I have modified their repo to work inside a jupyter notebook on google colab
C. Using Reinforcement learning with deep learning
I am still researching on this work , but it is a truly interesting research , it is about combing two fields together , it actually uses the pointer generator in its work (like in implementation B ) , and uses the same prepossessed version of the data .
This is the research , it uses this repo for its code
they actually are trying to fix 2 main problems with the corner stone implementation which are
I am currently working on implementing this approach in a jupyter notebook , so if GOD wills it , you would see more updates concerning this in the near future .
to evaluate a summary , we use a non-differentiable measures such as BLEU and ROUGE , they simply try to find the common words between the input and the output , the more the better , most of the above approches score from 32 to 38 rouge scores
I hope you enjoyed this quick overview on the series , my main focus in these blogs is to present the topic of text summarization in easy and practical way , providing you with an actual code that is runnable on any computer , without the need to have a powerful GPU , and to connect you to the latest researches about this topic , please sow your support by clapping to this blog , and don’t forget to check out the code of these blogs
In the coming blogs if GOD wills it , i would go through the details to build the corner stone implementation , that actually all the modern researches are based apon it , we will use word embedding approach , and we would use the raw data , and manually apply preprocessing
While in later blogs if GOD wills it , we would go through modern approaches like how you would be able to create a pointer generator model , to fix the problems mentioned above , and using reinforcement learning with deep learning .