Transfer learning is aimed to make use of valuable knowledge in a source domain to help model performance in a target domain.
Why do we need Transfer Learning forĀ NLP?
In NLP applications, especially when we do not have large enough datasets for solving a task(called the target task T ), we would like to transfer knowledge from other tasks S to avoid overfitting and to improve the performance of T.
Two Scenarios
Transferring knowledge to a semantically similar/same task but with a different dataset.
- Source task (S)-A Large dataset for binary sentiment classification
- Target task (T)- A small dataset for binary sentiment classification
Transferring knowledge to a task that is semantically different but shares the same neural network architecture so that neural parameters can be transferred.
- Source task (S)- A large dataset for binary sentiment classification
- Target task (T) - A small dataset for 6-way question classification (e.g., location, time, and number)
Transfer Methods
Parameter initialization (INIT).
The INIT approach first trains the network on S, and then directly uses the tuned parameters to initialize the network for TĀ . After transfer, we may fix the parameters in the target domain.i e fine tuning the parameters of T.
Multi-task learningĀ (MULT)
MULT, on the other hand, simultaneously trains samples in both domains.
Multi TaskĀ Learning
Combination (MULT+INIT)
We first pretrain on the source domain S for parameter initialization, and then train S and T simultaneously.
Model Performance on
Parameter initialization (INIT)Ā , MULT and MULT+INIT
- Transfer learning of semantically equivalent tasks appears to be successful.
- There is no big improvement for semantically different tasks.
Conclusion
The Neural Transfer Learning in NLP depends largely on how similar in semantics the source and target datasets are.