Yesterday a friend of mine called me sharing his CEO mailed him asking for his personal financial details with CTA on a shortened URL. He was about to click the URL but by chance checked the mail domain name before clicking. He was about to get tricked by hackers. I told him it was a standard CEO Fraud.
It is a spear-phishing email attack in which the attacker impersonates the CEO, tricks staff into transferring money, sending confidential HR information, or revealing sensitive information.
Email impersonation trick is an old technique of fraud but still, people get tricked by it. In general, an email defender system is not required if everyone checks mail attachment format, email sender domain, inspects links on the mail (phishing links), etc. People who get 1 mail per day can do the above sanity check exercise but for those who get 100s or 1000s of emails per day, this checking is a tedious task. You can be 100 times right but 1 mistake/wrong click can hit you hard to a great loss.
Phishing links is a malicious website address designed to steal personal, financial, or account information. Phishing links may initiate malware downloads or browser-based script attacks.
The Defender system is an alert system that reads emails from the mailbox of the client and shows the severity of fraud with the reason why a mail is marked as fraud. The model returns a probability of fraud, and based on your business needs one can tweak the probability of an alert being raised.
Email Fraud is a problem that needs a Low False Negative i.e. Recall > Precision (high recall metric).
To Read Mail use Gmail API
The modeling part over mail text is a standard text classification & mail tagging task. We shall build a probabilistic model of whether a mail is fraud or not with mail text and other information.
If someone wants a check-out reference of text classification code one can check out my previous Newsletter (google colab code link inside Newsletter).
Let us design the architecture of the Product — Defender. Below is a High-Level Design of the Model/Pipeline.
With the above framework in place, we can raise alerts on the client’s mailbox whenever we suspect an incoming fraudulent mail.
Above is a sample alert that can be raised to a user.
CEO Fraud (Email Impersonation), Phishing URLs, and Malicious attachments are very common these days and a system like the above (Defender) can protect or reduce to 99% probability (even sanitizer does not kill 100% germs 😂 pun intended ) of you being trapped by a potential fraud.
While my intent is to share more about when, why, and, how data science can help in real-world problems there is always more to it so rack your brains on the problem and if you have more ideas on this, I would love to read it please share in the comments section.
I hope you learned something new from this post. If you liked it, hit 👏, subscribe to my newsletter, and share this with others. Stay tuned for the next one!
Thanks to Founder’s Book for Sponsoring this Newsletter.
Connect, Follow or Endorse me on LinkedIn if you found this read useful. If you are building an AI or a data product or service, you are invited to become a sponsor of one of the future newsletter editions. Feel free to reach out to [email protected] for more details on sponsorships.
AUTHOR: https://www.linkedin.com/in/shaurya-uppal/
Newsletter: https://www.linkedin.com/newsletters/problem-solving-data-science-6874965456701198336/
[1] Experimentation when you can’t A/B Test | Beyond A/B Testing — Switchbacks & Synthetic Control Group
[2] Mastering A/B Testing by understanding Pitfalls
[3] Data Science in Ride-Hailing at Ola, Uber, Rapido, etc.
This story was first published here.