I’ve always been a supporter of well-curated newsletters. They give me an
opportunity to get a good overview of what happened in the fields I follow within a span of a day, a week or a month. However, not all the newsletters fit this category. Some don’t think three times before selling email addresses to 3rd-parties — and within the blink of an eye your mailbox can easily get flooded with messages that you didn’t request. Others may sign up your address for other services or newsletters as well, and often they don’t often much granularity to configure which communications you want to receive.
Even in the best-case scenario, the most privacy-savvy user may still think twice before signing up for a newsletter — you’re giving your personal email address to someone else you don’t necessarily trust, implying “yes, this is my address and I’m interested in this subject”. Additionally, most
of the newsletters spice up their URLs with tracking parameters, so they can easily measure user engagement — something you may not necessarily be happy with.
Moreover, the customization junkie may also have a valid use case for a more finely tuned selection of content in his newsletter — you may want to group some sources together into the same daily/weekly email, or you may be interested only in some particular subset of the subjects covered by a newsletter, filtering out those that aren’t relevant, or customize the style of the digest that gets delivered.
Finally, a fully automated way to deliver newsletters through 5 lines of
code and the tuning of a couple of parameters is the nirvana for many
companies of every size out there.
Those who read my articles in the past may know that I’m an avid consumer of RSS feeds. Despite being a 21-year-old technology, they do their job very well when it comes to deliver the information that matters without all the noise and trackers, and they provide a very high level of
integration being simple XML documents.
However, in spite of all the effort I put to be up-to-date with all my sources, a lot of potentially interesting content inevitably slips through — and that’s where newsletters step in, as they filter and group together all the content
that was generated in a given time frame and periodically deliver it to
your inbox.
My ideal solution would be something that combines the best aspects of
both the worlds: the flexibility of an RSS subscription, combined with a
flexible way of filtering and aggregating content and sources, and get
the full package delivered at my door in whichever format I like (HTML,
PDF, MOBI…).
In this article I’m going to show how to achieve this goal with a few tools:
Let’s cover these points step by step.
Those who have already read my previous articles may have heard of Platypush — the automation platform I’ve been building in the past few years. For those who aren’t familiar, an advised read is my first Medium post that illustrates some of its capabilities and the paradigm behind it.
We’ll be using the
backend configured with one or more http.poll
objects to poll our RSS sources at regular intervals and create the digests, and either the RssUpdates
plugin or the mail.smtp
google.mail
plugin to send the digests to our email.You can install Platypush on any device where you want to run your logic — a RaspberryPi, an old laptop, a cloud node, and so on. We will install
the base package with the
rss
module. Optionally, you can install it with the pdf module as well (if you want to export your digests also to PDF) or the
google
module (if you want to send the newsletter from a GMail address instead of an SMTP server).The first option is to install the latest stable version through
pip
:pip install 'platypush[rss]'
# Or
pip install 'platypush[rss,pdf,google]'
The other option is to install the latest git version:
git clone [email protected]/BlackLight/platypush.git
cd platypush
pip install '.[rss]'
# Or
pip install '.[rss,pdf,google]'
Once the software is installed, create the configuration file
~/.config/platypush/config.yaml
if it doesn't exist already and add the configuration for the RSS monitor:# Generic HTTP endpoint monitor
backend.http.poll:
requests:
# Add a new RSS feed to the pool
- type: platypush.backend.http.request.rss.RssUpdates
url: https://www.technologyreview.com/feed/ # URL to the RSS feed
title: MIT Technology Review # Title of the feed (shown in the head of the digest)
poll_seconds: 86400 # How often we should monitor this source (24*60*60 secs = once a day)
digest_format: html # Format of the digest (HTML or PDF)
You can also add more sources to the
http.poll
requests object, each with its own configuration. Also, you can customize the style of your digest by passing some valid CSS to these configuration attributes:# Style of the body element
body_style: 'font-size: 20px; font-family: "Merriweather", Georgia, "Times New Roman", Times, serif'
# Style of the main title
title_style: 'margin-top: 30px'
# Style of the subtitle
subtitle_style: 'margin-top: 10px; page-break-after: always'
# Style of the article titles
article_title_style: 'font-size: 1.6em; margin-top: 1em; padding-top: 1em; border-top: 1px solid #999'
# Style of the article link
article_link_style: 'color: #555; text-decoration: none; border-bottom: 1px dotted font-size: 0.8em'
# Style of the article content
article_content_style: 'font-size: 0.8em'
The
digest_format
attribute determines the output format of your digest - you may want to choose html
if you want to deliver a summary of the articles in a newsletter, or pdf
if you want instead to deliver the full content of each item as an attachment to an email address. Bonus point: since you can send PDFs to a Kindle if you configured an email address, this mechanism allows you to deliver the full digest of your RSS feeds to your Kindle's email address.
The
object also provides native integration with the Mercury Parser API to automatically scrape the content of a web page - I covered some of these concepts in my past article on how to parse RSS feeds and send the PDF digest to your e-reader. RssUpdates
The same mechanism works well for newsletters too. If you want to parse the content of the newsletter as well, all you have to do is configure the
Platypush plugin. Since the Mercury API doesn't provide a Python binding, this requires a couple of JavaScript dependencies:http.webpage
# Install Node and NPM, e.g. on Debian:
apt-get install nodejs npm
# Install the Mercury Parser API
npm install [-g] @postlight/mercury-parser
# Make sure that the Platypush PDF module dependencies
# are installed if you plan HTML->PDF conversion
pip install 'platypush[pdf]'
Then, if you want to parse the full content of the items and generate a PDF digest out of them, change your
http.poll
configuration to something like this:backend.http.poll:
requests:
- type: platypush.backend.http.request.rss.RssUpdates
url: https://www.technologyreview.com/feed/
title: MIT Technology Review
poll_seconds: 86400
digest_format: pdf # PDF digest format
extract_content: True # Extract the full content of the items
WARNING: Extracting the full content of the articles in an RSS feed has two limitations — a practical one and a legal one:
When new content is published on a subscribed RSS feed Platypush will generate a
and it should create a copy of the digest under NewFeedEvent
~/.local/share/platypush/feeds/cache/{date:time}_{feed-title}.[html|pdf]
. The
NewFeedEvent
in particular is the link you need to create your custom logic that sends an email to a list of addresses when new content is available.First, configure the Platypush mail plugin you prefer. When it comes to sending emails you primarily have two options:
mail.smtp
plugin — if you want to send emails directly through an SMTP server. Platypush configuration:mail.smtp:
username: [email protected]
password: your-pass
server: smtp.gmail.com
port: 465
ssl: True
google.mail
plugin— if you want to use the native GMail API to send emails. If that is the case then first make sure that you have the dependencies for the Platypush Google module installed:pip install 'platypush[google]'
In this case you’ll also have to create a project on the Google Developers console and download the OAuth credentials:
~/.credentials/google/client_secret.json
.python -m platypush.plugins.google.credentials \
"https://www.googleapis.com/auth/gmail.modify" \
~/.credentials/google/client_secret.json \
--noauth_local_webserver
At this point the GMail delivery is ready to be used by your Platypush automation.
Now that both the RSS parsing logic and the mail integration are in place, we can glue them together through the NewFeedEvent event. The new advised way to configure events in Platypush is through native Python scripts - the custom YAML-based syntax for events and procedure was becoming too cumbersome to maintain and write (although it’s still supported), and I feel like going back to a clean and simple Python API may be a better option.
Create and initialize the Platypush scripts directory, if it doesn’t existing already:
mkdir -p ~/.config/platypush/scripts
cd ~/.config/platypush/scripts
touch __init__.py # Initialize the root Python module
Then, create a new hook on
NewFeedEvent
:$EDITOR rss_news.py
import os
from typing import List
from platypush.event.hook import hook
from platypush.message.event.http.rss import NewFeedEvent
from platypush.utils import run
# Path to your mailing list - a text file with one address per line
maillist = os.path.expanduser('~/.mail.list')
def get_addresses() -> List[str]:
with open(maillist, 'r') as f:
return [addr.strip() for addr in f.readlines()
if addr.strip() and not addr.strip().startswith('#')]
# This hook matches:
# - event_type=NewFeedEvent
# - digest_format='html'
# - source_title='MIT Technology Review'
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
# The digest output file is stored in event.args['digest_filename']
with open(event.args['digest_filename'], 'r') as f:
run(action='mail.smtp.send',
from_='[email protected]',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body=f.read(),
body_type='html')
If you opted for the native GMail plugin you may want to go for:
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
# The digest output file is stored in event.args['digest_filename']
with open(event.args['digest_filename'], 'r') as f:
run(action='google.mail.compose',
sender='[email protected]',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body=f.read())
If instead you want to send the digest in PDF format as an attachment:
@hook(NewFeedEvent, digest_format='html', source_title='MIT Technology Review')
def send_mit_rss_feed_digest(event: NewFeedEvent, **_):
# mail.smtp plugin case
run(action='mail.smtp.send',
from_='[email protected]',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body='',
attachments=[event.args['digest_filename'])
# google.mail case
run(action='google.mail.compose',
sender='[email protected]',
to=get_addresses(),
subject=f'{event.args.get("source_title")} feed digest',
body='',
files=[event.args['digest_filename'])
Finally, create your
~/.mail.list
file with one destination email address per line and start platypush either from the command line or as a service. You should receive your email with the first batch of articles shortly after startup, and you'll receive more items if a new batch is available after the poll_seconds
configured period.Previously published at https://medium.com/@automationguru/how-to-automatically-deliver-customized-newsletters-from-rss-feeds-with-platypush-8c540a557fa