Imagine reading something, and never losing track of that information.
Motivation: I usually find it difficult to remember what I read as time passes. As shown in the graph above, memory retention drops exponentially after the first few days. I try to take thorough notes, and look them over regularly, but I usually need a trigger event for me to revisit them. This is super unsustainable, and I’m sure this is the case for most people. Wouldn’t it be great if you could visit your highlights more regularly? In the graph above, the more you review something you’ve learned, the more it becomes a part of you.
I searched the internet for a passive way to re-read notes and found readwise.io — a service that emails you your highlights everyday from various sources. Since I have been learning about object oriented functions of Python, and software architecture design patterns (and forgetting them mostly), I decided to put those skills to use and build a DIY version of the service for myself. Together, we’re going to build this application using Python (and its object oriented features). This application will make sure that anything you read (and highlight), gets presented to you on a regular basis so you never forget the material. Through spaced repetition, you can instill the notes in yourself.
Things this app does:
Let’s get started
We’re going to need data. This is the most manual step of the entire process. I use PDF expert to read PDFs and it has a feature to export all annotations. I simply put these in an excel document, which I then convert to JSON (using a generic Excel to JSON service on the internet). See the sample JSON file below. Each block represents a highlight/note.
# JSON data
{
"Sheet1": [
{
"date_added": "May 12, 8:59 AM, by Ankush Garg",
"source": "Book",
"title": "Fundamentals of Software Architecture",
"chapter": "N/A",
"note": "N/A",
"highlight": "The microkernel architecture style is a relatively simple monolithic architecture consisting of two architecture components: a core system and plug-in components.",
"page_number": "Page 165",
"has_been_chosen_before": "0",
"id": "48"
},
{
"date_added": "Apr 12, 10:50 AM, by Ankush Garg",
"source": "Book",
"title": "Genetic Algorithms with Python",
"chapter": "Chapter 4: Combinatorial Optimization - Search problems and combinatorial optimization",
"note": "N/A",
"highlight": "A search algorithm is focused on solving a problem through methodic evaluation of states and state transitions, aiming to find a path from the initial state to a desirable final (or goal) state. Typically, there is a cost or gain involved in every state transition, and the objective of the corresponding search algorithm is to find a path that minimizes the cost or maximizes the gain. Since the optimal path is one of many possible ones, this kind of search is related to combinatorial optimization, a topic that involves finding an optimal object from a finite, yet often extremely large, set of possible objects.",
"page_number": "Page 109",
"has_been_chosen_before": "0",
"id": "21"
}
]}
Folder structure: I’ll be using Pycharm to build this app. Let’s construct empty .py files in a project directory shown in the image below. Feel free to put these files in any folder you prefer. The main thing we’re going for is that each of these services will rely upon each-other for their inputs/outputs.
They’ll take that data, transform it, and then do something with it.
A very reasonable question at this point is why I decided to create 4 separate scripts for simply reading in the data, selecting some entries, and emailing those to a specified email account. The reason is
MODULARITY
. I want each of these services to do exactly what they're designed to, and nothing more. In the future, if I want to swap functionality out, I can do that easily because there's minimal dependency between each service. I'll give an example:
database.py
currently reads in the data file locally, but in the future as the dataset increases in volume, it may pull from data stored in S3. Accommodating this change will require a massive overhaul throughout the application, but having a separate modular service with minimal dependency, allows for easily swapping big pieces of functionality at will.
Let’s walk through each of the service files:
1.
database.py
import json
# Ended up using http://beautifytools.com/excel-to-json-converter.php to convert Excel to Json
# URL where data is stored - local on my computer for now
url = '/Users/ankushgarg/Desktop/email-reading-highlights/notes-email-sender/data/data.json'
def read_json_data():
with open(url) as json_file:
response = json.load(json_file)
return response
Database file is simple. It loads data that’s stored locally using the
read_json_data
function. We now have access to the data in our application.2.
selector_service.py
# This script reads in the data from S3 and selects highlights
import numpy as np
from database import read_json_data
def increment_has_chosen_before(item):
count_now = int(item['has_been_chosen_before'])
item['has_been_chosen_before'] = count_now + 1
class SelectorService:
def __init__(self):
self.raw_response = read_json_data() # Read in JSON data
self.sampled_object = None
self.sheet_name_to_sample_by = 'Sheet1'
self.num_of_entries_to_sample = 3 # Number of entries to select
def select_random_entries(self):
# Randomly choose entries from the dataset
self.sampled_object = np.random.choice(self.raw_response[self.sheet_name_to_sample_by],
self.num_of_entries_to_sample)
# For each selection increment the field "has_been_chosen_before"
# In the future can use probability to make selections to notes that haven't gotten selected
for note in self.sampled_object:
increment_has_chosen_before(note)
return self.sampled_object
Selector Service has an attribute that relies on
read_json_data
as we saw above, and self.raw_response
is the returned response. Three entries are selected randomly in selected_random_entries
and stored in self.sampled_object
. We have sampled the entries now and are ready to parse that content.
3.
parse_content.py
from selector_service import SelectorService
class ContentParser:
def __init__(self):
self.sample_entries = SelectorService().select_random_entries()
self.content = None
def parse_selected_entries(self):
content = ''
for item_index in range(len(self.sample_entries)):
item = "DATE-ADDED: " + self.sample_entries[item_index]['date_added']
content = content + item + "\n"
item = "HIGHLIGHT: " + self.sample_entries[item_index]['highlight']
content = content + item + "\n"
item = "TITLE: " + self.sample_entries[item_index]['title']
content = content + item + "\n"
item = "CHAPTER: " + self.sample_entries[item_index]['chapter']
content = content + item + "\n"
item = "SOURCE: " + self.sample_entries[item_index]['source']
content = content + item + "\n"
item = "PAGE-NUMBER: " + self.sample_entries[item_index]['page_number']
content = content + item + "\n" + "------------" + "\n"
self.content = content
return self.content
ContentParser
class takes in random entries, stores them as a class attribute self.sample_entries
, and parses them in a format useful for emailing using parse_selected_entries
method. Parse_selected_entries
is simply formatting the content for the email to be sent out in the next step. It looks complicated, but text formatting is all that’s happening. Parsed content can now be emailed.4.
mail_service.py
# This service emails whatever it gets back from Content Parser
from parse_content import ContentParser
import smtplib
from email.message import EmailMessage
class MailerService:
def __init__(self):
self.msg = EmailMessage()
self.content = ContentParser().parse_selected_entries()
def define_email_parameters(self):
self.msg['Subject'] = 'Your Highlights and Notes for today'
self.msg['From'] = "[email protected]" # your email
self.msg['To'] = ["[email protected]"] # recipient email
def send_email(self):
self.msg.set_content(self.content)
with smtplib.SMTP_SSL('smtp.gmail.com', 465) as smtp:
smtp.login("[email protected]", 'password') # email account used for sending the email
smtp.send_message(self.msg)
return True
def run_mailer(self):
self.define_email_parameters()
self.send_email()
def run_job():
composed_email = MailerService()
composed_email.run_mailer()
run_job()
MailerService
takes in parsed content by ContentParser
and stores it as self.content
class attribute. Define_email_parameter
sets email parameters such as subject, to and from, and sends the email using send_mail
method. Both methods are triggered by
run_mailer
and the entire application is run by run_job
function at the very bottom. This sends out an email to a specified account. This is what the email looks like.Sample Email
Congrats, you’ve made it this far!! One last thing is to run
mail_service.py
on a schedule. Let’s use Crontab
for that. Cron is a long-running process that executes commands at specific dates and times, and can be used to schedule recurring tasks. In your Crontab, add the following code with your absolute paths:
0 19 * * * /Users/ankushgarg/.pyenv/shims/python /Users/ankushgarg/Desktop/email-reading-highlights/notes-email-sender/mail_service.py >> /Users/ankushgarg/Desktop/email-reading-highlights/notes-email-sender/cron.log 2>&1
This script runs everyday at 7 PM PST. Check out https://crontab.guru/ for coming up with a schedule in a Cron format.
You’re done! My call to action is for you to make it better. Some ideas to enhance this project and make it yours:
has_been_chosen_before
attribute to make the selection better. Currently the sampling is happening randomly with replacement. You can change it so that has_been_chosen_before
informs probabilistically which highlight to include next. Once you have the structure down, there’s so much you can do. If you do decide to enhance this app, reach out and let me know so I can get some ideas for improvement as well. If anything is unclear, let me know and I’d be happy to clarify.
Cheers!