paint-brush
Processing 350K Requests Per Month via Three Free ETA Services Instead of One Paid Google Serviceby@maddevs
657 reads
657 reads

Processing 350K Requests Per Month via Three Free ETA Services Instead of One Paid Google Service

by Mad DevsJuly 11th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Andrew CTO at Mad Devs explains how to not spend even a penny by using three ETA (estimated time of arrival) services instead of one. GoDee is a start-up project that offers booking seats on a bus online. Google API provides 10,000 free queries per month, after which every 1,000 queries are charged $20. Pifia micro-service uses the same amount of requests in the Google Distance Matrix API to calculate the bus’s approximate arrival time, aka ETA.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - Processing 350K Requests Per Month via Three Free ETA Services Instead of One Paid Google Service
Mad Devs HackerNoon profile picture

This is a story on how to not spend even a penny by using three ETA (estimated time of arrival) services instead of one. Everything is based on my personal experience working as a back-end developer at GoDee project. GoDee is a start-up project that offers booking seats on a bus online. You could find more information about this project here.

Prehistory

GoDee is a public transportation service. Bus transportation by GoDee is more convenient than motorbikes common for Southeast Asia and cheaper than a taxi. The app-based system allows users to find an appropriate route, select the time, book the seat, and pay for the ride online. And one of the problems of GoDee is traffic jams that severely impact the user experience. Users get tired of waiting and get annoyed by trying to guess the bus arrival time. So, to make the commuting more convenient, it needed service to calculate the bus’s approximate arrival time, aka ETA.

Developing ETA from scratch would take at least a year. So, to speed up the process, GoDee decided to implement the Google Distance Matrix API tool. Later they developed their own Pifia micro-service.

Problems

Over time, the business grew, and the user base increased. We encountered a problem with increasing requests in the Google Distance Matrix API.

Why is this a problem?

Because every request costs money, Google API provides 10.000 free queries per month, after which every 1.000 queries are charged $20. At that time, we had about 150,000 requests per month.

My mentor was very dissatisfied with that. And said that system should change cashing to store ETA every 30 minutes. At that time, the system sent requests to the Google API every 3 seconds to get fresh data. However, such a cashing algorithm wasn’t efficient, since minibuses were stuck in traffic. And so the distance only changed once every ten minutes. There was another nuance. For example, five users are asking for information about the same bus, and this is the same request. The cache solved this type of problem.

func newCache(cfg config.GdmCacheConfig,
	pf func(from, to geometry.Coordinate) 
(durationDistancePair, error)) *Cache {
	res := Cache{
		cacheItems:                  make(map[string]gdmCacheItem),
		ttlSec:                      cfg.CacheItemTTLSec,
		invalidatePeriodSec:         cfg.InvalidationPeriodSec,
		pfGetP2PDurationAndDistance: pf,
	}
	return &res
}

func (c *Cache) get(from, to geometry.Coordinate) 
(gdmCacheItem, bool) {
	c.mut.RLock()
	defer c.mut.RUnlock()

	keyStr := geometry.EncodeRawCoordinates([]geometry.Coordinate
{from, to})
	val, exist := c.cacheItems[keyStr]
	if exist {
		return val, exist
	}

	itemsWithToEq := make([]gdmCacheItem, 0, 
len(c.cacheItems))
	for _, v := range c.cacheItems {
		if v.to == to {
			itemsWithToEq = 
append(itemsWithToEq, v)
		}
	}

	for _, itwt := range itemsWithToEq {
		p1 := 
geometry.Coordinate2Point(from)
		p2 := 
geometry.Coordinate2Point(itwt.from)
		if c.geom.DistancePointToPoint(p1, 
p2) > 10.0 {
			continue
		}
		return itwt, true
	}

	return gdmCacheItem{}, false
}

func (c *Cache) set(from, to geometry.Coordinate) 
(gdmCacheItem, error) {
	keyStr := 
geometry.EncodeRawCoordinates([]geometry.Coordinate
{from, to})

	c.mut.Lock()
	defer c.mut.Unlock()
	if v, ex := c.cacheItems[keyStr]; ex {
		return v, nil
	}

	resp, err := 
c.pfGetP2PDurationAndDistance(from, to)
	if err != nil {
		return gdmCacheItem{}, err
	}

	neuItem := gdmCacheItem{
		from: from,
		to:   to,
		data: durationDistancePair{
			dur:            resp.dur,
			distanceMeters: 
resp.distanceMeters},
		invalidationTime: 
time.Now().Add(time.Duration(c.ttlSec) * 
time.Second),
	}

	c.cacheItems[keyStr] = neuItem
	return neuItem, nil
}

func (c *Cache) invalidate() {
	c.mut.Lock()
	defer c.mut.Unlock()
	toDelete := make([]string, 0, 
len(c.cacheItems))
	for k, v := range c.cacheItems {
		if 
time.Now().Before(v.invalidationTime) {
			continue
		}
		toDelete = append(toDelete, k)
	}
	for _, td := range toDelete {
		delete(c.cacheItems, td)
	}
}

func (c *Cache) run() {
	ticker := 
time.NewTicker(time.Duration(c.invalidatePeriodSec) * time.Second)
	for {
		select {
		case <-ticker.C:
			c.invalidate()
		}
	}
}

Alternative services

The cache worked, but not for long since GoDee grew even further and faced the same problem — the number of queries has increased again.

It was decided to replace the Google API with OSRM. Basically, OSRM is a service for building a route based on ETA (this is a rough but the short description, if you need details, here is the link).

The Open Source Routing Machine or OSRM is a C++ implementation of a high-performance routing engine for the shortest paths in road networks.
Wikipedia.

OSRM has one problem: it builds routes and calculates ETA without taking traffic into account. To solve this problem, I started looking for services that can provide information about traffic in the specified part of the city. HERE Traffic was providing the data I needed. After a little study of the documentation, I wrote a small code that gets traffic information every 30 minutes. And to upload traffic information to OSRM, I wrote a small script with the command ./osrm-contract data.osrm --segment-speed-file updates.csv (more details here).

Math time: every half of the hour, there is a request to HERE to get traffic information this are two requests per hour, that is, a day is 48 requests (24 * 2 = 48) and a month is about ≈ 1.488 (48*31 = 1.488) a year 17.520. Yes, we have these free requests from HERE for 15 years would be enough.

// everything that these structures mean is 
described here 
https://developer.here.com/documentation/traffic/de
v_guide/topics/common-acronyms.html
type hereResponse struct {
	RWS []rws `json:"RWS"`
}

type rws struct {
	RW []rw `json:"RW"`
}

type rw struct {
	FIS []fis `json:"FIS"`
}

type fis struct {
	FI []fi `json:"FI"`
}

type fi struct {
	TMC tmc  `json:"TMC"`
	CF  []cf `json:"CF"`
}

type tmc struct {
	PC int     `json:"PC"`
	DE string  `json:"DE"`
	QD string  `json:"QD"`
	LE float64 `json:"LE"`
}

type cf struct {
	TY string  `json:"TY"`
	SP float32 `json:"SP"`
	SU float64 `json:"SU"`
	FF float64 `json:"FF"`
	JF float64 `json:"JF"`
	CN float64 `json:"CN"`
}

type geocodingResponse struct {
	Response response `json:"Response"`
}

type response struct {
	View []view `json:"View"`
}

type view struct {
	Result []result `json:"Result"`
}

type result struct {
	MatchLevel string   `json:"MatchLevel"`
	Location   location `json:"Location"`
}

type location struct {
	DisplayPosition position `json:"DisplayPosition"`
}

type position struct {
	Latitude  float64 `json:"Latitude"`
	Longitude float64 `json:"Longitude"`
}

type osmInfo struct {
	Waypoints []waypoints `json:"waypoints"`
	Code      string      `json:"code"`
}

type waypoints struct {
	Nodes    []int     `json:"nodes"`
	Hint     string    `json:"hint"`
	Distance float64   `json:"distance"`
	Name     string    `json:"name"`
	Location []float64 `json:"location"`
}

type osmDataTraffic struct {
	FromOSMID int
	ToOSMID   int
	TubeSpeed float64
	EdgeRate  float64
}

// CreateTrafficData - function creates a cvs file 
containing traffic information
func CreateTrafficData(h config.TrafficConfig) 
error {

	osm := make([]osmDataTraffic, 0)

	x, y := mercator(h.Lan, h.Lon, h.MapZoom)
	quadKey := tileXYToQuadKey(x, y, h.MapZoom)

	trafficInfo, err := 
getTrafficDataToHereService(quadKey, h.APIKey)
	if err != nil {
		return err
	}

	for _, t := range trafficInfo.RWS[0].RW {

		for j := 0; j < len(t.FIS[0].FI)-1; j++ {
			position, err := 
getCoordinateByStreetName(t.FIS[0].FI[j].TMC.DE, 
h.APIKey)
			if err != nil {
				logrus.Error(err)
				continue
			}

			osmID, err := 
requestToGetNodesOSMID(position.Latitude, 
position.Longitude, h.OSMRAddr)
			if err != nil {
				logrus.Error(err)
				continue
			}

			osm = append(osm, osmDataTraffic{
				FromOSMID: osmID[0],
				ToOSMID:   osmID[1],
				TubeSpeed: 0,
				EdgeRate:  t.FIS[0].FI[j].CF[0].SU,
			})

		}

	}
	if err := createCSVFile(osm); err != nil {
		return err
	}
	return nil
}

// http://mathworld.wolfram.com/MercatorProjection.htm
l
func mercator(lan, lon float64, z int64) (float64, 
float64) {
	latRad := lan * math.Pi / 180
	n := math.Pow(2, float64(z))
	xTile := n * ((lon + 180) / 360)
	yTile := n * (1 - (math.Log(math.Tan(latRad)+1/math.Cos(latRad)) / 
math.Pi)) / 2
	return xTile, yTile
}

// http://mathworld.wolfram.com/MercatorProjection.htm
l
func tileXYToQuadKey(xTile, yTile float64, z int64) 
string {
	quadKey := ""
	for i := uint(z); i > 0; i-- {
		var digit = 0
		mask := 1 << (i - 1)
		if (int(xTile) & mask) != 0 {
			digit++
		}
		if (int(yTile) & mask) != 0 {
			digit = digit + 2
		}
		quadKey += fmt.Sprintf("%d", digit)
	}
	return quadKey
}

// requestToGetNodesOSMID - function for getting 
osm id by coordinates
func requestToGetNodesOSMID(lan, lon float64, 
osrmAddr string) ([]int, error) {

	osm := osmInfo{}

	// here it is necessary that at the 
beginning lon And then lan
	// WARN only Ho Chi Minh
	url := 
fmt.Sprintf("http://%s/nearest/v1/driving/%v,%v", 
osrmAddr, lon, lan)

	resp, err := http.Get(url)
	if err != nil {
		return nil, err
	}

	if resp.StatusCode != http.StatusOK {
		return nil, fmt.Errorf("Status code 
%d", resp.StatusCode)
	}

	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return nil, err
	}

	err = json.Unmarshal(body, &osm)
	if err != nil {
		return nil, err
	}

	if len(osm.Waypoints) == 0 {
		return nil, fmt.Errorf("Nodes are 
empty, lan: %v, lon: %v", lan, lon)
	}

	return osm.Waypoints[0].Nodes, nil
}

// https://developer.here.com/documentation/geocoder/d
ev_guide/topics/quick-start-geocode.html

// getCoordinateByStreetName - function of the 
coordinates by street name

func getCoordinateByStreetName(streetName, apiKey 
string) (position, error) {

	streetName += " Ho Chi Minh"
	url := 
fmt.Sprintf("https://geocoder.ls.hereapi.com/6.2/ge
ocode.json?apiKey=%s&searchtext=", apiKey)

	gr := geocodingResponse{}

	streetNames := strings.Split(streetName, " ")
	for _, s := range streetNames {
		url += s + "+"
	}
	resp, err := http.Get(url)
	if err != nil {
		return position{}, err
	}

	if resp.StatusCode != http.StatusOK {
		return position{}, 
fmt.Errorf("Status code %d", resp.StatusCode)
	}

	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return position{}, err
	}

	err = json.Unmarshal(body, &gr)
	if err != nil {
		return position{}, err
	}
	if len(gr.Response.View) == 0 {
		return position{}, errors.New("View 
response empty")
	}
	for _, g := range 
gr.Response.View[0].Result {
		if g.MatchLevel == "street" {
			return 
g.Location.DisplayPosition, nil
		}
	}
	return position{}, fmt.Errorf("street: %s 
not found", streetName)
}


func getTrafficDataToHereService(quadKey, apiKey 
string) (hereResponse, error) {
	rw := hereResponse{}

	url := 
fmt.Sprintf("https://traffic.ls.hereapi.com/traffic
/6.2/flow.json?quadkey=%s&apiKey=%s", quadKey, 
apiKey)

	resp, err := http.Get(url)
	if err != nil {
		return rw, err
	}

	if resp.StatusCode != http.StatusOK {
		return rw, fmt.Errorf("Status code 
%d", resp.StatusCode)
	}

	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return rw, err
	}

	err = json.Unmarshal(body, &rw)
	if err != nil {
		return rw, err
	}

	return rw, nil
}


func createCSVFile(data []osmDataTraffic) error {

	if err := 
os.Remove("./traffic/result.csv"); err != nil {
		logrus.Error(err)
	}

	file, err := 
os.Create("./traffic/result.csv")
	if err != nil {
		return err
	}

	defer file.Close()

	writer := csv.NewWriter(file)
	defer writer.Flush()

	for _, value := range data {
		str := 
createArrayStringByOSMInfo(value)

		err := writer.Write(str)
		if err != nil {
			logrus.Error(err)
		}
	}
	return nil
}

func createArrayStringByOSMInfo(data 
osmDataTraffic) []string {
	var str []string

	str = append(str, fmt.Sprintf("%v", 
data.FromOSMID))
	str = append(str, fmt.Sprintf("%v", 
data.ToOSMID))
	str = append(str, fmt.Sprintf("%v", 
data.TubeSpeed))
	str = append(str, fmt.Sprintf("%v", 
data.EdgeRate))

	return str
}

Preliminary tests showed that the service works perfectly, but there is a problem, HERE gives traffic information in “gibberish” and the data does not match the OSRM format. In order for the information to fit, you need to use another service HERE for geocoding + OSRM (for getting points on the map). This is approximately 450.000 requests per month. Later, OSRM was abandoned because the number of requests exceeded the free limit. We didn’t give up and enabled the HERE Distance Matrix API and temporarily removed the Google Distance Matrix API. The logic HERE is simple: we send coordinates from point A to point B and get the bus arrival time.

type response struct {
	Response matrixResponse `json:"response"`
}

type matrixResponse struct {
	Route    []matrixRoute `json:"route"`
}

type matrixRoute struct {
	Summary summary `json:"summary"`
}

type summary struct {
	Distance    int `json:"distance"`
	TrafficTime int `json:"trafficTime"`
}

func HereDistanceETA() (response, error) {
    	matrixResponse := response{}
	query := fmt.Sprintf("&waypoint%v=geo!%v,%v", 0, from.Lat, 
from.Lon)
	query += fmt.Sprintf("&waypoint%v=geo!%v,%v", 1, to.Lat, 
to.Lon)
	query += 
"&mode=fastest;car;traffic:enabled"
	
	url := fmt.Sprintf("https://route.ls.hereapi.com/routing/7
.2/calculateroute.json?apiKey=%s", h.hereAPIKey)
	url += query
	resp, err := http.Get(url)
	if err != nil {
		logrus.WithFields(logrus.Fields{
			"url":   url,
			"error": err,
		}).Error("Get here response 
failed")
		return durationDistancePair{}, err
	}
	if resp.StatusCode != http.StatusOK {
		return durationDistancePair{}, 
fmt.Errorf("Here service, status code %d", 
resp.StatusCode)
	}

	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return durationDistancePair{}, err
	}

	err = json.Unmarshal(body, &matrixResponse)
	if err != nil {
		return durationDistancePair{}, err
	}

	if len(matrixResponse.Response.Route) == 0 {
		return durationDistancePair{}, 
errors.New("Matrix response empty")
	}
	res := durationDistancePair{
		dur:            
time.Duration(matrixResponse.Response.Route[0].Summ
ary.TrafficTime) * time.Second,
		distanceMeters: 
matrixResponse.Response.Route[0].Summary.Distance,
	}
	return res, nil
}

After we installed everything on the test server and started checking, we received the first feedback from the testers. They said that ETA reads the time incorrectly. We started looking for the problem, looked at logs (we used Data dog for logs), logs, and tests showed that everything works perfectly. We decided to ask about the problem in a little more detail, and it turned out that if the car is in traffic for 15 minutes, ETA shows the same time. We decided that this is because of the cache because it stores the original time and does not update it for 30 minutes.

We started looking for the problem, at the beginning we checked the data on the web version of the HERE Distance Matrix API (which is called we go here), everything worked fine, we received the same ETA. This problem was also checked on the google map service. There was no problem. The services themselves show this ETA. We explained everything to testers and businesses, and they accepted everything.

Our team lead suggested connecting another ETA service and returning the Google API as a backup option and writing code with the logic of switching services (the switch was needed if the requests pass the free number of requests).

The code works the following way:

val = getCount() // getting the number of queries 
used
if getMax() <= val { // checking for the limit of 
free requests for the service used
newService = switchService(s) // // if the limit is 
reached, switch the service return
return newService(from, to) // giving the logic of 
the new service 

We found the following Mapbox service, connected it, installed it, and it worked. As a result, our ETA had:

“Here” — 250,000 free requests per month
Google — 10,000 free requests per month
Mapbox — 100,000 free requests per month

Conclusion

Always look for alternatives, sometimes it happens that the business doesn't want to pay the money for the service and refuses it. As a developer who has worked hard on the service, you should bring the task to real use. This article describes how we were trying to connect more services for the free use of ETA because the business did not want to pay for the service.

P.S. As a developer, I believe that if the tool is good and does its job well, then you can pay for the tool’s services (or find Open source projects :D).

Previously published at https://blog.maddevs.io/how-to-make-three-paid-eta-services-one-free-6edc6affface