DSPs resolve one of the biggest challenges of programmatic advertising. The publisher wants to charge for each ad being shown, while the advertiser wants to pay only under certain conditions—like purchases.
A typical price for 1000 ad impressions is $1—$10, while purchases occur only in 10-100 cases in a million of shown ads. This can be resolved by computing bids in ad auctions—and this is what DSP does.
Plainly speaking, DSP computes bids and buys impressions for its own money, and advertisers pay DSP in case of purchases.
You can measure the success of DSP as a business in terms of margin: it is money spent minus money gained. If DSP is buying impressions cheaper than it is selling conversions to the advertiser, it is profitable and successful. If not — it is making losses.
The time gap between showing impressions and receiving (or not) conversion is usually about 1 day. So, the feedback loop for DSP is very quick, and within one day, it is clear if it is operating with a positive margin.
In our oversimplified view, the bid is computed based on the predicted purchase probability:
bid = E advertiser_payout = CPA * p(purchase | user, ad, page)
Where p() is the probability of the user making a purchase after seeing a particular ad on a particular page, and CPA is how much the advertiser is willing to pay DSP for each purchase, generated by ads.
Predicting purchase probability for the user+ad+page combination is called Ads ranking. At first glance, this is a textbook-supervised machine learning problem:
training dataset:
consists of impressions
for each impression, we record ML features computed for the user, ad, page, and their combinations
as a training label, we set 1 if the purchase happened due to this ad and 0 otherwise
ML model:
Training loss:
Indeed, ad ranking can be a very good example for a lot of foundational ML techniques. However, in real life, it implies many challenges:
There are also several ML techniques that can be used:
There are other mathematical problems that DSPs have to solve — I will mention them here and describe them in more detail in later chapters.
For example, sometimes an advertiser provides a list of its existing clients to DSP and instructs DSP to bid on “similar” users. In this case, a DSP has to build machine learning models that compute the similarity between each potential user and the list of desired users. This is called Lookalike targeting.
In reality, DSP’s bid formula is more complicated. First of all, that’s because a DSP has to solve a multivariate optimization problem. For example: for a given budget of $X and within a given time of Y days, a DSP must bring as many purchases as possible, but sustain a % of clicks on ads higher than Z%.
The type of bidding that has to meet the requirements is called pacing because often, the main restriction after budget is time. Under such conditions, spending the budget earlier or later than the predefined date will be a mistake.
Finally, DSP uses countering solutions for all revenue maximization techniques of the supply side. For example, against the reserve price optimization, changes in auction mechanics, and some other tools, a DSP will use bid shading. Bid shading is computing minimum bids that would still be the auction but at the lowest price.
Another example: to counter bid request duplication by the supply side, a DSP will use bid caching or a deduplication solution.
Advertisers fund the whole online advertising ecosystem and take the most risk. Most advertisers’ problems are of a business nature, and not of computational. I have outlined three major challenges below.
One big challenge for the advertiser is the misalignment between formally set up optimized goals in online advertising campaigns and the advertiser’s real desired business outcomes.
All the complicated ML and mathematical machinery described above optimizes for some particular online events happening - for example, the user clicking on the ad, the user filling a test drive form, the user enrolling in a free trial membership, or even the user buying something in the online shop.
However, an advertiser’s real business goals might be based on different events: a user making the most purchases within the next year, a user buying a car, or a user extending a trial membership onto a paid membership.
There are solid business reasons for this misalignment, but as a result, the whole complicated precise optimization within DSP and the whole ecosystem ends up optimizing for slightly the wrong thing.
It is up to the advertiser to figure out how to set up online advertising campaigns that optimize for the wrong thing, but yet bring the right results.
The second challenge is measuring advertising effectiveness. This is especially difficult for larger and better-known advertisers. There are three aspects to consider:
It is a complicated mathematical task to understand how much each marketing channel, including users’ natural intent, contributed to the sales - what was the incremental effectof each advertising channel - and how to re-distribute the advertising budgets in the future to maximize the effect.
Another challenge for the advertiser is uncertainty. DSPs often use the advertisers’ money, offering them vague guarantees of bringing “as many as possible” purchases for the budget. Advertisers have to rely on their intuition when allocating the advertising budget and setting expectations about the result.
It is also the advertiser's job to try various ways to run advertising campaigns, try out new ad images, and continuously learn what works best.
DSPs solve the most technically and mathematically challenging problems in the whole ecosystem, and they are in a continuous arms race not only against the supply side, but also against other DSPs - so a DSP has to evolve quickly.
But the feedback loop is very short, and the mathematical setup is rigorous enough, so it is relatively easy to measure the efficiency of each innovation and iterate.
In some ways, DSPs business is close to stock trading or betting. Technical solutions are always evolving and always improving, and all the modern ideas in technology and Machine Learning are most definitely tried out in online advertising.
Moreover, arguably DSP creates the most additional value for the ecosystem by de-facto solving the allocation problem: deciding which users get to see which ads.
Optimal ad allocation means that the user sees the most relevant ads and enjoys buying the advertised products, advertisers get the most clients for the least advertising spend, and publishers get the highest payout for the same users and ad inventory.
Like a very experienced salesman, DSP decides how to approach each particular user and what kind of product to show them.
Advertiser’s business deals with more risk and uncertainty and longer feedback loops. Moreover, the advertiser stands between the two worlds: the world of physical goods manufacturing and money movement, and the world of abstract mathematics of online ads reporting numbers.
It is the advertiser who has to deal with ML systems that might optimize well, but for the wrong goal and in an unstable and unexplainable way - or do not optimize well but have a lot of tweaks to try.
And we should not forget that the entire online advertising ecosystem is built on the advertiser's capacity to manufacture goods and is funded by the advertiser’s marketing budgets.
Now that we have looked at all the participants of the online advertising ecosystem on a general scale, in the next article, I will focus on the common terminology used in online advertising and typical performance metrics in the industry. With that, we will finish the introduction series and will start zooming in on particular topics.