6,734 reads

The Programmatic Advertising Ecosystem: Demand-side Challenges

by Oleg TishutinOctober 17th, 2023

Too Long; Didn't Read

This article describes ML challenges of Demand side businesses in programmatic advertising ecosystem. For DSP: - ads ranking - pacing - lookalike targeting - bid shading For advertiser: - optimization goal misalignment - measuring advertising effectiveness - dealing with uncertainty

People Mentioned

featured image - The Programmatic Advertising Ecosystem: Demand-side Challenges

DSP

DSPs resolve one of the biggest challenges of programmatic advertising. The publisher wants to charge for each ad being shown, while the advertiser wants to pay only under certain conditions—like purchases.

A typical price for 1000 ad impressions is $1—$10, while purchases occur only in 10-100 cases in a million of shown ads. This can be resolved by computing bids in ad auctions—and this is what DSP does.

Plainly speaking, DSP computes bids and buys impressions for its own money, and advertisers pay DSP in case of purchases.

You can measure the success of DSP as a business in terms of margin: it is money spent minus money gained. If DSP is buying impressions cheaper than it is selling conversions to the advertiser, it is profitable and successful. If not — it is making losses.

The time gap between showing impressions and receiving (or not) conversion is usually about 1 day. So, the feedback loop for DSP is very quick, and within one day, it is clear if it is operating with a positive margin.

In our oversimplified view, the bid is computed based on the predicted purchase probability:

bid = E advertiser_payout = CPA * p(purchase | user, ad, page)

Where p() is the probability of the user making a purchase after seeing a particular ad on a particular page, and CPA is how much the advertiser is willing to pay DSP for each purchase, generated by ads.

Predicting purchase probability for the user+ad+page combination is called Ads ranking. At first glance, this is a textbook-supervised machine learning problem:

training dataset:
- consists of impressions
- for each impression, we record ML features computed for the user, ad, page, and their combinations
- as a training label, we set 1 if the purchase happened due to this ad and 0 otherwise
ML model:
- any models from linear regression to gradient-boosted decision trees to neural networks—try and see whichever works best
Training loss:
- the model should minimize the log-likelihood metric on the training dataset

Indeed, ad ranking can be a very good example for a lot of foundational ML techniques. However, in real life, it implies many challenges:

Ads ranking has to be very precise, quick, and cost-efficient.

Data from hundreds of millions of users and millions of ads has to be processed, stored, and retrieved.

A lot of ads (millions per second) need to be ranked in real-time.

There are also several ML techniques that can be used:

Traditional supervised learning can be augmented with non-supervised, for example, clustering, to generate features.

Reinforcement learning can be used to break feedback loops and improve exploration of the models.

Transfer learning is used to extrapolate predictions to users who opted out of online tracking and did not provide their data for ads models’ training.

There are other mathematical problems that DSPs have to solve — I will mention them here and describe them in more detail in later chapters.

For example, sometimes an advertiser provides a list of its existing clients to DSP and instructs DSP to bid on “similar” users. In this case, a DSP has to build machine learning models that compute the similarity between each potential user and the list of desired users. This is called Lookalike targeting.

In reality, DSP’s bid formula is more complicated. First of all, that’s because a DSP has to solve a multivariate optimization problem. For example: for a given budget of $X and within a given time of Y days, a DSP must bring as many purchases as possible, but sustain a % of clicks on ads higher than Z%.

The type of bidding that has to meet the requirements is called pacing because often, the main restriction after budget is time. Under such conditions, spending the budget earlier or later than the predefined date will be a mistake.

Finally, DSP uses countering solutions for all revenue maximization techniques of the supply side. For example, against the reserve price optimization, changes in auction mechanics, and some other tools, a DSP will use bid shading. Bid shading is computing minimum bids that would still be the auction but at the lowest price.

Another example: to counter bid request duplication by the supply side, a DSP will use bid caching or a deduplication solution.

Advertisers

Advertisers fund the whole online advertising ecosystem and take the most risk. Most advertisers’ problems are of a business nature, and not of computational. I have outlined three major challenges below.

One big challenge for the advertiser is the misalignment between formally set up optimized goals in online advertising campaigns and the advertiser’s real desired business outcomes.

All the complicated ML and mathematical machinery described above optimizes for some particular online events happening - for example, the user clicking on the ad, the user filling a test drive form, the user enrolling in a free trial membership, or even the user buying something in the online shop.

However, an advertiser’s real business goals might be based on different events: a user making the most purchases within the next year, a user buying a car, or a user extending a trial membership onto a paid membership.

There are solid business reasons for this misalignment, but as a result, the whole complicated precise optimization within DSP and the whole ecosystem ends up optimizing for slightly the wrong thing.

It is up to the advertiser to figure out how to set up online advertising campaigns that optimize for the wrong thing, but yet bring the right results.

The second challenge is measuring advertising effectiveness. This is especially difficult for larger and better-known advertisers. There are three aspects to consider:

An advertiser runs ad campaigns through multiple channels, such as search engines, social networks, news websites, TV, radio, and billboards.

People might have some intention to buy advertiser’s goods even without seeing ads.

Sometimes, seeing too many ads for some products may make people buy less.

It is a complicated mathematical task to understand how much each marketing channel, including users’ natural intent, contributed to the sales - what was the incremental effectof each advertising channel - and how to re-distribute the advertising budgets in the future to maximize the effect.

Another challenge for the advertiser is uncertainty. DSPs often use the advertisers’ money, offering them vague guarantees of bringing “as many as possible” purchases for the budget. Advertisers have to rely on their intuition when allocating the advertising budget and setting expectations about the result.

It is also the advertiser's job to try various ways to run advertising campaigns, try out new ad images, and continuously learn what works best.

Discussion

DSPs solve the most technically and mathematically challenging problems in the whole ecosystem, and they are in a continuous arms race not only against the supply side, but also against other DSPs - so a DSP has to evolve quickly.

But the feedback loop is very short, and the mathematical setup is rigorous enough, so it is relatively easy to measure the efficiency of each innovation and iterate.

In some ways, DSPs business is close to stock trading or betting. Technical solutions are always evolving and always improving, and all the modern ideas in technology and Machine Learning are most definitely tried out in online advertising.

Moreover, arguably DSP creates the most additional value for the ecosystem by de-facto solving the allocation problem: deciding which users get to see which ads.

Optimal ad allocation means that the user sees the most relevant ads and enjoys buying the advertised products, advertisers get the most clients for the least advertising spend, and publishers get the highest payout for the same users and ad inventory.

Like a very experienced salesman, DSP decides how to approach each particular user and what kind of product to show them.

Advertiser’s business deals with more risk and uncertainty and longer feedback loops. Moreover, the advertiser stands between the two worlds: the world of physical goods manufacturing and money movement, and the world of abstract mathematics of online ads reporting numbers.

It is the advertiser who has to deal with ML systems that might optimize well, but for the wrong goal and in an unstable and unexplainable way - or do not optimize well but have a lot of tweaks to try.

And we should not forget that the entire online advertising ecosystem is built on the advertiser's capacity to manufacture goods and is funded by the advertiser’s marketing budgets.

To be Continued

Now that we have looked at all the participants of the online advertising ecosystem on a general scale, in the next article, I will focus on the common terminology used in online advertising and typical performance metrics in the industry. With that, we will finish the introduction series and will start zooming in on particular topics.