Antenna SQL Tutorial: Custom Attribution Model

👍

This is Part 2 of the Antenna SQL Tutorial. In Part 1 we calculated custom Reach and Frequency metrics.

Beeswax Reporting has built in attribution models today, but based on your advertising strategy you may seek a more custom solution for assigning credit for a conversion across line items.

With Antenna, all the log data is available and it’s possible to develop a more complicated attribution model. We’ll build a rudimentary time-decay model to show how this could work.

Time-decay attribution models split credit across multiple touch-points, but more favorably weigh credit towards more recent touchpoints. For the purposes of example, we won’t differentiate between view-through and click-through conversions and weigh them equally. Thus, the major factor in building our time-decay model will be distance from the conversion. To represent this, we’ll use a formula where an impression’s credit for attributing a conversion halves every 7 days.

This formula is expressed as 2^(-x/7) where x is the number of days prior to the conversion event. For example, 2^-1 = 2^(-7/7) = 0.5 and 2^-2 = 2^(-14/7) = 0.25. We can use these as weightings to determine which impressions should receive more or less credit.

We’ll begin by pulling the time between a conversion and all of the associated user’s impressions. Our end goal will be to see conversion by line item, so we’ll also pull the line item’s ID as well. I’m going to use the WITH keyword to create a Common Table Expression, so I can use these results later in the query -- this is a reusable subquery you can refer to by name in your query (the same result could be achieved with a subquery). If you’d like to learn more about CTEs, you can read more here.

WITH time_decay_diffs AS (
SELECT
	conversion_id,
	imps.line_item_id,
	DATEDIFF('DAY', convs.rx_time_ltz, imps.bid_time_utc) AS days_diff
FROM shared_conversions_view AS convs
JOIN shared_impression_details_view AS imps
  ON convs.user_id = imps.user_id
  -- This part is less intuitive, but we only want to join impressions that happened before the conversion otherwise it didn't really have an effect.
  AND convs.rx_time_ltz > imps.bid_time_utc
WHERE
  -- We're going to look at all conversions that occurred in the last week.
  convs.rx_time_ltz >= CURRENT_DATE() - 7
  -- And we're going to set the lookback window for a conversion as any impression that happened at most 30 days before the Conversion.
  AND imps.bid_time_utc >= TO_DATE(convs.rx_time_ltz) - 30)

We now have the number of days between an impression and conversion for every conversion that occurred. Note that this is expressed as a negative number, so it’s already set to be plugged into our formula later. It’s also worth noting that for a given conversion (conversion_id), multiple impressions can receive credit. This is fundamental to our model, so you may see multiple rows per conversion_id here.

Now we’re going to use that day difference value to assign a time decay score to each impression using the formula we previously outlined.

To find out what % of a conversion an impression receives credit for using our model, we’re going to take each time_decay_score’s weight and find out how it compares across all the time_decay_scores for a conversion. To do this we’ll use a window function to sum all the values across a conversion. You can find more information on window functions here.

SELECT
    line_item_id,
    conversion_id,
    POWER(2, days_diff / 7) AS time_decay_score,
    POWER(2, days_diff / 7)/(SUM(time_decay_score) OVER(PARTITION BY conversion_id)) AS pct_of_conversion
FROM time_decay_diffs

We’re close! The last step is adding up all the partial conversions to find out how many conversions each line item can get credit for. Alternatively, we could’ve taken this percentage per conversion and multiplied it by the conversion value to determine a dollar amount each impression receives credit for generating.

SELECT
	line_item_id, 
	SUM(pct_of_conversion) AS weighted_conversions
FROM
(SELECT
    line_item_id,
    conversion_id,
    POWER(2, days_diff / 7) AS time_decay_score,
    POWER(2, days_diff / 7)/(SUM(time_decay_score) OVER(PARTITION BY conversion_id)) AS pct_of_conversion
FROM time_decay_diffs)
GROUP BY line_item_id