Data Feeds

One of the most important considerations in working with an RTB bidder is designing the data pipeline by which auction, bid, and win events are processed and analyzed.

Unlike DSPs, with the Beeswax Bidder-as-a-Service technology you can access the full, unfiltered stream of RTB events, in much the same way you would if you were building a bidder from scratch.

Streaming vs Batch

The first consideration for designing your data pipeline is whether you prefer to get the data in batch form, or as a continuous stream. For the very high-volume data, like auction logs, Beeswax only supports batch data. For win logs (impressions) Beeswax supports both methods, and there are pros and cons to each:

Pipeline MethodDescriptionProsCons
BatchHourly or daily files of data placed in an S3 bucket.Fairly easy to ingest, fault tolerantDelay in utilizing data. Also may include many files written per hour.
StreamNear real-time data in JSON or protobuf format sent over http or to AWS KinesisUse data as fast as you can process itHigher cost and complexity to support data ingestion

🚧

Data De-Duplication

Beeswax Data Infrastructure uses an "at least once delivery" design pattern to ensure all events are eventually delivered to customers. In certain scenarios this may mean that duplicative data is sent in logs to customers.

We always recommend de-duplicating your log-level data on auction_id or conversion_id in the case of conversion logs.

Data Definitions

Column definitions, protobuf mapping, field lists and a data dictionary can be found in the is publicly-accessible directory on Github: Beeswax Log File Header Definitions.

Beeswax makes multiple types of data available from Stinger as described in the chart below. Based on your use case you may need some, or all of this data. Because some of this data can be quite large, additional fees may apply (contact your Account Manager for more information).

A more comprehensive description of these log types and implementation details can be found in this publicly-accessible Readme on Github: Beeswax Log Summary.

Data TypeDescriptionBatch Field ManifestColumn Definitions and Protobuf Mapping
AuctionsThe auction request from the exchange, normalized to OpenRTB fields.auction_log_headers.csv
BidsThe bids returned from the Bidding Agent to the exchange, whether the auction was won or not.bid_log_headers.csv
ConversionsThe conversions recorded by Beeswaxconversion_log_headers.csv
Attributed ConversionsThe conversions recorded by Beeswax, attributed back to an auctionattributed_conversion_log_headers.csv
IP Attributed ConversionsThe IP conversions recorded by Beeswax, attributed back to an auctionattributed_ip_conversion_log_headers.csv
LossesLoss logs provided by a limited number of exchanges (Google)bid_response_feedback_logs.csv
WinsThe winning auctions (impressions), clicks, and events (video plays, etc)win_log_headers.csvad_log.proto
Segments1st party segment available on the auctionsegment_log_headers.csv
Ghost WinsThe predicted winning auctions (Ghost Impressions) that have resulted from a Ghost Bidghost_win_log_headers.csv
Ghost Attributed ConversionsThe conversion events that have been attributed back to a Ghost Impressionghost_attributed_conversion_log_headers.csv
Ghost IP Attributed ConversionsThe IP conversion events that have been attributed back to a Ghost Impressionghost_attributed_ip_conversion_log_headers.csv

Schema Changes

Our rapid development cycle and the needs of our customers mean we often add new fields with little forewarning. In order to deliver value as quickly as possible, we do not release log-level changes on a set release schedule and new fields may be added at any time.

As a result, we recommend your ingestion pipelines are setup to handle for the addition of new columns at any time to avoid disruption of service.

That said, we will make all efforts to inform customers of any breaking changes to the logic of existing fields. Similarly, the ordinal positions of fields and header names will not change. Deprecated fields will not be removed in order to preserve column positions.

Our documentation of new fields is typically updated day of release, and typically within 2-3 business days at most.

CSV Formatting Notes

When emitting logs in .csv file format, Beeswax escapes certain special characters. While we mostly follow the RFC-4180 standard for CSV files, there are some small deviations from the specification that we do not follow. Most notably, in Win, Attributed Conversion and Loss Logs we use \ as the escape character for embedded double quote (in contrast to escaping double quotes with an additional double quote character) and comma characters. Most standard CSV parsers will allow the escape character to be adjustable.

Additionally, Win Logs, Attributed Conversion Logs and Loss Logs always enclose fields in double quotes, while all other log types only enclose fields in double quotes to handle for commas in the value of the field.


What’s Next