Data Feeds
One of the most important considerations in working with an RTB bidder is designing the data pipeline by which auction, bid, and win events are processed and analyzed.
Unlike DSPs, with the Beeswax Bidder-as-a-Service technology you can access the full, unfiltered stream of RTB events, in much the same way you would if you were building a bidder from scratch.
Streaming vs Batch
The first consideration for designing your data pipeline is whether you prefer to get the data in batch form, or as a continuous stream. For the very high-volume data, like auction logs, Beeswax only supports batch data. For win logs (impressions) Beeswax supports both methods, and there are pros and cons to each:
Pipeline Method | Description | Pros | Cons |
---|---|---|---|
Batch | Hourly or daily files of data placed in an S3 bucket. | Fairly easy to ingest, fault tolerant | Delay in utilizing data. Also may include many files written per hour. |
Stream | Near real-time data in JSON or protobuf format sent over http or to AWS Kinesis | Use data as fast as you can process it | Higher cost and complexity to support data ingestion |
Data De-Duplication
Beeswax Data Infrastructure uses an "at least once delivery" design pattern to ensure all events are eventually delivered to customers. In certain scenarios this may mean that duplicative data is sent in logs to customers.
We always recommend de-duplicating your log-level data on auction_id or conversion_id in the case of conversion logs.
Data Definitions
Column definitions, protobuf mapping, field lists and a data dictionary can be found in the is publicly-accessible directory on Github: Beeswax Log File Header Definitions.
Beeswax makes multiple types of data available from Stinger as described in the chart below. Based on your use case you may need some, or all of this data. Because some of this data can be quite large, additional fees may apply (contact your Account Manager for more information).
A more comprehensive description of these log types and implementation details can be found in this publicly-accessible Readme on Github: Beeswax Log Summary.
Data Type | Description | Batch Field Manifest | Column Definitions and Protobuf Mapping |
---|---|---|---|
Auctions | The auction request from the exchange, normalized to OpenRTB fields. | auction_log_headers.csv | |
Bids | The bids returned from the Bidding Agent to the exchange, whether the auction was won or not. | bid_log_headers.csv | |
Conversions | The conversions recorded by Beeswax | conversion_log_headers.csv | |
Attributed Conversions | The conversions recorded by Beeswax, attributed back to an auction | attributed_conversion_log_headers.csv | |
IP Attributed Conversions | The IP conversions recorded by Beeswax, attributed back to an auction | attributed_ip_conversion_log_headers.csv | |
Losses | Loss logs provided by a limited number of exchanges (Google) | bid_response_feedback_logs.csv | |
Wins | The winning auctions (impressions), clicks, and events (video plays, etc) | win_log_headers.csv | ad_log.proto |
Segments | 1st party segment available on the auction | segment_log_headers.csv | |
Ghost Wins | The predicted winning auctions (Ghost Impressions) that have resulted from a Ghost Bid | ghost_win_log_headers.csv | |
Ghost Attributed Conversions | The conversion events that have been attributed back to a Ghost Impression | ghost_attributed_conversion_log_headers.csv | |
Ghost IP Attributed Conversions | The IP conversion events that have been attributed back to a Ghost Impression | ghost_attributed_ip_conversion_log_headers.csv |
Schema Changes
Our rapid development cycle and the needs of our customers mean we often add new fields with little forewarning. In order to deliver value as quickly as possible, we do not release log-level changes on a set release schedule and new fields may be added at any time.
As a result, we recommend your ingestion pipelines are setup to handle for the addition of new columns at any time to avoid disruption of service.
That said, we will make all efforts to inform customers of any breaking changes to the logic of existing fields. Similarly, the ordinal positions of fields and header names will not change. Deprecated fields will not be removed in order to preserve column positions.
Our documentation of new fields is typically updated day of release, and typically within 2-3 business days at most.
CSV Formatting Notes
When emitting logs in .csv file format, Beeswax escapes certain special characters. While we mostly follow the RFC-4180 standard for CSV files, there are some small deviations from the specification that we do not follow. Most notably, in Win, Attributed Conversion and Loss Logs we use \ as the escape character for embedded double quote (in contrast to escaping double quotes with an additional double quote character) and comma characters. Most standard CSV parsers will allow the escape character to be adjustable.
Additionally, Win Logs, Attributed Conversion Logs and Loss Logs always enclose fields in double quotes, while all other log types only enclose fields in double quotes to handle for commas in the value of the field.
Updated about 1 year ago