Data Collection Part 2 - Back-End

Context

Our customers often ask what data is collected by Kameleoon, and for which purposes. Given the current data privacy frameworks in place in Europe (mainly GDPR), and those starting to appear in the United States, it's important to understand precisely what elements of your visitors' data are saved into our data stores. Developers also sometimes wonder where exactly it is stored and how it is retrieved. This article serves both as a reference for listing all collected data, and as a guide providing technical details about the underlying implementations and databases. We focus here on the back-end side, where data are stored for long term periods and retrieved to provide analytic reports or, in some cases, data synchronization with the front end for activation purposes.

List of collected data

The following data are collected and stored on our back-end servers, for each visit made by an individual visitor on your website.

visitor code (unique Kameleoon identifier for this visitor);
visit number (if several visits were made by the same visitor);
device type (mobile, tablet or desktop);
operating system (Windows, Mac OS X, Linux...);
name and version of the browser;
screen size;
window size;
time zone of the browser;
language of the browser;
original referrer (acquisition channel);
number of pages viewed;
time spent on the website;
time of the beginning and end of the visit;
number of opened tabs;
adblocker activated;
list of conversions (clicks, transactions, etc);
list of personalizations and A/B experiments seen by the visitor;
current weather conditions (if the corresponding targeting condition is activated): temperature, wind, rain...;
time of sunset (present if a weather targeting condition has been activated, as it is required for some weather targeting criteria);
weather forecast (if the corresponding targeting condition is activated): temperature, wind, rain...;
geolocation (only if the corresponding targeting condition is activated or if a weather targeting condition has been activated, as these targeting criteria require geolocation data);
internal search history (only if activated);
products seen (only if activated);
title and URL of pages visited;
custom data;
external segmentation data (obtained from a third party DMP or CRM).

List of used databases and frameworks

Currently, we use the following NoSQL databases and technologies on our data flow architecture:

Hadoop File System (along with Spark);
Cassandra;
Elasticsearch;
Kafka.