NAV
  • Self-Hosting and On-Premises Platform
  • Self-Hosting and On-Premises Platform

    Introduction

    Kameleoon is usually proposed as a SaaS (Software as a Service). This means that the data related to customers (experiments, personalizations, accounts...) is hosted on a common platform. About 60 physical servers (as of 2019) are used to support all the functional features of our software, such as script hosting, data collection, storage, creation of analytical reports... For every customer, data is shared on common servers and the separation between customer data is only logical. This means that of course, our platform prevents a customer from accessing another customer's data, but these restrictions are implemented in our application's code.

    Sometimes customers wish to use our software in self-hosting / on-premises mode. This is usually done for performance, data confidentiality or security reasons. Kameleoon fully supports the on-premises model and we're able to offer customers three different options to self-host our software. The first one allows customers to host the critical Kameleoon application file (and optionally, public resources such as images) on their own servers or CDN rather than Kameleoon's CDN. It's a quick and easy option (2-3 days). The second one consists in the setup of a dedicated data storage cluster and is of medium difficulty (1-2 weeks). The last one is the full on-premises configuration of Kameleoon where absolutely everything runs on dedicated servers. Setup time can vary depending on the customer's specific requirements but is usually in the range of 1-2 months.

    Application file & public resources self-hosting

    The simplest option to consider when thinking about using Kameleoon in On-Premises mode is the ability to self-host the application file. The Kameleoon application file can safely reside either on the Kameleoon CDN (default, SaaS setup) or on your own servers or CDN. The required configuration option can be set in the Kameleoon back-office, in the websites setup section. Three values are possible: no self-hosting at all, self-hosting only for application file, and full self-hosting, both for application file and other public resources (images).

    Application file self-hosting

    If you wish to self-host the Kameleoon application file, two steps are required:

    1/ You need to know the URL where the application file will be hosted on your side. Then you need to provide this URL in the installation tag (this results in a slightly modified installation tag compared with the default ones).

    For instance, let's say you are using the JavaScript File (Asynchronous Loading with Anti-Flicker) implementation method. The Kameleoon application file is by default hosted on //SITE_CODE.kameleoon.eu/kameleoon.js. You just need to change this URL in the installation tag, replacing it with your own URL (such as https://www.customerdomain.com/resources/scripts/kameleoon.js).

    2/ You need to implement synchronization between the file hosted on your servers / CDN and the original file generated by the Kameleoon platform. This is mandatory because the application file is not a static file, its contents change regularly (to be more precise, its contents change everytime an experiment or personalization changes its status on our platform, and also when some configuration changes are carried out). For instance, if you start a new experiment, or pause or stop a running one, contents of the application file will change.

    The appropriate way to perform this synchronization depends on your exact setup. For CDNs, they will have their own interface to configure this, which is CDN dependent and is out of the scope of this article. For standard web hosting on your own HTTP server such as nginx or Apache, we recommend a simple cron job that will perform a wget command to retrieve the appropriate file. We recommend running this job every 5 minutes.

    Once these two steps are completed, you are ready to use the Kameleoon platform with a self-hosted application file.

    The following example provides a ready to use installation tag and synchronization commands. Copying / pasting is usually enough to get it working.

    Example: Instructions for self-hosting of the Kameleoon application file

    <script type="text/javascript">
        // Duration in milliseconds to wait while the Kameleoon application file is loaded
        var kameleoonLoadingTimeout = 1000;
    
        var kameleoonQueue = kameleoonQueue || [];
        var kameleoonStartLoadTime = new Date().getTime();
        if (! document.getElementById("kameleoonLoadingStyleSheet") && ! window.kameleoonDisplayPageTimeOut)
        {
            var kameleoonS = document.getElementsByTagName("script")[0];
            var kameleoonCc = "* { visibility: hidden !important; background-image: none !important; }";
            var kameleoonStn = document.createElement("style");
            kameleoonStn.type = "text/css";
            kameleoonStn.id = "kameleoonLoadingStyleSheet";
            if (kameleoonStn.styleSheet)
            {
                kameleoonStn.styleSheet.cssText = kameleoonCc;
            }
            else
            {
                kameleoonStn.appendChild(document.createTextNode(kameleoonCc));
            }
            kameleoonS.parentNode.insertBefore(kameleoonStn, kameleoonS);
            window.kameleoonDisplayPage = function(fromEngine)
            {
                if (!fromEngine)
                {   
                    window.kameleoonTimeout = true;
                }
                if (kameleoonStn.parentNode)
                {
                    kameleoonStn.parentNode.removeChild(kameleoonStn);
                }
            };
            window.kameleoonDisplayPageTimeOut = window.setTimeout(window.kameleoonDisplayPage, kameleoonLoadingTimeout);
        }
    </script>
    <script type="text/javascript" src="//www.customerdomain.com/resources/scripts/kameleoon.js" async="true"></script>
    

    On the integration snippet, we changed the source of the script to our own URL: //www.customerdomain.com/resources/scripts/kameleoon.js. Below are working examples of synchronization commands.

    # wget command
    
    wget https://SITE_CODE.direct.kameleoon.eu/kameleoon.js -O /var/www/html/resources/scripts/kameleoon.js -b -T 30 -t 3
    
    # cron entry
    
    */5 * * * * wget https://SITE_CODE.direct.kameleoon.eu/kameleoon.js -O /var/www/html/resources/scripts/kameleoon.js -b -T 30 -t 3
    

    If you use cross-domain tracking, note that an additional static iFrame (https://www.customerdomain.com/path/to/kameleoon-iframe.html) has also to be self hosted. But this is the case even in the standard (SaaS) setup for this implementation method, and is already documented in the Advanced Integration Guide.

    Images self-hosting

    In addition to application file self-hosting, images uploaded via the Kameleoon platform can also be self hosted. If this option is chosen, the generated URLs for uploaded images will use your own server / CDN and not our main domain. The standard URL path for uploaded images is SITE_CODE.kameleoon.eu/images/ and resources there are served by a Content Delivery Network (the same one as the application file). With images self-hosting, URLs will be changed to the domain contained in the value entered in the back-office as the application file URL. If you entered https://server.mydomain.com/path/resources/kameleoon.js, the generated path will thus be server.mydomain.com/images/.

    Of course, for images self-hosting to work a synchronization mechanism must also be used. This is outside the scope of this article, as it's much more complex than synchronizing a single file as in the previous section. Several files have to be considered and the exact names and thus URLs of images uploaded cannot be known in advance.

    We recommend using images self-hosting only via a CDN, which usually has its own built-in mechanism for replication. Writing your own could be time consuming. The exact configuration to be performed depends on your exact CDN, but the general idea is to point your CDN to serve resources from the SITE_CODE.direct.kameleoon.eu origin domain.

    Dedicated clusters for Data Storage

    Using a separate cluster for data storage means that data collected for visitors on your website will no longer be stored with data from other Kameleoon users: it will be physically separated. It will reside on dedicated, separate servers. This offers the following advantages:

    The setup of a dedicated cluster for data storage consists in the installation of several open-source database systems. We use 4 main technologies. Two are mandatory (HDFS and Elasticsearch) for any setup. The two others (Cassandra and Redis) depend on your use of Kameleoon. If you only have the A/B Testing module, they are not needed, but if you use the Personalization module, you would need them.

    1. Hadoop File System (mandatory). All the data collection events are stored in HDFS. From this raw data, we can rebuild visits, which are then used in other scalable databases. As a result HDFS is seen as the main datastore / backup system / source of truth.

    2. Elasticsearch (mandatory). Elasticsearch is the indexing engine we use to create all analytical reports on our platform. With knowledge of the data model we use, you can also run your own custom queries for advanced analysis and reports.

    3. Cassandra (required for personalization and / or cross-device history reconciliation). Cassandra is used for various tasks, for instance computation of the Machine Learning models or cross-device history reconciliation.

    4. Redis (required for personalization and / or cross-device history reconciliation). Data that must be accessed dynamically by visitors on your website (for instance, number of views in a product page over the last hour) is saved in Redis.

    Here are the exact server requirements for the dedicated data cluster option.

    ComponentVersionMinimal number of serversOptimal number of serversRecommended amount of RAMServer Storage typeRemarks
    Hadoop File System2.9.12232 GBSpinning disks with large capacity (8TB or more)Replication of data is crucial, so 2 servers needed
    Elasticsearch6.51264 GBSSD recommended
    Cassandra3.11.11232 GBSSD mandatory
    Redis4.0.111264 GBDoes not matter

    We recommend using the latest version of CentOS Linux distribution for all components.

    Full On-Premises model (separated back-office, data collection and storage cluster, and application file hosting)

    To host the entirety of the Kameleoon platform, you need self-hosting of the application file, a dedicated data storage cluster, a dedicated data collection pipeline and a separated back-office. With this scenario, ALL the components and functionality of the Kameleoon platform are hosted on your own IT ecosystem. This allows for custom security policies, for instance it would be quite common in this case to setup a VPN with access restricted to corporate workstations.

    For the data collection pipeline, we use Kafka. The Back-Office application itself runs on a Tomcat JEE server. It uses several other Java standalone applications that communicate through an instance of ActiveMQ. We use MySQL as a relational database for the back-office. And nginx is required as a high performance HTTP server to collect data events sent by browsers (beacon HTTP calls).

    Here are the exact server requirements for the dedicated data pipeline and back-office. In addition to that, you need to provide servers for the storage cluster detailed in the previous section, and either a CDN or hosting server (first section).

    ComponentVersionMinimal number of serversOptimal number of serversRecommended amount of RAMServer Storage typeRemarks
    JDK / Tomcat / ActiveMQ1.8 / 8.0.47 / 5.14.51132 GBSSD recommendedTomcat JEE server and standalone Java applications are collocated
    MySQL5.5.561132 GBSSD recommended
    nginx1.17.31232 GBSpinning disksA proprietary Java log parsing application will also be installed on these nodes
    Kafka1.12232 GBSpinning disks with large capacity (8TB or more)Confluent distribution - version 4.1

    We recommend using the latest version of CentOS Linux distribution for all components. The Back-office application is provided as a WAR file which has to be hosted on the Tomcat server. Other Java modules (standalone applications) are provided as JAR files.

    FAQ

    Q. Is it possible to encrypt data in case of dedicated data-storage clusters?

    A. Yes, we can encrypt the partitions where the data will be stored. This option implies an additional setup cost.