Setting up Matomo Auto-Archiving in Docker

October 15, 2020 - 5 min read

TL;DR

Scroll to the end of the page and copy the full Dockerfile from there.

The base idea

Docker containers provide all the features a regular OS install would offer. Therefore the idea is that we take advantage of the cron feature built into Linux and add our own cron tasks inside the container. In order to add cron tasks we need to create our own container image using a Dockerfile. The required steps will depend on what base image you are using - meaning whether you use Alpine (clearly labeled) or Debian (all non-Alpine images and default).

Right now your Dockerfile should look something like this but feel free to use any variant found on the Matomo Docker Hub page - I will tell you what to watch out for.

FROM matomo:fpm-alpine

Installing Cron (Debian-based images only)

If you are using Debian you will need to install the cron package manually first. Because they are cleared by default you will need to update your package sources first and then you’ll be able to download cron.

Therefore you will need to add the following lines:

# Install Cron
RUN apt-get update
RUN apt-get install cron -y

Adding our Crontab entries

As I didn’t want to rely on external files I chose to add the tasks using commands. The instruction used might look strange at first so let’s break it up:

RUN echo "*/5 * * * * /usr/local/bin/php /var/www/html/console core:archive --url=https://analytics.example.com/" >> /etc/crontabs/root

Using the >> syntax we can pipe the output of a command into a file and append it. As we always want the same output we simply use the echo command to append a fixed string. The value behind the chevrons is the path of the file to which we want to append the output.

If you are using Debian use /etc/crontab instead of /etc/crontabs/root for the filepath instead. Also since we no longer specify what user should be used through the file path we will now have to put that information into the cron file itself. Make sure to apply these changes to the other entries as well! The result will look like this:

RUN echo "*/5 * * * * root /usr/local/bin/php /var/www/html/console core:archive --url=https://analytics.example.com/" >> /etc/crontab

The string we append consists of two parts: The cron schedule expression (e.g. */5 * * * * - this one executes every 5 minutes) and the command that should be executed (e.g. php console scheduled-tasks:run) based on that schedule. As cron expressions can be very confusing at first I recommend you play around with a generator like crontab guru.

Running the archiving script on a schedule

The Matomo archiving process can be initiated by instructing php to execute the script located at /var/www/html/console (no file extension required). In practice this looks like this:

$ /usr/local/bin/php /var/www/html/console core:archive --url=https://analytics.example.com/

Of course you’ll have to adjust the url argument to match the one of your Matomo installation. Also note that I entered the entire path to the php executable to be extra sure that it will execute properly.

If we take what we just learned and put it together we get the following instruction which sets up a crontab when building the container image that instructs PHP to execute the Matomo console script every 5 minutes and start the archiving process.

# Run archive script every 5 minutes
RUN echo "*/5 * * * * /usr/local/bin/php /var/www/html/console core:archive --url=https://analytics.example.com/" >> /etc/crontabs/root

Running all scheduled tasks on a schedule

Apart from the archiving script there is also a script to run all scheduled tasks. Those include for example sending emails.

# Run scheduled tasks every 20 minutes
RUN echo "*/20 * * * * /usr/local/bin/php /var/www/html/console scheduled-tasks:run" >> /etc/crontabs/root

Custom start command

Finally we need to add a custom start command as when using the default one the cron daemon would not be running. This first starts the cron daemon and then continues to start the main php process which will keep running in foreground.

# Start Cron and PHP
CMD crond && php-fpm

Important: The array syntax doesn’t work here so stick to the one shown above!

If you use a Debian base image you’ll need to replace crond with /etc/init.d/cron start and if you use Apache (aka the regular version of the image) a Debian base image make sure to replace php-fpm with apache2-foreground. Keep in mind that both of these changes may apply to you!

Final Dockerfile

Combining all of the previous commands and some extra spaces so it’s all aligned we end up with the following Dockerfile. Of course it will look slightly different if you use a different base image.

FROM matomo:fpm-alpine

# Run archive script every 5 minutes
RUN echo "*/5     *       *       *       *       /usr/local/bin/php /var/www/html/console core:archive --url=https://analytics.example.com/" >> /etc/crontabs/root

# Run scheduled tasks every 20 minutes
RUN echo "*/20    *       *       *       *       /usr/local/bin/php /var/www/html/console scheduled-tasks:run" >> /etc/crontabs/root

# Start Cron and PHP
CMD crond && php-fpm

Now build this image and replace the official image with it. You should have no problems doing so!

All that’s left to do is to go to your Matomo dashboard > General settings > Archiving settings and disable browser based archiving. In addition you might want to consider adjusting how often reports are going to get archived. If it’s higher than the interval in which the crontab is triggered this value is the limiting factor.

Screenshot of Archiving settings

Get updates straight into your inbox