8.4 KiB
regina - website analytics
Ruling Empress Generating In-depth Nginx Analytics
Description
regina
is a python program that generates analytics for a static webpage serverd with nginx.
regina
is easy to deploy and privacy respecting:
- it collects the data from the nginx logs: no javascript/changes to your website required
- data is stored on your webserver in a sqlite database, nothing goes to any cloud
It parses the log and stores the important data in an sqlite database. It can then create an analytics html page that has lots of useful plots and numbers.
Capabilities
Statistics
regina
can generate the following statistics:
- visitor count history
- request count history
- referrer ranking (from which site people visit)
- route ranking (accessed files)
- browser ranking
- platform ranking (operating systems)
- city ranking (where your site visitors are from)
- country ranking
- mobile visitor percentage
- detect if a visitor is likely to be human or a bot
All of those plots and numbers can be generated for the last x days (you can set x yourself) and for all times.
Visualization
regina
can use the data above to generate a static analytics page in a single html file.
The visitor and ranking histories are included as plots.
You can view an example page here
If that is not enough for you, you can write your own script and use data exported by regina or access the database directly.
Getting started
Dependencies
- nginx: You need a nginx webserver that outputs the access log in the
combined
format, which is the default - sqlite >= 3.37
- python >= 3.10
- python-matplotlib
Installation
You can install regina with python-pip:
git clone https://github.com/MatthiasQuintern/regina.git
cd regina
python3 -m pip install .
You can also install it system-wide using sudo python3 -m pip install .
If you also want to install the man-page and the zsh completion script:
sudo cp regina.1.man /usr/share/man/man1/regina.1
sudo gzip /usr/share/man/man1/regina.1
sudo cp regina/package-data/_regina.compdef.zsh /usr/local/share/zsh/site-functions/_regina
sudo chmod +x /usr/local/share/zsh/site-functions/_regina
Configuration
The following instructions assume you have an nginx webserver configured for a website like this, with /www
as root (/
):
/www
|-- resources
| |-- image.jpg
|-- index.html
By default, nginx will generate logs in the combined
format with the name access.log
in /var/log/nginx/
and rotate them daily.
Copy the default configuration and template from the git directory to a directory of your choice, in this case ~/.config/regina
If you did clone the git repo, the files should be in /usr/local/lib/python3.11/site-packages/regina/package-data/
.
mkdir ~/.config/regina
cp regina/package-data/default.cfg ~/.config/regina/regina.cfg
cp regina/package-data/template.html ~/.config/regina/template.html
Now edit the configuration to fit your needs. For our example:
[regina]
server_name = my_server.com
access_log = /var/log/nginx/access.log.1
...
[html-generation]
html_out_path = /www/analytics/analytics.html
img_location = /img
[plot-generation]
img_out_dir = /www/analytics/img
Most defaults should be fine. The default configuration should also be documented well enough for you to know what do do. It is strongly recommended to only use absolute paths.
Now you fill collect the data from the nginx log specified as access_log
in the configuration into the database specified at the database
location (or ~/.local/share/regina/my-server.com.db
if left blank):
regina --config ~/.config/regina/regina.cfg --collect
To visualize the data, run:
regina --config ~/.config/regina/regina.cfg --visualize
This will generate plots and statistics and replace all variables in template_html
and output the result to html_out_path
.
If html_out_path
is in your webroot, you should now be able to access the generated site.
In our example, /www
will look like this:
/www
|-- analytics
| |-- analytics.html
| |-- img
| |-- ranking_referer_total.svg
| |-- ranking_referer_last_x_days.svg
| ...
|-- resources
| |-- image.jpg
|-- index.html
Automation
You will probably run regina
once per day, after nginx
has filled the daily access log. The easiest way to that is using a cronjob.
Run crontab -e
and enter:
10 0 * * * /usr/bin/regina --config /home/myuser/.config/regina/regina.cfg --collect --visualize
This assumes, you installed regina
system-wide.
Now the regina
command will be run every day, ten minutes after midnight.
After each day, rotates the logs, so access.log
becomes access.log.1
.
Since regina
is run after the log rotation, you will probably want to run it on access.log.1
.
Logfile permissions
By default, nginx
logs are -rw-r----- root root
so you can not access them as user.
You could either run regina as root, which I strongly do not recommend or make a root-cronjob that changes ownership of the log after midnight.
Run sudo crontab -e
and enter:
9 0 * * * chown your-username /var/log/nginx/access.log.1
This will make you the owner of the log 9 minutes after midnight, just before regina
needs read access.
GeoIP
regina
can show you from which country or city a visitor is from, but you will need an ip2location database.
You can acquire such a database for free at ip2location.com (and probably some other sites as well!).
After creating create an account you can download several different databases in different formats.
For regina
, download the IP-COUNTRY-REGION-CITY
for IPv4 as csv.
To configure regina to use the GeoIP database, edit get_visitor_location
and get_cities_for_contries
in section data-collection
.
By default, regina
only tells you which country a user is from.
Append the two-letter country codes for countries you are interested in to the get_cities_for_contries
option.
After that, add the GeoIP-data into your database:
regina --config regina.cfg --update-geoip path-to-csv
Depending on how many countries you specified, this might take a long time. You can delete the csv
afterwards.
Customization
Generated html
The generated file does not need to be an html. The template can be any text file.
regina
will only replace certain words starting with a %
.
You can see all supported variables and their values by running --visualize
with debug_level = 1
.
Data export
If you want to further process the data generated by regina, you can export the data by setting the data_out_dir
in the data-export
section.
The data can be exported as csv
or pkl
.
If you choose pkl
as file type, all rankings will be exported as python type list[tuple[int, str]]
.
Database
You can of course work directly with the database, as long as it is not altered. Editing, adding or deleting entries might make the database incompatible with regina, so only do that if you know what you are doing. Just querying entries will be fine though.
Troubleshooting
General
If you are having problems, try setting the debug_level
in section debug
of the configuration file to a non-zero value.
sqlite3.OperationalError: near "STRICT": syntax error
Your sqlite3 version is probably too old. Check with sqlite3 --version
. regina
requires 3.37 or higher.
Hotfix: Remove all STRICT
s from <python-dir>/site-packages/regina/sql/create_db.sql
.
Changelog
1.1 (2023-05-17)
- Improved database format:
- put referrer, browser and platform in own table to reduce size of the database
- route groups now part of visualization, not data collection
- Data visualization now uses more sql for improved performance
- Refactored codebase
- Bug fixes
- Changed setup.py to pyproject.toml
1.0 (2022-12-14)
- Initial release
Copyright
Copyright © 2022 Matthias Quintern. License GPLv3+: GNU GPL version 3 https://gnu.org/licenses/gpl.html.
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.