6.4 KiB
% REGINA(1) regina 1.1 % Matthias Quintern % April 2022
NAME
regina - Ruling Empress Generating In-depth Nginx Analytics (obviously)
SYNOPSIS
| regina --config CONFIG_FILE [OPTION...]
DESCRIPTION
Regina is an analytics tool for nginx. It collects information from the nginx access.log and stores it in a sqlite3 database. Regina supports several data visualization configurations and can generate an admin-analytics page from an html template file.
Command line options
- -h, --help
- Show the the possible command line arguments
- -c, --config config-file
- Retrieve settings from the config-file
- --access-log log-file
- Overrides the access_log from the configuration
- --collect
- Collect information from the access_log and store them in the databse
- --visualize
- Visualize the data from the database
- --update-geoip geoip-db
- Recreate the geoip part of the database from the geoip-db csv. The csv must have this form: lower, upper, country-code, country-name, region, city
GETTING STARTED
Dependencies
- nginx: You need a nginx webserver that outputs the access log in the
combined
format, which is the default - Python 3.10
- Python/matplotlib
Installation
You can install regina with python-pip:
git clone https://github.com/MatthiasQuintern/regina.git
cd regina
python3 -m pip install .
You can also install it system-wide using sudo python3 -m pip install .
If you also want to install the man-page and the zsh completion script:
sudo cp regina.1.man /usr/share/man/man1/regina.1
sudo gzip /usr/share/man/man1/regina.1
sudo cp regina/package-data/_regina.compdef.zsh /usr/share/zsh/site-functions/_regina
sudo chmod +x /usr/share/zsh/site-functions/_regina
Configuration
The following instructions assume you have an nginx webserver configured for a website like this, with /www
as root (/
):
/www
|-- resources
| |-- image.jpg
|-- index.html
By default, nginx will generate logs in the combined
format with the name access.log
in /var/log/nginx/
and rotate them daily.
Copy the default configuration and template from the git directory to a directory of your choice, in this case ~/.config/regina
If you did clone the git repo, the files should be in /usr/local/lib/python3.11/site-packages/regina/package-data/
.
mkdir ~/.config/regina
cp regina/package-data/default.cfg ~/.config/regina/regina.cfg
cp regina/package-data/template.html ~/.config/regina/template.html
Now edit the configuration to fit your needs. For our example:
[regina]
server_name = my_server.com
access_log = /var/log/nginx/access.log.1
...
[html-generation]
html_out_path = /www/analytics/analytics.html
img_location = /img
[plot-generation]
img_out_dir = /www/analytics/img
Most defaults should be fine. The default configuration should also be documented well enough for you to know what do do. It is strongly recommended to only use absolute paths.
Now you fill collect the data from the nginx log specified as access_log
in the configuration into the database specified at the database
location (or ~/.local/share/regina/my-server.com.db
if left blank):
regina --config ~/.config/regina/regina.cfg --collect
To visualize the data, run:
regina --config ~/.config/regina/regina.cfg --visualize
This will generate plots and statistics and replace all variables in template_html
and output the result to html_out_path
.
If html_out_path
is in your webroot, you should now be able to access the generated site.
In our example, /www
will look like this:
/www
|-- analytics
| |-- analytics.html
| |-- img
| |-- ranking_referer_total.svg
| |-- ranking_referer_last_x_days.svg
| ...
|-- resources
| |-- image.jpg
|-- index.html
Automation
You will probably run regina
once per day, after nginx
has filled the daily access log. The easiest way to that is using a cronjob.
Run crontab -e
and enter:
10 0 * * * /usr/bin/regina --config /home/myuser/.config/regina/regina.conf --collect --visualize
This assumes, you installed regina
system-wide.
Now the regina
command will be run every day, ten minutes after midnight.
After each day, rotates the logs, so access.log
becomes access.log.1
.
Since regina
is run after the log rotation, you will probably want to run it on access.log.1
.
Logfile permissions
By default, nginx
logs are -rw-r----- root root
so you can not access them as user.
You could either run regina as root, which I strongly do not recommend or make a root-cronjob that changes ownership of the log after midnight.
Run sudo crontab -e
and enter:
9 0 * * * chown your-username /var/log/nginx/access.log.1
This will make you the owner of the log 9 minutes after midnight, just before regina
needs read access.
GeoIP
regina
can show you from which country or city a visitor is from, but you will need an ip2location database.
You can acquire such a database for free at ip2location.com (and probably some other sites as well!).
After creating create an account you can download several different databases in different formats.
For regina
, download the IP-COUNTRY-REGION-CITY
for IPv4 as csv.
By default, regina
only tells you which country a user is from.
To see the individual cities for countries, append the two-letter country code to the get_cities_for_contries
option in the data-collection
section in the config file.
After that, oad the GeoIP-data into your database:
regina --config regina.conf --update-geoip path-to-csv
Depending on how many countries you specified, this might take a long time. You can delete the csv
afterwards.
CHANGELOG
1.1
- Improved database format:
- put referrer, browser and platform in own table to reduze size of the database
- route groups now part of visualization, not data collection
- Data visualization now uses more sql for improved performance
- Refactored codebase
- Bug fixes
- Changed setup.py to pyproject.toml
1.0
- Initial release
COPYRIGHT
Copyright © 2022 Matthias Quintern. License GPLv3+: GNU GPL version 3 https://gnu.org/licenses/gpl.html.
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.