From f3ea29757063b326c4d036010ee2ce5e63d2e26c Mon Sep 17 00:00:00 2001 From: "matthias@arch" Date: Wed, 17 May 2023 18:02:21 +0200 Subject: [PATCH] Updated readme --- README.md | 191 ++++++++++++++++++++++++++++----- regina.1.man | 295 +++++++++++++++++++++++++++++++++++++++++++++------ regina.1.md | 48 +++++++-- 3 files changed, 465 insertions(+), 69 deletions(-) diff --git a/README.md b/README.md index 7d52b51..b526390 100644 --- a/README.md +++ b/README.md @@ -1,32 +1,50 @@ # regina - nginx analytics tool **R**uling **E**mpress **G**enerating **I**n-depth **N**ginx **A**nalytics (obviously) -## Overview -Regina is an analytics tool for nginx. -It collects information from the nginx access.log and stores it in a sqlite3 database. -Regina supports several data visualization configurations and can generate an admin-analytics page from an html template file. +# regina - website analytics +**R**uling **E**mpress **G**enerating **I**n-depth **N**ginx **A**nalytics -## Command line options -**-h**, **--help** -: Show the the possible command line arguments +## Description +`regina` is a **python** program that generates ***analytics*** for a static webpage serverd with **nginx**. +`regina` is easy to deploy and privacy respecting: + - it collects the data from the nginx logs: no javascript/changes to your website required + - data is stored on your device in a **sqlite** database, nothing goes to any cloud +It parses the log and **stores** the important data in an *sqlite* database. +It can then create an analytics html page that has lots of useful **plots** and **numbers**. -**-c**, **--config** config-file -: Retrieve settings from the config-file +## Capabilities +### Statistics +`regina` can generate the following statistics: -**--access-log** log-file -: Overrides the access_log from the configuration + - visitor count history + - request count history + - referrer ranking *(from which site people visit)* + - route ranking *(accessed files)* + - browser ranking + - platform ranking *(operating systems)* + - city ranking *(where your site visitors are from)* + - country ranking + - mobile visitor percentage + - detect if a visitor is likely to be human or a bot -**--collect** -: Collect information from the access_log and store them in the databse +All of those plots and numbers can be generated for the **last x days** (you can set *x* yourself) and for **all times**. -**--visualize** -: Visualize the data from the database +### Visualization +`regina` can use the data above to generate a static analytics page in a single html file. +The visitor and ranking histories are included as plots. +You can view an example page [here](https://quintern.xyz/en/software/regina-example.html) +If that is not enough for you, you can write your own script and use data exported by regina or access the database directly. -**--update-geoip** geoip-db -: Recreate the geoip part of the database from the geoip-db csv. The csv must have this form: lower, upper, country-code, country-name, region, city +# Getting started -# Installation with pip -You can also install regina with python-pip: +## Dependencies +- **nginx**: You need a nginx webserver that outputs the access log in the `combined` format, which is the default +- **sqlite >= 3.37** +- **python >= 3.10** +- **python-matplotlib** + +## Installation +You can install regina with python-pip: ```shell git clone https://github.com/MatthiasQuintern/regina.git cd regina @@ -36,13 +54,138 @@ You can also install it system-wide using `sudo python3 -m pip install .` If you also want to install the man-page and the zsh completion script: ```shell -sudo cp regina.1.man /usr/share/man/man1/regina.1 -sudo gzip /usr/share/man/man1/regina.1 -sudo cp _regina.compdef.zsh /usr/share/zsh/site-functions/_regina -sudo chmod +x /usr/share/zsh/site-functions/_regina + sudo cp regina.1.man /usr/share/man/man1/regina.1 + sudo gzip /usr/share/man/man1/regina.1 + sudo cp regina/package-data/_regina.compdef.zsh /usr/local/share/zsh/site-functions/_regina + sudo chmod +x /usr/share/zsh/site-functions/_regina ``` -## 1.0 +## Configuration +The following instructions assume you have an nginx webserver configured for a website like this, with `/www` as root (`/`): +``` + /www + |-- resources + | |-- image.jpg + |-- index.html +``` +By default, nginx will generate logs in the `combined` format with the name `access.log` in `/var/log/nginx/` and rotate them daily. + +Copy the default configuration and template from the git directory to a directory of your choice, in this case `~/.config/regina` +If you did clone the git repo, the files should be in `/usr/local/lib/python3.11/site-packages/regina/package-data/`. +```shell + mkdir ~/.config/regina + cp regina/package-data/default.cfg ~/.config/regina/regina.cfg + cp regina/package-data/template.html ~/.config/regina/template.html +``` +Now edit the configuration to fit your needs. +For our example: +``` + [regina] + server_name = my_server.com + access_log = /var/log/nginx/access.log.1 + ... + [html-generation] + html_out_path = /www/analytics/analytics.html + img_location = /img + + [plot-generation] + img_out_dir = /www/analytics/img +``` +Most defaults should be fine. The default configuration should also be documented well enough for you to know what do do. +It is strongly recommended to only use absolute paths. + +Now you fill collect the data from the nginx log specified as `access_log` in the configuration into the database specified at the `database` location (or `~/.local/share/regina/my-server.com.db` if left blank): +``` + regina --config ~/.config/regina/regina.cfg --collect +``` + +To visualize the data, run: +``` + regina --config ~/.config/regina/regina.cfg --visualize +``` +This will generate plots and statistics and replace all variables in `template_html` and output the result to `html_out_path`. +If `html_out_path` is in your webroot, you should now be able to access the generated site. +In our example, `/www` will look like this: +``` + /www + |-- analytics + | |-- analytics.html + | |-- img + | |-- ranking_referer_total.svg + | |-- ranking_referer_last_x_days.svg + | ... + |-- resources + | |-- image.jpg + |-- index.html +``` + +### Automation +You will probably run `regina` once per day, after `nginx` has filled the daily access log. The easiest way to that is using a *cronjob*. +Run `crontab -e` and enter: +`10 0 * * * /usr/bin/regina --config /home/myuser/.config/regina/regina.cfg --collect --visualize` +This assumes, you installed `regina` system-wide. +Now the `regina` command will be run every day, ten minutes after midnight. +After each day, rotates the logs, so `access.log` becomes `access.log.1`. +Since `regina` is run after the log rotation, you will probably want to run it on `access.log.1`. + +#### Logfile permissions +By default, `nginx` logs are `-rw-r----- root root` so you can not access them as user. +You could either run regina as root, which I **strongly do not recommend** or make a root-cronjob that changes ownership of the log after midnight. +Run `sudo crontab -e` and enter: +`9 0 * * * chown your-username /var/log/nginx/access.log.1` +This will make you the owner of the log 9 minutes after midnight, just before `regina` needs read access. + + +## GeoIP +`regina` can show you from which country or city a visitor is from, but you will need an *ip2location* database. +You can acquire such a database for free at [ip2location.com](https://lite.ip2location.com/) (and probably some other sites as well!). +After creating create an account you can download several different databases in different formats. +For `regina`, download the `IP-COUNTRY-REGION-CITY` for IPv4 as *csv*. + +To configure regina to use the GeoIP database, edit `get_visitor_location` and `get_cities_for_contries` in section `data-collection`. +By default, `regina` only tells you which country a user is from. +Append the two-letter country codes for countries you are interested in to the `get_cities_for_contries` option. +After that, add the GeoIP-data into your database: +``` + regina --config regina.cfg --update-geoip path-to-csv +``` +Depending on how many countries you specified, this might take a long time. You can delete the `csv` afterwards. + + +# CUSTOMIZATION +## Generated html +The generated file does not need to be an html. The template can be any text file. +`regina` will only replace certain words starting with a `%`. +You can see all supported variables and their values by running `--visualize` with `debug_level = 1`. + +## Data export +If you want to further process the data generated by regina, you can export the data by setting the `data_out_dir` in the `data-export` section. +The data can be exported as `csv` or `pkl`. +If you choose `pkl` as filetype, all rankings will be exported as python type `list[tuple[int, str]]`. + +## Database +You can of course work directly with the database, as long as it is not altered. +Editing, adding or deleting entries might make the database incompatible with regina, so only do that if you know what you are doing. +Just querying entries will be fine though. + +# TROUBLESHOOTING +## General +If you are having problems, try setting the `debug_level` in section `debug` of the configuration file to a non-zero value. + +## sqlite3.OperationalError: near "STRICT": syntax error +Your sqlite3 version is probably too old. Check with `sqlite3 --version`. `regina` requires 3.37 or higher. +Hotfix: Remove all `STRICT`s from `/site-packages/regina/sql/create_db.sql`. + +# Cangelog +## 1.1 (2023-05-17) +- Improved database format: + - put referrer, browser and platform in own table to reduze size of the database + - route groups now part of visualization, not data collection +- Data visualization now uses more sql for improved performance +- Refactored codebase +- Bug fixes +- Changed setup.py to pyproject.toml +## 1.0 (2022-12-14) - Initial release # Copyright diff --git a/regina.1.man b/regina.1.man index 39a261e..0a89a38 100644 --- a/regina.1.man +++ b/regina.1.man @@ -1,4 +1,4 @@ -.\" Automatically generated by Pandoc 2.19.2 +.\" Automatically generated by Pandoc 3.0.1 .\" .\" Define V font for inline verbatim, using C font in formats .\" that render this, and otherwise B font. @@ -14,23 +14,28 @@ . ftr VB CB . ftr VBI CBI .\} -.TH "NICOLE" "1" "April 2022" "nicole 2.0" "" +.TH "REGINA" "1" "May 2023" "regina 1.1" "" .hy .SH NAME .PP -\f[B]R\f[R]uling \f[B]E\f[R]mpress \f[B]G\f[R]enerating +regina - \f[B]R\f[R]uling \f[B]E\f[R]mpress \f[B]G\f[R]enerating \f[B]I\f[R]n-depth \f[B]N\f[R]ginx \f[B]A\f[R]nalytics (obviously) -Regina is an analytics tool for nginx. +.SS Description +.PP +\f[V]regina\f[R] is a \f[B]python\f[R] program that generates +\f[B]\f[BI]analytics\f[B]\f[R] for a static webpage serverd with +\f[B]nginx\f[R]. +\f[V]regina\f[R] is easy to deploy and privacy respecting: - it collects +the data from the nginx logs: no javascript/changes to your website +required - data is stored on your device in a \f[B]sqlite\f[R] database, +nothing goes to any cloud It parses the log and \f[B]stores\f[R] the +important data in an \f[I]sqlite\f[R] database. +It can then create an analytics html page that has lots of useful +\f[B]plots\f[R] and \f[B]numbers\f[R]. .SH SYNOPSIS .PP \f[B]regina\f[R] \[em]-config CONFIG_FILE [OPTION\&...] -.SH DESCRIPTION -.PP -It collects information from the nginx access.log and stores it in a -sqlite3 database. -Regina supports several data visualization configurations and can -generate an admin-analytics page from an html template file. -.SS Command line options +.SH COMMAND LINE OPTIONS .TP \f[B]-h\f[R], \f[B]\[em]-help\f[R] Show the the possible command line arguments @@ -51,24 +56,20 @@ Visualize the data from the database Recreate the geoip part of the database from the geoip-db csv. The csv must have this form: lower, upper, country-code, country-name, region, city -.SH INSTALLATION AND UPDATING +.SH GETTING STARTED +.SS Dependencies +.IP \[bu] 2 +\f[B]nginx\f[R]: You need a nginx webserver that outputs the access log +in the \f[V]combined\f[R] format, which is the default +.IP \[bu] 2 +\f[B]sqlite >= 3.37\f[R] +.IP \[bu] 2 +\f[B]python >= 3.10\f[R] +.IP \[bu] 2 +\f[B]python-matplotlib\f[R] +.SS Installation .PP -To update regina, simply follow the installation instructions. -.SS pacman (Arch Linux) -.PP -Installing regina using the Arch Build System also installs the man-page -and a zsh completion script, if you have zsh installed. -.IP -.nf -\f[C] -git clone https://github.com/MatthiasQuintern/regina.git -cd regina -makepkg -si -\f[R] -.fi -.SS pip -.PP -You can also install regina with python-pip: +You can install regina with python-pip: .IP .nf \f[C] @@ -85,19 +86,245 @@ If you also want to install the man-page and the zsh completion script: .IP .nf \f[C] -sudo cp regina.1.man /usr/share/man/man1/regina.1 -sudo gzip /usr/share/man/man1/regina.1 -sudo cp _regina.compdef.zsh /usr/share/zsh/site-functions/_regina -sudo chmod +x /usr/share/zsh/site-functions/_regina + sudo cp regina.1.man /usr/share/man/man1/regina.1 + sudo gzip /usr/share/man/man1/regina.1 + sudo cp regina/package-data/_regina.compdef.zsh /usr/local/share/zsh/site-functions/_regina + sudo chmod +x /usr/share/zsh/site-functions/_regina \f[R] .fi +.SS Configuration +.PP +The following instructions assume you have an nginx webserver configured +for a website like this, with \f[V]/www\f[R] as root (\f[V]/\f[R]): +.IP +.nf +\f[C] + /www + |---- resources + | |---- image.jpg + |---- index.html +\f[R] +.fi +.PP +By default, nginx will generate logs in the \f[V]combined\f[R] format +with the name \f[V]access.log\f[R] in \f[V]/var/log/nginx/\f[R] and +rotate them daily. +.PP +Copy the default configuration and template from the git directory to a +directory of your choice, in this case \f[V]\[ti]/.config/regina\f[R] If +you did clone the git repo, the files should be in +\f[V]/usr/local/lib/python3.11/site-packages/regina/package-data/\f[R]. +.IP +.nf +\f[C] + mkdir \[ti]/.config/regina + cp regina/package-data/default.cfg \[ti]/.config/regina/regina.cfg + cp regina/package-data/template.html \[ti]/.config/regina/template.html +\f[R] +.fi +.PP +Now edit the configuration to fit your needs. +For our example: +.IP +.nf +\f[C] + [regina] + server_name = my_server.com + access_log = /var/log/nginx/access.log.1 + ... + [html-generation] + html_out_path = /www/analytics/analytics.html + img_location = /img + + [plot-generation] + img_out_dir = /www/analytics/img +\f[R] +.fi +.PP +Most defaults should be fine. +The default configuration should also be documented well enough for you +to know what do do. +It is strongly recommended to only use absolute paths. +.PP +Now you fill collect the data from the nginx log specified as +\f[V]access_log\f[R] in the configuration into the database specified at +the \f[V]database\f[R] location (or +\f[V]\[ti]/.local/share/regina/my-server.com.db\f[R] if left blank): +.IP +.nf +\f[C] + regina ----config \[ti]/.config/regina/regina.cfg --collect +\f[R] +.fi +.PP +To visualize the data, run: +.IP +.nf +\f[C] + regina ----config \[ti]/.config/regina/regina.cfg --visualize +\f[R] +.fi +.PP +This will generate plots and statistics and replace all variables in +\f[V]template_html\f[R] and output the result to +\f[V]html_out_path\f[R]. +If \f[V]html_out_path\f[R] is in your webroot, you should now be able to +access the generated site. +.PD 0 +.P +.PD +In our example, \f[V]/www\f[R] will look like this: +.IP +.nf +\f[C] + /www + |---- analytics + | |---- analytics.html + | |---- img + | |---- ranking_referer_total.svg + | |---- ranking_referer_last_x_days.svg + | ... + |---- resources + | |---- image.jpg + |---- index.html +\f[R] +.fi +.SS Automation +.PP +You will probably run \f[V]regina\f[R] once per day, after +\f[V]nginx\f[R] has filled the daily access log. +The easiest way to that is using a \f[I]cronjob\f[R]. +Run \f[V]crontab -e\f[R] and enter: +\f[V]10 0 * * * /usr/bin/regina ----config /home/myuser/.config/regina/regina.cfg --collect --visualize\f[R] +This assumes, you installed \f[V]regina\f[R] system-wide. +.PD 0 +.P +.PD +Now the \f[V]regina\f[R] command will be run every day, ten minutes +after midnight. +After each day, rotates the logs, so \f[V]access.log\f[R] becomes +\f[V]access.log.1\f[R]. +Since \f[V]regina\f[R] is run after the log rotation, you will probably +want to run it on \f[V]access.log.1\f[R]. +.SS Logfile permissions +.PP +By default, \f[V]nginx\f[R] logs are \f[V]-rw-r------- root root\f[R] so +you can not access them as user. +You could either run regina as root, which I \f[B]strongly do not +recommend\f[R] or make a root-cronjob that changes ownership of the log +after midnight. +Run \f[V]sudo crontab -e\f[R] and enter: +\f[V]9 0 * * * chown your-username /var/log/nginx/access.log.1\f[R] +This will make you the owner of the log 9 minutes after midnight, just +before \f[V]regina\f[R] needs read access. +.SS GeoIP +.PP +\f[V]regina\f[R] can show you from which country or city a visitor is +from, but you will need an \f[I]ip2location\f[R] database. +You can acquire such a database for free at +ip2location.com (https://lite.ip2location.com/) (and probably some other +sites as well!). +After creating create an account you can download several different +databases in different formats. +.PD 0 +.P +.PD +For \f[V]regina\f[R], download the \f[V]IP-COUNTRY-REGION-CITY\f[R] for +IPv4 as \f[I]csv\f[R]. +.PP +To configure regina to use the GeoIP database, edit +\f[V]get_visitor_location\f[R] and \f[V]get_cities_for_contries\f[R] in +section \f[V]data-collection\f[R]. +.PD 0 +.P +.PD +By default, \f[V]regina\f[R] only tells you which country a user is +from. +Append the two-letter country codes for countries you are interested in +to the \f[V]get_cities_for_contries\f[R] option. +.PD 0 +.P +.PD +After that, add the GeoIP-data into your database: +.IP +.nf +\f[C] + regina ----config regina.cfg --update-geoip path-to-csv +\f[R] +.fi +.PP +Depending on how many countries you specified, this might take a long +time. +You can delete the \f[V]csv\f[R] afterwards. +.SH CUSTOMIZATION +.SS Generated html +.PP +The generated file does not need to be an html. +The template can be any text file. +.PD 0 +.P +.PD +\f[V]regina\f[R] will only replace certain words starting with a +\f[V]%\f[R]. +You can see all supported variables and their values by running +\f[V]----visualize\f[R] with \f[V]debug_level = 1\f[R]. +.SS Data export +.PP +If you want to further process the data generated by regina, you can +export the data by setting the \f[V]data_out_dir\f[R] in the +\f[V]data-export\f[R] section. +The data can be exported as \f[V]csv\f[R] or \f[V]pkl\f[R]. +.PD 0 +.P +.PD +If you choose \f[V]pkl\f[R] as filetype, all rankings will be exported +as python type \f[V]list[tuple[int, str]]\f[R]. +.SS Database +.PP +You can of course work directly with the database, as long as it is not +altered. +Editing, adding or deleting entries might make the database incompatible +with regina, so only do that if you know what you are doing. +Just querying entries will be fine though. +.SH TROUBLESHOOTING +.SS General +.PP +If you are having problems, try setting the \f[V]debug_level\f[R] in +section \f[V]debug\f[R] of the configuration file to a non-zero value. +.SS sqlite3.OperationalError: near \[lq]STRICT\[rq]: syntax error +.PP +Your sqlite3 version is probably too old. +Check with \f[V]sqlite3 ----version\f[R]. +\f[V]regina\f[R] requires 3.37 or higher. +.PD 0 +.P +.PD +Hotfix: Remove all \f[V]STRICT\f[R]s from +\f[V]/site-packages/regina/sql/create_db.sql\f[R]. .SH CHANGELOG -.SS 1.0 +.SS 1.1 +.IP \[bu] 2 +Improved database format: +.RS 2 +.IP \[bu] 2 +put referrer, browser and platform in own table to reduze size of the +database +.IP \[bu] 2 +route groups now part of visualization, not data collection +.RE +.IP \[bu] 2 +Data visualization now uses more sql for improved performance +.IP \[bu] 2 +Refactored codebase +.IP \[bu] 2 +Bug fixes +.IP \[bu] 2 +Changed setup.py to pyproject.toml ## 1.0 .IP \[bu] 2 Initial release .SH COPYRIGHT .PP -Copyright \[co] 2022 Matthias Quintern. +Copyright © 2022 Matthias Quintern. License GPLv3+: GNU GPL version 3 . .PD 0 .P diff --git a/regina.1.md b/regina.1.md index 77eb34e..0c72fb1 100644 --- a/regina.1.md +++ b/regina.1.md @@ -1,19 +1,45 @@ % REGINA(1) regina 1.1 % Matthias Quintern -% April 2022 +% May 2023 # NAME regina - **R**uling **E**mpress **G**enerating **I**n-depth **N**ginx **A**nalytics (obviously) +## Description +`regina` is a **python** program that generates ***analytics*** for a static webpage serverd with **nginx**. +`regina` is easy to deploy and privacy respecting: + - it collects the data from the nginx logs: no javascript/changes to your website required + - data is stored on your device in a **sqlite** database, nothing goes to any cloud +It parses the log and **stores** the important data in an *sqlite* database. +It can then create an analytics html page that has lots of useful **plots** and **numbers**. + + + + + + + + + + + + + + + + + + + + + + + + # SYNOPSIS | **regina** --config CONFIG_FILE [OPTION...] -# DESCRIPTION -Regina is an analytics tool for nginx. -It collects information from the nginx access.log and stores it in a sqlite3 database. -Regina supports several data visualization configurations and can generate an admin-analytics page from an html template file. - -## Command line options +# COMMAND LINE OPTIONS **-h**, **--help** : Show the the possible command line arguments @@ -37,8 +63,8 @@ Regina supports several data visualization configurations and can generate an ad ## Dependencies - **nginx**: You need a nginx webserver that outputs the access log in the `combined` format, which is the default - **sqlite >= 3.37** -- **Python >= 3.10** -- **Python/matplotlib** +- **python >= 3.10** +- **python-matplotlib** ## Installation You can install regina with python-pip: @@ -119,7 +145,7 @@ In our example, `/www` will look like this: ### Automation You will probably run `regina` once per day, after `nginx` has filled the daily access log. The easiest way to that is using a *cronjob*. Run `crontab -e` and enter: -`10 0 * * * /usr/bin/regina --config /home/myuser/.config/regina/regina.conf --collect --visualize` +`10 0 * * * /usr/bin/regina --config /home/myuser/.config/regina/regina.cfg --collect --visualize` This assumes, you installed `regina` system-wide. Now the `regina` command will be run every day, ten minutes after midnight. After each day, rotates the logs, so `access.log` becomes `access.log.1`. @@ -144,7 +170,7 @@ By default, `regina` only tells you which country a user is from. Append the two-letter country codes for countries you are interested in to the `get_cities_for_contries` option. After that, add the GeoIP-data into your database: ``` - regina --config regina.conf --update-geoip path-to-csv + regina --config regina.cfg --update-geoip path-to-csv ``` Depending on how many countries you specified, this might take a long time. You can delete the `csv` afterwards.