Updated readme
This commit is contained in:
parent
281c766cbd
commit
f3ea297570
191
README.md
191
README.md
@ -1,32 +1,50 @@
|
|||||||
# regina - nginx analytics tool
|
# regina - nginx analytics tool
|
||||||
**R**uling **E**mpress **G**enerating **I**n-depth **N**ginx **A**nalytics (obviously)
|
**R**uling **E**mpress **G**enerating **I**n-depth **N**ginx **A**nalytics (obviously)
|
||||||
|
|
||||||
## Overview
|
# regina - website analytics
|
||||||
Regina is an analytics tool for nginx.
|
**R**uling **E**mpress **G**enerating **I**n-depth **N**ginx **A**nalytics
|
||||||
It collects information from the nginx access.log and stores it in a sqlite3 database.
|
|
||||||
Regina supports several data visualization configurations and can generate an admin-analytics page from an html template file.
|
|
||||||
|
|
||||||
## Command line options
|
## Description
|
||||||
**-h**, **--help**
|
`regina` is a **python** <!-- ![python-logo](/resources/img/logos/python.svg "snek make analytics go brr") --> program that generates ***analytics*** for a static webpage serverd with **nginx**.
|
||||||
: Show the the possible command line arguments
|
`regina` is easy to deploy and privacy respecting:
|
||||||
|
- it collects the data from the nginx logs: no javascript/changes to your website required
|
||||||
|
- data is stored on your device in a **sqlite** database, nothing goes to any cloud
|
||||||
|
It parses the log and **stores** the important data in an *sqlite* <!-- ![sqlite-logo](/resources/img/logos/sqlite.svg) --> database.
|
||||||
|
It can then create an analytics html page that has lots of useful **plots** and **numbers**.
|
||||||
|
|
||||||
**-c**, **--config** config-file
|
## Capabilities
|
||||||
: Retrieve settings from the config-file
|
### Statistics
|
||||||
|
`regina` can generate the following statistics:
|
||||||
|
|
||||||
**--access-log** log-file
|
- visitor count history
|
||||||
: Overrides the access_log from the configuration
|
- request count history
|
||||||
|
- referrer ranking *(from which site people visit)*
|
||||||
|
- route ranking *(accessed files)*
|
||||||
|
- browser ranking
|
||||||
|
- platform ranking *(operating systems)*
|
||||||
|
- city ranking *(where your site visitors are from)*
|
||||||
|
- country ranking
|
||||||
|
- mobile visitor percentage
|
||||||
|
- detect if a visitor is likely to be human or a bot
|
||||||
|
|
||||||
**--collect**
|
All of those plots and numbers can be generated for the **last x days** (you can set *x* yourself) and for **all times**.
|
||||||
: Collect information from the access_log and store them in the databse
|
|
||||||
|
|
||||||
**--visualize**
|
### Visualization
|
||||||
: Visualize the data from the database
|
`regina` can use the data above to generate a static analytics page in a single html file.
|
||||||
|
The visitor and ranking histories are included as plots.
|
||||||
|
You can view an example page [here](https://quintern.xyz/en/software/regina-example.html)
|
||||||
|
If that is not enough for you, you can write your own script and use data exported by regina or access the database directly.
|
||||||
|
|
||||||
**--update-geoip** geoip-db
|
# Getting started
|
||||||
: Recreate the geoip part of the database from the geoip-db csv. The csv must have this form: lower, upper, country-code, country-name, region, city
|
|
||||||
|
|
||||||
# Installation with pip
|
## Dependencies
|
||||||
You can also install regina with python-pip:
|
- **nginx**: You need a nginx webserver that outputs the access log in the `combined` format, which is the default
|
||||||
|
- **sqlite >= 3.37**
|
||||||
|
- **python >= 3.10**
|
||||||
|
- **python-matplotlib**
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
You can install regina with python-pip:
|
||||||
```shell
|
```shell
|
||||||
git clone https://github.com/MatthiasQuintern/regina.git
|
git clone https://github.com/MatthiasQuintern/regina.git
|
||||||
cd regina
|
cd regina
|
||||||
@ -36,13 +54,138 @@ You can also install it system-wide using `sudo python3 -m pip install .`
|
|||||||
|
|
||||||
If you also want to install the man-page and the zsh completion script:
|
If you also want to install the man-page and the zsh completion script:
|
||||||
```shell
|
```shell
|
||||||
sudo cp regina.1.man /usr/share/man/man1/regina.1
|
sudo cp regina.1.man /usr/share/man/man1/regina.1
|
||||||
sudo gzip /usr/share/man/man1/regina.1
|
sudo gzip /usr/share/man/man1/regina.1
|
||||||
sudo cp _regina.compdef.zsh /usr/share/zsh/site-functions/_regina
|
sudo cp regina/package-data/_regina.compdef.zsh /usr/local/share/zsh/site-functions/_regina
|
||||||
sudo chmod +x /usr/share/zsh/site-functions/_regina
|
sudo chmod +x /usr/share/zsh/site-functions/_regina
|
||||||
```
|
```
|
||||||
|
|
||||||
## 1.0
|
## Configuration
|
||||||
|
The following instructions assume you have an nginx webserver configured for a website like this, with `/www` as root (`/`):
|
||||||
|
```
|
||||||
|
/www
|
||||||
|
|-- resources
|
||||||
|
| |-- image.jpg
|
||||||
|
|-- index.html
|
||||||
|
```
|
||||||
|
By default, nginx will generate logs in the `combined` format with the name `access.log` in `/var/log/nginx/` and rotate them daily.
|
||||||
|
|
||||||
|
Copy the default configuration and template from the git directory to a directory of your choice, in this case `~/.config/regina`
|
||||||
|
If you did clone the git repo, the files should be in `/usr/local/lib/python3.11/site-packages/regina/package-data/`.
|
||||||
|
```shell
|
||||||
|
mkdir ~/.config/regina
|
||||||
|
cp regina/package-data/default.cfg ~/.config/regina/regina.cfg
|
||||||
|
cp regina/package-data/template.html ~/.config/regina/template.html
|
||||||
|
```
|
||||||
|
Now edit the configuration to fit your needs.
|
||||||
|
For our example:
|
||||||
|
```
|
||||||
|
[regina]
|
||||||
|
server_name = my_server.com
|
||||||
|
access_log = /var/log/nginx/access.log.1
|
||||||
|
...
|
||||||
|
[html-generation]
|
||||||
|
html_out_path = /www/analytics/analytics.html
|
||||||
|
img_location = /img
|
||||||
|
|
||||||
|
[plot-generation]
|
||||||
|
img_out_dir = /www/analytics/img
|
||||||
|
```
|
||||||
|
Most defaults should be fine. The default configuration should also be documented well enough for you to know what do do.
|
||||||
|
It is strongly recommended to only use absolute paths.
|
||||||
|
|
||||||
|
Now you fill collect the data from the nginx log specified as `access_log` in the configuration into the database specified at the `database` location (or `~/.local/share/regina/my-server.com.db` if left blank):
|
||||||
|
```
|
||||||
|
regina --config ~/.config/regina/regina.cfg --collect
|
||||||
|
```
|
||||||
|
|
||||||
|
To visualize the data, run:
|
||||||
|
```
|
||||||
|
regina --config ~/.config/regina/regina.cfg --visualize
|
||||||
|
```
|
||||||
|
This will generate plots and statistics and replace all variables in `template_html` and output the result to `html_out_path`.
|
||||||
|
If `html_out_path` is in your webroot, you should now be able to access the generated site.
|
||||||
|
In our example, `/www` will look like this:
|
||||||
|
```
|
||||||
|
/www
|
||||||
|
|-- analytics
|
||||||
|
| |-- analytics.html
|
||||||
|
| |-- img
|
||||||
|
| |-- ranking_referer_total.svg
|
||||||
|
| |-- ranking_referer_last_x_days.svg
|
||||||
|
| ...
|
||||||
|
|-- resources
|
||||||
|
| |-- image.jpg
|
||||||
|
|-- index.html
|
||||||
|
```
|
||||||
|
|
||||||
|
### Automation
|
||||||
|
You will probably run `regina` once per day, after `nginx` has filled the daily access log. The easiest way to that is using a *cronjob*.
|
||||||
|
Run `crontab -e` and enter:
|
||||||
|
`10 0 * * * /usr/bin/regina --config /home/myuser/.config/regina/regina.cfg --collect --visualize`
|
||||||
|
This assumes, you installed `regina` system-wide.
|
||||||
|
Now the `regina` command will be run every day, ten minutes after midnight.
|
||||||
|
After each day, rotates the logs, so `access.log` becomes `access.log.1`.
|
||||||
|
Since `regina` is run after the log rotation, you will probably want to run it on `access.log.1`.
|
||||||
|
|
||||||
|
#### Logfile permissions
|
||||||
|
By default, `nginx` logs are `-rw-r----- root root` so you can not access them as user.
|
||||||
|
You could either run regina as root, which I **strongly do not recommend** or make a root-cronjob that changes ownership of the log after midnight.
|
||||||
|
Run `sudo crontab -e` and enter:
|
||||||
|
`9 0 * * * chown your-username /var/log/nginx/access.log.1`
|
||||||
|
This will make you the owner of the log 9 minutes after midnight, just before `regina` needs read access.
|
||||||
|
|
||||||
|
|
||||||
|
## GeoIP
|
||||||
|
`regina` can show you from which country or city a visitor is from, but you will need an *ip2location* database.
|
||||||
|
You can acquire such a database for free at [ip2location.com](https://lite.ip2location.com/) (and probably some other sites as well!).
|
||||||
|
After creating create an account you can download several different databases in different formats.
|
||||||
|
For `regina`, download the `IP-COUNTRY-REGION-CITY` for IPv4 as *csv*.
|
||||||
|
|
||||||
|
To configure regina to use the GeoIP database, edit `get_visitor_location` and `get_cities_for_contries` in section `data-collection`.
|
||||||
|
By default, `regina` only tells you which country a user is from.
|
||||||
|
Append the two-letter country codes for countries you are interested in to the `get_cities_for_contries` option.
|
||||||
|
After that, add the GeoIP-data into your database:
|
||||||
|
```
|
||||||
|
regina --config regina.cfg --update-geoip path-to-csv
|
||||||
|
```
|
||||||
|
Depending on how many countries you specified, this might take a long time. You can delete the `csv` afterwards.
|
||||||
|
|
||||||
|
|
||||||
|
# CUSTOMIZATION
|
||||||
|
## Generated html
|
||||||
|
The generated file does not need to be an html. The template can be any text file.
|
||||||
|
`regina` will only replace certain words starting with a `%`.
|
||||||
|
You can see all supported variables and their values by running `--visualize` with `debug_level = 1`.
|
||||||
|
|
||||||
|
## Data export
|
||||||
|
If you want to further process the data generated by regina, you can export the data by setting the `data_out_dir` in the `data-export` section.
|
||||||
|
The data can be exported as `csv` or `pkl`.
|
||||||
|
If you choose `pkl` as filetype, all rankings will be exported as python type `list[tuple[int, str]]`.
|
||||||
|
|
||||||
|
## Database
|
||||||
|
You can of course work directly with the database, as long as it is not altered.
|
||||||
|
Editing, adding or deleting entries might make the database incompatible with regina, so only do that if you know what you are doing.
|
||||||
|
Just querying entries will be fine though.
|
||||||
|
|
||||||
|
# TROUBLESHOOTING
|
||||||
|
## General
|
||||||
|
If you are having problems, try setting the `debug_level` in section `debug` of the configuration file to a non-zero value.
|
||||||
|
|
||||||
|
## sqlite3.OperationalError: near "STRICT": syntax error
|
||||||
|
Your sqlite3 version is probably too old. Check with `sqlite3 --version`. `regina` requires 3.37 or higher.
|
||||||
|
Hotfix: Remove all `STRICT`s from `<python-dir>/site-packages/regina/sql/create_db.sql`.
|
||||||
|
|
||||||
|
# Cangelog
|
||||||
|
## 1.1 (2023-05-17)
|
||||||
|
- Improved database format:
|
||||||
|
- put referrer, browser and platform in own table to reduze size of the database
|
||||||
|
- route groups now part of visualization, not data collection
|
||||||
|
- Data visualization now uses more sql for improved performance
|
||||||
|
- Refactored codebase
|
||||||
|
- Bug fixes
|
||||||
|
- Changed setup.py to pyproject.toml
|
||||||
|
## 1.0 (2022-12-14)
|
||||||
- Initial release
|
- Initial release
|
||||||
|
|
||||||
# Copyright
|
# Copyright
|
||||||
|
295
regina.1.man
295
regina.1.man
@ -1,4 +1,4 @@
|
|||||||
.\" Automatically generated by Pandoc 2.19.2
|
.\" Automatically generated by Pandoc 3.0.1
|
||||||
.\"
|
.\"
|
||||||
.\" Define V font for inline verbatim, using C font in formats
|
.\" Define V font for inline verbatim, using C font in formats
|
||||||
.\" that render this, and otherwise B font.
|
.\" that render this, and otherwise B font.
|
||||||
@ -14,23 +14,28 @@
|
|||||||
. ftr VB CB
|
. ftr VB CB
|
||||||
. ftr VBI CBI
|
. ftr VBI CBI
|
||||||
.\}
|
.\}
|
||||||
.TH "NICOLE" "1" "April 2022" "nicole 2.0" ""
|
.TH "REGINA" "1" "May 2023" "regina 1.1" ""
|
||||||
.hy
|
.hy
|
||||||
.SH NAME
|
.SH NAME
|
||||||
.PP
|
.PP
|
||||||
\f[B]R\f[R]uling \f[B]E\f[R]mpress \f[B]G\f[R]enerating
|
regina - \f[B]R\f[R]uling \f[B]E\f[R]mpress \f[B]G\f[R]enerating
|
||||||
\f[B]I\f[R]n-depth \f[B]N\f[R]ginx \f[B]A\f[R]nalytics (obviously)
|
\f[B]I\f[R]n-depth \f[B]N\f[R]ginx \f[B]A\f[R]nalytics (obviously)
|
||||||
Regina is an analytics tool for nginx.
|
.SS Description
|
||||||
|
.PP
|
||||||
|
\f[V]regina\f[R] is a \f[B]python\f[R] program that generates
|
||||||
|
\f[B]\f[BI]analytics\f[B]\f[R] for a static webpage serverd with
|
||||||
|
\f[B]nginx\f[R].
|
||||||
|
\f[V]regina\f[R] is easy to deploy and privacy respecting: - it collects
|
||||||
|
the data from the nginx logs: no javascript/changes to your website
|
||||||
|
required - data is stored on your device in a \f[B]sqlite\f[R] database,
|
||||||
|
nothing goes to any cloud It parses the log and \f[B]stores\f[R] the
|
||||||
|
important data in an \f[I]sqlite\f[R] database.
|
||||||
|
It can then create an analytics html page that has lots of useful
|
||||||
|
\f[B]plots\f[R] and \f[B]numbers\f[R].
|
||||||
.SH SYNOPSIS
|
.SH SYNOPSIS
|
||||||
.PP
|
.PP
|
||||||
\f[B]regina\f[R] \[em]-config CONFIG_FILE [OPTION\&...]
|
\f[B]regina\f[R] \[em]-config CONFIG_FILE [OPTION\&...]
|
||||||
.SH DESCRIPTION
|
.SH COMMAND LINE OPTIONS
|
||||||
.PP
|
|
||||||
It collects information from the nginx access.log and stores it in a
|
|
||||||
sqlite3 database.
|
|
||||||
Regina supports several data visualization configurations and can
|
|
||||||
generate an admin-analytics page from an html template file.
|
|
||||||
.SS Command line options
|
|
||||||
.TP
|
.TP
|
||||||
\f[B]-h\f[R], \f[B]\[em]-help\f[R]
|
\f[B]-h\f[R], \f[B]\[em]-help\f[R]
|
||||||
Show the the possible command line arguments
|
Show the the possible command line arguments
|
||||||
@ -51,24 +56,20 @@ Visualize the data from the database
|
|||||||
Recreate the geoip part of the database from the geoip-db csv.
|
Recreate the geoip part of the database from the geoip-db csv.
|
||||||
The csv must have this form: lower, upper, country-code, country-name,
|
The csv must have this form: lower, upper, country-code, country-name,
|
||||||
region, city
|
region, city
|
||||||
.SH INSTALLATION AND UPDATING
|
.SH GETTING STARTED
|
||||||
|
.SS Dependencies
|
||||||
|
.IP \[bu] 2
|
||||||
|
\f[B]nginx\f[R]: You need a nginx webserver that outputs the access log
|
||||||
|
in the \f[V]combined\f[R] format, which is the default
|
||||||
|
.IP \[bu] 2
|
||||||
|
\f[B]sqlite >= 3.37\f[R]
|
||||||
|
.IP \[bu] 2
|
||||||
|
\f[B]python >= 3.10\f[R]
|
||||||
|
.IP \[bu] 2
|
||||||
|
\f[B]python-matplotlib\f[R]
|
||||||
|
.SS Installation
|
||||||
.PP
|
.PP
|
||||||
To update regina, simply follow the installation instructions.
|
You can install regina with python-pip:
|
||||||
.SS pacman (Arch Linux)
|
|
||||||
.PP
|
|
||||||
Installing regina using the Arch Build System also installs the man-page
|
|
||||||
and a zsh completion script, if you have zsh installed.
|
|
||||||
.IP
|
|
||||||
.nf
|
|
||||||
\f[C]
|
|
||||||
git clone https://github.com/MatthiasQuintern/regina.git
|
|
||||||
cd regina
|
|
||||||
makepkg -si
|
|
||||||
\f[R]
|
|
||||||
.fi
|
|
||||||
.SS pip
|
|
||||||
.PP
|
|
||||||
You can also install regina with python-pip:
|
|
||||||
.IP
|
.IP
|
||||||
.nf
|
.nf
|
||||||
\f[C]
|
\f[C]
|
||||||
@ -85,19 +86,245 @@ If you also want to install the man-page and the zsh completion script:
|
|||||||
.IP
|
.IP
|
||||||
.nf
|
.nf
|
||||||
\f[C]
|
\f[C]
|
||||||
sudo cp regina.1.man /usr/share/man/man1/regina.1
|
sudo cp regina.1.man /usr/share/man/man1/regina.1
|
||||||
sudo gzip /usr/share/man/man1/regina.1
|
sudo gzip /usr/share/man/man1/regina.1
|
||||||
sudo cp _regina.compdef.zsh /usr/share/zsh/site-functions/_regina
|
sudo cp regina/package-data/_regina.compdef.zsh /usr/local/share/zsh/site-functions/_regina
|
||||||
sudo chmod +x /usr/share/zsh/site-functions/_regina
|
sudo chmod +x /usr/share/zsh/site-functions/_regina
|
||||||
\f[R]
|
\f[R]
|
||||||
.fi
|
.fi
|
||||||
|
.SS Configuration
|
||||||
|
.PP
|
||||||
|
The following instructions assume you have an nginx webserver configured
|
||||||
|
for a website like this, with \f[V]/www\f[R] as root (\f[V]/\f[R]):
|
||||||
|
.IP
|
||||||
|
.nf
|
||||||
|
\f[C]
|
||||||
|
/www
|
||||||
|
|---- resources
|
||||||
|
| |---- image.jpg
|
||||||
|
|---- index.html
|
||||||
|
\f[R]
|
||||||
|
.fi
|
||||||
|
.PP
|
||||||
|
By default, nginx will generate logs in the \f[V]combined\f[R] format
|
||||||
|
with the name \f[V]access.log\f[R] in \f[V]/var/log/nginx/\f[R] and
|
||||||
|
rotate them daily.
|
||||||
|
.PP
|
||||||
|
Copy the default configuration and template from the git directory to a
|
||||||
|
directory of your choice, in this case \f[V]\[ti]/.config/regina\f[R] If
|
||||||
|
you did clone the git repo, the files should be in
|
||||||
|
\f[V]/usr/local/lib/python3.11/site-packages/regina/package-data/\f[R].
|
||||||
|
.IP
|
||||||
|
.nf
|
||||||
|
\f[C]
|
||||||
|
mkdir \[ti]/.config/regina
|
||||||
|
cp regina/package-data/default.cfg \[ti]/.config/regina/regina.cfg
|
||||||
|
cp regina/package-data/template.html \[ti]/.config/regina/template.html
|
||||||
|
\f[R]
|
||||||
|
.fi
|
||||||
|
.PP
|
||||||
|
Now edit the configuration to fit your needs.
|
||||||
|
For our example:
|
||||||
|
.IP
|
||||||
|
.nf
|
||||||
|
\f[C]
|
||||||
|
[regina]
|
||||||
|
server_name = my_server.com
|
||||||
|
access_log = /var/log/nginx/access.log.1
|
||||||
|
...
|
||||||
|
[html-generation]
|
||||||
|
html_out_path = /www/analytics/analytics.html
|
||||||
|
img_location = /img
|
||||||
|
|
||||||
|
[plot-generation]
|
||||||
|
img_out_dir = /www/analytics/img
|
||||||
|
\f[R]
|
||||||
|
.fi
|
||||||
|
.PP
|
||||||
|
Most defaults should be fine.
|
||||||
|
The default configuration should also be documented well enough for you
|
||||||
|
to know what do do.
|
||||||
|
It is strongly recommended to only use absolute paths.
|
||||||
|
.PP
|
||||||
|
Now you fill collect the data from the nginx log specified as
|
||||||
|
\f[V]access_log\f[R] in the configuration into the database specified at
|
||||||
|
the \f[V]database\f[R] location (or
|
||||||
|
\f[V]\[ti]/.local/share/regina/my-server.com.db\f[R] if left blank):
|
||||||
|
.IP
|
||||||
|
.nf
|
||||||
|
\f[C]
|
||||||
|
regina ----config \[ti]/.config/regina/regina.cfg --collect
|
||||||
|
\f[R]
|
||||||
|
.fi
|
||||||
|
.PP
|
||||||
|
To visualize the data, run:
|
||||||
|
.IP
|
||||||
|
.nf
|
||||||
|
\f[C]
|
||||||
|
regina ----config \[ti]/.config/regina/regina.cfg --visualize
|
||||||
|
\f[R]
|
||||||
|
.fi
|
||||||
|
.PP
|
||||||
|
This will generate plots and statistics and replace all variables in
|
||||||
|
\f[V]template_html\f[R] and output the result to
|
||||||
|
\f[V]html_out_path\f[R].
|
||||||
|
If \f[V]html_out_path\f[R] is in your webroot, you should now be able to
|
||||||
|
access the generated site.
|
||||||
|
.PD 0
|
||||||
|
.P
|
||||||
|
.PD
|
||||||
|
In our example, \f[V]/www\f[R] will look like this:
|
||||||
|
.IP
|
||||||
|
.nf
|
||||||
|
\f[C]
|
||||||
|
/www
|
||||||
|
|---- analytics
|
||||||
|
| |---- analytics.html
|
||||||
|
| |---- img
|
||||||
|
| |---- ranking_referer_total.svg
|
||||||
|
| |---- ranking_referer_last_x_days.svg
|
||||||
|
| ...
|
||||||
|
|---- resources
|
||||||
|
| |---- image.jpg
|
||||||
|
|---- index.html
|
||||||
|
\f[R]
|
||||||
|
.fi
|
||||||
|
.SS Automation
|
||||||
|
.PP
|
||||||
|
You will probably run \f[V]regina\f[R] once per day, after
|
||||||
|
\f[V]nginx\f[R] has filled the daily access log.
|
||||||
|
The easiest way to that is using a \f[I]cronjob\f[R].
|
||||||
|
Run \f[V]crontab -e\f[R] and enter:
|
||||||
|
\f[V]10 0 * * * /usr/bin/regina ----config /home/myuser/.config/regina/regina.cfg --collect --visualize\f[R]
|
||||||
|
This assumes, you installed \f[V]regina\f[R] system-wide.
|
||||||
|
.PD 0
|
||||||
|
.P
|
||||||
|
.PD
|
||||||
|
Now the \f[V]regina\f[R] command will be run every day, ten minutes
|
||||||
|
after midnight.
|
||||||
|
After each day, rotates the logs, so \f[V]access.log\f[R] becomes
|
||||||
|
\f[V]access.log.1\f[R].
|
||||||
|
Since \f[V]regina\f[R] is run after the log rotation, you will probably
|
||||||
|
want to run it on \f[V]access.log.1\f[R].
|
||||||
|
.SS Logfile permissions
|
||||||
|
.PP
|
||||||
|
By default, \f[V]nginx\f[R] logs are \f[V]-rw-r------- root root\f[R] so
|
||||||
|
you can not access them as user.
|
||||||
|
You could either run regina as root, which I \f[B]strongly do not
|
||||||
|
recommend\f[R] or make a root-cronjob that changes ownership of the log
|
||||||
|
after midnight.
|
||||||
|
Run \f[V]sudo crontab -e\f[R] and enter:
|
||||||
|
\f[V]9 0 * * * chown your-username /var/log/nginx/access.log.1\f[R]
|
||||||
|
This will make you the owner of the log 9 minutes after midnight, just
|
||||||
|
before \f[V]regina\f[R] needs read access.
|
||||||
|
.SS GeoIP
|
||||||
|
.PP
|
||||||
|
\f[V]regina\f[R] can show you from which country or city a visitor is
|
||||||
|
from, but you will need an \f[I]ip2location\f[R] database.
|
||||||
|
You can acquire such a database for free at
|
||||||
|
ip2location.com (https://lite.ip2location.com/) (and probably some other
|
||||||
|
sites as well!).
|
||||||
|
After creating create an account you can download several different
|
||||||
|
databases in different formats.
|
||||||
|
.PD 0
|
||||||
|
.P
|
||||||
|
.PD
|
||||||
|
For \f[V]regina\f[R], download the \f[V]IP-COUNTRY-REGION-CITY\f[R] for
|
||||||
|
IPv4 as \f[I]csv\f[R].
|
||||||
|
.PP
|
||||||
|
To configure regina to use the GeoIP database, edit
|
||||||
|
\f[V]get_visitor_location\f[R] and \f[V]get_cities_for_contries\f[R] in
|
||||||
|
section \f[V]data-collection\f[R].
|
||||||
|
.PD 0
|
||||||
|
.P
|
||||||
|
.PD
|
||||||
|
By default, \f[V]regina\f[R] only tells you which country a user is
|
||||||
|
from.
|
||||||
|
Append the two-letter country codes for countries you are interested in
|
||||||
|
to the \f[V]get_cities_for_contries\f[R] option.
|
||||||
|
.PD 0
|
||||||
|
.P
|
||||||
|
.PD
|
||||||
|
After that, add the GeoIP-data into your database:
|
||||||
|
.IP
|
||||||
|
.nf
|
||||||
|
\f[C]
|
||||||
|
regina ----config regina.cfg --update-geoip path-to-csv
|
||||||
|
\f[R]
|
||||||
|
.fi
|
||||||
|
.PP
|
||||||
|
Depending on how many countries you specified, this might take a long
|
||||||
|
time.
|
||||||
|
You can delete the \f[V]csv\f[R] afterwards.
|
||||||
|
.SH CUSTOMIZATION
|
||||||
|
.SS Generated html
|
||||||
|
.PP
|
||||||
|
The generated file does not need to be an html.
|
||||||
|
The template can be any text file.
|
||||||
|
.PD 0
|
||||||
|
.P
|
||||||
|
.PD
|
||||||
|
\f[V]regina\f[R] will only replace certain words starting with a
|
||||||
|
\f[V]%\f[R].
|
||||||
|
You can see all supported variables and their values by running
|
||||||
|
\f[V]----visualize\f[R] with \f[V]debug_level = 1\f[R].
|
||||||
|
.SS Data export
|
||||||
|
.PP
|
||||||
|
If you want to further process the data generated by regina, you can
|
||||||
|
export the data by setting the \f[V]data_out_dir\f[R] in the
|
||||||
|
\f[V]data-export\f[R] section.
|
||||||
|
The data can be exported as \f[V]csv\f[R] or \f[V]pkl\f[R].
|
||||||
|
.PD 0
|
||||||
|
.P
|
||||||
|
.PD
|
||||||
|
If you choose \f[V]pkl\f[R] as filetype, all rankings will be exported
|
||||||
|
as python type \f[V]list[tuple[int, str]]\f[R].
|
||||||
|
.SS Database
|
||||||
|
.PP
|
||||||
|
You can of course work directly with the database, as long as it is not
|
||||||
|
altered.
|
||||||
|
Editing, adding or deleting entries might make the database incompatible
|
||||||
|
with regina, so only do that if you know what you are doing.
|
||||||
|
Just querying entries will be fine though.
|
||||||
|
.SH TROUBLESHOOTING
|
||||||
|
.SS General
|
||||||
|
.PP
|
||||||
|
If you are having problems, try setting the \f[V]debug_level\f[R] in
|
||||||
|
section \f[V]debug\f[R] of the configuration file to a non-zero value.
|
||||||
|
.SS sqlite3.OperationalError: near \[lq]STRICT\[rq]: syntax error
|
||||||
|
.PP
|
||||||
|
Your sqlite3 version is probably too old.
|
||||||
|
Check with \f[V]sqlite3 ----version\f[R].
|
||||||
|
\f[V]regina\f[R] requires 3.37 or higher.
|
||||||
|
.PD 0
|
||||||
|
.P
|
||||||
|
.PD
|
||||||
|
Hotfix: Remove all \f[V]STRICT\f[R]s from
|
||||||
|
\f[V]<python-dir>/site-packages/regina/sql/create_db.sql\f[R].
|
||||||
.SH CHANGELOG
|
.SH CHANGELOG
|
||||||
.SS 1.0
|
.SS 1.1
|
||||||
|
.IP \[bu] 2
|
||||||
|
Improved database format:
|
||||||
|
.RS 2
|
||||||
|
.IP \[bu] 2
|
||||||
|
put referrer, browser and platform in own table to reduze size of the
|
||||||
|
database
|
||||||
|
.IP \[bu] 2
|
||||||
|
route groups now part of visualization, not data collection
|
||||||
|
.RE
|
||||||
|
.IP \[bu] 2
|
||||||
|
Data visualization now uses more sql for improved performance
|
||||||
|
.IP \[bu] 2
|
||||||
|
Refactored codebase
|
||||||
|
.IP \[bu] 2
|
||||||
|
Bug fixes
|
||||||
|
.IP \[bu] 2
|
||||||
|
Changed setup.py to pyproject.toml ## 1.0
|
||||||
.IP \[bu] 2
|
.IP \[bu] 2
|
||||||
Initial release
|
Initial release
|
||||||
.SH COPYRIGHT
|
.SH COPYRIGHT
|
||||||
.PP
|
.PP
|
||||||
Copyright \[co] 2022 Matthias Quintern.
|
Copyright © 2022 Matthias Quintern.
|
||||||
License GPLv3+: GNU GPL version 3 <https://gnu.org/licenses/gpl.html>.
|
License GPLv3+: GNU GPL version 3 <https://gnu.org/licenses/gpl.html>.
|
||||||
.PD 0
|
.PD 0
|
||||||
.P
|
.P
|
||||||
|
48
regina.1.md
48
regina.1.md
@ -1,19 +1,45 @@
|
|||||||
% REGINA(1) regina 1.1
|
% REGINA(1) regina 1.1
|
||||||
% Matthias Quintern
|
% Matthias Quintern
|
||||||
% April 2022
|
% May 2023
|
||||||
|
|
||||||
# NAME
|
# NAME
|
||||||
regina - **R**uling **E**mpress **G**enerating **I**n-depth **N**ginx **A**nalytics (obviously)
|
regina - **R**uling **E**mpress **G**enerating **I**n-depth **N**ginx **A**nalytics (obviously)
|
||||||
|
|
||||||
|
## Description
|
||||||
|
`regina` is a **python** <!-- ![python-logo](/resources/img/logos/python.svg "snek make analytics go brr") --> program that generates ***analytics*** for a static webpage serverd with **nginx**.
|
||||||
|
`regina` is easy to deploy and privacy respecting:
|
||||||
|
- it collects the data from the nginx logs: no javascript/changes to your website required
|
||||||
|
- data is stored on your device in a **sqlite** database, nothing goes to any cloud
|
||||||
|
It parses the log and **stores** the important data in an *sqlite* <!-- ![sqlite-logo](/resources/img/logos/sqlite.svg) --> database.
|
||||||
|
It can then create an analytics html page that has lots of useful **plots** and **numbers**.
|
||||||
|
|
||||||
|
<!-- ## Capabilities -->
|
||||||
|
<!-- ### Statistics -->
|
||||||
|
<!-- `regina` can generate the following statistics: -->
|
||||||
|
|
||||||
|
<!-- - visitor count history -->
|
||||||
|
<!-- - request count history -->
|
||||||
|
<!-- - referrer ranking *(from which site people visit)* -->
|
||||||
|
<!-- - route ranking *(accessed files)* -->
|
||||||
|
<!-- - browser ranking -->
|
||||||
|
<!-- - platform ranking *(operating systems)* -->
|
||||||
|
<!-- - city ranking *(where your site visitors are from)* -->
|
||||||
|
<!-- - country ranking -->
|
||||||
|
<!-- - mobile visitor percentage -->
|
||||||
|
<!-- - detect if a visitor is likely to be human or a bot -->
|
||||||
|
|
||||||
|
<!-- All of those plots and numbers can be generated for the **last x days** (you can set *x* yourself) and for **all times**. -->
|
||||||
|
|
||||||
|
<!-- ### Visualization -->
|
||||||
|
<!-- `regina` can use the data above to generate a static analytics page in a single html file. -->
|
||||||
|
<!-- The visitor and ranking histories are included as plots. -->
|
||||||
|
<!-- You can view an example page [here](https://quintern.xyz/en/software/regina-example.html) -->
|
||||||
|
<!-- If that is not enough for you, you can write your own script and use data exported by regina or access the database directly. -->
|
||||||
|
|
||||||
# SYNOPSIS
|
# SYNOPSIS
|
||||||
| **regina** --config CONFIG_FILE [OPTION...]
|
| **regina** --config CONFIG_FILE [OPTION...]
|
||||||
|
|
||||||
# DESCRIPTION
|
# COMMAND LINE OPTIONS
|
||||||
Regina is an analytics tool for nginx.
|
|
||||||
It collects information from the nginx access.log and stores it in a sqlite3 database.
|
|
||||||
Regina supports several data visualization configurations and can generate an admin-analytics page from an html template file.
|
|
||||||
|
|
||||||
## Command line options
|
|
||||||
**-h**, **--help**
|
**-h**, **--help**
|
||||||
: Show the the possible command line arguments
|
: Show the the possible command line arguments
|
||||||
|
|
||||||
@ -37,8 +63,8 @@ Regina supports several data visualization configurations and can generate an ad
|
|||||||
## Dependencies
|
## Dependencies
|
||||||
- **nginx**: You need a nginx webserver that outputs the access log in the `combined` format, which is the default
|
- **nginx**: You need a nginx webserver that outputs the access log in the `combined` format, which is the default
|
||||||
- **sqlite >= 3.37**
|
- **sqlite >= 3.37**
|
||||||
- **Python >= 3.10**
|
- **python >= 3.10**
|
||||||
- **Python/matplotlib**
|
- **python-matplotlib**
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
You can install regina with python-pip:
|
You can install regina with python-pip:
|
||||||
@ -119,7 +145,7 @@ In our example, `/www` will look like this:
|
|||||||
### Automation
|
### Automation
|
||||||
You will probably run `regina` once per day, after `nginx` has filled the daily access log. The easiest way to that is using a *cronjob*.
|
You will probably run `regina` once per day, after `nginx` has filled the daily access log. The easiest way to that is using a *cronjob*.
|
||||||
Run `crontab -e` and enter:
|
Run `crontab -e` and enter:
|
||||||
`10 0 * * * /usr/bin/regina --config /home/myuser/.config/regina/regina.conf --collect --visualize`
|
`10 0 * * * /usr/bin/regina --config /home/myuser/.config/regina/regina.cfg --collect --visualize`
|
||||||
This assumes, you installed `regina` system-wide.
|
This assumes, you installed `regina` system-wide.
|
||||||
Now the `regina` command will be run every day, ten minutes after midnight.
|
Now the `regina` command will be run every day, ten minutes after midnight.
|
||||||
After each day, rotates the logs, so `access.log` becomes `access.log.1`.
|
After each day, rotates the logs, so `access.log` becomes `access.log.1`.
|
||||||
@ -144,7 +170,7 @@ By default, `regina` only tells you which country a user is from.
|
|||||||
Append the two-letter country codes for countries you are interested in to the `get_cities_for_contries` option.
|
Append the two-letter country codes for countries you are interested in to the `get_cities_for_contries` option.
|
||||||
After that, add the GeoIP-data into your database:
|
After that, add the GeoIP-data into your database:
|
||||||
```
|
```
|
||||||
regina --config regina.conf --update-geoip path-to-csv
|
regina --config regina.cfg --update-geoip path-to-csv
|
||||||
```
|
```
|
||||||
Depending on how many countries you specified, this might take a long time. You can delete the `csv` afterwards.
|
Depending on how many countries you specified, this might take a long time. You can delete the `csv` afterwards.
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user