336 lines
9.5 KiB
Groff
336 lines
9.5 KiB
Groff
.\" Automatically generated by Pandoc 3.0.1
|
|
.\"
|
|
.\" Define V font for inline verbatim, using C font in formats
|
|
.\" that render this, and otherwise B font.
|
|
.ie "\f[CB]x\f[]"x" \{\
|
|
. ftr V B
|
|
. ftr VI BI
|
|
. ftr VB B
|
|
. ftr VBI BI
|
|
.\}
|
|
.el \{\
|
|
. ftr V CR
|
|
. ftr VI CI
|
|
. ftr VB CB
|
|
. ftr VBI CBI
|
|
.\}
|
|
.TH "REGINA" "1" "May 2023" "regina 1.1" ""
|
|
.hy
|
|
.SH NAME
|
|
.PP
|
|
regina - \f[B]R\f[R]uling \f[B]E\f[R]mpress \f[B]G\f[R]enerating
|
|
\f[B]I\f[R]n-depth \f[B]N\f[R]ginx \f[B]A\f[R]nalytics (obviously)
|
|
.SS Description
|
|
.PP
|
|
\f[V]regina\f[R] is a \f[B]python\f[R] program that generates
|
|
\f[B]\f[BI]analytics\f[B]\f[R] for a static webpage serverd with
|
|
\f[B]nginx\f[R].
|
|
\f[V]regina\f[R] is easy to deploy and privacy respecting: - it collects
|
|
the data from the nginx logs: no javascript/changes to your website
|
|
required - data is stored on your device in a \f[B]sqlite\f[R] database,
|
|
nothing goes to any cloud It parses the log and \f[B]stores\f[R] the
|
|
important data in an \f[I]sqlite\f[R] database.
|
|
It can then create an analytics html page that has lots of useful
|
|
\f[B]plots\f[R] and \f[B]numbers\f[R].
|
|
.SH SYNOPSIS
|
|
.PP
|
|
\f[B]regina\f[R] \[em]-config CONFIG_FILE [OPTION\&...]
|
|
.SH COMMAND LINE OPTIONS
|
|
.TP
|
|
\f[B]-h\f[R], \f[B]\[em]-help\f[R]
|
|
Show the the possible command line arguments
|
|
.TP
|
|
\f[B]-c\f[R], \f[B]\[em]-config\f[R] config-file
|
|
Retrieve settings from the config-file
|
|
.TP
|
|
\f[B]\[em]-access-log\f[R] log-file
|
|
Overrides the access_log from the configuration
|
|
.TP
|
|
\f[B]\[em]-collect\f[R]
|
|
Collect information from the access_log and store them in the databse
|
|
.TP
|
|
\f[B]\[em]-visualize\f[R]
|
|
Visualize the data from the database
|
|
.TP
|
|
\f[B]\[em]-update-geoip\f[R] geoip-db
|
|
Recreate the geoip part of the database from the geoip-db csv.
|
|
The csv must have this form: lower, upper, country-code, country-name,
|
|
region, city
|
|
.SH GETTING STARTED
|
|
.SS Dependencies
|
|
.IP \[bu] 2
|
|
\f[B]nginx\f[R]: You need a nginx webserver that outputs the access log
|
|
in the \f[V]combined\f[R] format, which is the default
|
|
.IP \[bu] 2
|
|
\f[B]sqlite >= 3.37\f[R]
|
|
.IP \[bu] 2
|
|
\f[B]python >= 3.10\f[R]
|
|
.IP \[bu] 2
|
|
\f[B]python-matplotlib\f[R]
|
|
.SS Installation
|
|
.PP
|
|
You can install regina with python-pip:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
git clone https://github.com/MatthiasQuintern/regina.git
|
|
cd regina
|
|
python3 -m pip install .
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
You can also install it system-wide using
|
|
\f[V]sudo python3 -m pip install .\f[R]
|
|
.PP
|
|
If you also want to install the man-page and the zsh completion script:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
sudo cp regina.1.man /usr/share/man/man1/regina.1
|
|
sudo gzip /usr/share/man/man1/regina.1
|
|
sudo cp regina/package-data/_regina.compdef.zsh /usr/local/share/zsh/site-functions/_regina
|
|
sudo chmod +x /usr/share/zsh/site-functions/_regina
|
|
\f[R]
|
|
.fi
|
|
.SS Configuration
|
|
.PP
|
|
The following instructions assume you have an nginx webserver configured
|
|
for a website like this, with \f[V]/www\f[R] as root (\f[V]/\f[R]):
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
/www
|
|
|---- resources
|
|
| |---- image.jpg
|
|
|---- index.html
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
By default, nginx will generate logs in the \f[V]combined\f[R] format
|
|
with the name \f[V]access.log\f[R] in \f[V]/var/log/nginx/\f[R] and
|
|
rotate them daily.
|
|
.PP
|
|
Copy the default configuration and template from the git directory to a
|
|
directory of your choice, in this case \f[V]\[ti]/.config/regina\f[R] If
|
|
you did clone the git repo, the files should be in
|
|
\f[V]/usr/local/lib/python3.11/site-packages/regina/package-data/\f[R].
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
mkdir \[ti]/.config/regina
|
|
cp regina/package-data/default.cfg \[ti]/.config/regina/regina.cfg
|
|
cp regina/package-data/template.html \[ti]/.config/regina/template.html
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
Now edit the configuration to fit your needs.
|
|
For our example:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
[regina]
|
|
server_name = my_server.com
|
|
access_log = /var/log/nginx/access.log.1
|
|
...
|
|
[html-generation]
|
|
html_out_path = /www/analytics/analytics.html
|
|
img_location = /img
|
|
|
|
[plot-generation]
|
|
img_out_dir = /www/analytics/img
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
Most defaults should be fine.
|
|
The default configuration should also be documented well enough for you
|
|
to know what do do.
|
|
It is strongly recommended to only use absolute paths.
|
|
.PP
|
|
Now you fill collect the data from the nginx log specified as
|
|
\f[V]access_log\f[R] in the configuration into the database specified at
|
|
the \f[V]database\f[R] location (or
|
|
\f[V]\[ti]/.local/share/regina/my-server.com.db\f[R] if left blank):
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
regina ----config \[ti]/.config/regina/regina.cfg --collect
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
To visualize the data, run:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
regina ----config \[ti]/.config/regina/regina.cfg --visualize
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
This will generate plots and statistics and replace all variables in
|
|
\f[V]template_html\f[R] and output the result to
|
|
\f[V]html_out_path\f[R].
|
|
If \f[V]html_out_path\f[R] is in your webroot, you should now be able to
|
|
access the generated site.
|
|
.PD 0
|
|
.P
|
|
.PD
|
|
In our example, \f[V]/www\f[R] will look like this:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
/www
|
|
|---- analytics
|
|
| |---- analytics.html
|
|
| |---- img
|
|
| |---- ranking_referer_total.svg
|
|
| |---- ranking_referer_last_x_days.svg
|
|
| ...
|
|
|---- resources
|
|
| |---- image.jpg
|
|
|---- index.html
|
|
\f[R]
|
|
.fi
|
|
.SS Automation
|
|
.PP
|
|
You will probably run \f[V]regina\f[R] once per day, after
|
|
\f[V]nginx\f[R] has filled the daily access log.
|
|
The easiest way to that is using a \f[I]cronjob\f[R].
|
|
Run \f[V]crontab -e\f[R] and enter:
|
|
\f[V]10 0 * * * /usr/bin/regina ----config /home/myuser/.config/regina/regina.cfg --collect --visualize\f[R]
|
|
This assumes, you installed \f[V]regina\f[R] system-wide.
|
|
.PD 0
|
|
.P
|
|
.PD
|
|
Now the \f[V]regina\f[R] command will be run every day, ten minutes
|
|
after midnight.
|
|
After each day, rotates the logs, so \f[V]access.log\f[R] becomes
|
|
\f[V]access.log.1\f[R].
|
|
Since \f[V]regina\f[R] is run after the log rotation, you will probably
|
|
want to run it on \f[V]access.log.1\f[R].
|
|
.SS Logfile permissions
|
|
.PP
|
|
By default, \f[V]nginx\f[R] logs are \f[V]-rw-r------- root root\f[R] so
|
|
you can not access them as user.
|
|
You could either run regina as root, which I \f[B]strongly do not
|
|
recommend\f[R] or make a root-cronjob that changes ownership of the log
|
|
after midnight.
|
|
Run \f[V]sudo crontab -e\f[R] and enter:
|
|
\f[V]9 0 * * * chown your-username /var/log/nginx/access.log.1\f[R]
|
|
This will make you the owner of the log 9 minutes after midnight, just
|
|
before \f[V]regina\f[R] needs read access.
|
|
.SS GeoIP
|
|
.PP
|
|
\f[V]regina\f[R] can show you from which country or city a visitor is
|
|
from, but you will need an \f[I]ip2location\f[R] database.
|
|
You can acquire such a database for free at
|
|
ip2location.com (https://lite.ip2location.com/) (and probably some other
|
|
sites as well!).
|
|
After creating create an account you can download several different
|
|
databases in different formats.
|
|
.PD 0
|
|
.P
|
|
.PD
|
|
For \f[V]regina\f[R], download the \f[V]IP-COUNTRY-REGION-CITY\f[R] for
|
|
IPv4 as \f[I]csv\f[R].
|
|
.PP
|
|
To configure regina to use the GeoIP database, edit
|
|
\f[V]get_visitor_location\f[R] and \f[V]get_cities_for_contries\f[R] in
|
|
section \f[V]data-collection\f[R].
|
|
.PD 0
|
|
.P
|
|
.PD
|
|
By default, \f[V]regina\f[R] only tells you which country a user is
|
|
from.
|
|
Append the two-letter country codes for countries you are interested in
|
|
to the \f[V]get_cities_for_contries\f[R] option.
|
|
.PD 0
|
|
.P
|
|
.PD
|
|
After that, add the GeoIP-data into your database:
|
|
.IP
|
|
.nf
|
|
\f[C]
|
|
regina ----config regina.cfg --update-geoip path-to-csv
|
|
\f[R]
|
|
.fi
|
|
.PP
|
|
Depending on how many countries you specified, this might take a long
|
|
time.
|
|
You can delete the \f[V]csv\f[R] afterwards.
|
|
.SH CUSTOMIZATION
|
|
.SS Generated html
|
|
.PP
|
|
The generated file does not need to be an html.
|
|
The template can be any text file.
|
|
.PD 0
|
|
.P
|
|
.PD
|
|
\f[V]regina\f[R] will only replace certain words starting with a
|
|
\f[V]%\f[R].
|
|
You can see all supported variables and their values by running
|
|
\f[V]----visualize\f[R] with \f[V]debug_level = 1\f[R].
|
|
.SS Data export
|
|
.PP
|
|
If you want to further process the data generated by regina, you can
|
|
export the data by setting the \f[V]data_out_dir\f[R] in the
|
|
\f[V]data-export\f[R] section.
|
|
The data can be exported as \f[V]csv\f[R] or \f[V]pkl\f[R].
|
|
.PD 0
|
|
.P
|
|
.PD
|
|
If you choose \f[V]pkl\f[R] as filetype, all rankings will be exported
|
|
as python type \f[V]list[tuple[int, str]]\f[R].
|
|
.SS Database
|
|
.PP
|
|
You can of course work directly with the database, as long as it is not
|
|
altered.
|
|
Editing, adding or deleting entries might make the database incompatible
|
|
with regina, so only do that if you know what you are doing.
|
|
Just querying entries will be fine though.
|
|
.SH TROUBLESHOOTING
|
|
.SS General
|
|
.PP
|
|
If you are having problems, try setting the \f[V]debug_level\f[R] in
|
|
section \f[V]debug\f[R] of the configuration file to a non-zero value.
|
|
.SS sqlite3.OperationalError: near \[lq]STRICT\[rq]: syntax error
|
|
.PP
|
|
Your sqlite3 version is probably too old.
|
|
Check with \f[V]sqlite3 ----version\f[R].
|
|
\f[V]regina\f[R] requires 3.37 or higher.
|
|
.PD 0
|
|
.P
|
|
.PD
|
|
Hotfix: Remove all \f[V]STRICT\f[R]s from
|
|
\f[V]<python-dir>/site-packages/regina/sql/create_db.sql\f[R].
|
|
.SH CHANGELOG
|
|
.SS 1.1
|
|
.IP \[bu] 2
|
|
Improved database format:
|
|
.RS 2
|
|
.IP \[bu] 2
|
|
put referrer, browser and platform in own table to reduze size of the
|
|
database
|
|
.IP \[bu] 2
|
|
route groups now part of visualization, not data collection
|
|
.RE
|
|
.IP \[bu] 2
|
|
Data visualization now uses more sql for improved performance
|
|
.IP \[bu] 2
|
|
Refactored codebase
|
|
.IP \[bu] 2
|
|
Bug fixes
|
|
.IP \[bu] 2
|
|
Changed setup.py to pyproject.toml ## 1.0
|
|
.IP \[bu] 2
|
|
Initial release
|
|
.SH COPYRIGHT
|
|
.PP
|
|
Copyright © 2022 Matthias Quintern.
|
|
License GPLv3+: GNU GPL version 3 <https://gnu.org/licenses/gpl.html>.
|
|
.PD 0
|
|
.P
|
|
.PD
|
|
This is free software: you are free to change and redistribute it.
|
|
There is NO WARRANTY, to the extent permitted by law.
|
|
.SH AUTHORS
|
|
Matthias Quintern.
|