regina/regina.1.man
2023-05-17 18:02:21 +02:00

336 lines
9.5 KiB
Groff

.\" Automatically generated by Pandoc 3.0.1
.\"
.\" Define V font for inline verbatim, using C font in formats
.\" that render this, and otherwise B font.
.ie "\f[CB]x\f[]"x" \{\
. ftr V B
. ftr VI BI
. ftr VB B
. ftr VBI BI
.\}
.el \{\
. ftr V CR
. ftr VI CI
. ftr VB CB
. ftr VBI CBI
.\}
.TH "REGINA" "1" "May 2023" "regina 1.1" ""
.hy
.SH NAME
.PP
regina - \f[B]R\f[R]uling \f[B]E\f[R]mpress \f[B]G\f[R]enerating
\f[B]I\f[R]n-depth \f[B]N\f[R]ginx \f[B]A\f[R]nalytics (obviously)
.SS Description
.PP
\f[V]regina\f[R] is a \f[B]python\f[R] program that generates
\f[B]\f[BI]analytics\f[B]\f[R] for a static webpage serverd with
\f[B]nginx\f[R].
\f[V]regina\f[R] is easy to deploy and privacy respecting: - it collects
the data from the nginx logs: no javascript/changes to your website
required - data is stored on your device in a \f[B]sqlite\f[R] database,
nothing goes to any cloud It parses the log and \f[B]stores\f[R] the
important data in an \f[I]sqlite\f[R] database.
It can then create an analytics html page that has lots of useful
\f[B]plots\f[R] and \f[B]numbers\f[R].
.SH SYNOPSIS
.PP
\f[B]regina\f[R] \[em]-config CONFIG_FILE [OPTION\&...]
.SH COMMAND LINE OPTIONS
.TP
\f[B]-h\f[R], \f[B]\[em]-help\f[R]
Show the the possible command line arguments
.TP
\f[B]-c\f[R], \f[B]\[em]-config\f[R] config-file
Retrieve settings from the config-file
.TP
\f[B]\[em]-access-log\f[R] log-file
Overrides the access_log from the configuration
.TP
\f[B]\[em]-collect\f[R]
Collect information from the access_log and store them in the databse
.TP
\f[B]\[em]-visualize\f[R]
Visualize the data from the database
.TP
\f[B]\[em]-update-geoip\f[R] geoip-db
Recreate the geoip part of the database from the geoip-db csv.
The csv must have this form: lower, upper, country-code, country-name,
region, city
.SH GETTING STARTED
.SS Dependencies
.IP \[bu] 2
\f[B]nginx\f[R]: You need a nginx webserver that outputs the access log
in the \f[V]combined\f[R] format, which is the default
.IP \[bu] 2
\f[B]sqlite >= 3.37\f[R]
.IP \[bu] 2
\f[B]python >= 3.10\f[R]
.IP \[bu] 2
\f[B]python-matplotlib\f[R]
.SS Installation
.PP
You can install regina with python-pip:
.IP
.nf
\f[C]
git clone https://github.com/MatthiasQuintern/regina.git
cd regina
python3 -m pip install .
\f[R]
.fi
.PP
You can also install it system-wide using
\f[V]sudo python3 -m pip install .\f[R]
.PP
If you also want to install the man-page and the zsh completion script:
.IP
.nf
\f[C]
sudo cp regina.1.man /usr/share/man/man1/regina.1
sudo gzip /usr/share/man/man1/regina.1
sudo cp regina/package-data/_regina.compdef.zsh /usr/local/share/zsh/site-functions/_regina
sudo chmod +x /usr/share/zsh/site-functions/_regina
\f[R]
.fi
.SS Configuration
.PP
The following instructions assume you have an nginx webserver configured
for a website like this, with \f[V]/www\f[R] as root (\f[V]/\f[R]):
.IP
.nf
\f[C]
/www
|---- resources
| |---- image.jpg
|---- index.html
\f[R]
.fi
.PP
By default, nginx will generate logs in the \f[V]combined\f[R] format
with the name \f[V]access.log\f[R] in \f[V]/var/log/nginx/\f[R] and
rotate them daily.
.PP
Copy the default configuration and template from the git directory to a
directory of your choice, in this case \f[V]\[ti]/.config/regina\f[R] If
you did clone the git repo, the files should be in
\f[V]/usr/local/lib/python3.11/site-packages/regina/package-data/\f[R].
.IP
.nf
\f[C]
mkdir \[ti]/.config/regina
cp regina/package-data/default.cfg \[ti]/.config/regina/regina.cfg
cp regina/package-data/template.html \[ti]/.config/regina/template.html
\f[R]
.fi
.PP
Now edit the configuration to fit your needs.
For our example:
.IP
.nf
\f[C]
[regina]
server_name = my_server.com
access_log = /var/log/nginx/access.log.1
...
[html-generation]
html_out_path = /www/analytics/analytics.html
img_location = /img
[plot-generation]
img_out_dir = /www/analytics/img
\f[R]
.fi
.PP
Most defaults should be fine.
The default configuration should also be documented well enough for you
to know what do do.
It is strongly recommended to only use absolute paths.
.PP
Now you fill collect the data from the nginx log specified as
\f[V]access_log\f[R] in the configuration into the database specified at
the \f[V]database\f[R] location (or
\f[V]\[ti]/.local/share/regina/my-server.com.db\f[R] if left blank):
.IP
.nf
\f[C]
regina ----config \[ti]/.config/regina/regina.cfg --collect
\f[R]
.fi
.PP
To visualize the data, run:
.IP
.nf
\f[C]
regina ----config \[ti]/.config/regina/regina.cfg --visualize
\f[R]
.fi
.PP
This will generate plots and statistics and replace all variables in
\f[V]template_html\f[R] and output the result to
\f[V]html_out_path\f[R].
If \f[V]html_out_path\f[R] is in your webroot, you should now be able to
access the generated site.
.PD 0
.P
.PD
In our example, \f[V]/www\f[R] will look like this:
.IP
.nf
\f[C]
/www
|---- analytics
| |---- analytics.html
| |---- img
| |---- ranking_referer_total.svg
| |---- ranking_referer_last_x_days.svg
| ...
|---- resources
| |---- image.jpg
|---- index.html
\f[R]
.fi
.SS Automation
.PP
You will probably run \f[V]regina\f[R] once per day, after
\f[V]nginx\f[R] has filled the daily access log.
The easiest way to that is using a \f[I]cronjob\f[R].
Run \f[V]crontab -e\f[R] and enter:
\f[V]10 0 * * * /usr/bin/regina ----config /home/myuser/.config/regina/regina.cfg --collect --visualize\f[R]
This assumes, you installed \f[V]regina\f[R] system-wide.
.PD 0
.P
.PD
Now the \f[V]regina\f[R] command will be run every day, ten minutes
after midnight.
After each day, rotates the logs, so \f[V]access.log\f[R] becomes
\f[V]access.log.1\f[R].
Since \f[V]regina\f[R] is run after the log rotation, you will probably
want to run it on \f[V]access.log.1\f[R].
.SS Logfile permissions
.PP
By default, \f[V]nginx\f[R] logs are \f[V]-rw-r------- root root\f[R] so
you can not access them as user.
You could either run regina as root, which I \f[B]strongly do not
recommend\f[R] or make a root-cronjob that changes ownership of the log
after midnight.
Run \f[V]sudo crontab -e\f[R] and enter:
\f[V]9 0 * * * chown your-username /var/log/nginx/access.log.1\f[R]
This will make you the owner of the log 9 minutes after midnight, just
before \f[V]regina\f[R] needs read access.
.SS GeoIP
.PP
\f[V]regina\f[R] can show you from which country or city a visitor is
from, but you will need an \f[I]ip2location\f[R] database.
You can acquire such a database for free at
ip2location.com (https://lite.ip2location.com/) (and probably some other
sites as well!).
After creating create an account you can download several different
databases in different formats.
.PD 0
.P
.PD
For \f[V]regina\f[R], download the \f[V]IP-COUNTRY-REGION-CITY\f[R] for
IPv4 as \f[I]csv\f[R].
.PP
To configure regina to use the GeoIP database, edit
\f[V]get_visitor_location\f[R] and \f[V]get_cities_for_contries\f[R] in
section \f[V]data-collection\f[R].
.PD 0
.P
.PD
By default, \f[V]regina\f[R] only tells you which country a user is
from.
Append the two-letter country codes for countries you are interested in
to the \f[V]get_cities_for_contries\f[R] option.
.PD 0
.P
.PD
After that, add the GeoIP-data into your database:
.IP
.nf
\f[C]
regina ----config regina.cfg --update-geoip path-to-csv
\f[R]
.fi
.PP
Depending on how many countries you specified, this might take a long
time.
You can delete the \f[V]csv\f[R] afterwards.
.SH CUSTOMIZATION
.SS Generated html
.PP
The generated file does not need to be an html.
The template can be any text file.
.PD 0
.P
.PD
\f[V]regina\f[R] will only replace certain words starting with a
\f[V]%\f[R].
You can see all supported variables and their values by running
\f[V]----visualize\f[R] with \f[V]debug_level = 1\f[R].
.SS Data export
.PP
If you want to further process the data generated by regina, you can
export the data by setting the \f[V]data_out_dir\f[R] in the
\f[V]data-export\f[R] section.
The data can be exported as \f[V]csv\f[R] or \f[V]pkl\f[R].
.PD 0
.P
.PD
If you choose \f[V]pkl\f[R] as filetype, all rankings will be exported
as python type \f[V]list[tuple[int, str]]\f[R].
.SS Database
.PP
You can of course work directly with the database, as long as it is not
altered.
Editing, adding or deleting entries might make the database incompatible
with regina, so only do that if you know what you are doing.
Just querying entries will be fine though.
.SH TROUBLESHOOTING
.SS General
.PP
If you are having problems, try setting the \f[V]debug_level\f[R] in
section \f[V]debug\f[R] of the configuration file to a non-zero value.
.SS sqlite3.OperationalError: near \[lq]STRICT\[rq]: syntax error
.PP
Your sqlite3 version is probably too old.
Check with \f[V]sqlite3 ----version\f[R].
\f[V]regina\f[R] requires 3.37 or higher.
.PD 0
.P
.PD
Hotfix: Remove all \f[V]STRICT\f[R]s from
\f[V]<python-dir>/site-packages/regina/sql/create_db.sql\f[R].
.SH CHANGELOG
.SS 1.1
.IP \[bu] 2
Improved database format:
.RS 2
.IP \[bu] 2
put referrer, browser and platform in own table to reduze size of the
database
.IP \[bu] 2
route groups now part of visualization, not data collection
.RE
.IP \[bu] 2
Data visualization now uses more sql for improved performance
.IP \[bu] 2
Refactored codebase
.IP \[bu] 2
Bug fixes
.IP \[bu] 2
Changed setup.py to pyproject.toml ## 1.0
.IP \[bu] 2
Initial release
.SH COPYRIGHT
.PP
Copyright © 2022 Matthias Quintern.
License GPLv3+: GNU GPL version 3 <https://gnu.org/licenses/gpl.html>.
.PD 0
.P
.PD
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
.SH AUTHORS
Matthias Quintern.