Home > Admin Area > Admin HOWTOs > HOWTO Run Your Invenio Installation |
This HOWTO guide intends to give you ideas on how to run your CDS Invenio installation and how to take care of its normal day-to-day operation.
Many tasks that manipulate the bibliographic record database can be set to run in a periodical mode. For example, we want to have the indexing engine to scan periodically for newly arrived documents to index them as soon as they enter into the system. It is the role of the BibSched system to take care of the task scheduling and the task execution.
Periodical tasks (such as regular metadata indexing) as well as one-time tasks (such as a batch upload of newly acquired metadata file) are not executed straight away but are stored in the BibSched task queue. BibSched daemon looks periodically in the queue and launches the tasks according to their order or the date of programmed runtime. You can consider BibSched to be a kind of cron daemon for bibliographic tasks.
This means that after having installed Invenio you will want to have the BibSched daemon running permanently. To launch BibSched daemon, do:
$ bibsched start
To setup indexing, ranking, sorting, formatting, and collection cache updating daemons to run periodically with a sleeping period of, say, 5 minutes:
$ bibindex -f50000 -s5m $ bibreformat -oHB -s5m $ webcoll -v0 -s5m $ bibrank -f50000 -s5m $ bibsort -s5m
Note that if you are using virtual index facility, such as for the global index, then you should schedule them apart:
$ bibindex -w global -f50000 -s5m
It is imperative to have the above tasks run permanently in your BibSched queue so that newly submitted documents will be processed automatically.
You may also want to set up some periodical housekeeping tasks:
$ bibrank -f50000 -R -wwrd -s14d -LSunday $ bibsort -R -s7d -L 'Sunday 01:00-05:00' $ inveniogc -a -s7d -L 'Sunday 01:00-05:00' $ dbdump -s20h -L '22:00-06:00' -o/opt2/mysql-backups -n10
Please consult the sections below for more details about these housekeeping tasks.
There is also the possibility to setup the batch uploader daemon to run periodically, looking for new documents or metadata files to upload:
$ batchuploader --documents -s20m $ batchuploader --metadata -s20m
Additionally you might want to automatically generate sitemap.xml files for your installation. For this just schedule:
You will then need to add a line such as:$ bibexport -w sitemap -s1d
to your robots.txt file.Sitemap: https://iu.tind.io/sitemap-index.xml.gz
If you are using the WebLinkback module, you may want to run some of the following tasklets:
sudo -u www-data /opt/invenio/bin/bibtasklet \ -N weblinkbackupdaterdeleteurlsonblacklist \ -T bst_weblinkback_updater \ -a "mode=1" \ -u admin -s1d -L '22:00-05:00' sudo -u www-data /opt/invenio/bin/bibtasklet \ -N weblinkbackupdaternewpages \ -T bst_weblinkback_updater \ -a "mode=2" \ -u admin -s1d -L '22:00-05:00' sudo -u www-data /opt/invenio/bin/bibtasklet \ -N weblinkbackupdateroldpages \ -T bst_weblinkback_updater \ -a "mode=3" \ -u admin -s7d -L '22:00-05:00' sudo -u www-data /opt/invenio/bin/bibtasklet \ -N weblinkbackupdatermanuallysettitles \ -T bst_weblinkback_updater \ -a "mode=4" \ -u admin -s7d -L '22:00-05:00' sudo -u www-data /opt/invenio/bin/bibtasklet \ -N weblinkbackupdaterdetectbrokenlinkbacks \ -T bst_weblinkback_updater \ -a "mode=5" \ -u admin -s7d -L 'Sunday 01:00-05:00' sudo -u www-data /opt/invenio/bin/bibtasklet \ -N weblinkbacknotifications \ -T bst_weblinkback_updater \ -a "mode=6" \ -u admin -s1d
Note that the BibSched automated daemon stops as soon as some of its tasks end with an error. You will be informed by email about this incident. Nevertheless, it is a good idea to inspect the BibSched queue from time to time anyway, say several times per day, to see what is going on. This can be done by running the BibSched command-line monitor:
$ bibsched
The monitor will permit you to stop/start the automated mode, to
delete the tasks that were wrongly submitted, to run some of the tasks
manually, etc. Note also that the BibSched daemon writes its log and
error files about its own operation, as well as on the operation of
its tasks, to the /opt/invenio/var/log
directory.
Invenio users may set up an automatic notification email alerts so that they are automatically alerted about documents of their interest by email, either daily, weekly, or monthly. It is the job of the alert engine to do this. The alert engine has to be run every day:
$ alertengine
You may want to set up an external cron job to call
alertengine
each day, for example:
$ crontab -l # invenio: periodically restart Apache: 59 23 * * * /usr/sbin/apachectl restart # invenio: run alert engine: 30 14 * * * /usr/bin/sudo -u apache /opt/invenio/bin/alertengine
When you are adding new records to the system, the word frequency ranking weights for old records aren't recalculated by default in order to speed up the insertion of new records. This may influence a bit the precision of word similarity searches. It is therefore advised to expressely run bibrank in the recalculating mode once in a while during a relatively quiet site operation day, by doing:
You may want to do this either (i) periodically, say once per month (see the previous section), or (ii) depending on the frequency of new additions to the record database, say when the size grows by 2-3 percent.$ bibrank -R -w wrd -s 14d -L Sunday
It is advised to run from time to time the rebalancing of the sorting buckets. In order to speed up the process of insertion of new records, the sorting buckets are not being recalculated, but new records are being added at the end of the corresponding bucket. This might create differences in the size of each bucket which might have a small impact on the speed of sorting.
The rebalancing might be run weekly or even daily.$ bibsort -R -s 7d -L 'Sunday 01:00-05:00'
The tool inveniogc
provides a garbage collector for
the database, temporary files, and the like.
Different temporary log and err files are created
in /opt/invenio/var/log
and /opt/invenio/var/tmp
directory that is good to
clean up from time to time. The previous command could be used to
clean those files, too, via:
$ inveniogc -s 7d -d -L 'Sunday 01:00-05:00'
The inveniogc
tool can run other cleaning actions;
please refer to its help (inveniogc --help
) for more
details.
Note that in the above section "Setting up periodical daemon
tasks", we have set up inveniogc
with
argument -a
in that example, meaning that it would run
all possible cleaning actions. Please modify this if it is not what
you want.
You can launch a bibsched task called dbdump
in order
to take regular snapshot of your database content into SQL dump files.
For example, to back up the database content
into /opt2/mysql-backups
directory every night, keeping
at most 10 latest copies of the backup file, you would launch:
$ dbdump -s 20h -L '22:00-06:00' -o /opt2/mysql-backups -n 10
This will create files named
like invenio-dbdump-2009-03-10_22:10:28.sql
in this
folder.
Note that you may use Invenio-independent MySQL backuping tools
like mysqldump
, but these might generally lock all tables
during backup for consistency, hence it could happen that your site
might not be accessible during backuping time due to the user session
table being locked as well. The dbdump
tool does not
lock all tables, therefore the site remains accessible to users while
the dump files are being created. Note that the dump files are kept
consistent with respect to the data, since dbdump
runs
via bibsched
, hence not allowing any other important
bibliographic task to run during the backup.
To load a dump file produced by dbdump
into a running
Invenio instance later, you can use:
$ bibsched stop $ cat /opt2/mysql-backups/invenio-dbdump-2009-03-10_22\:10\:28.sql | dbexec $ bibsched start