Recoll: Difference between revisions
Jump to navigation
Jump to search
(→Debian) |
(→Debian) |
||
(One intermediate revision by the same user not shown) | |||
Line 42: | Line 42: | ||
=== Debian === |
=== Debian === |
||
I had indexing issues (deadlocks) with the version in Debian Buster (1.24.3). |
|||
<source lang="bash"> |
<source lang="bash"> |
||
# Get signature key |
# Get signature key |
||
gpg --keyserver |
gpg --keyserver pool.sks-keyservers.net --recv-key F8E3347256922A8AE767605B7808CE96D38B9201 |
||
gpg --export '7808CE96D38B9201' | sudo apt-key add - |
gpg --export '7808CE96D38B9201' | sudo apt-key add - |
||
# Old key: |
|||
# gpg --keyserver hkp://pool.sks-keyservers.net:80 --recv-key 7808CE96D38B9201 |
|||
# gpg --export '7808CE96D38B9201' | sudo apt-key add - |
|||
# Add repository |
# Add repository |
||
cat << EOF | sudo tee /etc/apt/sources.list.d/recoll.list |
cat << EOF | sudo tee /etc/apt/sources.list.d/recoll.list |
Latest revision as of 14:15, 21 March 2021
Recoll is a very very fast and lightweight file indexer.
Overview
Favorite of TuxRadar.com's review:
- Index can be build manually.
- Additional filters can be installed to support additional file types, support multiple indexes.
Personal review:
- (+) Quite fast query results
- (+) Can index IMAP Maildir directly, even attachments!
- (+) also return files matching similar keywords (stemming).
- (-) Indexing hangs on some .zip files (with 5G+ memory, and lot of swapping). Removed *.jar, *.zip, *.tgz, *.gz, *.tar from list of files to index.
- (-) Relevance low. Aggravated with the point above. Files with search keywords in the filename, or title, or at beginning of the file are not returned first. Stemming makes it even worse.
- Fix: From the documentation, stemming can be disabled by keyword if they are capitalized (like floor will find flooring, floored, but Floor will only return floor). Stemming can also be disabled from the menu.
- (-) No way to narrow down the results by PDF only or DOC only (both is considered text).
- Fix: Make a new query and add
ext:pdf
to restrict to PDF's.
- Fix: Make a new query and add
Install
Ubuntu
The version in Ubuntu universe repository is an old version. Install back-port repositories from launchpad.net as follows::
# This is not necessary anymore...
# gpg --keyserver keyserver.ubuntu.com --recv 9DA85604
# gpg --export --armor 9DA85604 | sudo apt-key add -
# gpg --keyserver keyserver.ubuntu.com --recv A0735AD0
# gpg --export --armor A0735AD0 | sudo apt-key add -
sudo add-apt-repository ppa:xapian-backports/ppa
# Old repository
# sudo add-apt-repository ppa:recoll-backports/ppa
# New repository
sudo add-apt-repository ppa:recoll-backports/recoll-1.15-on
sudo apt-get update
sudo apt-get install recoll
# Install recommended packages
sudo apt-get install antiword catdoc ghostscript libimage-exiftool-perl poppler-utils \
unrtf python-mutagen xsltproc untex pstotext python-chm
Debian
I had indexing issues (deadlocks) with the version in Debian Buster (1.24.3).
# Get signature key
gpg --keyserver pool.sks-keyservers.net --recv-key F8E3347256922A8AE767605B7808CE96D38B9201
gpg --export '7808CE96D38B9201' | sudo apt-key add -
# Old key:
# gpg --keyserver hkp://pool.sks-keyservers.net:80 --recv-key 7808CE96D38B9201
# gpg --export '7808CE96D38B9201' | sudo apt-key add -
# Add repository
cat << EOF | sudo tee /etc/apt/sources.list.d/recoll.list
deb https://www.lesbonscomptes.com/recoll/debian/ buster main
deb-src https://www.lesbonscomptes.com/recoll/debian/ buster main
EOF
# Preferences
cat << EOF | sudo tee /etc/apt/preferences.d/recoll
Package: *
Pin: origin "www.lesbonscomptes.com"
Pin-Priority: 350
EOF
# Update and install
sudo apt update
sudo apt install recoll
# Install recommended packages, for better indexing
# These were already installed: antiword ghostscript libimage-exiftool-perl poppler-utils unrtf pstotext
sudo apt install catdoc python-chm python-mutagen untex xsltproc
# Install packages suggested by 'recoll'
sudo apt install xapian-tools python3-genshi python-recoll python3-recoll untex wv
Usage
Some examples of queries (see [1]):
"foo bar" foo bar ext:pdf oracle filename:CRYPTO oracle filename:*CRYPTO* # idem filename:photo size>1M author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes dir:recoll dir:src -dir:utils -dir:common dir:recoll OR dir:src