Recoll

From miki
Jump to navigation Jump to search

Recoll is a very very fast and lightweight file indexer.

Overview

Favorite of TuxRadar.com's review:

  • Index can be build manually.
  • Additional filters can be installed to support additional file types, support multiple indexes.

Personal review:

  • (+) Quite fast query results
  • (+) Can index IMAP Maildir directly, even attachments!
  • (+) also return files matching similar keywords (stemming).
  • (-) Indexing hangs on some .zip files (with 5G+ memory, and lot of swapping). Removed *.jar, *.zip, *.tgz, *.gz, *.tar from list of files to index.
  • (-) Relevance low. Aggravated with the point above. Files with search keywords in the filename, or title, or at beginning of the file are not returned first. Stemming makes it even worse.
    • Fix: From the documentation, stemming can be disabled by keyword if they are capitalized (like floor will find flooring, floored, but Floor will only return floor). Stemming can also be disabled from the menu.
  • (-) No way to narrow down the results by PDF only or DOC only (both is considered text).
    • Fix: Make a new query and add ext:pdf to restrict to PDF's.

Install

Ubuntu

The version in Ubuntu universe repository is an old version. Install back-port repositories from launchpad.net as follows::

# This is not necessary anymore...
# gpg --keyserver keyserver.ubuntu.com --recv 9DA85604
# gpg --export --armor 9DA85604 | sudo apt-key add -
# gpg --keyserver keyserver.ubuntu.com --recv A0735AD0
# gpg --export --armor  A0735AD0 | sudo apt-key add -

sudo add-apt-repository ppa:xapian-backports/ppa

# Old repository
# sudo add-apt-repository ppa:recoll-backports/ppa

# New repository
sudo add-apt-repository ppa:recoll-backports/recoll-1.15-on

sudo apt-get update
sudo apt-get install recoll

# Install recommended packages
sudo apt-get install antiword catdoc ghostscript libimage-exiftool-perl poppler-utils \
                     unrtf python-mutagen xsltproc untex pstotext python-chm

Debian

I had indexing issues (deadlocks) with the version in Debian Buster (1.24.3).

# Get signature key
gpg --keyserver pool.sks-keyservers.net --recv-key F8E3347256922A8AE767605B7808CE96D38B9201
gpg --export '7808CE96D38B9201' | sudo apt-key add -
# Old key:
# gpg --keyserver hkp://pool.sks-keyservers.net:80 --recv-key 7808CE96D38B9201
# gpg --export '7808CE96D38B9201' | sudo apt-key add -
# Add repository
cat << EOF | sudo tee /etc/apt/sources.list.d/recoll.list
deb https://www.lesbonscomptes.com/recoll/debian/ buster main
deb-src https://www.lesbonscomptes.com/recoll/debian/ buster main
EOF
# Preferences
cat << EOF | sudo tee /etc/apt/preferences.d/recoll
Package: *
Pin: origin "www.lesbonscomptes.com"
Pin-Priority: 350
EOF
# Update and install
sudo apt update
sudo apt install recoll

# Install recommended packages, for better indexing
# These were already installed: antiword ghostscript libimage-exiftool-perl poppler-utils unrtf pstotext
sudo apt install catdoc python-chm python-mutagen untex xsltproc

# Install packages suggested by 'recoll'
sudo apt install xapian-tools python3-genshi python-recoll python3-recoll untex wv

Usage

Some examples of queries (see [1]):

"foo bar"
foo bar ext:pdf
oracle filename:CRYPTO
oracle filename:*CRYPTO*                # idem
filename:photo size>1M
author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
dir:recoll dir:src -dir:utils -dir:common
dir:recoll OR dir:src