Skip to content

steffenfritz/FileTrove

Repository files navigation

Build Status License: AGPL v3 Go Reference OpenSSF Scorecard OpenSSF Best Practices

VERSION: v1.0.0-BETA.1

NOTE: As BETA.1 introduced YARA-X and builds are not yet automated you have to use release v1.0.0-BETA-16 (without YARA support) or build it by yourself, see below for instructions.

About

FileTrove indexes files and creates metadata from them.

The single binary application walks a directory tree and identifies all regular files by type with siegfried, giving you the

  • MIME type
  • PRONOM identifier
  • Format version
  • Identification proof and note
  • filename extension

os.Stat() is giving you the

  • File size

  • File creation time

  • File modification time

  • File access time

  • and the same for directories

Furthermore it creates and calculates

For this check a 4.0GB BoltDB is needed and can be downloaded with FileTrove during the installation.

You can also create your own database for the NSRL check. You just need a text file with SHA1 hashes, one per line and the tool admftrove from this repository. With this tool you can also add your own hashes to an existing database.

All results are written into a SQLite database and can be exported to TSV files.

How to install

  1. Download a release from https://github.com/steffenfritz/FileTrove/releases or compile from source (using task build in cmd/ftrove (https://taskfile.dev)).

  2. Copy the file where you want to install ftrove (the downloaded file has a suffix, omitted in the following documentation)

  3. Run ./ftrove --install . (Mind the period)

    a) If you don't have already a NSRL database, you have to download it. Please be patient.

    b) If you have a NSRL database copy/move it do the "db" directory that ftrove just created.

  4. You are ready to go!

A word on YARA

The YARA module needs a C library that is not part of FileTrove and is not yet installed during installation. It has to be installed or build for your platform. More information can be found here: https://virustotal.github.io/yara-x/docs/api/c/c-/#building-the-c-library

A YARA example rule file can be found in the testdata/yara directory in this repository.

If a rule matches on a file the rule name, the session UUID and the file UUID is written into the table yara.

The YARA rule file itself is not stored in FileTrove's database.

To compile FileTrove with YARA-X support

  1. Install Golang: https://go.dev/doc/install
  2. Install Task build tool: https://taskfile.dev
  3. Install the YARA-X C library: https://virustotal.github.io/yara-x/docs/api/c/c-/#building-the-c-library
  4. Checkout this repo into your go workspace (e.g. /home/user/go/src): git clone https://github.com/steffenfritz/FileTrove.git
  5. Change into directory: e.g. cd /home/user/go/src/steffenfritz/FileTrove/cmd/ftrove
  6. Start build: task build

How to run

./ftrove -h gives you all flags ftrove understands.

A run only with necessary flags looks like this:

./ftrove -i $DIRECTORY

where $DIRECTORY is a directory you want to use as a starting point. FileTrove will walk this directory recursively down.

How to see the results

You can export the results via ./ftrove -t $UUID where $UUID is the session id. Every indexing run gets its own session id. You get a list of all sessions using ./ftrove -l.

Example:

  1. ./ftrove -l
  2. ./ftrove -t 926be141-ab75-4106-8236-34edfcf102f2

This will create several TSV files that can be read with Excel, Numbers and your preferred text editor.

You can also work with SQL on the database, using sqlite on the console or a GUI like sqlitebrowser (https://sqlitebrowser.org/). Sqliteviz is also a neat tool to visualize the data (https://sqliteviz.com/app/#/).

Background

FileTrove is the successor of filedriller and based on my iPres 2021 paper Marrying siegfried and the National Software Reference Library