A quick and scrappy CLI tool for automating archival of historical documents.
robo_archiver
bulk processes periodical issues using their file name and MARC record into the CSV structure used by the Arizona Memory Project.
Apparently, a lot of archival is done by hand, and many of the fields are somewhat redundant in-context. The aim of this program is to streamline the process for archivists.
The program does not currently attempt to extract information (such as volume no., issue no., publisher, languages, or contributors) from the contents of the file.
This could maybe be accomplished with help from a pdf-to-text utility and an AI API call or two, but for now I reckon the time to review outweighs the cost of entering that information manaully.
Run the application with -h
or --help
for a full list of commands.
All files for the same periodical should begin with the same name and end with their date in partial yyyy-mm-dd
format.
Example files
An_Arizona_Desert-ation_1967.pdf
An_Arizona_Desert-ation_1967-04.pdf
An_Arizona_Desert-ation_1967-06-20.pdf
Marc data and call number are obtained from the asla catalogue and pasted when prompted.
Example MARC
Tag Ind. Subfields
001 ocn893691141
003 OCoLC
005 20141024031649.0
008 141024u19uuuuuuazumr 0 0eng d
035 $a(Sirsi) o893691141
035 $a(OCoLC)893691141
040 $aAZP$cAZP
049 $aAZPF
245 03 $aAn Arizona desert-ation.
246 13 $aArizona desertation.
246 13 $aArizona desert ation.
246 13 $aArizona dissertation.
260 $aPhoenix, Ariz. :$bDesert Sunshine Exposure Tests.
300 $billustrations ;$c28 cm.
336 $atext$btxt$2rdacontent
337 $aunmediated$bn$2rdamedia
338 $avolume$bnc$2rdacarrier
500 $a"C.R. Caryl, director."
588 $aDescription based on: April 1967 ; title from caption
610 20 $aDesert Sunshine Exposure Tests (Phoenix, Ariz.)
650 0 $aSolar radiation$xEnvironmental effects$xTesting.
650 0 $aMaterials$xTesting.
700 1 $aCaryl, C. R.
710 2 $aDesert Sunshine Exposure Tests (Phoenix, Ariz.)
On MacOS you may need to run xcode-select --install
to be able to compile macros.
- clone the repository
- run
cargo build --release
Add <path_to_clone_directory>\target\release
to your PATH, or alias it in your powershell $PROFILE
.
Create an alias in .zshrc
:
alias robo_archiver=<path_to_clone_directory>\target\release\robo_archiver
Alternatively, you may download and extract the latest release, store it somewhere safe, and alias its path instead.