Advanced topics
Paperless offers a couple features that automate certain tasks and make your life easier.
Hooking into the consumption process
Sometimes you may want to do something arbitrary whenever a document is consumed. Rather than try to predict what you may want to do, Paperless lets you execute scripts of your own choosing just before or after a document is consumed using a couple simple hooks.
Just write a script, put it somewhere that Paperless can read & execute, and
then put the path to that script in paperless.conf
or docker-compose.env
with the variable name
of either PAPERLESS_PRE_CONSUME_SCRIPT
or
PAPERLESS_POST_CONSUME_SCRIPT
.
Important
These scripts are executed in a blocking process, which means that if a script takes a long time to run, it can significantly slow down your document consumption flow. If you want things to run asynchronously, you’ll have to fork the process in your script and exit.
Pre-consumption script
Executed after the consumer sees a new document in the consumption folder, but before any processing of the document is performed. This script can access the following relevant environment variables set:
DOCUMENT_SOURCE_PATH
A simple but common example for this would be creating a simple script like this:
/usr/local/bin/ocr-pdf
#!/usr/bin/env bash
pdf2pdfocr.py -i ${DOCUMENT_SOURCE_PATH}
/etc/paperless.conf
...
PAPERLESS_PRE_CONSUME_SCRIPT="/usr/local/bin/ocr-pdf"
...
This will pass the path to the document about to be consumed to /usr/local/bin/ocr-pdf
,
which will in turn call pdf2pdfocr.py on your document, which will then
overwrite the file with an OCR’d version of the file and exit. At which point,
the consumption process will begin with the newly modified file.
Post-consumption script
Executed after the consumer has successfully processed a document and has moved it into paperless. It receives the following environment variables:
DOCUMENT_ID
DOCUMENT_FILE_NAME
DOCUMENT_CREATED
DOCUMENT_MODIFIED
DOCUMENT_ADDED
DOCUMENT_SOURCE_PATH
DOCUMENT_ARCHIVE_PATH
DOCUMENT_THUMBNAIL_PATH
DOCUMENT_DOWNLOAD_URL
DOCUMENT_THUMBNAIL_URL
DOCUMENT_CORRESPONDENT
DOCUMENT_TAGS
The script can be in any language, but for a simple shell script example, you can take a look at post-consumption-example.sh in this project.
The post consumption script cannot cancel the consumption process.
Docker
Assumed you have /home/foo/paperless-ngx/scripts/post-consumption-example.sh
.
You can pass that script into the consumer container via a host mount in your docker-compose.yml
.
...
consumer:
...
volumes:
...
- /home/paperless-ngx/scripts:/path/in/container/scripts/
...
Example (docker-compose.yml): - /home/foo/paperless-ngx/scripts:/usr/src/paperless/scripts
which in turn requires the variable PAPERLESS_POST_CONSUME_SCRIPT
in docker-compose.env
to point to /path/in/container/scripts/post-consumption-example.sh
.
Example (docker-compose.env): PAPERLESS_POST_CONSUME_SCRIPT=/usr/src/paperless/scripts/post-consumption-example.sh
Troubleshooting:
Monitor the docker-compose log
cd ~/paperless-ngx; docker-compose logs -f
Check your script’s permission e.g. in case of permission error
sudo chmod 755 post-consumption-example.sh
Pipe your scripts’s output to a log file e.g.
echo "${DOCUMENT_ID}" | tee --append /usr/src/paperless/scripts/post-consumption-example.log
File name handling
By default, paperless stores your documents in the media directory and renames them
using the identifier which it has assigned to each document. You will end up getting
files like 0000123.pdf
in your media directory. This isn’t necessarily a bad
thing, because you normally don’t have to access these files manually. However, if
you wish to name your files differently, you can do that by adjusting the
PAPERLESS_FILENAME_FORMAT
configuration option.
This variable allows you to configure the filename (folders are allowed) using placeholders. For example, configuring this to
PAPERLESS_FILENAME_FORMAT={created_year}/{correspondent}/{title}
will create a directory structure as follows:
2019/
My bank/
Statement January.pdf
Statement February.pdf
2020/
My bank/
Statement January.pdf
Letter.pdf
Letter_01.pdf
Shoe store/
My new shoes.pdf
Danger
Do not manually move your files in the media folder. Paperless remembers the last filename a document was stored as. If you do rename a file, paperless will report your files as missing and won’t be able to find them.
Paperless provides the following placeholders within filenames:
{asn}
: The archive serial number of the document, or “none”.{correspondent}
: The name of the correspondent, or “none”.{document_type}
: The name of the document type, or “none”.{tag_list}
: A comma separated list of all tags assigned to the document.{title}
: The title of the document.{created}
: The full date (ISO format) the document was created.{created_year}
: Year created only.{created_month}
: Month created only (number 01-12).{created_day}
: Day created only (number 01-31).{added}
: The full date (ISO format) the document was added to paperless.{added_year}
: Year added only.{added_month}
: Month added only (number 01-12).{added_day}
: Day added only (number 01-31).
Paperless will try to conserve the information from your database as much as possible.
However, some characters that you can use in document titles and correspondent names (such
as : \ /
and a couple more) are not allowed in filenames and will be replaced with dashes.
If paperless detects that two documents share the same filename, paperless will automatically
append _01
, _02
, etc to the filename. This happens if all the placeholders in a filename
evaluate to the same value.
Hint
You can affect how empty placeholders are treated by changing the following setting to true.
PAPERLESS_FILENAME_FORMAT_REMOVE_NONE=True
Doing this results in all empty placeholders resolving to “” instead of “none” as stated above. Spaces before empty placeholders are removed as well, empty directories are omitted.
Hint
Paperless checks the filename of a document whenever it is saved. Therefore, you need to update the filenames of your documents and move them after altering this setting by invoking the document renamer.
Warning
Make absolutely sure you get the spelling of the placeholders right, or else paperless will use the default naming scheme instead.
Caution
As of now, you could totally tell paperless to store your files anywhere outside the media directory by setting
PAPERLESS_FILENAME_FORMAT=../../my/custom/location/{title}
However, keep in mind that inside docker, if files get stored outside of the predefined volumes, they will be lost after a restart of paperless.
Storage paths
One of the best things in Paperless is that you can not only access the documents via the web interface, but also via the file system.
When as single storage layout is not sufficient for your use case, storage paths come to the rescue. Storage paths allow you to configure more precisely where each document is stored in the file system.
Each storage path is a PAPERLESS_FILENAME_FORMAT and follows the rules described above
Each document is assigned a storage path using the matching algorithms described above, but can be overwritten at any time
For example, you could define the following two storage paths:
Normal communications are put into a folder structure sorted by year/correspondent
Communications with insurance companies are stored in a flat structure with longer file names, but containing the full date of the correspondence.
By Year = {created_year}/{correspondent}/{title}
Insurances = Insurances/{correspondent}/{created_year}-{created_month}-{created_day} {title}
If you then map these storage paths to the documents, you might get the following result. For simplicity, By Year defines the same structure as in the previous example above.
2019/ # By Year
My bank/
Statement January.pdf
Statement February.pdf
Insurances/ # Insurances
Healthcare 123/
2022-01-01 Statement January.pdf
2022-02-02 Letter.pdf
2022-02-03 Letter.pdf
Dental 456/
2021-12-01 New Conditions.pdf
Hint
Defining a storage path is optional. If no storage path is defined for a document, the global PAPERLESS_FILENAME_FORMAT is applied.
Caution
If you adjust the format of an existing storage path, old documents don’t get relocated automatically. You need to run the document renamer to adjust their pathes.