Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will mailbag use EA-PDF? #8

Closed
gwiedeman opened this issue Jun 29, 2021 · 0 comments
Closed

Will mailbag use EA-PDF? #8

gwiedeman opened this issue Jun 29, 2021 · 0 comments
Labels
Derivatives Writing data to derivative formats, like PDFs or WARCs

Comments

@gwiedeman
Copy link
Collaborator

The Mailbag specification will be agnostic on which type of PDF is included. It merely requires implementations to document the type of PDF, the tool used to generate it, and the version of the tool.

However, the Mailbag Python tool will generate PDF files. Two principles that the Mailbag project has thought a lot about are parsimony and maintenance. We wish to create a specification and tool that is as simple as possible, which we feel is very much in the spirit of Bagit. To that end, we must rely on dependencies that are wildly implemented and thus likely to be sustainable. We haven’t completed our examination of possible dependencies, but it does not look like there is an implementation yet to generate EA-PDF documents using Python. If that changes we would look into compliance with EA-PDF, and pursue it if it would not add much complexity or maintenance risk to the tool. In the interest of sustainability, we hope we can make dependencies such as the ones used to create PDFs and WARCs modular, so it would be easy to change the tools we use over time.

However, we also feel that one of the major affordances of mailbags is that they would maintain email in both structured formats as data, as well as the form of PDF documents. Thus, it might be easier and more sustainable for downstream implementations working with mailbags in the future to parse email headers from a MBOX or EML file than a EA-PDF document, which is likely to be more challenging to work with.

@gwiedeman gwiedeman added the Derivatives Writing data to derivative formats, like PDFs or WARCs label Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Derivatives Writing data to derivative formats, like PDFs or WARCs
Projects
None yet
Development

No branches or pull requests

1 participant