New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of original filenames #4
Comments
We've found that any type of file system metadata that we would like to retain, including file names is far easier to preserve when copied to a separate text document like the proposed original_filenames.txt file. Is the original-filenames.txt in the proposed spec yet? I'm curious whether it would be better to have a complete listing of all attachments, with option original-filenames, like an attachments.csv file with columns for mailbag-message-id, original-filename, mailbag-attachment-id. |
Thanks for the feedback! I finally got to ask the advisory board about this and while we think keeping original attachment filenames when possible is important to our user personas, we agree that the current We're now planning to replace |
If possible, it might be nice to generalize the columns to something like "original-attachment" and "packaged-attachment" or similar, as that might allow for migrated attachments during packaging as per this comment. |
addressed in the mailbagit tool by UAlbanyArchives/mailbagit#187. |
In drafting the specification, we discovered multiple places where there is the potential for cross-platform filename issues. This is one downside to relying on filesystems for structure, as not all strings are valid file or directory names.
During the packaging of a mailbag, all messages are assigned a new Mailbag-Message-ID that must be filename-safe. These IDs can be UUIDs or merely be sequential numbers, and are used as filenames for new derivative files, such as when PDFs are created from an MBOX file. Unfortunately, we don’t feel that we can use the Message-ID field that’s usually included in emails as it may not be filesystem-safe.
The issue is that users may be packaging mailbags from EML files or even legacy use cases with PDF files and these files already have filenames for individual messages. While they should be coming from the same filesystem and thus be safe to use for derivative files as well, comments from the working meeting suggested that it may be simpler to even use Mailbag-Message-ID in these cases. Interestingly, commenters suggested that even filenames for EML files were not originally created by the user sending the messages. Still, we think Nicholas Garza and Gary Richardon in particular would be alarmed if these filenames were overridden during the creation of a mailbag. So we plan to keep the original filenames whenever possible.
Perhaps a bigger issue is for attachments. We plan to keep the original filenames for attachments, but files embedded within MBOX or EML files may not have been created in the same filesystem being used to package a mailbag. We still think it’s important to keep the original names here, but for cases when an attachment filename is invalid, it is now required to renamed the file using the Mailbag-Message-ID and document changes in an original_filenames.txt file.
The text was updated successfully, but these errors were encountered: