New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of mailbag.csv #5
Comments
Is there a way in the mailbag.csv file or elsewhere to indicate a one-to-many relationship among derivatives, for example if there is a single or a few pst files that are converted into eml? |
Thank you for your comment. Currently no, and the challenge of documenting this type of relationship is one of the main reasons the Advisory board was hesitant about including multiple email accounts per #8. Though multiple PSTs would not necessarily mean multiple accounts so we definitely need to discuss this more. I could see multiple exports from the same account over time being a common use case. |
Currently Office 365's email export tool cuts pst files off around 10GB and while that's liable to change over time, in my experience our recent email account exports have been 1-3 pst files and will continue to grow. Allowing for email accounts that comprise multiple psts will make the specification more widely applicable and scalable, whether it's in mailbag.csv or the subfolder structure or up to the user to document elsewhere. |
As of version 0.3, the specification now supports multiple email accounts, including multiple PST files. Thanks for your feedback! I'll close this, but feel free to reopen if the changes don't address your use case. |
The choice of using a CSV tag file to serialize message-level information was also questioned during the working meeting. CSVs can create the potential for error since they can be written using a variety of different delimiters and dialects. Large numbers of rows may also create issues, as different tools have limits, often around 1 million rows. We had some useful discussions about using JSON or another serialization that did not have these issues, but concluded that CSVs were more useful for the Nicholas Garza, Teresa Burns, and Gary Richardson personas, since they are likely to be more comfortable opening and reading a CSV file using spreadsheet software than a JSON file. A suggestion from the Working Meeting was to break up the CSV into multiple files after a certain number of rows, much like WARC files, so we decided to split the file after 100,000 rows.
We also discussed how the specification’s requirement of a separate mailbag.csv tag file is one of the few major costs in meeting the specification over a generic Bagit bag. In reconsidering this, we realized that the reason this CSV file was required was that it pointed to where messages were within the payload directory and also acted as a lookup between the Message-ID and filename-safe Mailbag-Message-ID fields. We had originally required message header information in the mailbag.csv as well but we’ve decided that this should be optional. Feedback from the working meeting also suggested including a column for attachments, so we added an integer field for the number of attachments.
The text was updated successfully, but these errors were encountered: