About Metadata


What is metadata?

Metadata is "data about data". For example when you take a photograph using a phone or a digital camera the make and model of the camera, the date and time the photo was taken, the exposure time, etc. will normally be stored within the JPG file in a format known as EXIF data. When you view the photo on your phone or computer you would normally initially see the photo itself but (depending what app you are using to view it) you will be able to select a "file information" or "properties" option which will display that metadata.
EXIF data is only one example of metadata. Other file types may have metadata embedded in other forms. For example Word documents have metadata which shows the date and time when the document was created, the date and time when it was last changed, and the identity of the user who made the change.
Exactly what counts as metadata for any particular kind of file is not rigidly defined because metadata is essentially data which is not included when the file is printed out or when an image copy (e.g. a PDF copy) is made from it, but what data is printed out (or included in an PDF copy) depends on the app used to do the printing or image copying and so may vary, to some degree, depending on the app used and/or the print/copy options selected. For example, you may have noticed that if you print out an email the format of the printout varies depending on what app you use to print it - attachments may be listed at the top above the text, or at the bottom below the text, and the size of attachments may or may not be stated. The date of the email may always be given in day-month-year format or may be stated to be just "Today", "Yesterday", "Monday" etc. and only stated in day-month-year format if over seven days old. The time may be stated just in hours and minutes, or the seconds may be included as well. A time zone may or may not be stated. These variations exists because the defined format of emails (MIME) does not define how the data stored in MIME format is to be presented so this will vary between apps. The sender, recipient(s) and date/time of an email is normally thought of as "metadata" but it could be argued that since such obviously important data (including at least the hours and minutes of the time, if not always the seconds) are invariably included when any email is printed out, whatever the app used (albeit with some variation of format) that such items are not strictly metadata (if metadata is thought of as data which is not ordinarily included in a printout).
Although there is some degree of imprecision about what counts as metadata for any given file type, in the legal context the word is used to emphasise the point that when parties to a legal dispute are ordered to disclose certain documents, or categories of documents, to the other side, by providing copies, the copy to be provided to other side should normally be a "native copy" - i.e. a copy of the data in the document in its original form which includes all embedded metadata - and not simply a "printed" image which will not show the embedded metadata.

Embedded metadata and file system metadata

The types of metadata so far referred to are embedded data specific to different types of file - EXIF data would be used in JPG photos, MIME data in emails, and Word documents use a different format for embedded metadata. Some files - for example a simple TXT file -  contain no embedded file-type specific metadata. But all files of whatever type will have a filename and "date created", "date last modified" and "date last accessed" metadata which are maintained automatically by the file system of the device they are on, in a directory file, rather than being embedded as part of each the file's contents. 

File system metadata

Files on a computer or other device have a filename and have a "date created" and a "date modified" which are maintained by the file system (although referred to as date these items of metadata include time as well as date). The rationale for these two dates is that "date modified" is the date when the data in the file was last updated. The "date created" is the date that a copy of the file was created on a particular disk (drive/card). So, for example, if you create a spreadsheet on your computer drive on 1 July it will initially have a "date created" of 1 July and a "date modified" of 1 July. If you update the spreadsheet file on 6 July then the "date created" will still be 1 July but the "date modified" will be 6 July. If you then copy that file to a USB drive on 20 July, the file on the USB drive should have a "date modified" of 6 July and a "date created" of 20 July. If, instead of copying the file, it is moved, then the file on the original disk will disappear and the file on the target disk will have the same "date created" (1 July) and "date modified" (6 July). Because the filename is part of the file system directory, not part of the data in the file itself, changing the filename will not cause the "date modified" or "date created" to change. 
That is how it is supposed to work but whether the theory is achieved in practice depends on what you are using to do the copy or move operation. Some systems faithfully preserve the "date modified" when copying or moving by giving the copy the same "date modified" value as the original but there is nothing to stop an app being designed to write out the file copy as a completely new file - containing exactly the same contents, and having the same filename, as the original but with a "date modified" and "date created" which are both the current date/time (i.e. the date/time that the copy operation is carried out - 20 July in the above example).
As well as "date created" and "date modified" metadata most file systems maintain "last accessed" date/time metadata for each file. As the name suggests, events which will cause the "last accessed" date/time to be updated include not only events which change the data in the file but also many events which simply read the data. But in order to be of value a file system has to be selective about what is to count as an "access" - if simply displaying the "last access" date/time was itself considered to be an "access" then the "last accessed" date/time when displayed would always just show the current date and time!   
Published international standards used for the transmission of files from one device to another (e.g. over the internet) typically include fields for filename, "date created", "date modified" and "last accessed" but even apps which faithfully fill in these fields when sending (not all do) will often ignore the "date created", "date modified" and "last accessed" fields when receiving and simply save the files on the target system with all three fields set to the current date/time. It varies between apps but there appears to be a general trend for apps not to "trust" the "date modified" value received, perhaps because the app does not know its origin. The same concern might also apply to the "date created" and "last accessed" values but there is a further dilemma with regard to those in that (unlike with a copy or move operation from one disk to another on a single device) the receiving system has no way of knowing whether the ultimate intention was a move or a copy. The Outlook app used on Windows systems, for example, does recreates neither the "date created", "date modified" no "last accessed" date/times when files attached to emails are saved, simply setting these values to the current date/time - though it does, by default, save using the original filename. Similarly not only does the built-in ZIP facility in Windows not capture the "date created" and "last accessed" but the built-in UNZIP facility in Windows ignores the "date modified", as well as the "date created" and "last accessed" when UNZIPing files, though again it does save the UNZIPed files using their original filenames (only the filename itself, not a full pathname). Many cloud storage providers do not preserve the file system "date modified" (nor "date created" or "last accessed") when files are loaded to the cloud though a few may preserve "date modified".
Virtually all file systems organise files in a hierarchy of folders and subfolders so, for each file on the the file system there is, as well as the filename itself - e.g. IMG_20200731_124319.jpg   also a full path name - e.g. C:\Users\Smith\Pictures\Landscaping the garden\IMG_20200731_124319.jpg  Generally it will only be the filename, not the full file path name, which is captured in a ZIP file.  

Preserving copies of documents if there are (or may be) legal proceedings   

If there are, or may be, legal proceedings, it is important to save copies of all files which may possibly be relevant, preserving both the file contents and metadata. Preserving the file contents, including embedded metadata, should present little difficulty but preserving all file system metadata can present some challenges if the methods available are themselves liable to change some metadata. For example if you ZIP a file (i.e. make a compressed copy of one or more files in ZIP format) most file systems will count the process of the ZIP app opening and reading the data in the files when making the compressed ZIP file as an "access" and consequently the "last accessed" date/time (of each source file itself and the compressed copy of it in the ZIP file) will be changed from whatever it was before to the date and time the ZIP operation was carried out.

The above is an example where the a process intended to preserve data in a copy has actually changed some particular data item in the file system metadata of the source file. Sometimes the process of saving a copy changes some item of file system metadata in the copy but not in the source. For example you may connect a phone to a Windows computer using a cable so that the files on the phone (e.g. JPG photos) can be accessed using Windows File Explorer (in which the phone will typically be displayed as an additional disk drive). The ZIP app on the Windows computer might have the ability to ZIP directly from the phone's drive but if, as is often the case, the ZIP app only works on the Windows computer's own drives, and not directly on the phone drive, it will be necessary, in order to create a ZIP file containing files from the phone, to first use Windows Explorer to create copies of the phone's files on one of the Windows computers own drives and, as explained above, a copy operation, whilst preserving "date modified" file system metadata, will cause the "date created" metadata to be updated in the copy. So a ZIP file made from files from a cable-connected phone in these circumstances would not contain the original file system "date created" metadata but the "date created" value in the source files on the phone would not itself be changed by the operation and could (if necessary in future) be displayed by connecting it again to a Windows computer and using the computer's display facilities such as Windows Explorer.


Although preserving as much file system metadata as possible can require some care, it is important to do for a number of reasons such as:

  • Although most file types have embedded metadata equivalent to "date created" and "date modified" not all do.

  • Although some file managers (such as Windows Explorer) will show the embedded metadata equivalent of "date modified" when listing certain file types, none will do so for all file types. The file system "date modified", on the other hand, can easily be listed for all files irrespective of type.

  • Document management systems such as Bundledocs generally use the file system "date modified" to initially set the document date when a file is loaded. For certain file types they might use embedded metadata (Bundledocs does this for emails, for example) but for most file types if any metadata is used it is likely to be the file system metadata.

  • The filename chosen by the creator of a computer file can shed light on what the data in a file was intended to refer to.

  • If a JPG photo is sent as an email or text attachment its embedded metadata may be stripped off. If no original with intact embedded metadata is available the filename may shed light on when it was taken if the filenames used by the phone or camera it was taken on include the date and time taken, as many do.       


Different courts and tribunals have different rules about the disclosure of documents including metadata, and, to a degree, a particular court/tribunal might give different directions depending on the type of case and the potential significance of particular data. But when copies are being taken at the earliest stage - i.e. when the possibility of legal proceedings arises even before they are actually commenced - the usual approach is to

  • Take copies in ZIP format in which the original file contents (including embedded metadata), original filename, and original file system "date modified" is preserved for each file - this should always be possible

  • Include in the ZIP file the original file system "date created" if this is possible, as it normally will be for files on a computer but perhaps not for files on all portable devices

  • Retain the portable devices - this should be done as a matter of course anyway but is of additional importance because even if the ZIP file captures the original "date created" as well as the "date modified" and filename, it will not capture the full file pathname.   


Disclosing Native Copies of documents

At the Disclosure of Documents stage of litigation it is usual for copies of documents to be provided in "native format". This means that the data in the file copies provided must be identical to, and in the same format as, the data in the original file, including embedded metadata. It does not necessarily mean that the file names of the copies provided are the original file names (typically each file copy will be named to provide a description of the file and a document number which relates to a number on the accompanying disclosure list) still less the original pathname. Also the "date created", date modified" and "date accessed" will be the date/time that the copies are downloaded from whatever system (e.g. Bundledocs) you are using to manage the Disclosure of Document process. Because of these limitations on the routine provision of native copies it is generally accepted that the recipient can, if they wish, call for a copy of a specific document with its original filename - this might be done, for example, if a JPG photo is provided which has no EXIF data because taken on a very old digital camera - the file name might provide a clue as to the date/time taken. In this situation - where a copy of a file with its original filename is sought - you can unZIP the relevant ZIP file you made earlier and provide a copy of the file which has the original filename. In some cases it might be that the recipient wants to have the full file pathname - e.g. to see how files were grouped together - and in that case it may be necessary to go back to the original device for the information.         

Disclaimer

The information on this page about specific computer techniques is provided for information purposes only. Every reasonable effort has been made to ensure that the information is accurate and up to date at the time it was written but no responsibility for its accuracy, or for any consequences of relying on it, is assumed by me. You should satisfy yourself, before using any of the techniques, software or services described, that the techniques are appropriate for your purposes and that the software or service is reliable.

This page was lasted updated in August 2020. Disclaimer