About Metadata

What is metadata?

Metadata means "data about data". For example the date and time that an email message was sent, who it was sent to, and who sent it, are metadata providing a context for, and information about, the text of the message.

Some metadata is readily apparent. The date and time that an email message was sent, who it was sent to, and who sent it, for example, will be shown whenever the email is displayed or printed. Some metadata, for example the last printed date of a Word document, is "hidden" in the sense that it is less well known, only displayed if you tap on certain options, and not so easy to print.

An example in more detail - EXIF metadata

JPG photos also have hidden data known as EXIF (“Exchangeable Image File Format”) data. This includes items such as the exposure time, focal length, and  ISO number, as well as, of course, the date and time that the image was taken.  Regarding the date and time of the image, there are, as well as DateTime, two other EXIF date/time fields: DateTimeOriginal and DateTimeDigitized.

As explained in the 2010 report by the Metadata Working Group DateTimeOriginal is the date/time that the camera's image sensor captured the image, DateTimeDigitized is the date/time that the file, containing a digital representation of that image in JPG format, was created on the phone's storage (or SDCard in the case of a digital camera), and DateTime is the date/time that that JPG file was last updated. For modern cameras and phones the difference in time between DateTimeOriginal and DateTimeDigitized is infinitesimal and so they will always be the same down to the second and, providing the JPG has not been changed (e.g. by being rotated by a photo editor), DateTime will be the same as DateTimeOriginal and DateTimeDigitized. 

In addition to those three date/time fields there is (if geo-tagging is switched on) GPSDateStamp and GPSTimeStamp which contain the GMT date and time of the GPS satellite signal used for the last GPS fix before the photo was taken. In order to save battery life, if a phone detects that it is stationary (because two consecutive GPS fixes are the same) it will stop doing GPS fixes for a period unless and until it detects significant movement (the device can tell when it has started to move from the accelerometer). Consequently at the time a photo is taken the time of the last GPS fix may have been many seconds, or even minutes, before the time the photo was taken but GPSDateStamp/GPSTimeStamp  can, nevertheless, provide useful confirmation that the device's internal clock, which is used when writing DateTimeOriginal/DateTimeDigitized/DateTime, is not wildly wrong.

A phone or digital camera uses its internal clock when writing DateTimeOriginal, DateTimeDigitized, and DateTime. The internal clock of the device can be set to synchronise automatically - with the network signal from the nearest tower in the case of a phone or with the GPS signal in the case of a digital camera - but it can alternatively be set manually. If the internal clock has been set manually it could in theory have been set to the wrong time and/or wrong date. This is more likely to happen with a digital camera than with a phone (because an incorrect date/time on a phone would affect everything - the home screen, emails, and the date shown for SMS messages, for example - and the user would be likely to immediately notice and correct it). Unlike DateTimeOriginal/DateTimeDigitized/DateTime, the GPSDateStamp and GPSTimeStamp fields are not dependent on the device clock - they always show the GMT date/time independently provided by GPS satellites - so if GPSDateStamp/GPSTimeStamp show the same date and approximately the same time as DateTimeOriginal/DateTimeDigitized/DateTime (making allowance for time zone) that provides additional confidence that  DateTimeOriginal/DateTimeDigitized/DateTime are accurate, at least to within a few minutes.

If geo-tagging is on then additional items of GPS metadata will also be present. GPSLatitude (together with GPSLatitudeRef showing either North or South) and  GPSLongitude (together with GPSLongitudeRef showing either East or West) will show the approximate location of the camera at the time the photo was taken. The accuracy of the GPS position varies (buildings and terrain may interfere with GPS satellite signals) and there is actually an additional GPS field which gives an estimate of the degree of horizontal inaccuracy (in metres) - GPSHPositioningError

GPSImgDirection is a field which indicates the direction the camera was pointing when the photo was taken, relative to north (GPSImgDirectionRef identifies whether True or Magnetic north). For example if GPSImgDirection is 186660/2074 that is 90 degrees which is due east. Although the letters GPS appear in this tag it is usually set using data from the device's magnetometer (compass). A GPS signal only needs to be used in the calculation of GPSImgDirection if GPSImgDirectionRef is True (in which case the position of the camera or phone needs to be determined as part of calculating the magnetic declination). 

Metadata "prints" for eBundles

Civil legal proceedings usually end with a final hearing before a judge for which an eBundle (bookmarked PDF) of documents is produced to be seen by the judge and referred to by the parties and their barristers. The eBundle contains two dimensional pages - just like a book. It is possible for arrangements to be made for items of evidence which are not two dimensional pages - such as a video - to be considered if necessary but, for the most part, two dimensional pages in an eBundle are used. This means that if either party wishes to rely on "hidden" metadata they will need to take steps to make a "print" or "screenshot" image showing both data and metadata. For example when you view a JPG photo on your phone or computer you would normally initially see the photo image only, but by tapping a "file information" or "properties" option you can then see some of the metadata, including EXIF metadata. A "print" image of the photo would  normally show only the image so if a party wishes to rely on metadata (e.g. to establish when the photo was taken) they have to make a PDF "print" using a suitable viewer program to ensure that the metadata which is to be relied on, as well as the data (the image), both appear in the "print" image or screenshot which will eventually be part of the eBundle.

To allow parties to decide whether they need to rely on metadata (and if so to then produce a "print" image or screenshot showing both data and  metadata to be included in the eventual eBundle) typically the rules of civil courts and other civil tribunals require native copies of documents to be sent by each side to the other at the Disclosure of Documents stage which is normally part way through the legal proceedings. A "native copy" means an exact copy of the file in the same internal format - e.g. a JPG copy of a JPG, or a Word copy of a Word document - which will include all its metadata, or, at least, all embedded metadata

Embedded metadata and file-system metadata

The examples of metadata mentioned so far are examples of embedded metadata specific to the file type - EXIF data in JPG photos, MIME data in emails, and XML data in a Word document - but all files of whatever type will also have file-system metadata which is maintained by the file system (e.g. Windows NTFS or Android Ext4). The file-system metadata maintained for each file will include a "filename" - a series of characters used to uniquely identify the file within a folder on the storage device and often ending with an abbreviation indicating file type, e.g. standardtermsV2.2.pdf

Most file systems allow files to be grouped together in folders, with the possibility of sub-folders so that there is a "pathname" consisting of folder and sub-folder names and ending in the filename. In addition to that, the file system will, of course, record the size of each file. What other file-system metadata there is depends on the file system concerned. On most file systems there is a "date/time last modified" and there may also be a "date/time created".  Where there is a "date/time created" you would think that "date/time last modified" would always be the same as, or later than, "date/time created", but this is not always the case. If a file is renamed, or is moved from one folder to another, both folders being on the same storage medium, it is usually the case that the date/time last modified" and "date/time created" are unchanged. But if a file is moved or copied to a different storage medium, in many systems the "date/time last modified" of the file, as it is on the new storage medium, is unchanged but the "date/time created" is the date/time of the move operation itself (reflecting the date/time that the file was first created on that new storage medium) so that "date/time created" is later than "date/time last modified". On some other systems, however, the file on the new storage medium would have both "date/time created" and "date/time last modified" set to the date/time of the move or copy.

Some file systems also have a "date/time last accessed". As the name suggests, events which will cause the file-system "date/time last accessed" to be updated include not only events which change the data in the file but also many events in which the data is simply read. But in order to be of value a file system has to be selective about what is to count as an "access" - if simply displaying the "date/time last accessed" was itself considered to be an "access" then the "date/time last accessed" when displayed would always just show the current date and time! Exactly what counts as an access may vary,  to a degree, between different file systems which maintain this item of metadata.  

The usefulness of file-system metadata 

File-system metadata is updated automatically by the file system itself rather than by any app. This can provide a degree of confirmation of the accuracy of the date/time in embedded metadata. For example the embedded metadata in a Word document may indicate that it was last updated at 12.55.04 on 1 July 2023. If the file-system "date/time last modified" metadata also shows that it was last modified at 12.55.04 on 1 July 2023 that provides additional confirmation. However precisely because file-system metadata is updated automatically, various events may cause it to change when the embedded metadata does not. For example, if there is litigation documents may have been loaded by a party to a document management system (such as Bundledocs) and that document management system will have been used to generate a set of copies of the files, together with a list of them giving a concise document description and date, ready to be sent to the other side at the Disclosure of Documents stage of litigation. The data in each native copy generated by the document management system - including embedded metadata - will be identical to the original, but because file-system metadata is maintained irrespective of what app is being used (and for these purposes the document management system itself is regarded as just another app) the file-system metadata "date/time last modified" of each generated copy (and the "date/time created" if present) will simply reflect the date/time that the copy was generated by the document management system, not the original date/time of the document. So, to summarise the point: if the file-system "date/time last modified" is the same as the date/time in embedded metadata that is confirmation of the embedded metadata date/time. If the file-system date/time is before the date/time in embedded metadata, that would suggest that the embedded metadata date/time has been wrongly set by an app. But a file-system "date/time modified" which is later than the date/time in embedded metadata does not mean that the embedded metadata is wrong as there may be a perfectly valid reason - such as the use of a document management system or because the file has been copied from one device to another. 

Preserving copies of documents with metadata if there are (or may be) legal proceedings  

Although it can depend on the particular tribunal and on any case-specific directions made, the general approach in civil litigation is that there is no expectation that the copies routinely used during the litigation process should have original file-system metadata but, in the event of a query about any specific document, it should be possible for the original document to be made available for inspection. For example, if a JPG photo is provided during litigation which has no embedded EXIF data (because taken on a very old digital camera, for example) then the original filename might provide a clue as to the date/time taken (if the filename is in yyyyMMdd_hhmmss format). To allow the original file to be inspected if that should become necessary at any point, a copy can be taken at the outset designed to preserve, so far as possible, file-system metadata. So if you made a ZIP or RAR archive of the JPG file at the outset you could, on request, extract a copy of the JPG with original filename. Depending on the programs used to create the archive and subsequently extract files from it, the file system under which the extract program in particular is running, and the history of a file, the "date/time last modified" of a file extracted from the archive might be the original date/time, or it might be a later date/time (if the file had been copied from one device to another before the archive was created, or it might simply be set to the date/time of the extract operation itself, but the filename of the file as extracted should be the original filename (or, at least, the filename as it was at the time the archive was created).       

If the file system records "date/time last accessed" for each file it may count the opening and reading the data in the source file when creating the archive as an "access" and consequently the file-system "date/time last accessed" of each source file itself, and the "date/time last accessed" for the copy of the file stored within the archive (if the archive records "date/time last accessed") may be changed from whatever it was before to the date/time the archive was created. If the file system records "date/time created" that will not be changed in the source file by the creation of the archive but "date/time created" may or may not be stored within the archive and, if not stored in the archive, the "date/time created" field of the file when extracted will be set to the date/time of extraction. These things may be unavoidable (when creating an ordinary ZIP or RAR archive) but as far as possible the archive should be created in such a way that as much file-system metadata as possible is preserved - by, for example, carrying out the operation to create the archive on the original files rather than carrying  it out on copies.    

Different civil courts and other tribunals may have slightly different rules about the disclosure of documents including metadata, and, to a degree, a particular tribunal might give different directions depending on the type of case and the potential significance of particular data. In extreme cases the High Court may make an "imagining order" at an early stage (a compulsory order requiring a party to actual or impending litigation to allow solicitors for another party, accompanied by a neutral "supervising solicitor" and neutral computer expert, to enter premises where computers and other electronic devices belonging to that party are located and take a complete image of each storage medium, the image to be held securely by the computer expert subject to future directions of the court). But, absent such an order, when copies are being taken at the outset - i.e. when the possibility of legal proceedings first arises even before the form of, and forum for, the litigation is considered - the usual approach is to at least make ZIP or RAR archive copies in which at least the original file contents (including embedded metadata), and the filename, is preserved for each file.  


Disclaimer

This information page is designed to be used only by clients of John Antell who have entered into an agreement for the provision of legal services. The information in it is necessarily of a general nature and will not be applicable to every case: it is intended to be used only in conjunction with more specific advice to the individual client about the individual case. This information page should not be used by, or relied on, by anyone else.

The information on this page about specific computer techniques is provided for information purposes only. Every reasonable effort has been made to ensure that the information is accurate and up to date at the time it was written but no responsibility for its accuracy, or for any consequences of relying on it, is assumed by me. You should satisfy yourself, before using any of the techniques, software or services described, that the techniques are appropriate for your purposes and that the software or service is reliable. The features and behaviour of different apps may vary with different versions. Take technical advice as appropriate. 

This page was lasted updated in December 2023. Disclaimer