Tuesday, January 25, 2022

Image Metadata: The Metadata Removal Problem

 This is part 6 of my series on metadata for scanned pictures.

Part 1: The Scanned Image Metadata Project

Part 2: Standards, Guidelines, and ExifTool

Part 3: Dealing with Timestamps

Part 4: My Approach

Part 5: Viewing What I Wrote

Part 6: The Metadata Removal Problem (this post)

Part 7: Thoughts after 4000+ Scans


 

When I embarked on this project, I knew it'd be a challenge to figure out how to put metadata into image files. I expected that some programs would be better than others at showing the metadata I'd put in. But I didn't realize I'd have to contend with programs that silently strip metadata when you ask them to do something completely different. Caroline Guntur's blog post opened my eyes:

Many cloud platforms and social media sites will not upload, or retain the [metadata] in your photos. Some will even strip the information completely upon download.

So I can upload an image file with metadata, but the uploaded file might not have it. Or I can download a file with metadata, but the downloaded file might not have it. Ouch!

I shouldn't have been surprised. Especially on social media sites, photo metadata has acquired a reputation as a security and privacy risk. The GPS coordinates for where a photo was taken (typically included in the metadata by cell phones) have drawn particular attention. Some sites have responded by removing most or all metadata from uploaded images (sometimes while keeping it for their own use). That has drawn the ire of many photographers, who have been understandably unhappy about having, among other things, their embedded copyright notices removed from their pictures.

It got me to wondering: if uploading and downloading images may affect their metadata, what about other ways of moving files around? Is email safe? Texting? I decided to do some poking around.

I looked into two basic scenarios:

  • Upload/Download: Is metadata maintained in image files that are uploaded to a web site or cloud service and then downloaded? This scenario covers social media sites like Facebook, Instagram, and Twitter, as well as cloud storage platforms from Google, Apple, Amazon, etc.

  • Point-to-point Communication: Is metadata maintained in images sent via email, texting, or instant messaging (e.g., WhatsApp and Facebook Messenger)? And what about Airdrop, Apple's close-range wireless mechanism for transferring files from one device to another?

Upload/Download Scenarios

IPTC is not just the name of a metadata standard. It's also the abbreviation for the organization that created it: the International Press Telecommunications Council. Among its activities is looking out for the intellectual property rights of its members. One of the ways it does that is by checking how well a variety of web sites adhere to the IPTC's request that metadata in uploaded image files be left intact. Every three years since 2013, the IPTC has tested a variety of sites to see whether they retain four fields the IPTC considers particularly important: Caption/description, Creator, Copyright Notice, and Credit Line ("the 4Cs"). The latest results (from 2019) cover 16 sites and are here. I encourage you to read the report (it's not long), but the highlights are that "good" sites (i.e., those retaining the 4Cs) include Flickr, Google Photo and Drive, Dropbox, and Microsoft OneDrive. The "bad" sites (i.e., those not retaining the 4Cs) include Instagram, Facebook, and Twitter.

The IPTC's test results are interesting, but they're silent regarding the retention of the two timestamps I care about ("when taken" and "when scanned"), and they have nothing to say about  Apple's iCloud, which I think is a serious omission. I decided to do some testing of my own.  

It's useful to distinguish sites whose primary purpose is storage and accessibility from those whose primary purpose is sharing. Google Photos and Apple iCloud Photos, for example, push themselves as services that let you securely store your photos (and videos) in the cloud and have them accessible from all your devices. They support sharing photos with others, but that's not their primary purpose. You could easily make use of these services without ever sharing anything.

In contrast, the primary reason to upload photos to social media services like Facebook, Instagram, and Twitter, is to share them with others. The purpose of uploading photographs is for other people to see them.

Sites for Storage and Accessibility

I uploaded an image file to the following services, then I downloaded it and checked to see if the Exif, IPTC, and XMP copies of the four fields I use (description, copyright, "when taken", and "when scanned") remained intact. My findings were consistent, both with one another and with the results of the IPTC's testing:

  • Google Photos: All my metadata was preserved.
  • iCloud Photos: All my metadata was preserved.
  • Google Drive: All my metadata was preserved.
  • iCloud Drive: All my metadata was preserved.
  • Microsoft OneDrive: All my metadata was preserved.
  • CrashPlan for Small Business: All my metadata was preserved.

This is reassuring. Storing an image file in cloud storage is unlikely to change its metadata. This is good news for those of us who believe in cloud-based backups.

My experiments were based on the default behavior for these sites, and I suspect that's the case for the IPTC's, too. According to Consumer Reports, Flickr can  be configured to omit metadata when images are downloaded, and it's possible that the same is true of other storage and accessibility sites. However, anybody who configures a site to omit metadata in downloaded images is hardly in a position to complain if images downloaded from that site lack metadata.

Sites for Sharing

Social media sites such as Facebook and Twitter are perhaps the best known sharing-oriented web sites, but the umbrella over such sites is broader than that. Also covered are dating sites (e.g., Tinder and eHarmony), for example, as well as sites for selling things (e.g., eBay and craigslist). 

I didn't test how these sites handle image metadata, because others (e.g., Consumer Reports and Kapersky, in addition to the IPTC) have covered this ground better than I could. They've all come to the same conclusion: social media and other sharing-based sites typically remove metadata from uploaded photographs. 

Social media and other sharing-based sites are a poor choice if you want to share not just pictures, but also their metadata.

Point-to-Point Communication

The point-to-point communication mechanisms I considered are email, texting and instant-messaging, and Apple's Airdrop. I did little experimentation of my own, because this terrain has also been well explored by others.

On the email front, the consensus is that image files sent via email retain their metadata. I did a few simple tests, and my results showed the same: metadata was preserved.

Email can contain images either inline (i.e., displayed in the message itself) or as attachments. In 2020, Craig Ball published a blog post describing how inline images in email appeared to have no metadata, while attached images did. His investigation revealed that the inline images he received did, in fact, contain all the metadata in the images that had been sent, but the metadata somehow got stripped during the process of saving an inline image as an independent file. The blog post went on to explain how to work around the problem.

To see if I could reproduce his results, I emailed an image to myself twice, once as an attachment and once as an inline image. In both cases, I was able to see the metadata without any trouble. However, the email client I used was Thunderbird, whereas Ball used Gmail and Outlook. That could explain why we experienced different behaviors.

It's comforting that Ball's conclusion aligns with the consensus that images sent via email retain their metadata. At the same time, it's disturbing that extracting an inline image from a message may cause its metadata to be removed. Sigh.

But that's email. These days, more photos are probably sent by text or instant message. How does image metadata fare when communicated in those ways?

On the instant-messaging front, things are clear. I didn't run any tests myself, because the net community speaks with a single voice:

  • WhatsApp removes image metadata.
  • Facebook Messenger removes image metadata.
  • Signal removes image metadata.
  • Telegram removes image metadata.

There are ways to work around this behavior (e.g., by sending photos as documents), but the fact remains that these instant-messaging services redact photo metadata as a matter of policy.

When we shift from instant messaging to good, old-fashioned, ordinary texting, the air is fogged by the fact that smart phones typically obscure whether you're engaging in good, old-fashioned, ordinary texting. Users of the Messages app on Apple devices, for example, typically communicate with one another via iMessage. iMessage is an internet-based protocol that is quite different from the cell phone system's SMS/MMS technologies (which underlie good, old-fashioned texting). iMessage works only between Apple devices and only when an internet connection is available, so for texting to or from non-Apple devices or when internet access is lacking, the Messages app employs SMS/MMS. The protocol used for a particular sent message is indicated in Messages by the bubble color (blue for iMessage, green for SMS/MMS), but all incoming messages look the same (grey bubble), regardless of whether they were transmitted using iMessage or SMS/MMS.

This means that a text message sent or received using Messages might be a "normal" text (conveyed via SMS/MMS), but it might be an iMessage text, depending on whether the other party (or parties) in the conversation were using Apple devices and whether an internet connection was available. My understanding is that a similar bifurcation exists on Android devices, where the Google Messages app may send and receive messages using either RCS or SMS/MMS, depending on the capabilities of the parties' devices and those of their service providers.

The effect of texting on image metadata appears to be:

  • Photos sent using the iMessage protocol retain their metadata. This is both the wisdom of the net as well as my personal experience. Photos texted between Apple devices arrive with their metadata intact (unless the lack of an internet connection causes Messages to fall back on SMS/MMS),
  • Photos sent using the RCS protocol retain their metadata. It's harder to find information about RCS than iMessage, but the sources I consulted (e.g., here and here) agree on this point. Photos texted between devices running Android should arrive with their metadata intact (provided both sender and recipient(s) are using RCS).
  • Photos sent using SMS/MMS may retain their metadata. This is the scenario that applies to texts between different kinds of devices (e.g., between iOS and Android devices). Most (but not all) Internet sources I consulted said that MMS strips metadata. My favorite overview of the situation is by Dr. Neal Krawetz. His summary is that "the entire delivery process for texted pictures is just one bad handling process after another." I lack the expertise to evaluate the accuracy of his analysis, but it looks quite plausible, and it would explain the varying behavioral descriptions I found elsewhere on the internet. I feel confident in stating that transmitting photos via SMS/MMS might retain their metadata.
Stepping back from the details, we can say that instant messaging apps scrub metadata from photos, and sending photos by text may or may not have it scrubbed. Texting photos between Apple devices is a good bet as regards metadata retention, but it's important to make sure that both sender and receiver see blue bubbles in the Messages app.

The final point-to-point communication mechanism I looked at is Apple's Airdrop. I'd always thought of Airdrop as simply a way to wirelessly copy a file from one Apple device to another, but that's not quite right. A standard file copy entails copying a sequence of bytes from one place to another. What the bytes represent (e.g., a document, an image, the state of a game) is immaterial. The copying program doesn't care what the bytes are for. It just copies them.

Copying an image file in that manner would copy the file's metadata, because the copying program wouldn't care that it's an image file. It would simply copy the bytes, just like it would with a document or a game state, etc. But that's not how Airdrop behaves. By default, metadata is removed from pictures that are Airdropped. This can be overridden by enabling the "All Photos Data" option, but it's a non-sticky setting, so it has to be explicitly enabled each time Airdrop is used to copy images from one device to another. 

Airdrop's "strip metadata by default" behavior makes it less convenient and less reliable for sharing photos with metadata than a simple file-copying program would be.

Conclusion

Once you get metadata into an image file, you don't want to accidentally lose it, either for yourself or for those with whom you want to share it. The safest things you can do with image files (from the perspective of metadata retention) are to upload them to sites designed for storage and accessibility (as opposed to sharing) and to send them via email. The worst things you can do (again, from the perspective of metadata retention) are to upload them to sharing-oriented sites (e.g., social networks) or to text them using instant-messaging services.

No comments: