Visual inspection asks whether an image looks manipulated — a shadow pointing the wrong way, edges that don’t align, unnatural lighting on a face. Human experts do this, and it catches careless edits. It misses careful ones.
The gap between a detectable edit and an undetectable one has narrowed considerably. Modern editing software has made sophisticated manipulation genuinely difficult to spot — generative AI tools can remove objects, reconstruct backgrounds, and alter facial features without leaving a visible seam. What previously required specialist skill and hours of careful retouching now takes seconds. Graphic designers and content creators have never been more empowered, and the same tools that serve legitimate creative work serve those who would rather the manipulation go unnoticed. Visual forensics — asking whether an image looks edited — is losing that arms race.
Forensic analysis asks something different: does this file’s internal structure match what the device or software that created it should have produced? Every camera model, every firmware version, every editing application generates a predictable signature in the files it produces. Editing disrupts or replaces part of that signature. The evidence is in the file structure — independent of how the image looks.
This article builds on earlier work
If you’ve read Your camera’s fingerprint survives EXIF stripping — here’s how, the re-encoding signal in the next section will be familiar ground. That article covers the compression mechanism in full detail. Here the focus is broader: re-encoding is one signal among several, and together they answer a more precise question than “was this file processed?” — they answer “does this file match what the claimed device and workflow should have produced?”
The re-encoding signal — JPEG and WebP
JPEG uses DCT (Discrete Cosine Transform) block compression. The image is divided into 8×8 pixel blocks, each transformed into frequency components and encoded using Huffman tables. Camera firmware hardcodes those tables at the hardware level — fixed quantisation tables per model, Huffman tables matching the JPEG Annex K standard. Software re-encoders compute optimised tables per image. The two patterns are measurably different in every JPEG they produce.
WebP lossy works differently. It is based on VP8 video codec encoding and uses predictive coding as its primary method — the encoder models each block from its neighbours and records only the difference. DCT is applied to the residual blocks where that prediction falls short, not to the whole image. Entropy coding uses context-adaptive arithmetic coding rather than Huffman. The re-encoding signal for WebP follows the same underlying logic — camera firmware produces consistent parameter sets, software re-encoders produce different ones — but WebP remains a small fraction of files processed compared to JPEG, and the forensic baseline is still developing. It is an emerging area of analysis rather than an established one.
For JPEG, the re-encoding signal is well-established: if a file’s DQT or Huffman tables carry the signature of software rather than camera firmware, it was re-encoded after capture — even with no visible pixel changes and no remaining metadata. The mechanism is covered in full in the DQT and Huffman article.
PNG uses lossless compression and has no equivalent re-encoding signal. For PNG files, the analysis relies entirely on metadata.
Manufacturer metadata baseline — when the expected fingerprint changes
Camera firmware doesn’t write arbitrary EXIF — it produces a defined, predictable output for every image. A Canon R6 running a specific firmware version generates characteristic EXIF field values, GPS coordinate precision, timestamp format, and MakerNote structure. A Nikon Z6, a Sony A7, a Samsung Galaxy — each produces a different but internally consistent fingerprint. These are not just descriptive labels; they are the expected output of a known manufacturing process.
When a file deviates from that expected profile — carrying quantisation tables inconsistent with the claimed make and model, GPS precision that doesn’t match the device’s known output, MakerNote fields absent that should always be present — each deviation is a finding. The file doesn’t match the device it claims to have come from.
This applies across formats. WebP embeds EXIF and XMP chunks with the same device-identifying fields. MP4 and MKV containers carry creation timestamps, encoder application strings, and per-stream codec parameters — all written by the device or software that produced the file. A video claiming to come straight from a phone but carrying an FFmpeg encoder string in the container header was post-processed.
Timestamp contradictions — where DateTimeOriginal, DateTime, GPS timestamp, and software tags disagree — are one expression of this. The stronger framing is: the file doesn’t fit the expected profile of the device it claims to have come from.
Metadata fields can be modified or fabricated by anyone with the right tool. This is why the compression structure signals in the next sections carry more forensic weight — they are embedded in the mathematical encoding of the file itself, not in a text header.
Compression inconsistencies — ELA and clone detection
For JPEG and WebP lossy files, localised edits leave statistical traces in the compression structure. JPEG divides images into 8×8 pixel blocks; each block is encoded based on the frequency content of the original scene. When a region is replaced — via inpainting, cloning, or copy-paste from an external source — the compression texture of that region differs from the surrounding camera-captured content.
Error Level Analysis (ELA) is an algorithmic technique that detects this — not a visual inspection, but a computation. The image is re-compressed at a known quality level; the pixel-level residual between the original and re-compressed versions exposes regions with inconsistent compression history. Regions that were edited or inserted respond differently to re-compression than regions that were captured intact. The output is a map of those inconsistencies.
Clone map analysis searches for repeated pixel regions — a signal of copy-paste from within the image or from an external source.
PNG and lossless WebP have no equivalent compression inconsistency signals. They are lossless formats and don’t degrade in the way that makes ELA meaningful. Those formats rely on metadata signals instead.
The EXIF thumbnail — wherever EXIF travels
JPEG files embed a low-resolution thumbnail in the EXIF block, written at the moment of capture. Editing software typically updates the main image and often does not regenerate the thumbnail. A face removed from the main image may still be present in the thumbnail. An object cropped from the frame persists five levels deep in the file structure while the main image shows the edited version — a direct comparison between what was captured and what was submitted.
EXIF is not a JPEG-exclusive structure. WebP stores EXIF in a dedicated chunk inside its RIFF container — a formally supported part of the WebP specification. PNG can carry EXIF in its metadata blocks, written by tools like Lightroom, Photoshop, and ImageMagick. When a JPEG with an embedded thumbnail is converted to WebP or PNG and the conversion tool preserves the EXIF block, the thumbnail travels with it. If the main image is subsequently edited without regenerating the thumbnail, the mismatch is forensically identical to the JPEG case.
Many conversion tools strip EXIF from WebP and PNG by default, so the signal is less consistently present in those formats than in JPEG — but when EXIF does survive the conversion, it carries the same evidential weight.
What the combination of signals gives you
No single signal is conclusive on its own. Re-encoding means the file was processed — not what was changed. A deviation from the manufacturer baseline means something doesn’t fit the expected profile — not that it was forged. An ELA anomaly has multiple possible explanations.
Three consistent signals in the same file — re-encoding evidence, a deviation from the claimed device’s expected output, and an ELA anomaly in a specific region — narrow the explanation space substantially.
No single signal proves an edit. Three consistent signals in the same file is a different conversation entirely.
These four signals were chosen for this article because they are the most legible: they apply to the formats most people encounter — JPEG, WebP, PNG — and the evidence they produce can be described without a signal processing background. The full analysis pipeline runs more than thirty individual checks. Others include: statistical distribution tests on DCT coefficient blocks (Benford’s law on first-digit frequencies), progressive JPEG encoding detection (a signal that never appears in camera-original files), chroma subsampling signatures (4:4:4 = software re-save), audio sample rate anomalies in video (consumer cameras always record 48 kHz; 44.1 kHz indicates post-processing), XMP namespace fingerprinting (Lightroom, Photoshop, Darktable, and a dozen other tools each leave a distinct namespace signature in the file), block grid misalignment, and noise level consistency across image regions. A later article in this series will cover the complete signal set — what each check looks for and what a positive finding means in practice.
AI-assisted editing sits in the most challenging part of this space. When generative AI inpaints a region — removing a person, reconstructing a background, altering a face — the filled area carries different statistical properties from the surrounding camera-captured content. ELA flags that inconsistency the same way it would for a traditional clone or paste. What the compression analysis cannot yet reliably determine is whether the tool used was a human retoucher or a generative AI model. Identifying the AI generation signature within a localised region — distinguishing AI fill from manual editing — is a distinct and harder problem. It is also where forensic research is moving fastest.
snapWONDERS is building dedicated AI generation detection into the analysis pipeline. The initial work targets wholly AI-generated images and video — the foundation that makes localised AI modification detection possible next.
snapWONDERS runs the full signal set across formats — DQT and Huffman table fingerprinting on JPEG, encoding parameter analysis on WebP lossy, ELA and clone map on lossy formats, EXIF thumbnail extraction wherever EXIF is present, metadata consistency against manufacturer profiles across all formats — as a single pass on every uploaded file. Each finding is reported independently. The combined output is the forensic picture.
Run any image through it. If you have already stripped the metadata and assume the file is clean — upload it and see what the compression structure and device profile signals show.
→ Run any photo through snapWONDERS forensic analysis
What the image looks like after editing and what the file records after editing are two different things.
Kenneth Springer is the founder of snapWONDERS and the developer of Vaultify. The forensic checks described in this article — Huffman re-encoding detection, quantisation table fingerprinting, ELA, clone map analysis, EXIF thumbnail extraction, and metadata consistency scoring against manufacturer profiles — run automatically on every file analysed through the platform. snapWONDERS forensic analysis — no account required.

