Technical Information

C2PA Technical Overview

The C2PA (Coalition for Content Provenance and Authenticity) specification defines a technical framework for creating and verifying content provenance. This page provides a technical overview of how C2PA works and how our tool interacts with C2PA metadata.

What is Content Provenance?

Content provenance is like a digital birth certificate for media. It tells you where a piece of content came from, who created it, when it was made, and what changes have been made to it along the way. Think of it as a trustworthy history of a digital file.

In today's world where AI can generate realistic images and videos, knowing the origin and history of digital content is more important than ever. Content provenance helps us distinguish between:

  • Human-created content
  • AI-generated content
  • Content that has been edited or manipulated
  • Original, unaltered content

C2PA Manifests and Assertions

The C2PA information comprises a series of statements that cover areas such as asset creation, edit actions, capture device details, bindings to content, and many other subjects. These statements, called assertions, make up the provenance of a given asset and represent a series of trust signals that can be used by a human to improve their view of trustworthiness concerning the asset.

Assertions are wrapped up with additional information into a digitally signed entity called a claim. This claim is digitally signed by the signer, producing the claim signature.

These assertions, claims, and the claim signature are all bound together into a verifiable unit called a C2PA Manifest by a hardware or software component called a claim generator. The set of C2PA Manifests, as stored in the asset's Content Credential, represent its provenance data.

Structure of a C2PA Manifest

C2PA Manifest Structure
C2PA Manifest
├── Claim
│   ├── Assertion 1 (e.g., Creation Info)
│   ├── Assertion 2 (e.g., Edit Actions)
│   ├── Assertion 3 (e.g., Device Info)
│   └── ...
├── Claim Signature
└── Validation Data

What's in a C2PA Manifest?

A C2PA manifest can contain various types of information, including:

  • Creator Information: Who created the content (a person, organization, or AI system)
  • Creation Time: When the content was created
  • Tool Information: What software or hardware was used to create the content
  • Edit History: What changes have been made to the content
  • Thumbnails: Small preview images of the original or edited content
  • AI Indicators: Whether the content was generated by AI
  • Cryptographic Hashes: Digital fingerprints that can detect if the content has been altered

All this information is organized into "assertions" - statements about the content that can be verified. These assertions are then digitally signed to ensure they haven't been tampered with.

PNG Chunks and C2PA

In PNG files, C2PA data is stored in specific chunks. A chunk in a PNG file is a data structure that contains:

  • Length (4 bytes): The length of the data field
  • Chunk Type (4 bytes): A 4-character ASCII name
  • Chunk Data (variable length): The actual data
  • CRC (4 bytes): A checksum for error detection

C2PA data is typically stored in chunks with types like "caBX", "C2PA", "C2CI", or "C2CS". Our tool allows you to identify and remove these chunks if needed.

Common C2PA Chunk Types

  • caBX: Content Authenticity Binary eXtension - Contains C2PA manifest data
  • C2PA: Main C2PA data chunk
  • C2CI: C2PA Content Information
  • C2CS: C2PA Content Signature

The caBX Chunk in Detail

The caBX chunk is particularly important as it contains the C2PA Manifest Store. According to the C2PA specification, this chunk is designed to be ancillary (non-critical), private, and not safe for copying. This is reflected in its name:

  • 'c' (lowercase): Indicates an ancillary (non-critical) chunk
  • 'a' (lowercase): Indicates a private chunk (not officially registered)
  • 'B' (uppercase): Complies with the reserved bit
  • 'X' (uppercase): Indicates "not safe for copying" (should be removed if the image is modified)

Inside the caBX chunk, data is formatted following the C2PA Manifest Store structure, which is based on JUMBF (JPEG Universal Metadata Box Format). The Manifest Store is essentially a super-box with the label c2paand a specific UUID, containing one or more C2PA manifests.

Each manifest includes information about the image's provenance: a set of assertions about the image, details of the provenance claim, and a digital signature that ensures integrity. Optionally, a manifest may contain a Credential Store with credentials (e.g., X.509 certificates) of the signer.

Information in C2PA Manifests

A C2PA manifest can contain various types of information, such as:

  • The tool that generated the image
  • Creation date and time
  • Thumbnails of the original or edited image
  • Edits performed on the image
  • Cryptographic hashes to detect alterations
  • Information about the creator or editor

These assertions are typically encoded in JSON or CBOR (Concise Binary Object Representation) within the JUMBF structure. The digital signature usually uses standard formats like COSE (CBOR Object Signing and Encryption) and supports algorithms such as RSA-PSS, ECDSA P-256/384/512, or Ed25519.

How C2PA Data is Stored in Different File Formats

C2PA has defined methods for embedding provenance data in various file formats:

  • PNG: Uses a custom chunk called caBX
  • JPEG: Uses an APP11 marker segment with JUMBF-formatted data
  • TIFF: Uses a private IFD (Image File Directory)
  • MP4/Video: Uses a UUID box with a specific identifier
  • SVG: Uses a metadata element with base64-encoded data

This flexibility allows C2PA to work across many different types of media, providing a consistent way to verify content regardless of format.

How Our Tool Works

Our C2PA Studio works by analyzing and manipulating the PNG file structure at a binary level. Here's how the process works:

  1. File Reading: When you upload a PNG file, our tool reads it as a binary array buffer.
  2. PNG Validation: We verify the PNG signature to ensure the file is a valid PNG image.
  3. Chunk Analysis: The tool scans through the file, identifying all chunks and their types.
  4. Selective Removal: Based on your input, specific chunks (like "caBX") are identified and removed.
  5. File Reconstruction: A new PNG file is created without the removed chunks, preserving all other data.

This process allows you to selectively remove C2PA metadata while keeping the image itself intact. It's particularly useful for:

  • Removing AI-generated content markers
  • Managing privacy by controlling what provenance data is attached to your images
  • Troubleshooting issues with C2PA implementations
  • Educational purposes to understand how content provenance works

Validating C2PA Signatures

A key aspect of C2PA is the digital signature that ensures the integrity of the manifest. This signature is generated with the private key of the author or editor and allows verification that the provenance information hasn't been altered.

When validating a C2PA manifest, several checks are performed:

  • Verification that the claim signature is valid using the signer's public key
  • Confirmation that each hashed URI in the assertions matches the actual data
  • Validation of data hashes to ensure content integrity

If the signature is valid and all hashes match, it indicates that the manifest is authentic and hasn't been tampered with. This validation process is crucial for establishing trust in the provenance information.

Understanding JSON Representation of C2PA Data

When C2PA data is decoded, it's often represented as a JSON structure. Here's a simplified example of what this might look like:

Example C2PA JSON
{
  "active_manifest": "urn:uuid:5a08e472-974c-422e-b38a-d6c7326481a5",
  "manifests": {
    "urn:uuid:5a08e472-974c-422e-b38a-d6c7326481a5": {
      "claim_generator": "AI_Image_Generator/1.0",
      "format": "image/png",
      "assertions": [
        {
          "label": "stds.schema-org.CreativeWork",
          "data": {
            "@context": "http://schema.org/",
            "@type": "CreativeWork",
            "author": [
              { "@type": "Person", "name": "AI System" }
            ]
          }
        },
        {
          "label": "c2pa.hash.data",
          "data": { "hash": "base64_encoded_hash_value" }
        }
      ],
      "signature_info": {
        "issuer": "Example Certificate Authority",
        "time": "2023-06-15T14:30:00Z"
      }
    }
  }
}

In this example:

  • active_manifest identifies the current, relevant manifest
  • claim_generator shows what software created the manifest (often an AI image generator)
  • assertions contains statements about the content, including creator information and cryptographic hashes
  • signature_info provides details about who signed the manifest and when

This structured format makes it possible for both humans and machines to understand the provenance of digital content.

Technical Specifications

The C2PA specification is available as a series of documents that detail the technical implementation of content provenance. These documents cover:

  • Data structures and formats
  • Cryptographic requirements
  • Embedding methods for different file formats
  • Validation procedures
  • Trust models and certification

For developers and technical users interested in implementing C2PA in their own applications, the full specification is available on the C2PA website.

Decoding C2PA Data

Decoding C2PA data involves several steps:

  1. Locating the caBX chunk in the PNG file
  2. Extracting the binary content of the chunk
  3. Interpreting the content according to the C2PA/JUMBF structure
  4. Converting the data to a readable JSON representation
  5. Optionally verifying the digital signature

This process can be complex due to the nested binary structures and cryptographic elements involved. Fortunately, there are open-source libraries recommended by C2PA that simplify this task, such as the C2PA SDK which provides bindings for various programming languages including Python.

When decoded, C2PA data typically reveals a structured JSON object containing information about the active manifest, all manifests found in the file, and details about each manifest including its assertions, signature information, and validation status.

Why Content Provenance Matters

In an era of sophisticated AI image generation and deepfakes, content provenance is becoming increasingly important:

  • For Journalists: Helps verify the authenticity of images and videos used in reporting
  • For Creators: Protects their work and establishes proper attribution
  • For Consumers: Provides transparency about the content they're viewing
  • For Platforms: Helps identify and potentially limit the spread of manipulated media
  • For Society: Builds trust in digital media by making it harder to spread misinformation

By providing a standardized way to trace the origin and modification history of digital content, C2PA aims to restore trust in online media in an age where distinguishing between authentic and manipulated content is increasingly difficult.

Future Developments

Our tool is continuously evolving to support more features related to C2PA metadata:

  • Detailed visualization of C2PA manifests and assertions
  • Support for more file formats beyond PNG
  • Adding and modifying C2PA metadata (not just removal)
  • Batch processing capabilities
  • Integration with content creation workflows

Stay tuned for updates as we continue to enhance our C2PA Studio to meet the evolving needs of content creators and consumers in the digital provenance ecosystem.