SELECT * FROM image.png

Learn how png-db uses steganography and the flexible PNG file format to embed and retrieve JSON data within image files.
PNG as a Database (PNGaaDB?)
Try it at https://pngdb.jonaylor.com
SELECT * FROM image.png
A lot of people have heard of steganography before but idk how many people have actually tried uses it or building for it. At least that statement is true for myself as while I’ve done tons of CTFs in high school and college, I’d never really known how it all worked and thus png-db
. This project hides JSON data in obscure corners of PNG files and exposes it via a sequel-esque script. It’s very fun.
To understand how an image can store data, one must first be familiar with the structure of a Portable Network Graphics (PNG) file. A PNG is not a single block of data but a series of distinct data segments known as "chunks".
+------------+
| PNG Header |
| (8 bytes) |
+------------+
| Length | (4 bytes)
| Type | (4 bytes) "IHDR"
| Data | (13 bytes) Image Header data
| CRC | (4 bytes)
+------------+
| Length | (4 bytes)
| Type | (4 bytes) "zTXt" for Schema
| Data | (variable) Compressed JSON Schema
| CRC | (4 bytes)
+------------+
| Length | (4 bytes)
| Type | (4 bytes) "zTXt" for Row 1
| Data | (variable) Compressed JSON Row Data
| CRC | (4 bytes)
+------------+
| ... |
+------------+
| Length | (4 bytes)
| Type | (4 bytes) "IDAT" (Pixel Data)
| Data | (variable) Compressed Image Pixels
| CRC | (4 bytes)
+------------+
| ... |
+------------+
| Length | (4 bytes)
| Type | (4 bytes) "IEND"
| Data | (0 bytes)
| CRC | (4 bytes)
+------------+
Every valid PNG file is required to contain three specific types of chunks. The file has to begin with anIHDR
(Image Header) chunk, which provides foundational metadata like the image's dimensions. Following the header, the file contains one or more IDAT
chunks, which hold the compressed pixel data of the image. The file then must be terminated by an IEND
chunk, which signals the end of the data stream.
Beyond these required components, the PNG specification allows for a variety of optional, ancillary chunks that can store a wide range of information. For example, the tEXt
chunk stores simple, uncompressed textual information as a keyword-value pair. The zTXt
chunk is a compressed alternative for larger blocks of text, using zlib
to compress the data string. These chunks are where metadata like Title
or Author
are typically stored. Although the PNG specification defines these chunks for specific purposes, readers are not required to respect them, which creates an avenue for data embedding.
The structure of these chunks is a key enabler for using PNGs as a data format. Each chunk includes a four-byte length field, a four-byte chunk type code, the chunk's data, and a four-byte Cyclic Redundancy Check (CRC). This organized format provides a predictable container for custom data. The specification even defines an UnknownChunks
element, which can store data not defined in the PNG specification, further highlighting the format's flexibility.
The following table provides a quick reference for some of the most important PNG chunks:
Chunk Type | Description | Critical/Optional |
---|---|---|
IHDR | The image header, containing dimensions, color type, and compression method. | Critical |
IDAT | Contains the actual compressed pixel data. There can be multiple IDAT chunks. | Critical |
IEND | Marks the end of the PNG file. | Critical |
PLTE | The palette chunk, used for palette-based images. | Optional |
tEXt | Stores uncompressed textual metadata (keyword-value pairs). | Optional |
zTXt | Stores compressed textual metadata (keyword-value pairs). | Optional |
bKGD | Defines the default background color for the image. | Optional |
cHRM | Defines the color calibration data. | Optional |
Steganography Use Cases
The concept of embedding data in a PNG is an element of steganography, the practice of concealing a message within another message. Steganography often involves modifying the raw pixel data itself to hide information, an approach that can be brittle and may corrupt an image. This project’s approach, however, uses the PNG file format's extensible chunk structure to store data without altering the visual content of the image. The data is openly contained in custom or designated chunks.
How It Works
png-db
hacks the PNG format's inherent flexibility to store structured data within a single file. While other steganographic methods manipulate pixel data, here we’re using the PNG's zTXt
(compressed textual data) chunks to store all of its information. The image itself is simply a placeholder, a grid of black pixels, which serves as a canvas for the embedded data (which is boring but I made this in an afternoon so meh).
The project organizes its data into a schema and individual data rows. When the database is saved to a PNG file, the schema is serialized into a JSON object and written to a single zTXt
chunk with the keyword "schema"
. Each data row is also serialized as a JSON object and is written to its own separate zTXt
chunk. The keyword for each data row chunk is generated based on its coordinates, in the format "row_x_y"
. This method allows the database to be reconstructed simply by reading and parsing these specific chunks from the PNG file. When a user queries the database, the project parses a simple WHERE
clause and performs an in-memory linear scan of all the data rows. It then compares the values of the coordinates or the JSON fields within each row to the conditions specified in the query, returning only the rows that match.
Conclusion
png-db
is a simple, focused tool that demonstrates the potential of using the PNG file format as a data container. I suppose next I’ll build a meme page where all the memes have hidden datasets contained in the image metadata then pretend I’ve found a super secret CIA chat website.
Try it at https://pngdb.jonaylor.com