Reverse Engineering File Format From Scratch

Usually, file structures are vastly documented with open-source parsers available, but that's not always the case. In this article we will take a look at a case study of reverse engineering After Effects' project file - this will serve a dual purpose of demonstrating how we can understand the file structure of an undocumented format and showing that it’s not as scary as one may think. Based on the acquired knowledge we should be able to build a file parser and extract information.

The first thing I did was opening the file in a hex editor to get a general feeling for it, and I also run the “file” tool to learn what I could about the file format.

In this example, it was revealed to be a big-endian RIFF file. While learning more about it I also wanted to better understand how these files are used inside AE. Since AE is a pretty huge I didn’t want to RE the binaries just yet and hope to skip that overall. AE allows us to “Save As > Save a Copy As XML” which caught my attention since it meant the file structure can be represented in a more human-readable way. It also might have helped me make heads or tails of the ASCII strings I saw in the hex editor - note the similarities between these two files - “swvap”, “head”, “nhed”, etc:

<svap bdata="072b8e06"/>
<head bdata="00570001072b8e06800000000000000100000001"/>
<nhed bdata="0000000000000005000101001e100200000004e41708720000000000fffffffe"/>
<adfr bdata="40e7700000000000"/>

Noticing this I quickly forgot about researching the RIFF file and focused on building a parser for the XML (since it looked less scary than the binary data in the original .aep files). Using the reference of the XML format I learned how single tags, children, attributes, arrays etc are represented. When building the parser I just made it parse recursively until it hit a point it couldn't continue through. Then I determined the problem, fixed it, and iterated - and quite quickly my parser was able to finish parsing all the nodes and data in the file.

Since the parser could already understand some project properties described as XML tags, I managed to extract the composition name (which is kind of a layers grouping in AE). From there I could work on understanding the meaning of certain tags and the data they contain - it seemed most of the data was found in the bdata attribute in a binary format.

The easiest way to go about it was to start with an empty project to minimize the amount of data (and export it) and then add layers we know the meaning of and do a diff between the old and new exports. Do note that it's also beneficial to use the same approach for multiple empty projects to understand where variable data (such as timestamp/etc) is stored.

As for the binary data, we can try to parse it in different formats such as integers, floats or/and text and see what makes the most sense. Once we focus on an area in the file, we should be able to figure out the “effect” and “settings” of the layer it affects.

We can keep researching the differences and guessing the types until we're satisfied we know enough. It’s important to also ask yourself how would you implement the AE format details - perhaps its authors used a similar approach. Eventually, you can also search the program/process memory for strings or data we can see in the program's UI itself - it may reveal structures used nearby, such as a class instance that represents a given type of data. Or even search for the bdata value itself and look around its memory area to learn more.
Reverse Engineering File Format From Scratch Reverse Engineering File Format From Scratch Reviewed by Vaishno Chaitanya on August 22, 2019 Rating: 5

1 comment:

  1. Through hole technology is often utilized when manufacturing PCBs that are double-sided or multilayered. PCB reverse engineering PCBs can contaminate the environment through use and disposal, and it is believed that a large percentage of Americans may have already been exposed to dangerous levels of the substance.


Powered by Blogger.