Wednesday, January 31, 2018

Practical Exercise - Image Carving II - Python

In the last post we looked at how we can manually carve out a jpeg image from free 'space'. Good to know and OK to do if we have one or two but if we had thousands to carve...... it could take some time. We would then use some sort of Image Recovery Software but could we write our own??

Part of the reason for this blog was to demonstrate some Hex Ninja skills both manually and how we can write some simple scripts to automate some of these tasks.

The general process goes something like this:
1. First we find the artifact we are looking for.
2. Understand the layout of the artifact.
3. Manually try and carve out the artificat and make sure it works for all cases.
4. Write a script to automate the process.
5. Test the script and make sure it works.

The last blog post covered steps 1-3, this post will cover steps 4-5.

So the language we will be using is Python. It is very easy to program in and is my 'goto' language at the moment for getting something up and running fast.

Available from 

There are two versions available 2.7 and 3.6. See to check out the differences between them.

I mainly use 2.7 because of there are more code libraries and more support for debugging on sites like StackOverflow but we can test it on both and see if it works. Eventually I will move to Python3.

So download Python 2.7 for your OS (Mac/Windows/Linux) and follow the install instructions.

To make sure everything has intalled OK, go to a command prompt and type python.

Hopefully you see something similar to the above screenshot. The output should tell you what version you are using (2.7.12) if it is 32 or 64 bitand a Python command prompt >>>

In the tradition of your programming languages your first exercise is to print Hello World to the screen.
Python makes this very simple, type print ("Hello World") and you should see output like below.

To get back to the normal commad prompt hit hit Ctrl-Z and Enter.

There are two main ways of using python.
1. From the Python command prompt where we can type python commands direct. This is good for doing simple testing of instructions.
2. Running a python script, where we write the python commands in an editor, save it with the extension py and then we can execute it by typing at the command prompt python

We will be mainly use the second technique. We can use a a basic text editor such a notepad. My favourite editor is PyCharm from JetBrains
It has code hightlighting, code completetion, finds error and you can run your code from within the editor, but there are a plethora of editors. They can be a bit daunting to initially use but well worth it if you intend to code a lot. For simlicity we will just use a text editor.

So now we are ready to start coding.
But before we start coding let's think about what we want to achive.
1. We want to load a file.
2. We want to search the file for the JPEG start of frame header "FFD8FFE0" and the end of frame 'FFD9"
3. We then want to save the data between these markers to a file. Simples!

As we want to keep the code simple, we won't be doing any error checking. In a real production program, there is a lot of error checking making sure the file exists, the data is in the correct format etc etc and it can make looking at the code confusing, so we will just be doing the bare basics.

The first thing we add to our script is to tell python what modules we will be using. We will be using the module re . We will be using re (Regular Expressions) to do fast searches so we need to tell Python the load in that module using the import insstruction

We then hardcode in the Start/End of Frame tags we will be searching for. FFD8FFE0 and FFD9. The format of them may look a little strange but basically it is in a hex byte string format. ie each hex byte is preceed with \x. The reason we do this is because the the file we read in will be in that format so it is easier to search for these tags in this format.

import re

JPEG_SOF = b'\xFF\xD8\xFF\xE0'JPEG_EOF = b'\xFF\xD9
JPEG_EOF = b'\xFF\xD9' 

Next we want to read in our file we want to search through. We could pass in the filename as an argument but as we are trying to be simple we will hardcode the filename it into our code. We use the open command with the name of the file we are carving from. We will use the date file Carve1.bin from the previus blog.

We use the 'rb' format indicating we want to read 'r' a binary 'b' file. The open command returns a reference to out file call a file object we call file_obj. Next we read the whole file into a variable call data. Don't try this with a massive file. We will show in later posts files how to read in big files. We then want to close the file which releases the reference to it so other programs can access it. Also make sure the file Carve1.bin i is in the same directory as the python script, otherwise we have to add path information to the filename.


This seems all pretty straightforward.

No we have our data loaded in memory we can perform our search. This is where we use the re module. Basically we want to get a list of all the offsets in the data where we find our tags. The following commands returns a list of these offsets.

SOF_list=[match.start() for match in re.finditer(re.escape(JPEG_SOF),data)]
EOF_list=[match.start() for match in re.finditer(re.escape(JPEG_EOF),data)]

If we run the script so far we can check what we have found.

>>> SOF_list
>>> EOF_list

So we have found the SOF tag at byte offset 4696 and the EOF tag at 11747.

Now all that is left for us to do is to get the data between these offset and save it to a file. We will write the code assuming their could be more hits so we can loop through all the we can carve all the images in one go.

So we need a counter variable we will call i we use to go through the lists. We then use a for loop to go through the SOF_list. We then want to get the jpeg image data from the hex byte string we read in from the file. We can do it simply by subdata=data[start:end]. So now we have the data we just need to save it to a file. As before I like to name the file and include the start offset and end offset in the name of the file. We do this with 

Now we just open that file with the 'wb' - write binary format. We update i with i=i+1 to then refernce the next EOF_list offset. And we do a print statement to give some feedback to the user.

i=0for SOF in SOF_list:
    i=i+1    print ("Found an image and carving it to "+carve_filename)

 So that should do it. We can now save this file call it and run it.

 Great it works .. so lets check the carved file.

And we are done. A 17 line image carver!