Wednesday, January 31, 2018

Practical Exercise - Image Carving II - Python



In the last post we looked at how we can manually carve out a jpeg image from free 'space'. Good to know and OK to do if we have one or two but if we had thousands to carve...... it could take some time. We would then use some sort of Image Recovery Software but could we write our own??

Part of the reason for this blog was to demonstrate some Hex Ninja skills both manually and how we can write some simple scripts to automate some of these tasks.

The general process goes something like this:
1. First we find the artifact we are looking for.
2. Understand the layout of the artifact.
3. Manually try and carve out the artificat and make sure it works for all cases.
4. Write a script to automate the process.
5. Test the script and make sure it works.

The last blog post covered steps 1-3, this post will cover steps 4-5.

So the language we will be using is Python. It is very easy to program in and is my 'goto' language at the moment for getting something up and running fast.

Available from https://www.python.org/downloads/ 

There are two versions available 2.7 and 3.6. See https://wiki.python.org/moin/Python2orPython3 to check out the differences between them.

I mainly use 2.7 because of there are more code libraries and more support for debugging on sites like StackOverflow but we can test it on both and see if it works. Eventually I will move to Python3.

So download Python 2.7 for your OS (Mac/Windows/Linux) and follow the install instructions.


To make sure everything has intalled OK, go to a command prompt and type python.


Hopefully you see something similar to the above screenshot. The output should tell you what version you are using (2.7.12) if it is 32 or 64 bitand a Python command prompt >>>

In the tradition of your programming languages your first exercise is to print Hello World to the screen.
Python makes this very simple, type print ("Hello World") and you should see output like below.



To get back to the normal commad prompt hit hit Ctrl-Z and Enter.

There are two main ways of using python.
1. From the Python command prompt where we can type python commands direct. This is good for doing simple testing of instructions.
2. Running a python script, where we write the python commands in an editor, save it with the extension py and then we can execute it by typing at the command prompt python yourscripty.py

We will be mainly use the second technique. We can use a a basic text editor such a notepad. My favourite editor is PyCharm from JetBrains https://www.jetbrains.com/pycharm/
It has code hightlighting, code completetion, finds error and you can run your code from within the editor, but there are a plethora of editors. They can be a bit daunting to initially use but well worth it if you intend to code a lot. For simlicity we will just use a text editor.

So now we are ready to start coding.
But before we start coding let's think about what we want to achive.
1. We want to load a file.
2. We want to search the file for the JPEG start of frame header "FFD8FFE0" and the end of frame 'FFD9"
3. We then want to save the data between these markers to a file. Simples!

As we want to keep the code simple, we won't be doing any error checking. In a real production program, there is a lot of error checking making sure the file exists, the data is in the correct format etc etc and it can make looking at the code confusing, so we will just be doing the bare basics.

The first thing we add to our script is to tell python what modules we will be using. We will be using the module re . We will be using re (Regular Expressions) to do fast searches so we need to tell Python the load in that module using the import insstruction

We then hardcode in the Start/End of Frame tags we will be searching for. FFD8FFE0 and FFD9. The format of them may look a little strange but basically it is in a hex byte string format. ie each hex byte is preceed with \x. The reason we do this is because the the file we read in will be in that format so it is easier to search for these tags in this format.

import re

JPEG_SOF = b'\xFF\xD8\xFF\xE0'JPEG_EOF = b'\xFF\xD9
JPEG_EOF = b'\xFF\xD9' 

Next we want to read in our file we want to search through. We could pass in the filename as an argument but as we are trying to be simple we will hardcode the filename it into our code. We use the open command with the name of the file we are carving from. We will use the date file Carve1.bin from the previus blog.  https://github.com/thehexninja/BlogDownloads/blob/master/Carve1.bin

We use the 'rb' format indicating we want to read 'r' a binary 'b' file. The open command returns a reference to out file call a file object we call file_obj. Next we read the whole file into a variable call data. Don't try this with a massive file. We will show in later posts files how to read in big files. We then want to close the file which releases the reference to it so other programs can access it. Also make sure the file Carve1.bin i is in the same directory as the python script, otherwise we have to add path information to the filename.

file_obj=open('Carve1.bin','rb')
data=file_obj.read()
file_obj.close()

This seems all pretty straightforward.

No we have our data loaded in memory we can perform our search. This is where we use the re module. Basically we want to get a list of all the offsets in the data where we find our tags. The following commands returns a list of these offsets.

SOF_list=[match.start() for match in re.finditer(re.escape(JPEG_SOF),data)]
EOF_list=[match.start() for match in re.finditer(re.escape(JPEG_EOF),data)]

If we run the script so far we can check what we have found.


>>> SOF_list
[4696]
>>> EOF_list
[11747]

So we have found the SOF tag at byte offset 4696 and the EOF tag at 11747.

Now all that is left for us to do is to get the data between these offset and save it to a file. We will write the code assuming their could be more hits so we can loop through all the we can carve all the images in one go.

So we need a counter variable we will call i we use to go through the lists. We then use a for loop to go through the SOF_list. We then want to get the jpeg image data from the hex byte string we read in from the file. We can do it simply by subdata=data[start:end]. So now we have the data we just need to save it to a file. As before I like to name the file and include the start offset and end offset in the name of the file. We do this with 
carve_filename="Carve1_"+str(SOF)+"_"+str(EOF_list[i])+".jpg"

Now we just open that file with the 'wb' - write binary format. We update i with i=i+1 to then refernce the next EOF_list offset. And we do a print statement to give some feedback to the user.

i=0for SOF in SOF_list:
    subdata=data[SOF:EOF_list[i]+2]
    carve_filename="Carve1_"+str(SOF)+"_"+str(EOF_list[i])+".jpg" 
    carve_obj=open(carve_filename,'wb')
    carve_obj.write(subdata)
    carve_obj.close()
    i=i+1    print ("Found an image and carving it to "+carve_filename)

 So that should do it. We can now save this file call it jpeg_carve.py and run it.


 Great it works .. so lets check the carved file.


And we are done. A 17 line image carver!

Sunday, December 31, 2017

Practical Exercise - Image Carving

So who's ready to carve?

Or as Gordon would say " Let's Carve or F#!K OFF "

In the last post we talked about some simple carving of a JPEG image file using a hex editor.

Before we get to carried away we should practice a couple of simple carving of images from 'unallocated'. What do I mean by 'unallocated' I hear you ask well...

There are a couple of approaches to carving and recovering files from file systems.

Firstly is the "File System" approach. That is, we use the fileystem's knowledge of where the deleted file was to begin our journey of recovery.

For example, when a file is deleted in a FAT32 filesystem, the directory entry has the first byte of the entry overwritten with 'E5'. The directory entry still contains; the filename (minus the first character), the filesize and the first cluster number. These can be vital to assist in the recovery process.

For a valid file we could look up the cluster number in the FAT table and find all the fragments as each FAT entry points to the next cluster number.

However when a file is deleted the FAT table entries are zeroed so we cannot trace the file fragments. We will go through a worked example of this later.

The second technique for file recovery is to ignore the filesystem and treat the disk as one big block of data. We can either do this on the whole disk image or we can just export the unallocated portion of the disk. We can then use our knowledge of what type of file we are trying to recover to attempt to find the file/s in question.

So let's start with three simple image carves.

1. JPEG: Deleted, no thumbnails, not overwritten, unfragmented in free unallocated space.

2. JPEG: Deleted, no thumbnails, not overwritten, unfragmented in full unallocated space.

3. JPEG: Deleted, no thumbnails not overwritten, fragmented in unallocated space.


Carve 1

Download the bin file from the GitHub

In a hex editor search for FFD8FFE0.


We find a search hit at 0x1258


Select the beginning of block at the the start of the JPEG at 0x1258. Now we search for the end of the file with the hex FFD9.


The D9 of FFD9 is at end of the file is at offset 0x2DE4. We select this as the end of the block. Copy the block out to a new file. In the filename I like to include 3 things, the file I am carving from, the start and end offset. So lets call it Carve1_1258_2DE4.jpg and wallah... 
Carve1_1258_2DE4.jpg

Carve 2

Download the bin file from the GitHub

Again we search for FFD8FFE0.


We find it at offset 13B6. In this second example we see that it is embedded in other data (other deleted or allocated files), this is more typical of what we might see.
Again we search for FFD9 for the end of file marker. It is at 0x2360. We select the block and copy it out into a new file. Carve2_13B6_2360.jpg.

Carve2_13B6_2360.jpg

This seems simple enough, just a search from the start and end and we a have carved two deleted files of the Hex Ninja.

Carve 3

Download the bin file from the GitHub

Opening this unallocated blob we see something interesting...
\

For those who like to common hexinate files it looks like an OLE Compound File (OLECF) that is used in Word, Powerpoint, Excel from 1997-2003.  They have a distinct 8 byte header D0CF11E0A1B11AE1. For more info have a look at http://www.forensicswiki.org/wiki/OLE_Compound_File

So this example looks like there is another file in the unallocated space. But we will concentrate on the JPEG we are searching for. So we search for FFD8FFE0 as before.


Interesting to note that it is on a nice byte boundary of 0x2000 ie  8192 bytes or 16 sectors of 512 bytes. This will be important later but let's move on to carving the JPEG. Search for FFD9. We find it at 0x4424. We save it as Carve3_2000_4424.jpg.

Carve3_2000_4424.jpg
Huh, this doesn't seem right. The first part looks like the devilishly handsome you know who!! But what happened to the rest. So let's look back at our file we carved out. If we scroll up from the bottom we see some weird stuff. We see some references to a directory structure "theme/theme/themeManager.xml" ...



That stuff should not be in our JPEG. So here is our Aha moment... no not 'Take on me' Aha more like a 'that's interesting' Aha.
Aha - Take On Me (1985)
https://www.youtube.com/watch?v=djV11Xbc914
 
We saw the first part of unallocated was an OLE file then we found our JPEG but it looks like maybe  we have some of the OLE file mixed in our JPEG causing it to not decode properly. 

So now what could be happening. Perhaps FRAGMENTATION!!!. 

What is this fragmentation sorcery you speak of?

Well let's back up a bit first.

So for a new filesystem out of the box, we have a nice clean storage device. A new file would be stored in sequential blocks on a disk. We store a file in logical blocks called clusters. Each cluster is made up as of a number of the smallest traditional Hard Disk units called a sectors (512 bytes). The cluster is an arbitrary unit and is the smallest addressable unit the operating system can address. For example in a FAT32 filesystem a cluster may be 4 sectors (2048 bytes) or 8 sectors (8192 bytes) etc. 

So why isn't this fixed? 

Well mainly for a reason of a trade off. If the cluster size is too big we can waste a lot of space. e.g. if our cluster is 32kBytes and our file is 100 bytes we are wasting nearly 32kBytes (slack space).  But, if we make each cluster really small say, 1 sector, we run out of the maximum storage space pretty quickly as the size of the table to address all these sectors (FAT) becomes almost as big percentage of our storage e.g. a 2TB disk using a 1 cluster/sector would need 16GB of FAT to store all sectors addresses and there are 2 FATs on the disk for redundancy

So when we have many files and we delete some, create some new files, delete some more file our disk becomes fragmented. So when we go to save a file we have lots of gaps in our disk from the files that have been deleted and the operating system would like to reuse them. The FAT file system will store the sequential cluster number for each file e.g. 202,203,207,412,902 could be the non-sequential cluster numbers for a 5 cluster file. This is fine for an allocated file but what happens when the file is deleted. The directory entry has the first byte overwritten with E5, it also stores the first cluster number but the FAT entry is overwritten with zeros. 

This is OK for a deleted file that has sequential cluster numbers but for a typical file with non-sequential cluster numbers we are.... well... stuffed! 
The things we use for our advantage is to know the cluster size and the type of file we are searching for. The cluster size is good as we only need to look at the boundary of clusters for the file we are searching for. The file type is useful as we know what we are looking at. A text file or a ZIP file look very different in hex. 

Now back to our file. If we have a look at the highlighted section in our carved file, we remember that our OLE file was 0x2000 bytes long, that could be a clue for our cluster size 0x2000 is 8192 bytes or 16 sectors. This is a good clue that our cluster size of 16 or fraction of this maybe 8 or 4.

So looking back through our data Carve3.bin we see that if we step forward in multiples of 0x2000 bytes we see that if our assumption of a cluster size of 0x2000 were true that the second cluster looks strange. Prior to 0x8000 is a a run of all zeros which is not normal for a sequential run of a JPEG which usually has high entropy data.



So let's try maybe half the cluster size of 0x1000 or 4092 bytes (8 sectors). If we find the start of the JPEG be searching for FFD8FFE0 we found at 0x2000. We then search forward one 'trial cluster' of 0x1000 we find that there is no continuity of high entropy data we would normally see in the data part of a JPEG. So our initial assumption of a cluster size of 0x2000 was wrong. So let's move forward with a cluster size of 0x1000.


If we move forward from 0x3000 to 0x4000 we see some nice data that has high entropy again.
So it looks like our assumption of cluster size 0x1000 might be correct, so if we move forward another cluster 0x1000 we see we are not in JPEG type high entropy data anymore.
So maybe the JPEG finishes in this last cluster i.e from 0x3000 to 0x4000. So lets search forward from 0x4000 looking for FFD9 and we find a hit at 0x4424. 
So if we try making up a the file of:
0x2000 to 0x3000  and 
0x4000 to 0x4424
If we combine those parts we have a file Carve3_2000_3000_4000_4224.jpg. In a hex editor we simply copy the first part 0x2000 to 0x3000 to a file then we copy 0x4000 to 0x4224 and append that to our file. Now let's check the results.
Carve3_2000_3000_4000_4224.jpg
Wow that looks good if I don't say so myself.... and my best profile too!

So that was quite a hexinating journey. So what did we cover.  Carving a sequential JPEG from unallocated space right up to a fragmented carve. Good work. What you have learnt is the basis of every file recovery.

Until the next post TheHexNinja says:

Seasons Greetings All
Prosperous New Year Awaits 
Drink and Be Merry


Tuesday, July 26, 2016

Hex Editors Phoaar

The Hex Editor

OK, so our basic tool on this journey is the humble hex editor. But all is not so simple.  There are a plethora of hex editors available. Basically we want to be able to highlight an area of interest, save.... view...save.. copy...paste.. cut..repeat....

The basic features you will be using a lot of are
  • Search: bytes in hex, locate, count, index, export address
  • Goto: both absolute and relative address.
  • Select: nice if they are right click 'start', right click 'end'
  • Cut, Copy, Insert Paste, Overwrite Paste
  • Hex/Decimal: be able to switch between these easily
You will be doing these functions alot! So choose a hex editor that can do those functions easily or with shortcuts.

My favourite hex editors are (no affiliations or endorsements):

Paid:

WinHex - Xways
http://www.winhex.com/winhex/

Super fast, simple to use. All you really need for basic hex carving.
The basic personal licesne is ~$60 and well worth it.
For basic carving I really like the 'right click- beginning of block' , 'right click- end of block', Edit- Copy Block into new file - Walla.

WinHex Screenshot


HexWorkshop
http://www.hexworkshop.com/
Hex Workshop Screenshot
I like the coloured byte window....purrdy..., it is nice to help identify periodic patterns and you can pick up small changes in the data as you scroll through a file etc
License is $89.95
Copying and cutting blocks of data is a little cumbersome as you need to specify start address and either size or end address. Not a show stopper, but it does slow the Hex Ninja down when he has his flow on.


010 Editor
http://www.sweetscape.com/010editor/
A bit more expensive but I like this one a lot for more complex operations and analysis
$129.95 or $49.95 for personal use
Has scripting capabilities and some nice file templates for parsing file structures

010 Editor Screenshot



Free Editors:

Notepad++ with the Hex Editor Plugin
https://notepad-plus-plus.org/
Good if you like to keep the programming, hex editing all in one place.

Hxd
http://mh-nexus.de/en/hxd/
Nice interface and has Mac version as well.

Although forensic tools have the ability to show the hex, the features are pretty limited (except for XWAYS -WinHex)


Example
So.... What daily functions does Hex Ninja like to do in a hex editor?

The number one thing I do is seeing if a given file is intact, corrupted etc so by basically opening a file in a hex editor we get to see what it really like like and not what the file extension is labeling it as.

So open as many files as you can so you get to see the basic structure they have. If you first focus on JPG, PNG, MP4/MOV, AVI, DOC and PDF, you will be across most filetypes you want to recover, rebuild etc.
You will get so used to there structure and tags that you can recognise them in a stream of hex,



...there's way too much information to decode the Matrix. You get used to it, though. Your brain does the translating. I don't even see the code. All I see is blonde, brunette, redhead. Hey uh, you want a drink? -Cypher



For example the most common file the Hex Ninja sees is the common JPG or more correctly the JPEG File Interchange Format (JFIF).. The JPG is the file extension, the JFIF is the file container it is stored in. Lets hexinate a typical JPEG.

Hex View of JPEG

To do any basic carving we need to find the start of a file and the end of the file OR an embedded size so we can find the end. Let's take a quick look under the hood.

The basic structure in JFIF is a sequence of marker segments. Starting with FF followed by a byte defining the marker type. Depending on the marker there can be embedded data and nested marker segments. 

See https://en.wikipedia.org/wiki/JPEG for a basic overview or https://www.w3.org/Graphics/JPEG/itu-t81.pdf if you want to dig deeper.

The first 2 bytes 0xFFD8 indicate a 'Start Of Image' (SOI). 
If we just searched for the two bytes 0xFFD8 on a disk or 'unallocated space' we would produce to too many false hits. Generally the longer and more specific the search term the less false hits we will get, so two bytes is a little short so we will see what follows that we could use in a search term . 

The next two bytes 0xFFE0 indicate a 'JFIF APP0 marker segment'. which has embedded data such as the text 'JFIF'. While the 0xFFD8FFE0 is generally common across all cameras/phones I have seen a couple of cameras that didn't put the APP0 first but APP1 was first ie 0xFFD8FFE1 but that is rare so let's keep it simple.

Next we need to look for an embedded size or embedded file marker. 

Unfortunately there is no embedded size in the JFIF, We could technically decode the image as we carve to find the end but that it a bit more intense so lets start with finding the end. So we need to be looking for an end of file marker. In the JFIF specification it is End Of Image (EOI) 0xFFD9.... Really.. a two byte marker! That can lead to a lot of false positives. Why didn't they make it an 8 byte marker or even 4 or 6 bytes would be better! 

There are a couple of issues we should be aware of so we can try and avoid false positives in a search and carve: 

1. There can be embedded thumbnail/s inside the JFIF file that have the same SOI and EOI markers. Yep good thinking JPEG working group! We can generally avoid this by ignoring the EOI if it occurs too soon after the SOI. We can also carve out the thumbnails in a more thorough carve to be done in later blogs. 
2. If the end of the file has been overwritten we may not find the EOI marker until the end of another image. We can avoid this by limiting how far we search for the EOI after the SOI. 
3. The image data may be fragmented. That is, cluster size blocks of the data can be intermingled with  other files. Generally we do not know the location or sequence of the clusters. We will practise these in a later blog post.  

The marker 0xFFD9 should not occur in the file unless it is the EOI (of the main image or thumbnails), ie we should not find it in the compressed image data (OK JPEG working group, at least you thought of that).  

No back to our simple carve. We locate the 0xFFD9 indicating the end of the file.

JFIF EOI Marker 0xFFD9
Summary:

So if we found what looked to be a JPEG in unallocated space or embedded in another file we can carve it out using the simple technique:
1. Search 0xffD8FFE0, mark the first byte as the start of the block.
2. Seacrh 0xFFD9, mark the last byte the end of the block.
3. Copy the block into a new file, save it with a .jpg extension and you will have a carved JPEG.


Until the next post TheHexNinja says:

Bamboo bends in wind
Ninja watches you alone
POISON DART IN BACK 

Wednesday, January 13, 2016

Workflow




The first post is going to be a quick overview of my normal workflow and what tools I use. 

Firstly, welcome! Thanks for dropping by. Hopefully you will find something useful. If you want something explained in more detail,  add a comment or send me an email. Happy to help.


Now... when I say tools, I don't mean 'point and click'. I am not against them but usually if I am looking at it in hex, it is due to automated tools not extracting the data I need. It is also harder to explain how they work. 

You can choose what tools you like but the main thing is that your are comfortable with them and can use them quickly.

Workflow

The basic workflow goes like this:

1. What is this? 
     A big blob of data with juicy stuff inside. Excited? Me too. Look at all that HEX! Gigabytes of it! 



2. What are we looking for?
     Pictures, videos, documents, SMS, MMS, chat, SQLite databases, web searches etc
     Knowing what we are looking for will give us information such as headers footers, tags that we can search for.

3. What am I looking in?
     Is this a file, a copy of a micro SD Card, a Hard Disk Drive DD image, a raw NAND chip dump.
     This will help us know if the data is contiguous, what the sector,page,block sizes are, if the data needs to be reordered. It will help us to know if the filesystem is FAT32, NTFS, EXT4 etc



4. Is that data active or deleted?
    If the data is active, then we can use a filesystem approach to find it (that is not really a topic for here but more details later).
    If the data has been deleted, how long ago was it deleted? How big is the disk/memory, how full is the disk, how much was it used since the data was deleted?



5. Let's try looking manually
    The reason we are looking manually is usually due to fragmentation, incomplete file finalisation or partial overwriting. Using our hex editor we search for tags/headers/footers to try an identify similar patterns or files structures.
    Can we try and 'carve' out a file that can be viewed. Is the data fragmented, has it been partially overwritten? Do we need to build a new file? Do we have similar files from the same device?


 


6. Now let's automate this.
    Once we have done this manually we can now write a script to automate this process. Sometimes we are only looking for one file or piece of data but often it will be many or we will get a similar job again so it is worth putting in the few minutes to script a semi-reusable solution.




It would be nice to say this is the last step but there is a continuous loop between step 5 and 6. As we automate it, a new case breaks it, we adjust and automate ....

I am code language agnostic and have programmed in languages such as c64 basic, Fortran, Spice, various database 'languages', C, C++,VB, java, Matlab, assembler and Python.

I currently like to use Python due to simplicity, readability, support (where would I be without stackoverflow.com), rapid development, price (Gratis is good), licensing, cross platform support, easy GUI support and easy deployment (We can package it up as an exe if we need to distribute it stand alone- this saves the 'oh it's missing a module!' or 'how do i run it?' dilemma that turn a lot of people off from running code. (Setting this up simply is planned for about post 8.. so stay tuned)

I am also OS agnostic, PC, Mac, Linux, DSP on embedded ARM.. bring it on.

The code in the coming blogs will be using Python but as it is almost pseudo code, you can convert it to your language of choice. I am not up for a debate of which language is best.

The code is written to be understood, I am not here to show off how I reduced 8 lines of code into 1 and now no one can understand except it Dr Smarty McSmarty or how using a different instruction or module runs 13.6% faster. We can optimise later if we need to. Let's just get something working quickly so our brains can think about the problem and not be bogged down in syntax issues.

OK so let's get started!

Tools:


1. FTK Imager (free and simple to get 'forensic' copies of data like SD cards or Hard Disk Drives etc.)

2. Hex Editor (The next post will go over which ones I use and like)

3. Python (usually use 2.7 due to code base and support out there but also am tinkering with 3)

And that's it! The results I have been able to get from these simple tools have surpassed anything commercial I have used and the difference is I get to understand it too. Which makes the next job/project easier... well, I keep telling myself that.


Until the next post TheHexNinja says:

Gentle deer drinks dew
Forest awakens new day
NINJA STAR TO NECK TO NECK