Cache File Format (Documentation)
12 posts • Page 1 of 1
March 14th, 2008, 9:47 am
I am trying to write a forensics tool (Similar to Mozilla Cache View) that will allow me to extract the files out of the disk cache (like .jpg, .html, .css, etc. files) for Firefox. FF 2.0 is what I'm really concentrating on now--I know the history.dat file format has been changed in the 3beta to sqllite, but I assume the cache files have stayed the same. The problem is I can't figure out exactly how to parse these files (_CACHE_MAP_, _CACHE_001_/002/003, and all the ones named after the hashes). I have found two resources that have almost given me the information I need to pull out the files, but I need a little more information about what all the bits and bytes mean for each of these files. I really just would like to read the documentation about how these cache entries are built, but the closest I can come to getting the information I need is from the source code itself.
If anyone knows where the documentation is for how the cache entries are built, that would help me out tremendously! Other than that, these are the two resources I have come up with from 3 or 4 days of google searching. 1) http://people.mozilla.com/~chofmann/l10 ... CacheMap.h the comments in the source code kind of point me in the right direction. 2) http://www.securityfocus.com/infocus/1832 this seems close to what I need to be able to do, but it doesn't quite have enough details for me to be able to implement it in code. I would appreciate any additional resources you can provide on this matter. Thanks!
March 14th, 2008, 12:37 pm
Moving to Mozilla Development
Only two things are infinite, the universe and human stupidity, and I'm not sure about the former. -Albert Einstein
Please DO NOT PM me for support... Lets keep it on the board, so we can all learn.
March 14th, 2008, 4:13 pm
Why do you need to write an extension to do that when you can access files in the cache using "about:cache?device=disk"? Click on the item of interest, then click it again when the new window opens. You can drag an image out of the window and drop it into a directory to save it.
March 16th, 2008, 4:17 pm
It is not an extension that I want to write. It is a completely external tool. I basically want to be able to implement about:cache?device=disk in my program without ever opening up Firefox...and I want to move ALL of these files to a directory where they can be saved long-term. So if someone clears the cache later, I still have access to these files and would be able to present them in court as evidence. Does that clarify the question a little?
Thank you
March 17th, 2008, 6:46 am
I wonder why you need info other than the source. They aren't difficult to read,in comparison with the rest of files in the tree.
March 17th, 2008, 7:47 am
I was trying to avoid using the source if there was other documentation available. I didn't have the source downloaded and I have never done any firefox development, so if I could just read the documentation instead of reading through and downloading the source code, that would be magnitudes easier for me. It doesn't appear that there is documentation anywhere else, so I will have to figure it out from the source.
Thanks for all the help anyway. Oh, one last question: is mozilla/netwerk/cache/src where I should be looking for this code? That's the only real cache info I can find in the source tree.
March 17th, 2008, 1:23 pm
You can access the source online via http://lxr.mozilla.org/
(e.g. http://lxr.mozilla.org/mozilla1.8/sourc ... cheMap.cpp) So there is no need to download the full source.
June 11th, 2008, 1:38 pm
Hi Treydx,
I was searching for exactly the same thing as you.... unfortunately, I didn“t found any documentation. Do you have any luck other than the source code? Murilo
June 11th, 2008, 4:31 pm
Hey murilo123,
I did have some luck. For the most part, I was able to follow along with the securityfocus article to get all the info I needed. The hardest part was that that article had some errors in it. The first error was pretty obvious, it said left shift and you needed to right shift the bits or vice versa. The next problem that I found is that the bucket size is not static like the article says, but it's actually based upon the number of entries in the map. And I think that article says the wrong bucket size for each of the _CACHE_00x_ files, but that is well commented in the source via the link someone else posted in this thread. I still have most of my source code in python if you need some more help. I finished up this project a few months ago, so I don't really remember everything else off the top of my head. I do remember having to skip headers for each file...which I could never figure out what was in that part that I had to skip. I think I ended up having to skip 4096b in the map and ~276b/B in each of the cache block files. Let me know if you need some more help. If you know python, I can just send my code to you (I'm no python pro, but it works with most of my test cases). Oh, one more thing, I never took the time to merge the meta data and data from the cache files (or do extensive extension parsing, just jpg, gif, and png).
June 12th, 2008, 5:50 am
So, did you analyse the source code to find it? No documentation?
I would appreciate if you could send the code... You can use murilotito -at- gmail dot com Thanks!
June 12th, 2008, 5:36 pm
I think this is the bulk of the code that will help you. No, I never really found any documentation. There are two or three helpful comments in the source in a few different files if you need more info, but like I said, I'm no python pro
#FILE SELECTORS # - 0 = separate file on disk # - 1 = 256 byte block file # - 2 = 1k block file # - 3 = 4k block file def readmap(path): """readmap(path) Takes a directory or cachemap file as input and parses it in order to read all of the cached files that are stored in a Mozilla or Netscape format (including FireFox). Exports all files to a directory located in XXX. """ verbose = False #Change a directory to append the default file name if os.path.isdir(path): path = os.path.join(path, CACHE_MAP_FILENAME) #Open the file pointed to by path if os.path.isfile(path): if verbose: print "Reading Cache Map from: %s" % path try: #Open file to read in binary format mapfile = open(path, 'rb') except IOError: print "Cache Map could not be opened! Exiting..." return False else: print "Cache Map not found! Exiting..." return False #Begin parsing Cache Map file try: #Read, Seek, Open, Etc all cause IOErrors header = mapfile.read(20) except IOError: print "Error reading header\n" return False #Cache version #Size of cache in bytes #Number of entries stored in cache #Dirty flag #Number of records ver, datasize, entrycount, isdirty, recordcount = struct.unpack(">5I", header) if verbose: print "Version: %d" % ver print "Datasize: %d" % datasize print "EntryCount: %d" % entrycount print "IsDirty: %d" % isdirty print "RecordCount: %d" % recordcount #Read the eviction rank array (32 buckets) try: erank = mapfile.read(4*32) except IOError: print "Error reading Eviction Ranks!\n" return False eranks = struct.unpack(">32I", erank) #highest eviction rank of each bucket if verbose: print "Eviction ranks: ", eranks #Read the BucketUsage array (32 buckets) try: bu = mapfile.read(4*32) except IOError: print "Error reading the Bucket Usage!\n" return False #Number of used entries in each bucket bucketusage = struct.unpack(">32I", bu) if verbose: print "Entries Used in each Bucket: ", bucketusage #Sanity check... should be 0x114 (276) bytes into file whereami = mapfile.tell() assert whereami == 276, "Where is end of header? %d\n" % whereami #Now read BUCKETS #32 buckets in file. 256 records per bucket. 4 (32b) ints per record #Record is: Hash Number, Eviction Rank, Data Location, Metadata Location #Number of records in a bucket bucketsize = recordcount/32 recordlist = [ ] #32 buckets for i in range(32): #Read one bucket try: #256 records, 4 values/rec, 4 bytes/value start = mapfile.tell() bucket = mapfile.read(bucketsize*4*4) next = mapfile.tell() except IOError: print "Error Reading Bucket!\n" return False if (start==next): print "Could not read the bucket!" return False fmt_string = ">" + str(bucketsize*4) + "I" b = struct.unpack(fmt_string, bucket) if verbose: print b numentries = bucketusage[i] while (numentries > 0): a = readbucket(b, numentries) recordlist.append(a) numentries = numentries - 1 if verbose: print recordlist try: whereami = mapfile.tell() mapfile.seek(0,2) #move to EOF #sanity check, I better be at EOF assert whereami == mapfile.tell(), "Where is the EOF? %d,%d\n" % (whereami, mapfile.tell()) except IOError: print "ERROR! Cache map file read error.\n" return False #Close the file if not mapfile.closed: mapfile.close() return recordlist def readbucket(bucket, entry): """Given a bucket of records, read the record with index of 'entry'""" index = 4*(entry-1) return (bucket[index:index+4]) def noL(a): """Remove L suffix from int/long numbers""" if(type(a)!=types.StringType): return a if(a[-1:] == "L" or a[-1:] == "l"): return a[0:-1] return a def noX(a): """Remove 0x prefix from hex numbers""" if(type(a)!=types.StringType): return a if(a[0:2] == "0x" or a[0:2] == "0X"): return a[2:] return a def readrecord(record): """readrecord(record) -> record_dictionary Function takes in a 4-word record entry and parses the data out of it. Returns a dictionary of the record data. Keys: hash -> hex identifier erank -> eviction rank dataloc -> hex value that returns other data properties metadataloc -> same as above except for meta data datablock -> which data block file (or separate file) data is in (e.g. 1) metablock -> same .... except for meta data datafile -> the name of the data block file (_CACHE_001_) metafile -> the name of the meta data block file datastartblock -> block the data starts on in the data file metastartblock -> same .... meta data datanumblocks -> number of blocks the data spans in the data file metanumblocks -> same datablocksize -> size of a block in the file (1: 256, 2:512, 3:1024) metablocksize -> same """ rec = { } hashnumber = noL(noX(hex(record[0]))) evictionrank = int(record[1]) datalocation = int(record[2]) metadatalocation = int(record[3]) rec["hash"] = hashnumber rec["erank"] = evictionrank rec["dataloc"] = datalocation rec["metadataloc"] = metadatalocation #Calculate data/metadata locations/files #Reference: http://www.securityfocus.com/infocus/1832 is wrong here. # left shift does not make sense!!! whichdatablockfile = noL((datalocation & FILESELECTORMASK) >> 28) whichmetablockfile = noL((metadatalocation & FILESELECTORMASK) >> 28) rec["datablock"] = int(whichdatablockfile) rec["metablock"] = int(whichmetablockfile) datafile = CACHE_BLOCKS[whichdatablockfile] metafile = CACHE_BLOCKS[whichmetablockfile] #if index is 0, data is stored in a separate file #filename: <hashnumber><type><generationnumber> # hex conversions - remove first 2 (0x); add 00 to front; get last 2 if (datafile == ""): gen = ("000" + noX(hex(datalocation & 0xFF)))[-3:-1] datafile = noL(str(hashnumber)) + "d" + str(gen) if (metafile == ""): gen = ("000" + noX(hex(metadatalocation & 0xFF)))[-3:-1] metafile = noL(str(hashnumber)) + "m" + str(gen) rec["datafile"] = datafile rec["metafile"] = metafile #calculate start block datastartblock = int(datalocation & 0xFFFFFF) metastartblock = int(metadatalocation & 0xFFFFFF) rec["datastartblock"] = datastartblock rec["metastartblock"] = metastartblock datanumblocks = int((datalocation & 0x03000000) >> 24) metanumblocks = int((metadatalocation & 0x03000000) >> 24) rec["datanumblocks"] = datanumblocks rec["metanumblocks"] = metanumblocks #BLOCKSIZE = (0, 256, 512, 1024) #another error in reference? BLOCKSIZE = (0, 256, 1024, 4096) datablocksize = BLOCKSIZE[whichdatablockfile] metablocksize = BLOCKSIZE[whichmetablockfile] rec["datablocksize"] = datablocksize rec["metablocksize"] = metablocksize return rec def getdata(rec): """given a record, pull the binary data out of the cache files""" verbose = False #Open the data file path = os.path.join(INPATH, rec["datafile"]) if verbose: print "Path to Data file: %s" % path try: df = open(path, "rb") except IOError: if verbose: #Too many errors: fail silently print "Error opening data cache file: %s" % rec["datafile"] print " - ", rec return False #Open the metadata file path = os.path.join(INPATH, rec["metafile"]) if verbose: print "Path to Meta Data File: %s" % path try: mf = open(path, "rb") except IOError: if verbose: print "Error opening meta cache file: %s" % rec["metafile"] print " - ", rec return False #Read Data if (rec["datablock"] != 0): bsize = int(rec["datablocksize"]) start = int(rec["datastartblock"]) numbl = int(rec["datanumblocks"]) + 1 try: #Skip past header df.seek(4096, 0) #seek to block (1=relative mode) df.seek(bsize * start, 1) #read data (number of blocks * block size) data = df.read(bsize * numbl) except IOError: print "Error reading in data file!" print " - ", rec return False #Standalone file (i.e. not _CACHE_00X_) elif (rec["datablock"] == 0): try: #Read entire file... I am not sure if there are headers data = df.read() except IOError: print "Error reading standalond data file!" print " - ", rec return False else: print "Unknown data block reading file!" print " - ", rec return False df.close() #Read Metadata if (rec["metablock"] != 0): bsize = int(rec["metablocksize"]) start = int(rec["metastartblock"]) numbl = int(rec["metanumblocks"]) + 1 try: #Skip header mf.seek(4096, 0) #Seek to starting block (1 = relative seek mode) mf.seek(bsize * start, 1) #Read the data meta = mf.read(bsize * numbl) except IOError: print "Error reading in meta data file!" print " - ", rec return False #Standalone file elif(rec["metablock"] == 0): try: #Read entire metadata file meta = mf.read() except IOError: if verbose: print "Error reading standalone metadata file!" print " - ", rec return False else: print "Unknown meta block reading file!" print " - ", rec return False mf.close() #Common signatures extension = "" #<- default #Guess at signatures (Could use foremost.config???) try: if(issig(data[:6], "474946383761") or issig(data[:6], "474946383961")): extension = ".gif" if(issig(data[:11], "FFD8FFE0XXXX4A46494600")): extension = ".jpg" if(issig(data[:8], "89504E470D0A1A0A")): extension = ".png" if(issig(data[:4], "00000100")): extension = ".ico" if(issig(data[:3], "1F8B08")): extension = ".gz" if(issig(data[:3], "435753") or issig(data[:2],"464C56") or issig(data[:2],"465753")): extension = ".swf" if(issig(data[:3], "494433")): extension = ".mp3" if(issig(data[:4], "504B0304")): extension = ".zip" #if(issig(data[:8], "D0CF11E0A1B11AE1")): # extension = ".doc" #<- Really any microsoft office document! except ValueError: pass #Write the data file out OUTFILENAME = "cf" + str(rec["hash"]) + extension outplace = os.path.join(OUTFOLDER, OUTFILENAME) #print "OUTPLACE: ", outplace outfile = open(outplace, "wb") #write in binary outfile.write(data) outfile.close() #Write the meta data file out #Todo: Merge this into the data files?? OUTFILENAME = "cf" + str(rec["hash"]) + ".meta" outplace = os.path.join(OUTFOLDER, OUTFILENAME) outfile2 = open(outplace, "wb") outfile2.write(meta) outfile2.close() return True
12 posts • Page 1 of 1
Who is onlineUsers browsing this forum: No registered users and 2 guests |
|