OctetView - A tutorial¶
OctetView is by far the easiest way to take artifacts apart.
The basic idea is that that artifact is cut into non-overlapping objects, each of which covers one or more octets in the artifact.
…and then the OctetView class more or less takes care of the rest.
The objects can be examined before they are and discarded or inserted into the interpretation of the artifact.
Lets take an example:
from autoarchaeologist.base import octetview as ov
class CBM900LOut(ov.OctetView):
''' CBM900 L.out binary format '''
def __init__(self, this):
super().__init__(this)
header = LdHeader(self, 0)
if header.l_magic.val != 0o407 or header.l_flag.val != 0x10:
return
header.insert()
self.add_interpretation()
We are writing an examiner for the CBM900 “l.out” object file format, but first we have to find out if the artifact is one.
After we have initialized the OctetView parent class, we create an object starting at the first octet in the artifact.
The l.out files start out with this structure:
struct ldheader {
int l_magic; /* Magic number */
int l_flag; /* Flags */
int l_machine; /* Type of target machine */
vaddr_t l_entry; /* Entrypoint */
size_t l_ssize[NLSEG]; /* Segment sizes */
};
But our view is the actual storage layout of the structure, on this
particular hardware, using that specific C-compiler, so we define
our LdHeader class like this:
class LdHeader(ov.Struct):
def __init__(self, tree, lo):
super().__init__(
tree,
lo,
l_magic_=ov.Le16,
l_flag_=ov.Le16,
l_machine_=ov.Le16,
l_entry_=ov.Le32,
l_ssize_=ov.Array(9, ov.Le32, vertical=True),
pad__=2,
vertical=True,
)
tree is the OctetView we are working in, aka self in
the CBM900LOut class.
lo is the address where this data structure lives.
The name of the next five arguments end in an underscore, so they each define a field in the structure, by specifying which class to instantiate for that field.
If we run the snippet above we get an interpretation which looks like this:
0x000…030 LdHeader {
0x000…030 l_magic = 0x0107 // @0x0
0x000…030 l_flag = 0x0010 // @0x2
0x000…030 l_machine = 0x0004 // @0x4
0x000…030 l_entry = 0x00000030 // @0x6
0x000…030 l_ssize = [ // @0xa
0x000…030 [0x0]: 0x000000be
0x000…030 [0x1]: 0x00000000
0x000…030 [0x2]: 0x00000000
0x000…030 [0x3]: 0x00000000
0x000…030 [0x4]: 0x00000000
0x000…030 [0x5]: 0x00000000
0x000…030 [0x6]: 0x00000000
0x000…030 [0x7]: 0x0000009a
0x000…030 [0x8]: 0x0000004e
0x000…030 ]
0x000…030 }
0x030…0ee ab f1 2f […] 00 a9 fb ┆ /[…] ┆
[…]
The pad__=2 field is missing because field arguments
which end in two underscores are not rendered.
The rest of the artifact is default-hexdumped, because we have not created any objects which cover that part of it.
If we had not specified vertical=True to ov.Array
the members of the array would all be on a single line,
and likewise, without vertical=True the entire LdHeader
would be rendered on a single line.
Having structures and arrays horizontal while a data format is
reverse engineered makes it possible to grep -r all instances
of a struct in the entire excavation, to try to glean what this or
that field can contain and might mean.
Naked Structs¶
In normal structs the field attributes (ie: foo.field)
are the field objects.
In practice most fields are plain numbers, and it is a bit of bother
to write foo.field.val to get their numerical value.
In “Naked structs”, made so with the optional argument naked=True,
the field attribute will be field.val if the added field has
that attribute, so that the numeric value is available with foo.field.
Note that this snapshots struct.field.val so later modifications to it will
not be reflected in struct.field.
Variable Structs¶
Variable structures are created like this:
class Something(ov.Struct):
def __init__(self, tree, lo):
super().__init__(
tree,
lo,
width_=ov.Be24,
name_=ov.Text(5),
more=True,
)
if self.width.val < (1<<8):
self.add_field("payload", ov.Octet)
elif self.width.val < (1<<16):
self.add_field("payload", ov.Be16)
elif self.width.val < (1<<24):
self.add_field("payload", ov.Be24)
else:
print("Somethings wrong", self)
exit(2)
self.done()
Field classes¶
Field classes should be subclassed from ov.Octets which
ov.Struct also is, so yes: Structs can be nested.
OctetView comes with a lot of handy subclasses already, and most of them do what you expect:
Octets - some number of octets
Hidden - rendered as “Hidden”, no matter how small or big
Opaque - rendered as “class-name[0x%x]”
HexOctets - rendered as hex string without spaces
Dump - octets but rendered with hex+text
This - an artifact
Text - strings
Array - Arrays of some field class
Octet - a single octet value
Le16, Le24, Le32, Le64 - Little endian integers
Be16, Be24, Be32, Be64 - Big endian integers
L2301, L1032 - Confused endian double word integers
ov.Array is a factory which will return a class which
in the example above is used for an array of 9 little-endian 32 bit
numbers.
All the elements of an array has the same class, but they need not
have the same size.
ov.Text is a factory which returns a class for a string of
a given length.
Field classes must have a render() method which is responsible for
how they will appear in the interpretation, so for instance a RC4000
timestamp can be defined like this:
class ShortClock(ov.Be24):
def render(self):
if self.val == 0:
yield " "
else:
ut = (word << 19) * 100e-6
t0 = (366+365)*24*60*60
yield time.strftime(
"%Y-%m-%dT%H:%M",
time.gmtime(ut - t0)
)
Syntactic Sugar¶
There are two levels of syntactic sugar available on top of ov.Struct.
The first level of syntactic sugar this:
class CDef():
pointer = ov.Le32
char = ov.Octet
short = ov.Le16
int = ov.Le32
long = ov.Le64
uid_t = ov.Le16
gid_t = ov.Le16
daddr_t = ov.Le32
class Inode(ov.Struct):
TYPES = CDef()
FIELDS = [
( "di_mode", "short"),
( "di_nlink", "short"),
( "di_uid", "uid_t"),
( "di_gid", "gid_t"),
[…]
( "di_dbx", "daddr_t", 12),
[…]
]
As the example indicates, this allows common UNIX structures to be “fleshed out” with platform specific variable types.
The type classes should be able to impose any alignment or padding they require, but this has not been tested in practice yet.
The advantage of using this form, is that subclasses can easily edit the field list, for instance to insert or delete fields.
The second level of synctactic sugar makes that harder, but it is really convenient:
class Inode(ov.Struct):
TYPES = CDef()
FIELDS = ov.cstruct_to_fields('''
short di_mode;
short di_nlink;
uid_t di_uid;
gid_t di_gid;
[…]
daddr_t di_dbx[12]
[…]
'''
(Pointer syntax and multidimensional arrays are not yet supported.)
When octets are too big¶
If octets are too big the the job, OctetView has a sibling called
BitView, which can do the exact same things, but with 8 times
higher resolution, and much more than 8 times slower.