.NET Information center: Hacking Mono.Cecil

In the last post we took a look at how to manually remove invalid opcodes from an obfuscated assembly. We did this by decompiling the assembly, replacing with the nop opcode and then recompiling. We used this manual method of removing these as Mono.Cecil crashed at the sight of some of the invalid opcodes. In this post we take a look at a tiny "hack" to Mono.Cecil which allows us to do the same thing in an automated manner.

Read more: Paul Mason

Recap: What needs fixing?

Please note: An assumption is made in this article that all
invalid opcodes are single byte opcodes; this example does not cater for
invalid double byte opcodes.

Well, to work out what needs fixing, we'll firstly write some code
that we'll use to break Mono.Cecil (and for testing):

//Load the assembly
var
 assembly = AssemblyFactory.GetAssembly(
    @"D:\temp\Obfuscated\SimpleLibrary.dll");
 
//Output the il for each method in the assembly
foreach (TypeDefinition type in assembly.MainModule.Types)
{
    //Go through each method
    foreach (MethodDefinition
 def in type.Methods)
    {
        //Check the body
        if (def.HasBody)
        {
            //Get the 
CIL worker
            CilWorker worker = def.Body.CilWorker;
 
            //Chuck 
the bad instructions in here to avoid modifying the collection
            List<Instruction>
 instructionsToFix = new List<Instruction>();
 
            //Go 
through each instruction
            foreach
 (Instruction instr in
 def.Body.Instructions)
            {
                //TODO: 
Some how figure out if it is one to fix and add it to be fixed
            }
 
            //Go 
through the ones to fix and replace
            foreach
 (Instruction instr in
 instructionsToFix)
            {
                Instruction 
newInstr = worker.Create(OpCodes.Nop);
                worker.Replace(instr,
 newInstr);
            }
        }
    }
}
 
//Save the assembly
AssemblyFactory.SaveAssembly(assembly,
 @"D:\temp\Obfuscated\SimpleLibrary.new.dll");

This is some pretty basic code which simply goes through each type
and each method inside an assembly and replaces all invalid opcodes with
a nop.

When we run this code using the default version of Mono.Cecil we
unfortunately come across an error:

Mono.Cecil didn't like
an opcode

Now we know what we're fixing!

Getting the source

First of all, we need to get the source for Mono.Cecil to start
working with it. Rather than get the entire Mono system, I decided to
just check out the project that I needed via SVN:

svn co svn://anonsvn.mono-project.com/source/trunk/mcs/class/Mono.Cecil

Unfortunately the project won't compile by itself due to the .snk
file being located in a directory one up from Mono.Cecil. For this
example I simply turned off assembly signing to get this compiling,
however please feel free to download the .snk file and place it in the
appropriate location to have a fully signed version of Mono.Cecil.

Hacking Mono.Cecil

Now that we've got the source and it's compiling; let's hack it. Now,
from the screenshot you'll see that the error is sourcing from the
CodeReader class on line 207 (in my copy anyway). Taking a look in the
code at that line we see the following switch statement:

if (cursor 
== 0xfe)
    op = OpCodes.TwoBytesOpCode [br.ReadByte ()];
else
    op = OpCodes.OneByteOpCode [cursor];
 
Instruction
 instr = new Instruction
 ((int) offset, 
op);
switch
 (op.OperandType) {
case OperandType.InlineNone :
    break;
...
case OperandType.InlineTok :
    MetadataToken token = new MetadataToken 
(br.ReadInt32 ());
    switch (token.TokenType) {
...
    default:
        throw new ReflectionException ("Wrong token: " + token);
    }
    break;
}

That's our error message alright; and it seems to be happening
because it is going into OperandType.InlineTok. Hmmm... well, ideally
we'd like to go into InlineNone due to not having any subsequent
operand. As you can see, the OperandType comes from the variable op
which is defined by the lines:

if (cursor 
== 0xfe)
    op = OpCodes.TwoBytesOpCode [br.ReadByte ()];
else
    op = OpCodes.OneByteOpCode [cursor];

Well, since we're only working with one byte op codes in this
example, let's concentrate on that. The OpCodes.OneByteOpCode variable
is actually an array which places each opcode as a position in the array
according to it's byte code representation; for example: index 0 = 0x00
= nop, index 1 = 0x01 = break ... etc. In one
of our previous articles, we placed several invalid opcode bytes
throughout the code; all within a certain subset: 0xbe, 0xc0, 0xc1...
etc. Therefore, our invalid opcodes should be at the specified index of
OneByteOpCode; i.e. 190, 192, 193... etc.

Still following? Essentially to solve this problem we need to see
what opcodes are being defined at these indexes in Mono.Cecil at
runtime. Well, as we all know, a struct is never null therefore the
object at each of those "unused" opcode indexes is an empty struct (i.e.
all variables left uninitialised). Due to the way that the Mono.Cecil
OpCode object works, this gives us a confusing result stating that the
size of the OpCode is two bytes - even though it is in the one byte
array (check out OpCode.Size property to see why).

No wonder it causes problems! So how do we fix this? Well, for a
start we should initialise the array inside the OpCodes class to avoid
this issue:

01.static OpCodes()
02.{
03.    //Start from first index to avoid 
nop
04.    for (int i = 1; i < 
OneByteOpCode.Length; i++)
05.    {
06.        //Check to see if it is listed
 as an arglist... but not one
07.        if (OneByteOpCode[i].Op2
 == 0x00 && OneByteOpCode[i].Code != Code.Arglist)
08.        {
09.            OneByteOpCode[i]
 = new OpCode(0xff,
 (byte) i, 
Code.Unused, FlowControl.Next, OpCodeType.Primitive,
10.                                          OperandType.InlineNone, StackBehaviour.Pop0, 
StackBehaviour.Push0);
11.        }
12.    }
13.}

Basically we are looking for all OpCodes that haven't been
initialised properly; that is those with Op2=0x0. We have to be careful
however: both Nop and Arglist use an empty Op2 correctly - therefore we
intentionally skip these ones. Now, if you copied and pasted this into
your code it will complain about the variable Code.Unused. To make
things cleaner I simply added a new option to the Code enum so that
identification of invalid OpCodes is nice and easy. The reason I use the
word "unused" is really so that it is inline with how ILDASM sees an
invalid OpCode.

Before we finish hacking Mono.Cecil; there is one more "aesthetic"
change that I thought I'd make. Technically, the change above fixes the
issue for us; however being the pedantic guy that I am, I also wanted to
fix the "ToString()" method so that it'd display "unused" instead of
"arglist" when an invalid OpCode is present. Well, it actually isn't a
hard aesthetic fix to make. Simple find the Name property in the OpCode
class, and use the following:

1.public string
 Name {
2. get {
3.  int index =
 (Size == 1) ? Op2 : (Op2 + 256);
4.  return OpCodeNames.names
 [index] ?? "unused";
5. }
6.}

Now to test it all...

Testing our results

As you'll remember; I declared a new enum member: Code.Unused.
It starts to come in use when we rewrite our testing program:

//Load the assembly
var
 assembly = AssemblyFactory.GetAssembly(
    @"D:\temp\Obfuscated\SimpleLibrary.dll");
 
//Output the il for each method in the assembly
foreach (TypeDefinition type in assembly.MainModule.Types)
{
    //Go through each method
    foreach (MethodDefinition
 def in type.Methods)
    {
        //Check the body
        if (def.HasBody)
        {
            //Get the 
CIL worker
            CilWorker worker = def.Body.CilWorker;
 
            //Chuck 
the bad instructions in here to avoid modifying the collection
            List<Instruction>
 instructionsToFix = new List<Instruction>();
 
            //Go 
through each instruction
            foreach
 (Instruction instr in
 def.Body.Instructions)
            {
                //Remove 
invalid opcode
                if (instr.OpCode.Code == 
Code.Unused)
                    instructionsToFix.Add(instr);
            }
 
            //Go 
through the ones to fix and replace
            foreach
 (Instruction instr in
 instructionsToFix)
            {
                Instruction 
newInstr = worker.Create(OpCodes.Nop);
                worker.Replace(instr,
 newInstr);
            }
        }
    }
}
 
//Save the assembly
AssemblyFactory.SaveAssembly(assembly,
 @"D:\temp\Obfuscated\SimpleLibrary.new.dll");

We use Code.Unused to test for an invalid opcode to replace.
What are the results? Well, Reflector can now decompile the code as per
usual (again):

Reflector now works ok
again

Conclusion

This week we took a look at "fixing" the problem with Mono.Cecil when
we reached an invalid OpCode. Essentially to fix the problem in
Mono.Cecil involved:

Creating a new enum member Code.Unused so that we can identify
invalid opcodes
Initialising the static array with our invalid opcodes:
OpCodes.OneByteOpCode. This helped provide us with accurate opcode
descriptions in unused positions.
(Optional) Changing OpCode.Name to return an accurate friendly name
for invalid opcodes.

Once Mono.Cecil could handle these Opcodes, we had no problem
whatsoever writing an automated tool to "fix" the assembly for us. It
certainly doesn't take much to reverse some of the "value added"
obfuscation techniques does it!?

Next time

Well, that's all for this week. If you have any
questions/suggestions/notes, then please let me know. Not sure what the
next article will be about yet, however I'll be sure to make it
something interesting (perhaps tamper proofing?). What are your
thoughts?

.NET Information center

Sunday, January 31, 2010

Hacking Mono.Cecil - allowing invalid opcodes