Software licenses are often crackable. Deák Ferenc presents a technique for tackling this problem.
From the early days of the commercialization of computer software, malicious programmers, also known as crackers, have been continuously nettling the programmers of the aforementioned software by constantly bypassing the clever licensing mechanisms they have implemented in their software, thus causing financial damages to the companies providing the software.
This trend has not changed in recent years: the cleverer the routines the programmers write, the more time is spent by crackers in invalidating the newly created routines, and in the end the crackers always succeed. For companies to be able to keep up with the constant pressure provided by the cracking community, they would need to constantly change their licensing and identification algorithms, but in practice this is not a feasible way to deal with the problem.
An entire industry has evolved around software protection and licensing technologies, where renowned companies offer advanced (and expensive) solutions to tackle this problem. The protection schemes range from using various resources such as hardware dongles, to network activation, from unique license keys to using complex encryption of personalized data – the list is long.
This article provides a short introduction to illustrate a very simple and naive licensing algorithm’s internal workings. We will show how to bypass it in an almost real life scenario, and finally present a software based approach to mitigate the real problem by hiding the license checking code in a layer of obfuscated operations generated by the C++ template metaprogramming framework, which will make the life of the person wanting to crack the application a little bit harder. Certainly, if they are well determined, the code will still be cracked at some point, but at least we’ll make it harder for them.
A naive licensing algorithm
The naive licensing algorithm is a very simple implementation that checks the validity of a license associated with the name of the user who purchased the associated software. It is not an industrial strength algorithm: it only has demonstrative power, while trying to provide insight to the actual responsibilities of a real licensing algorithm.
Since the license checking code is usually shipped with the software product in compiled form, I’ll put in here both the generated code (in Intel x86 assembly) since that is what the crackers will see after a successful disassembly of the executable and the C++ code for the licensing algorithm. In order not to pollute precious page space with unintelligible binary code, I will restrict myself to including only the relevant bits of the code that naively determines whether a supplied license is valid or not, together with the C++ code that was used to generate the binary code.
Listing 1 is the source code of the licensing algorithm.
static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; bool check_license(const char* user, const char* users_license) { std::string license; size_t ll = strlen(users_license); size_t l = strlen(user), lic_ctr = 0; int add = 0; for (size_t i = 0; i < ll; i++) if (users_license[i] != '-') license += users_license[i]; while (lic_ctr < license.length() ) { size_t i = lic_ctr; i %= l; int current = 0; while (i < l) current += user[i ++]; current += add; add++; if (license[lic_ctr] != letters[current % sizeof letters]) return false; lic_ctr++; } return true; } |
Listing 1 |
The license which this method validates comes is in the form ABCD-EFGH-IJKL-MNOP, and there is an associated
generate_license
method which is presented as an appendix to this article.
Also, the naivety of this method is easily exposed by using the very proper name of
check_license
which immediately reveals to the want-to-be attacker where to look for the code checking the ... license. If you want to make harder for the attacker to identify the license checking method, I’d recommend either using some irrelevant names or just stripping all symbols from the executable as part of the release process.
The interesting part is the binary code of the method obtained via compilation of the corresponding C++ code (see Listing 2), which we obtained by compiling it with Microsoft Visual C++ 2015. I have compiled it in Release mode (with Debug information included for educational purposes) but it is intentionally not the Debug version, since we would hardly ship the debug version of the code to our customers.
if (license[lic_ctr] != letters[current % sizeof letters]) 00FC15E4 lea ecx,[license] 00FC15E7 cmovae ecx,dword ptr [license] 00FC15EB xor edx,edx 00FC15ED push 1Bh 00FC15EF pop esi 00FC15F0 div eax,esi 00FC15F2 mov eax,dword ptr [lic_ctr] 00FC15F5 mov al,byte ptr [ecx+eax] 00FC15F8 cmp al,byte ptr [edx+0FC42A4h] 00FC15FE jne check_license+0DEh (0FC1625h) return false; lic_ctr++; 00FC1600 mov eax,dword ptr [lic_ctr] 00FC1603 mov ecx,dword ptr [add] 00FC1606 inc eax 00FC1607 mov dword ptr [lic_ctr],eax 00FC160A cmp eax,dword ptr [ebp-18h] 00FC160D jb check_license+7Fh (0FC15C6h) } return true; 00FC160F mov bl,1 00FC1611 push 0 00FC1613 push 1 00FC1615 lea ecx,[license] 00FC1618 call std::basic_string<char,std::char_traits<char>, std::allocator<char> >::_Tidy (0FC1944h) 00FC161D mov al,bl } 00FC161F call _EH_epilog3_GS (0FC2F7Ch) 00FC1624 ret 00FC1625 xor bl,bl 00FC1627 jmp check_license+0CAh (0FC1611h) |
Listing 2 |
I have also used the built-in debugger of the VS IDE to visualize the generated code next to the source, which facilitates a better understanding of the relation between the two of them.
Let’s analyze it for a few moments. The essence of the validity checking happens at address
00FC15F8
where the comparison
cmp al, byte ptr [edx+0FC42A4h]
takes place (for those wondering,
edx
gets its value as being the remainder of the division at
00FC15F0
).
At this stage, the value of the
al
register is already initialized with the value of
license[lic_ctr]
and that is what is compared to the expected character. If it does not match, the code jumps to
0FC1625h
where the
bl
register is zeroed out (
xor bl, bl
) and from there the jump goes backward to
0FC1611h
to leave the method with the
ret
instruction found at
00FC1624
. Otherwise the loop continues.
The most common way of returning a value from a method call is to place the value in the
eax
register and let the calling code handle it, so before returning from the method the value of
al
is populated with the value of the
bl
register (via
mov al, bl
found at
00FC161D
).
Please remember that if the check discussed before did not succeed the value of the
bl
register was 0, but this
bl
was initialized to
1
(via
mov bl,1
at
00FC160F
) if the entire loop was successfully completed.
From the perspective of an attacker, the only thing that needs to be done is to replace the binary sequence of
xor bl,bl
with the binary code of
mov bl,1
in the executable. Since luckily these two are the same length (2 bytes), the crack is ready to be published within a few seconds.
Moreover, due to the simplicity of the implementation of the algorithm, a highly skilled cracker could easily create a key-generator for the application, which would be an even worse scenario as the cracker doesn’t have to modify the executable. This means that further safety steps, such as integrity checks of the application, would all be executed correctly, but there would be a publicly available key-generator which could be used by anyone to generate a license-key without ever paying for it, or malicious salesmen could generate counterfeit licenses which they could sell to unsuspecting customers.
Here let’s look at our C++ obfuscating framework.
The C++ obfuscating framework
The C++ obfuscating framework provides a simple macro-based mechanism, combined with advanced C++ template meta-programming techniques for relevant methods and control structures, to replace the basic C++ control structures and statements with highly obfuscated code which makes the reverse engineering of the product a complex and complicated procedure.
By using the framework, reverse engineering the license checking algorithm presented in the previous section would prove to be a highly challenging task due to the sheer amount of extra code generated by the framework engine.
The framework has adopted a familiar, BASIC-like, syntax to make the switch from real C++ source code to the macro language of the framework as easy and painless as possible.
Functionality of the framework
The role of the obfuscating framework is to generate extra code, while providing functionality which is expected by the user, with as few syntax changes to the language as possible.
The following functionalities are provided by the framework:
-
wrap all values into a
valueholder
class thus hiding them from immediate access -
provide a BASIC-like syntax for the basic C++ control structures (
if
,for
,while
...) - generate extra code to achieve complex code, making it harder to understand
- randomize constant values in order to hide the information.
Debugging with the framework
Like every developer who has been there, we know that debugging complex and highly templated C++ code can sometimes be a nightmare. In order to avoid this nightmare while using the framework, we decided to implement a debugging mode.
To activate the debugging mode of the framework, define the
OBF_DEBUG
identifier before including the obfuscation header file. Please see the specific control structures for how the debugging mode alters the behaviour of the macro.
Using the framework
The basic usage of the framework boils down to including the header file providing the obfuscating functionality
#include "instr.h"
then using the macro pair
OBF_BEGIN
and
OBF_END
as delimiters of the code sequences that will be using obfuscated expressions.
For a more under-the-hood view of the framework, the
OBF_BEGIN
and
OBF_END
macros declare a
try
-
catch
block, which has support for returning values from the obfuscated current code sequence, and also provides support for basic control flow modifications such as the usage of
continue
and
break
emulator macros
CONTINUE
and
BREAK
.
Behind the scenes: OBF_BEGIN and OBF_END
OBF_BEGIN
expands to:
#define OBF_BEGIN \ try {obf::next_step __crv = \ obf::next_step::ns_done; \ std::shared_ptr<obf::base_rvholder> \ __rvlocal;
and
OBF_END
becomes:
#define OBF_END } \ catch(std::shared_ptr<obf::base_rvholder>& r) { \ return *r; } catch (...) {throw;}
In order to support for ‘returning’ a value from the current obfuscated block we need a special variable
__rvlocal
. At later stages, this value will be populated with meaningful values as a result of executing the code of the
RETURN
macro (which will ‘throw’ a value with a type of
std::shared_ptr<obf::base_rvholder>
). The
OBF_END
will catch this specific value and handle it appropriately, while all other values thrown will be re-thrown in order to not to disturb the client code’s exception handling.
Value and numerical wrappers
To achieve an extra layer of obfuscation, the integral numerical values can be wrapped in the macro
N()
and all integral numeric variables (
int
,
long
, ...) can be wrapped in the macro
V()
to provide an extra layer of obfuscation for doing the calculation operations. The
V()
value wrapper also can wrap individual array elements(
x[2]
), but not arrays (
x
) and also cannot wrap class instantiation values due to the fact that the macro expands to a reference holder object.
The implementation of the wrappers uses the link time random number generator provided by [ Andrivet ] and the values are obfuscated by performing various operations to hide the original value.
And here is an example for using the value and variable wrappers:
int a, b = N(6); V(a) = N(1);
After executing the statement above, the value of
a
will be 1.
The value wrappers implement a limited set of operations which you can use to change the value of the wrapped variable. These are the compound assignment operators:
+=
,
-=
,
*=
,
/=
,
%=
,
<<=
,
>>=
,
&=
,
|=
and
^=
, and the post/pre-increment operations
--
and
++
. All of the binary operators (
+
,
-
,
*
,
/
,
%
,
&
,
|
,
<<
and
>>
) are also implemented, so you can write
V(a) + N(1)
or
V(a) - V(b)
.
Also, the assignment operator to a specific type and from a different value wrapper is implemented, together with the comparison operators.
As the name implies, the value wrappers will wrap values by offering a behaviour similar to the usage of simple values, so be aware that variables which are
const
values can be wrapped into the
V()
wrapper but, as with real const variables, you cannot assign to them. So for example the following code will not compile:
const char* t = "ABC"; if( V(t[1]) == 'B') { V( t[1] ) = 'D'; }
And the following
char* t = "ABC"; if( V(t[1]) == 'B') { V( t[1] ) = 'D'; }
will be undefined behaviour because the compiler will highly probably allocate the string
"ABC"
in a constant memory area (although I would expect your compiler to choke heavily on this expression since it’s not valid modern C++ anymore). To work with this kind of data, always use
char[]
instead of
char*
.
Behind the scenes of the implementation of numeric wrapping
The
N
macro is defined like the following:
#define N(a) (obf::Num<decltype(a), \ obf::MetaRandom<__COUNTER__, 4096>:: value ^ \ a>().get() ^ obf::MetaRandom<__COUNTER__ - 1, \ 4096>::value)
As a first step, let’s consider that due to the implementation of [
Andrivet
] and the (more or less standard)
__COUNTER__
macro, the following will have the same value:
obf::MetaRandom<__COUNTER__, 4096>::value obf::MetaRandom<__COUNTER__ - 1, 4096>::value)
Now, taking the
obf::Num
class into view, we have Listing 3, where the iteration of the templates is finalized by Listing 4.
template<typename T, T n> class Num final { public: enum { value = ( (n & 0x01) | ( Num < T , (n >> 1)>::value << 1) ) }; Num() : v(0) { v = value ^ MetaRandom<32, 4096>::value; } T get() const { volatile T x = v ^ MetaRandom<32, 4096>::value; return x; } private: volatile T v; }; |
Listing 3 |
struct ObfZero { enum {value = 0}; }; struct ObfOne { enum {value = 1}; }; #define OBF_ZERO(t) template <> struct Num<t,0> final : public ObfZero { t v = value; }; #define OBF_ONE(t) template <> struct Num<t,1> final : public ObfOne { t v = value; }; #define OBF_TYPE(t) OBF_ZERO(t) OBF_ONE(t) OBF_TYPE(int) // And for all other integral types |
Listing 4 |
The
Num
class tries to add some protection by adding some extra xor operations to the use of a simple number, thus turning a simple numeric assignment into several steps of assembly code (Visual Studio 2015 generated the code Listing 5 in Release With Debug Info mode).
int n; OBF_BEGIN n = N(42); 002A5F74 mov dword ptr [ebp-4],0 002A5F7B mov dword ptr [ebp-4],78Ch 002A5F82 mov eax,dword ptr [ebp-4] 002A5F85 xor eax,0E8Fh 002A5F8A mov dword ptr [ebp-4],eax 002A5F8D mov eax,dword ptr [ebp-4] 002A5F90 xor eax,929h OBF_END |
Listing 5 |
However, please note the several
volatile
variables ... which are required to circumvent today’s extremely clever optimizing compilers. If we remove the
volatile
from the variables, the compiler is clever enough to guess the value I wanted to obfuscate, so ... there goes the obfuscation.
Behind the scenes of the implementation of variable wrapping
When we are not building the code in debugging mode, the macro
V
expands to the following C++ nightmare:
#define MAX_BOGUS_IMPLEMENTATIONS 3 #define V(a) ([&]() \ {obf::extra_chooser<std::remove_reference \ <decltype(a)>::type, \ obf::MetaRandom<__COUNTER__,\ MAX_BOGUS_IMPLEMENTATIONS>::value > \ ::type _JOIN(_ec_,__COUNTER__)(a);\ return obf::stream_helper();}() << a)
So let’s dissect it in order to understand the underlying operations.
The value wrappers add an extra obfuscation layer to the values they wrap, by performing an extra addition, an extra subtraction or an extra xor operation on the value itself. This is picked randomly when compilation happens by the
extra_chooser
class, which is like:
template <typename T, int N> class extra_chooser { using type = basic_extra; };
and is helped by the following constructs:
#define DEFINE_EXTRA(N,implementer) template \ <typename T> struct extra_chooser<T,N> { \ using type = implementer<T>; } DEFINE_EXTRA(0, extra_xor); DEFINE_EXTRA(1, extra_substraction); DEFINE_EXTRA(2, extra_addition);
which are the actual definition of the classes for the extra operations, which in their turn look like Listing 6, where the extra addition and subtraction are also very similar.
template <class T> class extra_xor final : public basic_extra { public: extra_xor(T& a) : v(a) { volatile T lv = MetaRandom<__COUNTER__, 4096>::value; v ^= lv; } virtual ~extra_xor() { volatile T lv = MetaRandom<__COUNTER__ - 1, 4096>::value; v ^= lv; } private: volatile T& v; }; |
Listing 6 |
The next thing we observe is that an object of this kind (extra bogus operation chooser) is defined in a lambda function for the variable we are wrapping. The variable name for this is determined by
_JOIN(_ec_,__COUNTER__)(a)
, where
_JOIN
is just a simple joiner macro:
#define _JOIN(a,b) a##b
Upon creation and destruction of this
extra_chooser
object, the value of the object will remain unchanged; however, extra code will be generated by the compiler (thanks to the numerous
volatile
modifiers found in the extra operation classes, otherwise the compiler would ‘cheat’ again and just ‘skip’ our obfuscation). This is actually an extensible interface, so you can use it to define your own class for bogus operations using the
DEFINE_EXTRA
macro (and increase the
MAX_BOGUS_IMPLEMENTATIONS
as required).
Now, back to the lambda because it plays an important role. The lambda returns an object of type
obf::stream_helper()
, which is basically an empty class (
class stream_helper {};
), but the role of the lambda is still not done. As we can see in the macro, the lambda is executed and into its result (the
obf::stream_helper()
object) we stream the parameter of the macro (
<< a
). This gives control to the following operator:
template <typename T> refholder<T> operator << (stream_helper, T& a) { return refholder<T>(a); }
providing us with a controversial class,
refholder
(Listing 7).
template <typename T> class refholder final { public: refholder() = delete; refholder(T& pv) : v(pv) {} refholder(T&&) = delete; ~refholder() = default; refholder<T>& operator = (const T& ov) { v = ov; return *this; } refholder<T>& operator = (const refholder<T>& ov) { v = ov.v; return *this; } bool operator == (const T& ov) { return !(v ^ ov); } bool operator != (const T& ov) { return !operator ==(ov); } COMPARISON_OPERATOR(>=) COMPARISON_OPERATOR(<=) COMPARISON_OPERATOR(>) COMPARISON_OPERATOR(<) operator T() {return v;} refholder<T>& operator++() { ++ v; return *this; } refholder<T>& operator--() { -- v; return *this; } refholder<T> operator++(int) { refholder<T> rv(*this); operator ++(); return rv; } refholder<T> operator--(int) { refholder<T> rv(*this); operator --(); return rv; } COMP_ASSIGNMENT_OPERATOR(+) COMP_ASSIGNMENT_OPERATOR(-) COMP_ASSIGNMENT_OPERATOR(*) COMP_ASSIGNMENT_OPERATOR(/) COMP_ASSIGNMENT_OPERATOR(%) COMP_ASSIGNMENT_OPERATOR(<<) COMP_ASSIGNMENT_OPERATOR(>>) COMP_ASSIGNMENT_OPERATOR(&) COMP_ASSIGNMENT_OPERATOR(|) COMP_ASSIGNMENT_OPERATOR(^) private: volatile T& v; }; |
Listing 7 |
This class has all the support for the basic operations you can execute on a variable either via the member operators (defined explicitly or via the macro
COMP_ASSIGNMENT_OPERATOR
) either defined via the
DEFINE_BINARY_OPERATOR
macro which defines binary operators for
refholder
classes. In cases when the variable wrapping is done on constant variables, there are specializations of this template class for constant
T
s. There are various arguments against the construct of storing references as class members [
Stackoverflow
]; however, I consider this situation to be a reasonably safe one which can be exploited for this specific reason. So, here (Listing 8) comes a piece of generated assembly code for a very simple expression.
int n; OBF_BEGIN V(n) = N(42); 00048466 mov dword ptr [ebp-8],0 0004846D mov dword ptr [ebp-8],97Ch 00048474 push esi 00048475 mov esi,dword ptr [ebp-8] 00048478 mov dword ptr [ebp-8],48Bh 0004847F xor esi,0DC4h 00048485 mov eax,dword ptr [ebp-8] 00048488 add eax,dword ptr [n] 0004848B mov dword ptr [n],eax 0004848E mov dword ptr [ebp-8],48Bh 00048495 mov eax,dword ptr [ebp-8] 00048498 sub dword ptr [n],eax 0004849B lea eax,[n] 0004849E push eax 0004849F push dword ptr [ebp-8] 000484A2 lea eax,[ebp-0Ch] 000484A5 push eax 000484A6 call obf::operator<<<int> (0414C9h) 000484AB add esp,0Ch 000484AE xor esi,492h 000484B4 mov eax,dword ptr [eax] 000484B6 mov dword ptr [eax],esi OBF_END |
Listing 8 |
The sheer amount of extra code generated for a simple assignment is overwhelming.
Control structures of the framework
The basic control structures which are familiar from C++ are made available for immediate use by the developers by means of macros, which expand into complex templated code.
They are meant to provide the same functionality as the standard C++ keyword they are emulating, and if the framework is compiled in DEBUG mode, most of them actually expand to the C++ control structure itself.
Decision making
When there is a need in the application to take a decision based on the value of a specific expression, the obfuscated framework offers the familiar
if
-
then
-
else
statement for the developers in the form of the
IF
-
ELSE
-
ENDIF
construct.
The IF statement
For checking the true-ness of an expression the framework offers the
IF
macro which has the following form:
IF (expression) ....statements ELSE ....other statements ENDIF
where the
ELSE
is not mandatory, but the
ENDIF
is, since it indicates the end of the
IF
block’s statements.
And here is an example for the usage of the
IF
macro.
IF( V(a) == N(9) ) V(b) = a + N(5); ELSE V(a) = N(9); V(b) = a + b; ENDIF
Due to the way the
IF
macro is defined, it is not necessary to create a new scope between the
IF
and
ENDIF
; it is automatically defined and all variables declared in the statements between
IF
and
ENDIF
are destroyed.
Since the evaluation of the
expression
is bound to the execution of a hidden (well, at least from the outer world) lambda, unfortunately it is not possible to declare variables in the
expression
so the following:
IF( int x = some_function() )
is not valid, and will yield a compiler error. This is partially intentional, since it gives that extra layer of obfuscation required to hide the operations done on a variable in a nameless lambda somewhere deep in the code.
In cases when debugging mode is active, the
IF
-
ELSE
-
ENDIF
macros are defined to expand to the following statements:
#define IF(x) if(x) { #define ELSE } else { #define ENDIF }
Implementation of the IF construct
The
IF
macro expands to the following:
#define IF(x) { \ std::shared_ptr<obf::base_rvholder> __rvlocal;\ obf::if_wrapper(( [&]()->bool{ return (x); \ })).set_then( [&]() {
the
ELSE
macro expands to:
#define ELSE return __crv;}).set_else( [&]() {
and the
ENDIF
will give:
#define ENDIF return __crv;}).run(); }
So to wrap it all up, the following code:
IF( n == 42) n = 43; ELSE n = 44; ENDIF
will expand to Listing 9.
{ std::shared_ptr<obf::base_rvholder> __rvlocal; obf::if_wrapper( ([&]()->bool { return (n == 42); }) ) .set_then( [&]() { n = 43; return __crv; }) .set_else( [&]() { n = 44; return __crv; }) .run(); } |
Listing 9 |
Now let’s examine the
if_wrapper
class (Listing 10).
class if_wrapper final { public: template<class T> if_wrapper(T lambda) { condition.reset(new bool_functor<T>(lambda));} void run() { if(condition->run()) { if(thens) { thens->run(); }} else { if(elses) { elses->run(); }} } ~if_wrapper() noexcept = default; template<class T> if_wrapper& set_then(T lambda) { thens.reset(new next_step_functor<T>(lambda)); return *this; } template<class T> if_wrapper& set_else(T lambda) { elses.reset(new next_step_functor<T>(lambda)); return *this; } private: std::unique_ptr<bool_functor_base> condition; std::unique_ptr<next_step_functor_base> thens; std::unique_ptr<next_step_functor_base> elses; }; |
Listing 10 |
It is very clear why we needed the lambda created by the
IF
macro
(([&]()->bool { return (n == 42); }))
: we needed to create an object of type
class bool_functor
from it, which will give us the true-ness of the if condition. The bool functor class looks like Listing 11, where the important part is the
bool run()
– which in fact runs the condition and returns its true-ness.
struct bool_functor_base { virtual bool run() = 0; }; template <class T> struct bool_functor final : public bool_functor_base { bool_functor(T r) : runner(r) {} virtual bool run() {return runner();} private: T runner; }; |
Listing 11 |
The two branches of the
if
are represented by the member variables
std::unique_ptr<next_step_functor_base> thens; std::unique_ptr<next_step_functor_base> elses;
and they behave very similarly to the conditional.
The
run()
method of the
if_wrapper
class firstly checks the condition and then, depending on the presence of the
then
and
else
branches, executes the required operations.
Support for looping
There are times when every application needs to iterate over a set of values, so I tried to re-implement the basic loop structures used in C++: the
for
loop, the
while
and the
do
-
while
have been reincarnated in the framework.
The FOR statement
The macro provided to imitate the
for
statement is:
FOR(initializer, condition, incrementer) .... statements ENDFOR
Please note that, since
FOR
is a macro, it should use
,
(comma) not the traditional
;
which is used in the standard C++
for
loops, and do not forget to include your
initializer
,
condition
and
incrementer
in parentheses if they are expressions which have
,
(comma) in them.
The
FOR
loops should be ended with and
ENDFOR
statement to signal the end of the structure. Here is a simple example for the
FOR
loop.
FOR(V(a) = N(0), V(a) < N(10), V(a) += 1) std::cout << V(a) << std::endl; ENDFOR
The same restriction concerning the variable declaration in the
initializer
as in the case of the
IF
applies for the
FOR
macro too, so it is not valid to write:
FOR(int x=0, x<10, x++)
and the reasons are again the same as presented above.
In a debugging session, the
FOR
-
ENDFOR
macros expand to the following:
#define FOR(init,cond,inc) for(init;cond;inc) { #define ENDFOR }
The WHILE loop
The macro provided as replacement for the
while
is:
WHILE(condition) ....statements ENDWHILE
The
WHILE
loop has the same characteristics as the
IF
construct and behaves the same way as you would expect from a well-mannered
while
statement: it checks the condition at the top, and executes the statements repeatedly as long as the given condition is true. Here is an example for
WHILE
:
V(a) = 1; WHILE( V(a) < N(10) ) std::cout << "IN:" << a<< std::endl; V(a) += N(1); ENDWHILE
Unfortunately the
WHILE
loop also has the same restrictions as the
IF
: you cannot declare a variable in its condition.
If compiled in debugging mode, the
WHILE
evaluates to:
#define WHILE(x) while(x) { #define ENDWHILE }
The REPEAT-AS_LONG_AS construct posing as do-while
Due to the complexity of the solution, the familiar
do
-
while
construct of the C++ language had to be renamed a bit, since the
WHILE
‘keyword’ was already taken for the benefit of the
while
loop, so I created the
REPEAT
-
AS_LONG_AS
keywords to achieve this goal.
This is the syntax of the
REPEAT
-
AS_LONG_AS
construct:
REPEAT ....statements AS_LONG_AS( expression )
This will execute the statements at least once and then, depending on the value of the
expression
, either will continue the execution, or will stop and exit the loop. If the expression is
true
, it will continue the execution from the beginning of the loop; if the expression is
false
, execution will stop and the loop will be exited.
And here is an example:
REPEAT std::cout << a << std::endl; ++ V(a); AS_LONG_AS( V(a) != N(12) )
When debugging, the
REPEAT
-
AS_LONG_AS
construct expands to the following:
#define REPEAT do { #define AS_LONG_AS(x) } while (x);
Implementation of the looping constructs
The logic and design of the looping constructs are very similar to each other. They behave very similarly to
IF
and each of them uses the same building blocks. There are the wrapper classes (
for_wrapper
,
repeat_wrapper
,
while_wrapper
), each of them with their functors for verifying the condition, and the steps to be executed.
The implementation in each of the
run()
method of the wrapper class follows the logic of the keyword it tries to emulate, with the exception that the commands are wrapped into a
try
-
catch
to enable
BREAK
and
CONTINUE
to function properly. Let’s see for example the
run()
of the
for
wrapper:
void run() { for( initializer->run(); condition->run(); increment->run()) { try { next_step c = body->run(); } catch(next_step& c) { if(c == next_step::ns_break) break; if(c == next_step::ns_continue) continue; } } }
Altering the control flow of the application
Sometimes there is a need to alter the execution flow of a loop. C++ supports this operation by providing the
continue
and
break
statements. The framework offers the
CONTINUE
and
BREAK
macros to achieve this goal.
The CONTINUE statement
The
CONTINUE
statement will skip all statements that follow it in the body of the loop, thus altering the flow of the application.
Here is an example for the
CONTINUE
used in a
FOR
loop:
FOR(a = 0, a < 5, a++) std::cout << "counter before=" << a << std::endl; IF(a == 2) CONTINUE ENDIF std::cout << "counter after=" << a << std::endl; ENDFOR
and the equivalent
WHILE
loop:
a = 0; WHILE(a < 5) std::cout << "counter before=" << a << std::endl; IF(a == 2) a++; CONTINUE ENDIF std::cout << "counter after=" << a << std::endl; a++; ENDWHILE
Neither of these should print out the
counter after=2
text.
The BREAK statement
The
BREAK
statement terminates the loop statement it resides in and transfers execution to the statement immediately following the loop.
Here is an example for the
BREAK
statement used in a
FOR
loop:
FOR(a = 0, a < 10, a++) std::cout << "counter=" << a << std::endl; IF(a == 1) BREAK ENDIF ENDFOR
This loop will print
counter=0
and
counter=1
then it will leave the body of the loop, continuing the execution after the
ENDFOR
.
The RETURN statement
As expected, the
RETURN
statement returns the execution of the current function and will return the specified value to the caller function. Here is an example of returning 42 from a function:
int some_fun() { OBF_BEGIN RETURN(42) OBF_END }
With the introduction of
RETURN
, an important issue arose: the obfuscation framework does not support the use of
void
functions, so the following code will not compile:
void void_test(int& a) { OBF_BEGIN IF(V(a) == 42) V(a) = 43; ENDIF OBF_END }
This is a seemingly annoying feature, but it can easily be fixed by simply changing the return type of the function to any non-void type. The reason is that the
RETURN
macro and the underlying C++ constructs should handle a wide variety of returnable types in a manner which can be handled easily by the programmer without causing confusion.
Implementation of CONTINUE, BREAK and RETURN
These keywords give the following when not compiled in debug mode:
#define BREAK __crv = obf::next_step::ns_break; \ throw __crv; #define CONTINUE __crv = \ obf::next_step::ns_continue; throw __crv; #define RETURN(x) __rvlocal.reset\ (new obf::rvholder<std::remove_reference\ <decltype(x)> ::type>(x,x)); throw __rvlocal;
BREAK
and
CONTINUE
offer no surprises in the implementation and they comply to the expectation that has been formulated in the looping constructs: they throw a specific value, which is then caught in the local loop of the implementation, which handles it accordingly.
However,
RETURN
is a different kind of beast.
It initializes the
__rvlocal
(the local return value) to the returned value and then throws it for the
catch
which is to be found in the
OBF_END
macro, which in its turn handles it correctly.
As you can see, there are three evaluations of the
x
macro parameter. To avoid unwanted behaviour from your application, do not use expressions which might turn out to be dangerous, such as
RETURN (x++);
, which will give a three-times increment to your variable and undefined behaviour.
The
rvholder
class has the body shown in Listing 12.
struct base_rvholder { virtual ~base_rvholder() = default; template<class T> operator T () const { return *reinterpret_cast<const T*>(get()); } template<class T> bool operator == (const T& o) const { return o == operator T (); } template<class T> bool equals(const T& o) const { return o == *reinterpret_cast<const T*>(get()); } virtual const void* get() const = 0; }; template<class T> class rvholder : public base_rvholder { public: rvholder(T t, T c) : base_rvholder(), v(t), check(c) {} ~rvholder() = default; virtual const void* get() const override { return reinterpret_cast<const void*>(&v); } private: T v; T check; }; |
Listing 12 |
As you can see there is a redundant
equals
method in the base class, and this is due to the fact that during development of the framework, the Visual Studio compiler constantly crashed due to some internal error in the implementation of the
CASE
construct, and it always reported the error in the
operator ==
of the base class. In order to make it work, I have added the extra
equals
member.
The CASE statement
When programming in C++, the
switch
-
case
statement comes in handy when there is a need to avoid long chains of
if
statements. The obfuscation framework provides a similar construct, although not exactly a functional and syntactical copy of the original
switch
-
case
construct.
Here is the
CASE
statement:
CASE (<variable>) WHEN(<value>) [OR WHEN(<other_value>)] DO ....statements ....[BREAK] DONE [DEFAULT ....statements DONE] ENDCASE
The functionality is very similar to the well-known
switch
-
case
construct, the main differences are:
-
It is possible to use non-numeric, non-constant values (variables and strings) for the
WHEN
due to the fact that all of theCASE
statement is wrapped up in a templated, lambdaized, well-hidden from the outside world, construct. Be careful with this extra feature when using the debugging mode of the library because theCASE
macro expands to the standardcase
keyword. -
It is possible to have multiple conditions for a
WHEN
label joined together withOR
.
The fall through behaviour of the
switch
construct which is familiar to C++ programmers was kept, so there is a need to put in a
BREAK
statement if you wish the operation to stop after entering a branch.
Listing 13 is an example for the
CASE
statement.
std::string something = "D"; std::string something_else = "D"; CASE (something) WHEN("A") OR WHEN("B") DO std::cout <<"Hurra, something is " << something << std::endl; BREAK; DONE WHEN("C") DO std::cout <<"Too bad, something is " << something << std::endl; BREAK; DONE WHEN(something_else) DO std::cout <<"Interesting, something is " << something_else << std::endl; BREAK; DONE DEFAULT std::cout << "something is neither A, B or C," " but:" << something <<std::endl; DONE ENDCASE |
Listing 13 |
In cases when the framework is used in debugging mode, the macros expand to the following statements:
#define CASE(a) switch (a) { #define ENDCASE } #define WHEN(c) case c: #define DO { #define DONE } #define OR #define DEFAULT default:
Implementation of the CASE construct
Certainly, the most complex of all constructs is the
CASE
one. Just the number of macros supporting it is huge:
#define CASE(a) try { \ std::shared_ptr<obf::base_rvholder> __rvlocal;\ auto __avholder = a; \ obf::case_wrapper<std::remove_reference \ <decltype(a)>::type>(a). #define ENDCASE run(); } \ catch(obf::next_step& cv) {} #define WHEN(c)\ add_entry(obf::branch<std::remove_reference\ <decltype(__avholder)>::type> \ ( [&,__avholder]() -> \ std::remove_reference<decltype(__avholder)>\ ::type { \ std::remove_reference<decltype(__avholder)>\ ::type __c = (c); return __c;} )). #define DO add_entry( obf::body([&](){ #define DONE return \ obf::next_step::ns_continue;})). #define OR join(). #define DEFAULT add_default(obf::body([&](){
Let’s dive into it.
The
case_wrapper
name should be already familiar from the various wrappers, but for
CASE
, the real workhorse is the
case_wrapper_base
class. The
case_wrapper
class is necessary in order to make
CASE
selection on
const
or non
const
objects possible, so the
case_wrapper
classes just derive from
case_wrapper_base
and specialize on the
const
ness of the
CASE
expression. Please note that the
CASE
macro also evaluates more than one the
a
parameters, so writing
CASE(x++)
will lead to undefined behaviour.
The
case_wrapper_base
class looks like Listing 14.
template <class CT> class case_wrapper_base { public: explicit case_wrapper_base(const CT& v) : check(v), default_step(nullptr) {} case_wrapper_base& add_entry(const case_instruction& lambda_holder) { steps.push_back(&lambda_holder); return *this; } case_wrapper_base& add_default(const case_instruction& lambda_holder) { default_step = &lambda_holder; return *this; } case_wrapper_base& join() { return *this; } void run() const ; // body extracted from here, // see later in the article for the // description of it private: std::vector<const case_instruction*> steps; const CT check; const case_instruction* default_step; }; |
Listing 14 |
The
const CT check;
is the expression that is being checked for the various case branches. Please note the
add_entry
and
add_default
methods, together with the
join()
method which allow chaining of expressions and method calls on the same object. The
std::vector<const case_instruction*> steps;
is a cumulative container for all the branch condition expressions and bodies (code which is executed in a branch). This will introduce more complex code at a later stage; however, it was necessary to have these two joined in the same container in order to allow behaviour as similar to the original way the C++
case
works as possible.
The inner mechanism of the
CASE
depends on the following classes:
-
The
obf::case_instruction
class, which acts as a basic class for: -
obf::branch
and -
obf::body
classes.
The
obf::branch
class is the class which gets instantiated by the
WHEN
macro in a call to the
add_entry
method of the
case_wrapper
object created by
CASE
. Its role is to act as the condition chooser, and it looks like Listing 15.
template<class CT> class branch final : public case_instruction { public: template<class T> branch(T lambda) { condition.reset(new any_functor<T>(lambda)); } bool equals(const base_rvholder& rv, CT lv) const { return rv.equals(lv); } virtual next_step execute(const base_rvholder& against) const override { CT retv; condition->run(const_cast<void*> (reinterpret_cast<const void*>(&retv))); return equals(against,retv) ? next_step::ns_done : next_step::ns_continue; } private: std::unique_ptr<any_functor_base> condition; }; |
Listing 15 |
The
WHEN
macro has a more or less confusing lambda declaration which includes the local
__avholder
as being passed in by value. This is again due to the fact that various compilers decided to not to compile the same source code in the same way... well, some of them had a coup and bluntly declined to compile what the others already digested, that’s why the ugly solution came into existence.
The code that is executed upon entering a branch (including the default branch) is created by the
DO
and the
DEFAULT
macros. They both create an instance of the
obf::body
class:
DO
adds it to the steps of the case wrapper class, and
DEFAULT
calls the
add_default
member in order to specify a default branch. The
oft::body
class is much simpler, just a few lines (see Listing 16).
class body final : public case_instruction { public: template<class T> body(T lambda) { instructions.reset (new next_step_functor<T>(lambda)); } virtual next_step execute (const base_rvholder&) const override { return instructions->run(); } private: std::unique_ptr<next_step_functor_base> instructions; }; |
Listing 16 |
The most interesting (and longest) part of the
case
implementation is the
run()
method, presented here (in a somewhat stripped manner – I have removed all the security checks in order to have presentable code considering its length) – see Listing 17.
void run() const { auto it = steps.begin(); while(it != steps.end()) { next_step enter = (*it)->execute(rvholder<CT>(check,check)); if(enter == next_step::ns_continue) { ++it; } else { while(! dynamic_cast<const body*>(*it) && it != steps.end() ) { ++it; } // found the first body. while(it != steps.end()) { if(dynamic_cast<const body*>(*it)) { (*it)->execute(rvholder<CT> (check,check)); } ++it; } } } if(default_step) { default_step->execute(rvholder<CT> (check,check)); } } |
Listing 17 |
As a first step the code looks for the first branch which satisfies the condition (if
(*it)->execute(rvholder<CT>(check,check));
returns
next_step::ns_done
it means it has found a branch satisfying the
check
). In this case it skips all the other conditions for this branch and starts executing the code for all the
ofb::body
classes that are in the object. In case a
BREAK
statement was issued while executing the bodies the code will throw and the
catch
in
ENDCASE
(
catch(obf::next_step& cv
) will swallow it, and will return the execution to the normal flow.
The last resort is that if we have a
default_step
and we are still in the body of the run (no-one issued a
BREAK
command) it also executes it.
And with this we have presented the entire framework, together with implementation details, and now we are ready to catch up with our initial goal.
The naive licensing algorithm revisited
Now that we are aware of a library that offers code obfuscation without too many headaches from our side (at least, this was the intention of the author) let’s re-consider the implementation of the naive licensing algorithm using these new terms (see Listing 18).
bool check_license1(const char* user, const char* users_license) { OBF_BEGIN std::string license; size_t ll = strlen(users_license); size_t l = strlen(user), lic_ctr = N(0); size_t add = N(0), i =N(0); FOR (V(i) = N(0), V(i) < V(ll), V(i)++) IF ( V(users_license[i]) != N(45) ) license += users_license[i]; ENDIF ENDFOR WHILE (V(lic_ctr) < license.length() ) size_t i = lic_ctr; V(i) %= l; int current = 0; WHILE(V(i) < V(l) ) V(current) += user[V(i)++]; ENDWHILE V(current) += V(add); ++V(add); IF ( (license [lic_ctr] != letters[current % sizeof letters]) ) RETURN(false); ENDIF lic_ctr++; ENDWHILE RETURN (true); OBF_END } |
Listing 18 |
Indeed, it looks a little bit more ‘obfuscated’ than the original source, but after compilation it adds a great layer of extra code around the standard logic, and the generated binary is much more cumbersome to understand than the one ‘before’ the obfuscation. And due to the sheer size of the generated assembly code, we simply omit publishing it here.
Disadvantages of the framework
Those who dislike the usage of CAPITAL letters in code may find the framework to be annoying. As presented in [ Wakely14 ] this almost feels like the code is shouting at you. However, for this particular use case, I intentionally made it like this because of the need to have familiar words that a developer can instantly connect with (because the lower case words are already keywords), and also to subscribe to the C++ rule that macros should be upper case.
This brings us back to the swampy area of C++ and macros. There are several voices whispering loudly that macros have nothing to do in C++ code, and there are several voices echoing back that macros, if used wisely, can help C++ code as well as good old style C. I personally have nothing against the wise use of macros, indeed they became very helpful while developing this framework.
Last but not least, the numeric value wrappers do not work with floating point numbers. This is due to the fact that extensive binary operations are used on the number to obfuscate its value and this would be impossible to accomplish with floating point values.
Some requirements
The code is written with ‘older’ compilers in mind, so not all the latest and greatest features of C++14 and 17 are included. CLang version 3.4.1 happily compiles the source code, so does g++ 4.8.2. Visual Studio 2015 is also compiling the code.
Unit testing is done using the Boost Unit test framework. The build system for the unit tests is CMake and there is support for code coverage (the last two were tested only under Linux).
License and getting the framework
The library is a header only library, released in the public domain under the MIT license. You can get it from https://github.com/fritzone/obfy
Conclusion
History has shown us that if a piece of software is crackable, it will be cracked. And it just depends on the dedication, time spent, and effort invested by the software cracker when that piece of a software is to be proven crackable. There is no Swiss army knife when it comes to protecting your software against malicious interference because from the moment it left your build server and was downloaded, the software was out of your hands, and entered an uncontrollable environment. The only sensible thing you can do to protect your intellectual property is to make it as hard to crack as possible. This little framework provides a few ways of achieving this goal, and by making it open source, freely available and modifiable, to the developer community, we can only hope this will give it an advantage by allowing everyone to tailor it in order to suit their needs best.
Appendix: the license generating algorithm
As promised, Listing 19 is the naive license generating algorithm. Any further improvements to it are more than welcome.
static const char letters[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; std::string generate_license(const char* user) { if(!user) return ""; // the license will contain only these character // 16 chars + 0 char result[17] = { 0 }; size_t l = strlen(user), lic_ctr = 0; int add = 0; while (lic_ctr < 16) { size_t i = lic_ctr; i %= l; int current = 0; while (i < l) { current += user[i]; i++; } current += add; add++; result[lic_ctr] = letters[current % sizeof letters]; lic_ctr++; } return std::string(result); } |
Listing 19 |
References
[Andrivet] Random Generator by Sebastien Andrivet https://github.com/andrivet/ADVobfuscator
[Stackoverflow] http://stackoverflow.com/questions/12387239/reference-member-variables-as-class-members
[Wakely14] ‘Stop the Constant Shouting’ Overload 121 June 2014, Jonathan Wakely