Software Reverse Engineering (SRE) is like solving a puzzle where the rules keep shifting. Just when you think you’ve figured it out, obfuscation techniques throw you a curveball — and as tools get better at analyzing code, the obfuscation evolves to stay one step ahead.
One such obfuscation technique is Mixed Boolean-Arithmetic (MBA) obfuscation. You will often find it in things like:
- DRM (digital rights management),
- Anti-cheat systems, and
- Malware trying to stay hidden.
The idea? Make the code look so tangled that reversing it becomes a nightmare.
But what exactly is MBA obfuscation? And why is it such a pain to reverse? In this post, we will
- Break down what MBA expressions are (with examples),
- Explain why they are hard to simplify,
- And tools that try to make sense of them.
🧮 What is an MBA Expression?
MBA Expression mixes two world of mathematics, arithmetic operations (+, -, *, /) and boolean operations (&, |, ^, ~).
📑 Types of MBA Expression
1. Linear MBA Expression:
A Linear MBA expression is defined as a sum of terms, where each term () is composed of
- a constant (i.e. an n-bit integer)
- multiplied by a bitwise boolean expression ,
- where are n-bit variables.
Example:
2. Polynomial MBA Expression:
A Polynomial MBA expression are defined as a sum of terms, where each term () is composed of
- a constant (i.e. an n-bit integer)
- multiplied by a product of serveral bitwise boolean expressions ,
- where are n-bit variables.
Example:
🧪 MBA obfuscation
Mixed Boolean-Arithmetic (MBA) obfuscation replaces simple expression with mathematically equivalent, but more complex expressions that combine arithmetic and boolean operators while preserving the original semantics of the code
In simple terms, it hides easy stuff in messy math 🤯
💡 Example: A simple expression z = x + y
(32 bit variable) rewritten using linear MBA obfuscation
z = (y ^ x ^ x) + 3161188250 * (~x & (x | x)) + ((x | y) & x & x);
To verify the above expression equivalence with an SMT solver like Z3
from z3 import *
x = BitVec('x', 32)
y = BitVec('y', 32)
E_obf = (y ^ x ^ x) + 3161188250 * (~x & (x | x)) + ((x | y) & x & x)
E_simp = x + y
s = Solver()
s.add(E_obf != E_simp)
if s.check() == unsat:
print("Proved: E_obf == E_simp for all x, y")
else:
print(s.model())
✅ Output:
Proved: E_obf == E_simp for all x, y
🧩 Why is MBA Obfuscation So Hard to Reverse or Simplify?
Untangling MBA logic isn’t just tedious—it’s like solving a riddle written in three different languages at once. Here’s why it’s such a pain:
1. There’s No One True Simplifier
Some expressions are logically equal—but look nothing alike. You might simplify something using Z3, and it spits out one version. Try a different tool, and you’ll get something totally different. No guarantees you’ve landed on the simplest form. Sometimes, there isn’t even one.
2. Too Many Ways to Obfuscate
Even simple expression like x + y can get tangled up fast. Once you dig in, you realize there are endless ways to rewrite even the simple expressions. Here are a few technique:
-
Add useless noise: Things like x ^ x, x - x, or x & ~x do nothing at runtime but clutter up the code, tripping up analysis tools.
-
Use Opaque predicates: These are conditions that look legit but always return true or false, just dead code to confuse readers.
-
Shuffle terms: Thanks to math properties like commutativity and associativity, expressions get rearranged constantly, making pattern matching a headache.
-
Swap for worse: Replace a simple logic with a bigger and complex one that means the same thing.
-
Bitwise Chaos: Shifts, masks, XORs, and ANDs layered just right, they turn small expressions into unreadable puzzles.
3. Arithmetic ≠ Boolean
These two don’t always play nice. In arithmetic operation, you can distribute multiplication over addition (ex: a * (b + c) = a * b + a * c). But in Boolean, Not the same story. Try distributing an AND over an XOR and you’ll end up in a logic mess. Once they are mixed, simplification rules start breaking down.
4. Nonlinear Logic Is a Trap
Linear terms (like x + y) are annoying. Nonlinear ones (like , ) are downright brutal. Add Boolean operation on top and you are staring at a math puzzle that doesn’t want to be solved. These kinds of expressions are where simplifiers struggle and timeout.
5. Intent Gets Buried
Let’s say you simpified the expression. Great. You now know what it does. But good luck figuring out why it’s doing it. MBA doesn’t just hide logic — it buries meaning. Every transformation adds distance from the original purpose.
🛠️ Tools & Techniques for MBA De-obfuscation
Despite the challenges, several approaches have been developed to tackle MBA obfuscation:
SMT Solvers: Tools like Z3 and CVC4/5 can verify the equivalence of expressions by encoding them into logical formulas.
Symbolic Execution: Frameworks such as Triton and angr execute code with symbolic values, allowing for the tracking and simplification of expressions.
Program Synthesis: Techniques like those used in msynth and QSynth attempt to generate simpler programs that are functionally equivalent to obfuscated ones.
Pattern-Based Simplification: This involves identifying and applying known simplification patterns to reduce complexity.
Machine Learning: Models are trained on pairs of obfuscated and simplified expressions to learn deobfuscation patterns.
Data Flow Analysis: Tracking how data moves through a program can help identify and simplify obfuscated expressions.
🏁 CTF: Obfuscation Overload - The MBA Hash Challenge
Challenge: Reverse the MBA-obfuscated hash function to find a valid flag.
Goal: Understand the complexity of Mixed Boolean-Arithmetic (MBA) obfuscation—not to reveal the actual flag. If you find this CTF challenge difficult, consider real-world binaries where MBA obfuscation is applied across all functions—those are far more challenging to analyze.
Source Code
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#define FLAG_LEN 16
uint32_t obfuscated_mba_hash(const char *s) {
uint32_t h = 0xDEADBEEF;
for (int i = 0; i < FLAG_LEN; ++i) {
uint8_t c = s[i];
uint32_t t0 = 3875104330 * c + 2547036704 * (c | c) +
4294967206 * ~c + 2239147560 * (c ^ c) + 2167793469 * (c &
c);
uint32_t t1 = (308858953 * (c ^ c) + 1563666166 * c +
1379745440 * (c | c) + 1351555691 * (c & c)) * (3493294226 *
(c ^ c) + 819311776 * c + 1514020646 * (c | c) + 1640531535
* ~c + 3602166409 * (c & c));
uint32_t x = 3266272309 * ~t1 + 2311799488 * (t1 & c) + 1549144248
* t0 + 2858200778 * (t1 ^ t0) + 3435014962 * (t1 ^
t1) + 3835126537 * t1 + 4157021067 * (t0 | t1) +
1085716777 * (t1 & t0) + 1473626007 * (c | t1) +
1031582540 * (t0 | t0) + 3745048122 * (t0 & t0) +
175361011 * (t0 & t1) + 4272557055 * (t1 | t0) +
3849610050 * (t1 | c) + 592236179 * c + 433684625 * ~t0
+ 193461342 * (c | c) + 3076011376 * (c & c) + 595010362 *
~c + 3011436569 * (c & t1);
uint32_t v = (x + (c << (i % 5)));
h ^= ((v + (h << 5)) ^ (h >> 3)) ^ 0x1337BEEF;
}
return h;
}
int check_hash(uint32_t h) {
uint32_t x = 4091969007 * (~h & ~h) + 2778434176 * (h & h |
h) + 2831939379 * (h & (h | h)) + 971271889 * ((h ^ h) & h &
h) + 547429691 * ~(h | h) + 2338545138 * ((h ^ h) & ~h) +
3493026804 * (h & h & h & h) + 3543606811 * (h ^ h) +
898826796 * (~h | h) + 4273979207 * h + 2758797169 * ((h ^
h) & (h ^ h)) + 1806369029 * (h & h & h) + 4242380752 * (h |
h | h) + 2393138535 * (h | h | h ^ h) + 63644436 * (h & h |
~h) + 4104548139 * ((h | h) ^ (h | h));
uint32_t t = 578424293 * ~(h ^ h) + 2313091916 * (x & h & x) +
708776350 * (x ^ h ^ x & x) + 4178562083 * (~x |
~x) + 3091343853 * x + 884618817 * (x ^ h ^ x &
h) + 632913412 * ((h | x) ^ h ^ h) + 3838769443 * ~(h |
h) + 3646808033 * (x & h & h) + 2177279251 * (x & h &
(x ^ h)) + 148839815 * ((x | x) ^ h ^ h) + 90262380
* h + 2122257031 * (h ^ h ^ x ^ h) + 3355362634 * (h ^
x ^ (x | h)) + 3694908750 * (h ^ x & x) +
458366090 * ~(h & x) + 3830779979 * (h ^ h ^ ~x) +
1729427681 * (h & h ^ h ^ x);
return ((~t & (t - 1)) >> 31) & 1;
}
void secret_function(const char *input) {
if (strlen(input) != FLAG_LEN) {
printf("❌ Invalid flag length\n");
return;
}
uint32_t h = obfuscated_mba_hash(input);
if (check_hash(h)) {
printf("✅ Solved! CTF{%s}\n", input);
} else {
printf("❌ Wrong flag!\n");
}
}
int main() {
char buf[128];
printf("Enter the flag: ");
scanf("%127s", buf);
secret_function(buf);
return 0;
}
Compile
gcc -g mba_ctf.c -o mba_ctf
Flag Submission
If you are able to find the flag, please share the flag, hash, and your logic to generate the hash in the blog comments. The actual flag will be revealed later
🔚 Final Thoughts
MBA obfuscation is like digital camouflage in the world of software reverse engineering. It doesn’t stop the program from running — it stops you from understanding what it’s really doing. By blending arithmetic with Boolean logic, it creates a tangle that’s deliberately hard to unwind — a quiet but effective defense against analysis.
Getting past it takes patience. It’s not a clean puzzle with a single solution. It’s more like peeling an onion — layer after layer, often messy, sometimes painful. And yet … that’s the work, isn’t it?
You don’t have to break every piece of obfuscation. You just need to go deep enough to find what matters.
In the end, it’s part automation, part instinct — and a lot of persistence.
🤝 Thanks
You made it to the end. Thanks for reading — and good luck finding the flag 🏁. Looking forward to your submissions!