Bit-twiddling vs. Logic in Modern C++
Overview
Why if (a & b)
is not a free lunch
The folklore
“Branching is slow, so replace
&&
with&
(or||
with|
) and the optimizer will thank you.”
That advice shows up in code reviews, on Reddit threads, and even in seasoned code bases. The intuition is simple:
&&
may short-circuit, so the compiler must generate a conditional branch.&
always evaluates both operands, so the resulting instruction stream can be branch-free.
Therefore, the bitwise form “must” be faster—right?
We are going to stress-test that claim from four angles:
- Language rules (what the Standard guarantees).
- What compilers actually emit in 2025.
- Micro-architecture (branch prediction, cache, ILP).
- Human factors—maintainability, bugs, and intent.
The result is a nuanced picture: sometimes &
wins, often it makes no difference, and in many real programs it is a liability.
Ground truth: what the Standard says1
- Built-in
&&
must evaluate left → right and must not evaluate the right operand when the left isfalse
(or when the result is already known for||
). This is sequenced, not negotiable.2 - Overloaded
operator&&/||
are just function calls—no short-circuiting. - Bitwise
&
/|
always evaluate both operands and produce an integral value.
Key takeaway: Behavioural semantics are fixed. The optimizer can re-arrange side-effect-free instructions, but it may not “speculatively” evaluate an operand if doing so would create an observable difference in the abstract machine.
Cppreference summarises the rule crisply: “Built-in &&
and ||
perform short-circuit evaluation; bitwise logic operators do not.”3
“But my disassembler shows identical code!”
That is not a myth. For side-effect-free operands the optimiser is allowed to collapse:
bool fast = (x & y); // bitwise
bool safe = (x && y); // logical
into the same branch-free sequence because it can prove that:
x
andy
are scalars already materialised in registers,- reading them twice is free,
- the observable result (
fast
/safe
) is identical.
A nice real-world benchmark shows exactly that: Clang 15 with -O3
produced nearly identical inner loops for &
and &&
scans over a 160 M-element array.4
What happened to short-circuiting? It is still honoured semantically, but the optimiser recognised that evaluating the second operand eagerly does not change program behaviour, so it emitted branch-free code that happens to read both variables. The standard’s “as-if” rule is satisfied.
The hidden foot-guns of if (a & b)
Category | Risk when replacing && with & |
---|---|
Side effects | b++ now always increments, even when a is false . |
Operator precedence | & binds weaker than == , stronger than && ; many subtle bugs stem from forgotten parentheses. |
Integral promotion | & works on numbers. If a /b are bool , the result is still int , not bool . A subsequent comparison with true becomes tautological. |
Overloaded operators | Custom operator& might exist and change semantics silently. |
Readability / intent | Future maintainers will assume “bit mask” semantics, not “logical test”. |
Performance anatomy
-
Branch prediction: Modern CPUs predict the
jne
generated by&&
with >95 % accuracy for stable data patterns. Misprediction costs ~14 cycles, but only when the data is adversarial. -
Instruction-level parallelism: Eager evaluation can sometimes reduce ILP because both operands must be ready before the fused compare can retire.
-
Cache traffic: If
b
is a pointer dereference, forcing the read every time may hurt cache residency. Short-circuiting can be a win. -
Speculative loads: An out-of-order core may fetch
b
early anyway; if it turns out the load was unnecessary the cost is often hidden. -
GPU / SIMD kernels: Here branch divergence is lethal; replacing
&&
with&
(or*
) makes sense—but you should express it explicitly in a data-parallel style (e.g.select(mask, …)
), not with C++ control flow.
Measuring instead of guessing
Below is a minimal benchmark you can paste into Compiler Explorer or run with perf stat
—modify the MODE
macro to flip operators.
#include <vector>
#include <random>
#include <chrono>
#include <iostream>
#ifndef MODE // 0 = bitwise &, 1 = logical &&
#define MODE 1
#endif
int main() {
constexpr size_t N = 128 * 1024 * 1024;
std::vector<uint8_t> a(N), b(N);
std::mt19937 rng(0);
std::uniform_int_distribution<int> dist(0, 255);
for (size_t i = 0; i < N; ++i) {
a[i] = dist(rng);
b[i] = dist(rng);
}
size_t hits = 0;
auto t0 = std::chrono::steady_clock::now();
for (size_t i = 0; i < N; ++i) {
#if MODE
if ((a[i] > 200) && (b[i] > 200)) // logical
#else
if ((a[i] > 200) & (b[i] > 200)) // bitwise
#endif
++hits;
}
auto t1 = std::chrono::steady_clock::now();
std::cout << "hits=" << hits << " "
<< std::chrono::duration<double>(t1 - t0).count()
<< " s\n";
}
On an Ice Lake i7 at -O3
the difference is typically < 1 %, completely drowned by measurement noise unless you pin the benchmark and flush caches between runs.
When does bitwise pay off?
-
Inside tight, branch-averse GPU shaders or SIMD loops, where every warp must execute the same instruction stream.
-
When computing a compound predicate that you will reuse later:
uint32_t m = flags & READY & ENABLED & !ERROR; if (m) …
-
For bit-mask idioms (
if (flags & FLAG_WRITE)
)—but that is not a boolean logic replacement; the intent is genuinely “test bit k”.
Even in those cases document your intent with a comment; the form is unusual enough that reviewers will ask.
Debunking common myths
Myth | Reality |
---|---|
“Compilers can ignore the short-circuit rule if b has no side effects.” |
They can re-order or speculate as long as the abstract machine observes the same effects. Sequenced side effects (I/O, volatile access, atomic<> ) still forbid premature evaluation. |
“& is branch-free, therefore faster.” |
Only if the branch is unpredictable and the second operand is in cache. Otherwise && is at worst equal, sometimes better. |
“Logical operators expand to bigger machine code.” | Modern optimisers fold simple boolean logic into flag registers; the size difference is mostly the conditional jump. |
“Using & avoids the register dependency chain.” |
Not necessarily—both forms need both values before retirement unless the compiler transforms the predicate into a single compare-and-branch. |
Concurrency and atomics
For std::atomic<bool>
the value read is a side effect visible to other threads (§6.9.2.1 ¶7). A conforming compiler must not merge or speculate away the short-circuit. Rely on &&
to avoid spurious loads/stores across threads; replacing it with &
can lengthen the critical section or even change lock-free algorithms.
Style, intent, and code-review heuristics
- Default to
&&
/||
unless you have a measured reason. - Document any deliberate use of
&
on booleans (// branch-free predicate for GPU
). - Isolate tricks behind helper functions or constexpr lambdas so the intent is explicit.
- Write tests that capture side-effect expectations. A refactor that changes
++b
tob
can silently break logic when&
is used.
A quick guideline table
Situation | Recommended operator |
---|---|
Plain boolean predicate in CPU code | && / || |
Bit-mask test (flags & FLAG ) |
& / | |
GPU kernel, shader, or wide SIMD | & / | (with a comment) |
Mixed types or potential side effects | Stick to && / || |
Overloaded user-defined types | Prefer functions (all_of , any_of ) |
Conclusion
The performance delta between &&
and &
has mostly evaporated on today’s optimising compilers and superscalar CPUs. What remains is a semantic delta that can—and regularly does—bite maintainers.
Unless you have a profiler trace, an ISA you know intimately, and operands guaranteed side-effect-free, treat if (a & b)
for what it is: a micro-optimisation gamble that trades clarity and correctness for at best marginal speed gains.
Opt for the code that communicates intent first. Let the optimiser prove you wrong later.
TL;DR – Use &&
unless you can prove that &
is faster and harmless. The compiler usually does the proving for you.
-
C++ Standard (ISO/IEC 14882:2023) §7.6.14/§7.6.15 – sequencing rules for logical operators. ↩︎
-
compilation - C++ compiler optimizations and short-circuit evaluation - Stack Overflow ↩︎
-
Challenge your performance intuition with C++ operators - WordsAndButtons ↩︎