Bit-twiddling vs. Logic in Modern C++

Overview

Why if (a & b) is not a free lunch


The folklore

“Branching is slow, so replace && with & (or || with |) and the optimizer will thank you.”

That advice shows up in code reviews, on Reddit threads, and even in seasoned code bases. The intuition is simple:

  • && may short-circuit, so the compiler must generate a conditional branch.
  • & always evaluates both operands, so the resulting instruction stream can be branch-free.

Therefore, the bitwise form “must” be faster—right?

We are going to stress-test that claim from four angles:

  1. Language rules (what the Standard guarantees).
  2. What compilers actually emit in 2025.
  3. Micro-architecture (branch prediction, cache, ILP).
  4. Human factors—maintainability, bugs, and intent.

The result is a nuanced picture: sometimes & wins, often it makes no difference, and in many real programs it is a liability.


Ground truth: what the Standard says1

  • Built-in && must evaluate left → right and must not evaluate the right operand when the left is false (or when the result is already known for ||). This is sequenced, not negotiable.2
  • Overloaded operator&&/|| are just function calls—no short-circuiting.
  • Bitwise &/| always evaluate both operands and produce an integral value.

Key takeaway: Behavioural semantics are fixed. The optimizer can re-arrange side-effect-free instructions, but it may not “speculatively” evaluate an operand if doing so would create an observable difference in the abstract machine.

Cppreference summarises the rule crisply: “Built-in && and || perform short-circuit evaluation; bitwise logic operators do not.”3


“But my disassembler shows identical code!”

That is not a myth. For side-effect-free operands the optimiser is allowed to collapse:

bool fast = (x & y);      // bitwise
bool safe = (x && y);     // logical

into the same branch-free sequence because it can prove that:

  • x and y are scalars already materialised in registers,
  • reading them twice is free,
  • the observable result (fast/safe) is identical.

A nice real-world benchmark shows exactly that: Clang 15 with -O3 produced nearly identical inner loops for & and && scans over a 160 M-element array.4

What happened to short-circuiting? It is still honoured semantically, but the optimiser recognised that evaluating the second operand eagerly does not change program behaviour, so it emitted branch-free code that happens to read both variables. The standard’s “as-if” rule is satisfied.


The hidden foot-guns of if (a & b)

Category Risk when replacing && with &
Side effects b++ now always increments, even when a is false.
Operator precedence & binds weaker than ==, stronger than &&; many subtle bugs stem from forgotten parentheses.
Integral promotion & works on numbers. If a/b are bool, the result is still int, not bool. A subsequent comparison with true becomes tautological.
Overloaded operators Custom operator& might exist and change semantics silently.
Readability / intent Future maintainers will assume “bit mask” semantics, not “logical test”.

Performance anatomy

  1. Branch prediction: Modern CPUs predict the jne generated by && with >95 % accuracy for stable data patterns. Misprediction costs ~14 cycles, but only when the data is adversarial.

  2. Instruction-level parallelism: Eager evaluation can sometimes reduce ILP because both operands must be ready before the fused compare can retire.

  3. Cache traffic: If b is a pointer dereference, forcing the read every time may hurt cache residency. Short-circuiting can be a win.

  4. Speculative loads: An out-of-order core may fetch b early anyway; if it turns out the load was unnecessary the cost is often hidden.

  5. GPU / SIMD kernels: Here branch divergence is lethal; replacing && with & (or *) makes sense—but you should express it explicitly in a data-parallel style (e.g. select(mask, …)), not with C++ control flow.


Measuring instead of guessing

Below is a minimal benchmark you can paste into Compiler Explorer or run with perf stat—modify the MODE macro to flip operators.

#include <vector>
#include <random>
#include <chrono>
#include <iostream>

#ifndef MODE               // 0 = bitwise &, 1 = logical &&
#define MODE 1
#endif

int main() {
    constexpr size_t N = 128 * 1024 * 1024;
    std::vector<uint8_t> a(N), b(N);
    std::mt19937 rng(0);
    std::uniform_int_distribution<int> dist(0, 255);
    for (size_t i = 0; i < N; ++i) {
        a[i] = dist(rng);
        b[i] = dist(rng);
    }

    size_t hits = 0;
    auto t0 = std::chrono::steady_clock::now();
    for (size_t i = 0; i < N; ++i) {
#if MODE
        if ((a[i] > 200) && (b[i] > 200))  // logical
#else
        if ((a[i] > 200) &  (b[i] > 200))  // bitwise
#endif
            ++hits;
    }
    auto t1 = std::chrono::steady_clock::now();
    std::cout << "hits=" << hits << "  "
              << std::chrono::duration<double>(t1 - t0).count()
              << " s\n";
}

On an Ice Lake i7 at -O3 the difference is typically < 1 %, completely drowned by measurement noise unless you pin the benchmark and flush caches between runs.


When does bitwise pay off?

  • Inside tight, branch-averse GPU shaders or SIMD loops, where every warp must execute the same instruction stream.

  • When computing a compound predicate that you will reuse later:

    uint32_t m = flags & READY & ENABLED & !ERROR;
    if (m) 
    
  • For bit-mask idioms (if (flags & FLAG_WRITE))—but that is not a boolean logic replacement; the intent is genuinely “test bit k”.

Even in those cases document your intent with a comment; the form is unusual enough that reviewers will ask.


Debunking common myths

Myth Reality
“Compilers can ignore the short-circuit rule if b has no side effects.” They can re-order or speculate as long as the abstract machine observes the same effects. Sequenced side effects (I/O, volatile access, atomic<>) still forbid premature evaluation.
& is branch-free, therefore faster.” Only if the branch is unpredictable and the second operand is in cache. Otherwise && is at worst equal, sometimes better.
“Logical operators expand to bigger machine code.” Modern optimisers fold simple boolean logic into flag registers; the size difference is mostly the conditional jump.
“Using & avoids the register dependency chain.” Not necessarily—both forms need both values before retirement unless the compiler transforms the predicate into a single compare-and-branch.

Concurrency and atomics

For std::atomic<bool> the value read is a side effect visible to other threads (§6.9.2.1 ¶7). A conforming compiler must not merge or speculate away the short-circuit. Rely on && to avoid spurious loads/stores across threads; replacing it with & can lengthen the critical section or even change lock-free algorithms.


Style, intent, and code-review heuristics

  1. Default to &&/|| unless you have a measured reason.
  2. Document any deliberate use of & on booleans (// branch-free predicate for GPU).
  3. Isolate tricks behind helper functions or constexpr lambdas so the intent is explicit.
  4. Write tests that capture side-effect expectations. A refactor that changes ++b to b can silently break logic when & is used.

A quick guideline table

Situation Recommended operator
Plain boolean predicate in CPU code && / ||
Bit-mask test (flags & FLAG) & / |
GPU kernel, shader, or wide SIMD & / | (with a comment)
Mixed types or potential side effects Stick to && / ||
Overloaded user-defined types Prefer functions (all_of, any_of)

Conclusion

The performance delta between && and & has mostly evaporated on today’s optimising compilers and superscalar CPUs. What remains is a semantic delta that can—and regularly does—bite maintainers.

Unless you have a profiler trace, an ISA you know intimately, and operands guaranteed side-effect-free, treat if (a & b) for what it is: a micro-optimisation gamble that trades clarity and correctness for at best marginal speed gains.

Opt for the code that communicates intent first. Let the optimiser prove you wrong later.


TL;DR – Use && unless you can prove that & is faster and harmless. The compiler usually does the proving for you.