background

Detecting Arm64 Virtual Machines By CPU Cache

1 days ago in Atlanta.

title: Detecting Arm64 Virtual Machines By CPU Cache
date: 2025-07-21
location: Atlanta
tags: 
  - re
  - tech
cover: https://lms.d.zhan.com/zhanlms/addon_homework/2025/07/5486275687ed2d770926/small.png
wideCover: https://lms.d.zhan.com/zhanlms/addon_homework/2025/07/9260415687ed2f761a94/wide.png

Introduction

VM detection, aka Anti-VM or Anti-Sandbox, is commonly used in mobile software protection to prevevnt reverse engineering and debugging. On Android, some widely used methods including:

  • checking the ABI list for detecting Android x86
  • locating libhoudini.so
  • checking CPU models, CPUID

This post provides a method to detect Arm64 VMs by testing the CPU cache miss rate.

Basic Idea

The majority of Arm64 VMs are based on QEMU, which is a popular open-source emulator. QEMU's primary focus is on functional emulation and speed, instead of microarchitectural simulation. Therefore, by default, QEMU does not accurately emulate CPU cache.

A simple way to detect QEMU is to measure the cache miss rate. From my experiments, real Arm64 devices can achieve 0% i-cache miss rate on deliberate code. While QEMU consistently reaching 100% cache miss.

Implementation

This C implementation tests the instruction cache (often referred as i-cache) miss rate over 100 iterations. Compile the code with aarch64-linux-gnu-gcc and test with qemu-aarch64 -L /usr/aarch64-linux-gnu ./main.

#include <stdio.h>
#include <sys/mman.h>

int foo() { return 0; }

int bar() { return 1; }

typedef int (*func_ptr)();

int main(int argc, char *argv[]) {
  // Alloc RWX
  func_ptr ptr = mmap(0x0, 0x5000, PROT_READ | PROT_WRITE | PROT_EXEC,
                      MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);

  __builtin_memcpy(ptr, foo, 8); // Copy foo
  __builtin___clear_cache((char *)ptr, (char *)ptr + 8);

  int x = 0;
  for (int i = 0; i < 100; i++) {
    x += ptr();
  }

  __builtin_memcpy(ptr, bar, 8); // Copy bar
  for (int i = 0; i < 100; i++) {
    x += ptr();
  }

  printf("Cache misses: %d out of 100\n", x);

  return 0;
}

The program first copies foo function to a newly mapped RWX memory region. Then iterate 100 times to warm up the cache. After that, it copies bar function to the same memory region.

At this point, both emulator and real devices should have memory like this:

// @ptr - content of bar
mov w0, #1
ret

However, due to CPU cache, real devices will still have the foo function in the cache, while QEMU will not. So in the second loop, real devices will execute foo 100 times, while QEMU will execute bar 100 times.

Results

On Ampere Altra and Google Pixel 7a, the cache miss rate is 0 out of 100.

On QEMU, Unicorn, Qiling (QEMU based), the cache miss rate is 100 out of 100.

References