Large Language Models (LLMs) are increasingly integrated into AI workflows and agents to streamline a wide range of tasks. In this blog post, we introduce an approach for using LLMs for automated patch diff analysis.
TL;DR
Patch diffing is great for finding what changed between two versions of a binary, but the volume on typical patch days is high and manual triage costs a lot of time. The shown approach pipelines a binary diff, extracts the relevant changes, and lets an LLM score and summarize security relevance so a researcher can focus on the promising parts first.
Check out diffalayze!
Agents, Agents everywhere
No one working with LLMs these days can ignore the growing role of workflows and agents.
Agents everywhere
With the right tools, many day-to-day tasks can be streamlined and automated using LLMs. We wondered whether this concept could help automate vulnerability discovery and exploitability analysis in binary code. That led to the idea of building an AI-driven workflow that analyzes binary patch diffs and evaluates the changes for potential security implications.
The goal is to automatically highlight interesting or potentially vulnerable code, so security researchers can focus their time where it matters, instead of spending hours or even days on manual reverse engineering.
So without further ado, let’s dive in.
Patch Diffing
Patch diffing is a powerful technique for identifying code changes and understanding what a patch actually affects. While diffing with access to source code is relatively straightforward, it becomes much more challenging when working with native binaries.
Fortunately, several tools exist to address this problem and support binary-level diffing, for example BinDiff, Diaphora, or ghidriff. ghidriff is a great open-source tool developed by clearbluejar, which combines multiple techniques to detect and visualize code differences in binaries.
With such tools, security researchers can efficiently identify the root causes of vulnerabilities, discover changes that may introduce new bugs, or detect so-called silent patches (fixes for security issues that were not publicly disclosed). This knowledge feeds into exploit development, patch effectiveness reviews, and targeted reverse engineering.
However, there is an obvious bottleneck: finding the needle in the haystack. On a typical Mictrosoft Patch Tuesday, thousands of lines of code (LOC) change across many binaries. Manually skimming all diffs is time-consuming and this is where LLMs can help with triage.
The Tool: diffalayze
So we coded a simple tool for automating patch diffing binaries using ghidriff and implemented a custom AI workflow for LLM-based triaging the diffs.
diffalayze sample usage
High-level overview
diffalayze works in a fairly straightforward way.
The idea is to define targets using a so-called fetch_script.py
.
This script is responsible for downloading two versions of a binary or patch, verifying whether the versions have changed since the last run, and returning the corresponding files.
Once the binaries are ready, a Docker-based instance of ghidriff takes over, does its magic and generates several diff files. These diffs are then further processed into markdown-formatted outputs, making them easier to analyze with LLMs.
Next, the AI agent is triggered and runs a scatter-gather pipeline: It maps over the diff chunks to produce per-chunk analyses, then reduces them into a consolidated report using the selected back end and language model, such as OpenAI GPT-5. The result is a markdown report that includes a triage summary, an explanation of the patchs purpose, and an analysis of any fixed or newly introduced vulnerabilities.
The LLM assigns a heuristic security score and severity level (NONE to CRITICAL) based on patterns it detects in the diff and its training data. If a defined threshold is met, a user-defined action such as triggering a script or sending a notification is executed.
To sum it up, the following illustrates a high-level overview of this process:
Process overview
Targets
In our analysis, we focused on Windows kernel drivers such as mrxsmb.sys
.
To automatically check for new versions and download the driver binaries, we made use of the excellent Winbindex project.
To streamline this process, we developed a helper script that can download any Windows binary directly using the Winbindex project database.
This script can then be integrated into a fetch_script.py
, like in the following example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import urllib.request
from utils import winbindexer
from pathlib import Path
dbfile = "mrxsmb.sys.json.gz"
filename = "mrxsmb.sys"
windows_version = "11-24H2"
SCRIPT_DIR = Path(__file__).parent
tracking_file = SCRIPT_DIR / "version.log"
def download_file(url: str, dest: Path):
try:
urllib.request.urlretrieve(url, dest)
except Exception as e:
raise RuntimeError(f"[!] Download error: {e}")
def check_and_download():
try:
winbindexer.ensure_winbindex_repo()
results = winbindexer.get_latest_symbol_urls(filename, dbfile, windows_version)
if len(results) < 2:
raise ValueError("[!] Could not find two version")
new_version_url = results[0]["url"]
old_version_url = results[1]["url"]
if tracking_file.exists():
last_known = tracking_file.read_text(encoding="utf-8").strip()
if last_known == new_version_url:
return False
old_path = SCRIPT_DIR / f"old.{filename}"
new_path = SCRIPT_DIR / f"new.{filename}"
download_file(old_version_url, old_path)
download_file(new_version_url, new_path)
tracking_file.write_text(new_version_url + "\n", encoding="utf-8")
return str(old_path), str(new_path)
except (FileNotFoundError, ValueError, RuntimeError) as e:
print(f"[!] Error: {e}")
return "", ""
This example is invoked by diffalyze to check whether a new version of mrxsmb.sys
is available for a specific Windows build (e.g. 11-24H2) and, if so, downloads it to the target directory.
It can also serve as a template for fetching other Windows binaries.
The true strength of diffalyze becomes apparent when analyzing multiple targets in parallel. In our tests, for instance, we specified up to 32 binaries, which were processed simultaneously:
Running diffalayze with 32 targets
Bonus: We observed the best results when analyzing older Windows Long-Term Servicing Branch (LTSB) builds such as version 1607. Patches for these versions often contain only essential security fixes, which leads to cleaner diff results and reduces the amount of irrelevant changes the LLM has to deal with.
What about other targets?
In principle, any binary can be analyzed using Diffalyze.
What is needed is a custom fetch_script.py
that handles the preparation steps for the binary to be analyzed.
Demonstration
As described above, we used diffalyze to analyze Windows kernel drivers such as mrxsmb.sys
.
The following example shows how diffalyze was used to analyze this binary.
diffalyze can be executed as follows:
1
python3 diffalayze.py all -f -a -lv -lb anthropic -lm claude-opus-4-1 -llt HIGH -ltc ../notify.sh
Parameter explanation:
all
: all targets within thetargets
directory will be covered-f
(--force
): forces diff, even if no new version of the binary can be found-a
(--analyze
): enables LLM analysis after diffing-lv
(--llm-verbose
): verbose output for LLM analysis-lb
(--llm-backend
): LLM backend e.g. anthropic, openai, ollama-lm
(--llm-model
): Language model e.g. claude-opus-4-1 or gpt-5-llt
(--llm-level-threshold
): severity threshold for triggering the external script-ltc
(--llm-trigger-cmd
): script to be executed if specified threshold is met
The output of the execution looks like this:
Sample run
Since an external script was specified to trigger when the security level threshold is greater or equal than HIGH
and the LLM actually rated two targets as CRITICAL
,
the script was executed and notifications were sent:
Notification about target evaluation
The final analysis report looks as follows:
Sample Result (click to expand)
# Summary - Highest risk addressed: A feature-gated fix in RxCeEncryptData adds overflow-checked size calculations to prevent a potential integer overflow leading to kernel heap overflow during SMB encryption (CWE-190 → CWE-122). If the feature is disabled, the unsafe path remains. - Important hardening: MRxSmbCreateSrvCall introduces a feature-gated rejection of credential-marshal–like server names to prevent authentication confusion (CWE-20, CWE-287), input normalization, explicit availability checks, safe allocations, improved telemetry, and deterministic cleanup to reduce stale state and memory risks (CWE-664). - Residual risk: Both primary mitigations are gated by feature flags; deployments with these features disabled retain prior behaviors. # Detailed Findings - RxCeEncryptData (encryption path) - Feature-gated overflow-checked sizing - Replaces unchecked 32-bit additions for header/payload sizing with RtlULongAdd checks when the feature is enabled. Allocation now uses the validated size. - Prevents under-allocation followed by copying/encryption of param_4 bytes. - Type correction and parameter handling - Changes length parameter type from int* to uint* to align with expected ULONG semantics and avoids signed-length pitfalls (CWE-681). - Fixes RtlCopyMdlToBuffer “bytes copied” out-parameter to a dedicated local variable, avoiding clobbering the length parameter. - MRxSmbCreateSrvCall (connection setup path) - Early reject and telemetry - Reorders an early 0x400-flag check to log and return a specific NTSTATUS immediately; improves clarity without changing security posture. - Sub-redirector start and pre-claim - Explicitly starts sub-redirector on sentinel and pre-claims the server call to reduce inconsistent states before connection. - Input normalization - Trims a leading backslash from the server name before further processing (CWE-20). - Feature-gated credential-marshaling rejection - If enabled, calls CredUnmarshalTargetInfo on the server name and rejects anything recognized as marshaled data with STATUS_INVALID_PARAMETER (0xc000000d). Blocks cred-marshal–like inputs from reaching authentication paths (CWE-20, CWE-287). - Availability, allocation, and bounded copies - Queries server availability, constructs server entries explicitly, allocates buffers sized to UNICODE_STRING lengths, and uses RtlCopyUnicodeString for bounded copy (reduces CWE-119/120 exposure). - Cleanup on failure - Resets state, dereferences server entries, and frees allocations on error returns (CWE-664). # Exploitability & Impact - Integer overflow → heap overflow in RxCeEncryptData (most critical) - Before: Unchecked 32-bit additions (param_4 + constants) could wrap, causing ExAllocatePoolWithTag to under-allocate while RtlCopyMdlToBuffer/SmbCryptoEncrypt operate on param_4 bytes, enabling a kernel pool overflow. - After: With feature enabled, RtlULongAdd validates both additions and aborts on overflow. - Exploitability: Depends on upstream bounds for param_4. If param_4 can be attacker-influenced and large, pre-patch code is at risk. Post-patch safety depends on the feature flag being enabled. - Credential-marshal–like server names in MRxSmbCreateSrvCall - Risk: Server names that resemble marshaled target info could lead to authentication confusion or misuse if interpreted by credential APIs. - Mitigation: When enabled, CredUnmarshalTargetInfo is used as a gate; recognized marshaled forms are rejected with STATUS_INVALID_PARAMETER. - Exploitability: Realistic where attackers influence UNC/DFS names; effectiveness depends on feature enablement. - Resource lifetime and cleanup - Risk: Without explicit teardown on failures, stale pointers/leaks could accumulate, raising UAF/inconsistency risks under error churn. - Mitigation: Explicit state zeroing, dereference, and free paths reduce lifetime-related issues. Primarily stability/DoS hardening. - Lesser issues - Type correction and out-parameter fix in RxCeEncryptData reduce logic/aliasing risks; low direct exploitability. - Telemetry changes improve diagnostics; no evident sensitive data exposure in shown snippets. Residual/new risks: - Feature flag dependency: Both primary mitigations are gated by EvaluateCurrentState over feature descriptors. If disabled in some configurations, the original risks persist. - Large but bounded allocations from input-sized server names could contribute to memory pressure (low DoS risk). - Partial normalization trims only a single leading backslash; unlikely to be security-relevant given the cred-marshal check’s purpose. # Next Analysis Steps for Reproduction - Validate feature-flag behavior - Identify and toggle the relevant feature descriptors: - RxCeEncryptData sizing checks: g_Feature_2962494776_56195954_FeatureDescriptorDetails. - MRxSmbCreateSrvCall cred-marshal gate: g_Feature_2181565755_56614078_FeatureDescriptorDetails. - Test both enabled and disabled states to confirm code paths and outcomes. - Reproduce and verify the integer overflow fix (RxCeEncryptData) - With the feature disabled, drive RxCeEncryptData with a large param_4 value that would cause 32-bit addition overflow in param_4 + 0x34 or +0x84; observe allocation size vs. copy length behavior. - With the feature enabled, confirm RtlULongAdd returns an error and the function exits with the documented failure code (-0x3ffffbd5) instead of proceeding. - Confirm IoAllocateMdl uses the corrected unsigned length and that cleanup zeroes the length on failure. - Exercise the credential-marshal rejection (MRxSmbCreateSrvCall) - Provide a server name buffer that CredUnmarshalTargetInfo recognizes as marshaled target info. - With the feature enabled, expect immediate failure with STATUS_INVALID_PARAMETER and WPP telemetry reflecting the status. - With the feature disabled, confirm the request proceeds past this point. - Verify input normalization by supplying a name with a leading backslash and checking the adjusted pointer/length used for the marshal check. - Validate failure-path cleanup (MRxSmbCreateSrvCall) - Induce allocation failures (e.g., force ExAllocatePoolWithTag for the server name or phase context to fail) and verify: - State at lVar3 + 0x20 is zeroed. - Server entry is dereferenced. - Phase context buffer is freed. - Correct NTSTATUS is returned and telemetry logs the status. - Sanity checks on bounded copies - For server name allocation/copy, provide varying UNICODE_STRING lengths (including zero and maximum typical sizes) and verify allocations match the source length and that RtlCopyUnicodeString writes within bounds. These steps will confirm the mitigations, surface any lingering feature-gating gaps, and validate robustness of the new error paths. --- ## Security Relevance Evaluation **Level:** CRITICAL **Score:** 85 **Summary:** The report identifies a feature-gated fix for an integer overflow in RxCeEncryptData that could cause a kernel heap overflow during SMB encryption, and adds hardening in MRxSmbCreateSrvCall to reject credential-marshal-like server names. Because both mitigations are behind feature flags, prior vulnerable behavior can persist if the features are disabled.
Actual diff (click to expand)
```diff --- MRxSmbCreateSrvCall +++ MRxSmbCreateSrvCall @@ -1,16 +1,162 @@ uint MRxSmbCreateSrvCall(undefined8 param_1,longlong param_2) { - uint uVar1; + short *psVar1; + short sVar2; + longlong lVar3; + longlong lVar4; + ushort *puVar5; + undefined4 uVar6; + undefined4 uVar7; + undefined4 uVar8; + undefined8 uVar9; + bool bVar10; + uint uVar11; + int iVar12; + undefined7 extraout_var; + longlong lVar13; + short *_Dst; + short *psVar14; + short *psVar15; + short *local_res8; + ushort local_38; + ushort uStack_36; + ushort uStack_34; + ushort uStack_32; + short *psStack_30; - if ((*(uint *)(*(longlong *)(param_2 + 0x20) + 0x78) & 0x400) == 0) { - uVar1 = SmbCeCreateSrvCall(param_2); - return uVar1; + if ((*(uint *)(*(longlong *)(param_2 + 0x20) + 0x78) & 0x400) != 0) { + if ((Microsoft_Windows_SMBClientEnableBits & 1) != 0) { + Template_qqq(param_1,&CreateSrvCallError,*(longlong *)(param_2 + 0x20) + 0x18c,0xc00000be); + } + return 0xc00000be; } + lVar3 = *(longlong *)(param_2 + 0x108); + _Dst = (short *)0x0; + lVar4 = *(longlong *)(param_2 + 0x20); + if (*(longlong *)(lVar3 + 0xd8) == 0xffffffff) { + StartSubRedirectorForDialect(MRxSmbDeviceObject,1); + } + uVar11 = SubRdrPreClaimSrvCall(lVar3,param_2); + psVar15 = _Dst; + psVar14 = _Dst; + if (uVar11 == 0xc0000016) { + puVar5 = *(ushort **)(lVar3 + 0x40); + local_38 = *puVar5; + uStack_36 = puVar5[1]; + uStack_34 = puVar5[2]; + uStack_32 = puVar5[3]; + psStack_30 = *(short **)(puVar5 + 4); + if ((1 < local_38) && (*psStack_30 == 0x5c)) { + psStack_30 = psStack_30 + 1; + local_38 = local_38 - 2; + uStack_36 = uStack_36 - 2; + } + bVar10 = EvaluateCurrentState(&g_Feature_2181565755_56614078_FeatureDescriptorDetails); + if ((int)CONCAT71(extraout_var,bVar10) != 0) { + iVar12 = CredUnmarshalTargetInfo(psStack_30,local_38,0,0); + uVar11 = 0xc000000d; + if (iVar12 != -0x3ffffff3) { + if ((((undefined8 **)WPP_GLOBAL_Control != &WPP_GLOBAL_Control) && + ((*(uint *)((longlong)WPP_GLOBAL_Control + 0x2c) & 1) != 0)) && + (*(char *)((longlong)WPP_GLOBAL_Control + 0x29) != '\0')) { + WPP_SF_Z(WPP_GLOBAL_Control[3],0xb,&WPP_2876989b72e03b5952b18ed47e9e9657_Traceguids, + &local_38); + } + goto LAB_0; + } + } + uVar11 = SmbCeQueryServerAvailability(&local_38,1); + if (-1 < (int)uVar11) { + local_res8 = (short *)0x0; + uVar11 = SmbCeConstructServerEntry(lVar3,(longlong *)&local_res8); + psVar15 = local_res8; + psVar14 = (short *)0x0; + if (uVar11 == 0) { + uVar9 = *(undefined8 *)(param_2 + 0x58); + lVar13 = *(longlong *)(param_2 + 0x70); + *(undefined8 *)(local_res8 + 0x70) = *(undefined8 *)(param_2 + 0x50); + *(undefined8 *)(local_res8 + 0x74) = uVar9; + uVar6 = *(undefined4 *)(param_2 + 100); + uVar7 = *(undefined4 *)(param_2 + 0x68); + uVar8 = *(undefined4 *)(param_2 + 0x6c); + *(undefined4 *)(local_res8 + 0x78) = *(undefined4 *)(param_2 + 0x60); + *(undefined4 *)(local_res8 + 0x7a) = uVar6; + *(undefined4 *)(local_res8 + 0x7c) = uVar7; + *(undefined4 *)(local_res8 + 0x7e) = uVar8; + *(undefined8 *)(local_res8 + 0x80) = *(undefined8 *)(param_2 + 0x70); + if (lVar13 != 0) { + RxReferenceCredential(); + } + psVar14 = _Dst; + if (*(longlong *)(psVar15 + 0x80) != 0) { + psVar14 = *(short **)(*(longlong *)(psVar15 + 0x80) + 0x30); + } + psVar15[0x141] = 0; + psVar1 = psVar15 + 0x140; + *psVar1 = 0; + psVar15[0x144] = 0; + psVar15[0x145] = 0; + psVar15[0x146] = 0; + psVar15[0x147] = 0; + if ((psVar14 != (short *)0x0) && (*psVar14 != 0)) { + lVar13 = ExAllocatePoolWithTag(0x200,*psVar14,0x734d6d53); + *(longlong *)(psVar15 + 0x144) = lVar13; + if (lVar13 == 0) { + uVar11 = 0xc000009a; + goto LAB_0; + } + sVar2 = *psVar14; + psVar15[0x141] = sVar2; + *psVar1 = sVar2; + RtlCopyUnicodeString(psVar1,psVar14); + } + _Dst = (short *)ExAllocatePoolWithTag(0x200,0x48,0x734d6d53); + if (_Dst == (short *)0x0) { + uVar11 = 0xc000009a; + goto LAB_0; + } + memset(_Dst,0,0x48); + *(longlong *)(_Dst + 8) = param_2; + *(code **)_Dst = SmbCeCompleteSrvCallConstructionPhase2; + *(short **)(_Dst + 0x10) = _Dst + 0x18; + *(short **)(_Dst + 0xc) = psVar15; + _Dst[0x14] = 0; + _Dst[0x15] = 0; + _Dst[0x16] = 0; + _Dst[0x17] = 0; + uVar11 = SmbCepEstablishServerConnection(psVar15,_Dst,_Dst + 0x18); + psVar14 = _Dst; + } + } + } + else if ((((undefined8 **)WPP_GLOBAL_Control != &WPP_GLOBAL_Control) && + ((*(uint *)((longlong)WPP_GLOBAL_Control + 0x2c) & 0x40) != 0)) && + (1 < *(byte *)((longlong)WPP_GLOBAL_Control + 0x29))) { + WPP_SF_qZ(WPP_GLOBAL_Control[3],10,&WPP_2876989b72e03b5952b18ed47e9e9657_Traceguids,lVar3, + *(undefined2 **)(lVar3 + 0x40)); + } + _Dst = psVar14; + if (-1 < (int)uVar11) { + return uVar11; + } +LAB_0: if ((Microsoft_Windows_SMBClientEnableBits & 1) != 0) { - Template_qqq(param_1,&CreateSrvCallError,*(longlong *)(param_2 + 0x20) + 0x18c,0xc00000be); + Template_qqq(WPP_GLOBAL_Control,&CreateSrvCallError,lVar4 + 0x18c,uVar11); } - return 0xc00000be; + if ((((undefined8 **)WPP_GLOBAL_Control != &WPP_GLOBAL_Control) && + ((*(uint *)((longlong)WPP_GLOBAL_Control + 0x2c) & 1) != 0)) && + (*(char *)((longlong)WPP_GLOBAL_Control + 0x29) != '\0')) { + WPP_SF_qL(WPP_GLOBAL_Control[3],0xc,&WPP_2876989b72e03b5952b18ed47e9e9657_Traceguids,lVar3); + } + *(undefined8 *)(lVar3 + 0x20) = 0; + if (psVar15 != (short *)0x0) { + SmbCeDereferenceServerEntryEx((longlong)psVar15,'\0'); + } + if (_Dst != (short *)0x0) { + ExFreePoolWithTag(_Dst,0); + } + return uVar11; } --- RxCeEncryptData +++ RxCeEncryptData @@ -1,44 +1,57 @@ int RxCeEncryptData(undefined8 param_1,undefined8 *param_2,undefined8 param_3,ULONG param_4, - longlong *param_5,int *param_6) + longlong *param_5,uint *param_6) { undefined4 *puVar1; PUCHAR pUVar2; - int *piVar3; + bool bVar3; int iVar4; - longlong lVar5; + undefined7 extraout_var; + undefined8 uVar5; longlong lVar6; + longlong lVar7; + uint local_38 [2]; + undefined1 local_30 [8]; - piVar3 = param_6; - *param_6 = param_4 + 0x34; - lVar5 = ExAllocatePoolWithTag(0x200,param_4 + 0x84,0x66426d53); - if (lVar5 == 0) { + bVar3 = EvaluateCurrentState(&g_Feature_2962494776_56195954_FeatureDescriptorDetails); + if ((int)CONCAT71(extraout_var,bVar3) == 0) { + *param_6 = param_4 + 0x34; + local_38[0] = param_4 + 0x84; + } + else { + uVar5 = RtlULongAdd(0x34,param_4,param_6); + if (((int)uVar5 < 0) || (uVar5 = RtlULongAdd(0x50,*param_6,local_38), (int)uVar5 < 0)) { + return -0x3ffffbd5; + } + } + lVar6 = ExAllocatePoolWithTag(0x200,local_38[0],0x66426d53); + if (lVar6 == 0) { iVar4 = -0x3fffff66; } else { - puVar1 = (undefined4 *)(lVar5 + 0x50); - *(undefined8 *)(lVar5 + 0x7c) = param_1; - pUVar2 = (PUCHAR)(lVar5 + 0x84); + puVar1 = (undefined4 *)(lVar6 + 0x50); + *(undefined8 *)(lVar6 + 0x7c) = param_1; + pUVar2 = (PUCHAR)(lVar6 + 0x84); *puVar1 = 0x424d53fd; - *(ULONG *)(lVar5 + 0x74) = param_4; - *(undefined4 *)(lVar5 + 0x78) = 0x10000; - RtlCopyMdlToBuffer(param_3,0,pUVar2,param_4,¶m_6); + *(ULONG *)(lVar6 + 0x74) = param_4; + *(undefined4 *)(lVar6 + 0x78) = 0x10000; + RtlCopyMdlToBuffer(param_3,0,pUVar2,param_4,local_30); iVar4 = SmbCryptoEncrypt(param_2,(longlong)puVar1,pUVar2,param_4,pUVar2); if (-1 < iVar4) { - lVar6 = IoAllocateMdl(puVar1,*piVar3,0,0,0); - *param_5 = lVar6; - if (lVar6 != 0) { - MmBuildMdlForNonPagedPool(lVar6); + lVar7 = IoAllocateMdl(puVar1,*param_6,0,0,0); + *param_5 = lVar7; + if (lVar7 != 0) { + MmBuildMdlForNonPagedPool(lVar7); *(ushort *)(*param_5 + 10) = *(ushort *)(*param_5 + 10) | 0x1000; return 0; } iVar4 = -0x3fffff66; } - ExFreePoolWithTag(lVar5,0x66426d53); + ExFreePoolWithTag(lVar6,0x66426d53); } *param_5 = 0; - *piVar3 = 0; + *param_6 = 0; return iVar4; } ```
So now the fun part begins, where we verify the results and digging deeper into the binary …
Case Study: Integer Wraparound and Heap-based Buffer Overflow
In the example above, a potential integer overflow/wraparound in RxCeEncryptData
(SMBv3 with encryption) that later leads to a heap-based buffer overflow was highlighted.
So let’s analyze the patch manually.
Looking at the older version of mrxsmb.sys
, we quickly spotted the wraparound.
It originates from the fourth argument passed in by the caller:
Integer wraparound
Now, let’s take a look at this bug in action:
The image below illustrates a call to RxCeEncryptData
during an SMBv3 encryption process, where a payload length of 0xFFFFFF80
is passed in the R9
register.
Call to RxCeEncryptData
The integer wraparound occurs (0xFFFFFF80
+ 0x84
= 0x00000004
), and the value is used as second parameter (RDX
) to ExAllocatePoolWithTag
:
Memory allocation
RtlCopyMdlToBuffer
then attempts to copy 0xFFFFFF80
bytes (R9
) into the previously allocated small buffer::
RtlCopyMdlToBuffer
Finally, the expected page fault occurs.
System error
Blue Screen of Death (BSOD)
Great, our automated approach flagged the bug. In the newer version, we quickly identified the fix:
Fixed integer wraparound
Ultimately, the issue was fixed and we assume this was the root cause of CVE-2025-32718.
Caveats
While successfully identifying fixed bugs, particularly memory-related ones, LLMs also tend to hallucinate and generate false positives. This occurs especially when the model attempts to detect new vulnerabilities that may have been introduced by the patch. Although this is a common limitation of AI-based static code analysis, it becomes even more pronounced when the submitted code lacks sufficient context.
Another issue is the reliability of decompiled output. The tool consumes Ghidra pseudocode, which can be inaccurate. Common pitfalls include wrong return value, signed versus unsigned mismatches that flip comparisons, and misintepreted structures or bitfields.
In regard of selecting models, we observed significantly better results, including more detail, clearer explanations with code references, and fewer false positives, when using advanced reasoning models such as o3, GPT-5 (Thinking) or Opus 4.1.
Conclusion
We successfully employed an automated patch-diffing approach in combination with an AI-driven workflow to identify silent fixes and known vulnerabilities or bugs.
At the time of publication, this method had been used on 32 Windows driver targets across two Patch Tuesdays. Despite various false positives and occasional hallucinations, the tool did not uncover any previously unknown vulnerabilities introduced by the patch itself.
This once again underscores that relying solely on LLMs without proper validation has its limits.
Nevertheless, achieving this objective seems well within reach in the near future. With growing experience and a broader set of results, we will be able to fine-tune prompts, supply additional code context where needed, and potentially implement dedicated validation mechanisms.
Check out diffalayze and share your feedback with us!