Kunal Mukherjee

Kunal Mukherjee, Ph.D

I am a Postdoctoral Research Associate in the Department of Computer Science at Virginia Tech (Blacksburg, VA), working closely with Dr. Murat Kantarcıoğlu. I received my M.S. and Ph.D. in Computer Science from The University of Texas at Dallas.

🔍 My current research focus is on adversarial robustness of graph learning and the governed, forensic use of LLMs in system provenance and security operations. I develop agentic, retrieval-augmented pipelines that transform threat reports into actionable signals, and I study how to make GNNs and LLMs trustworthy under real-world attacks and constraints.

Agentic RAG for Threat Intel: Designed a retrieval-augmented pipeline where an LLM-based agent parses threat reports and extracts indicators of compromise (IOCs) and TTPs for downstream detection and hunt workflows.
Adversarial Attacks on GNNs: Building realistic attack generators and evaluation suites for domains beyond system logs, including blockchain and social networks, to stress-test link/node classification and detection tasks.
LLM-Guided Forensics & Governance: Developing methods for LLM-assisted provenance forensics (triage, explanation, evidence tracing) alongside policy and guardrails for reliable, auditable agent behavior in security settings.
Provenance-Centered Detection: Advancing PIDS pipelines that fuse graph-structured system activity with LLM reasoning and GNN inference for robust anomaly detection and interpretable investigations.

Prior work (Ph.D., thesis): I expanded the scope of provenance-based intrusion detection systems (PIDS) to IoT settings, validated their robustness via an adversarial attack generation framework, and improved explainability for GNN-based PIDS. I also collaborated with industry partners (e.g., A*STAR Institute for Infocomm Research, Acronis, Inc., and Guardora) and applied GNNs to increase relevance and diversity in recommendation systems, including building an explanation framework for link-prediction during my time at Zillow Group.

He is currently in the academic job market for Fall '26 or applied scientist researcher positions. Please contact him (kunmukh at gmail.com) if you would like to discuss potential collaborations. Please find his research statement, teaching statement, and community statement.

Security Research

Degree: Doctorate
City: Blacksburg, Virginia
Phone: +1 812-550-3890

Email: [email protected]
Website: www.kunmukh.com
Specialty: Graph Learning, Graph Privacy, Explainable AI, Malware and APT Threat Analysis

Research Interests

Anomaly Detection and Malware Classification
Explainable ML
Adversarial ML
Differential Privacy

Security and Goverannce in LLMs
Recommendation Systems
Graph Learning: Data Mining and Pattern Recognition
System/IoT/Network Security

Projects

Realizable Adversarial System Action Generation Framework

Automated, data-driven framework that generates adversarial real-world attacks capable of evading ML-based IDS, [USENIX '23].

LLM-driven Forencis and Intrusion Detection Agents

Built an LLM-based agent that mines and interprets threat reports, then queries provenance data sources to investigate reported attacks, [ arXiv '25, Demo].

Adversarial Robustness of Graph Neural Networks

Designing realistic adversarial attacks on GNNs across domains (e.g., blockchain, social media, and system provenance) to evaluate and strengthen robustness of graph learning models.

Explainability Framework for GNN-based Intrusion Detectors

Ground truth–aware explanation framework for enhancing the interpretability of GNN-based IDS, [KDD '25, arXiv '23].

Privacy-Preserving Federated Intrusion Detection for IoT

Federated IDS framework customized for IoT-specific constraints, integrating differential privacy to detect evasive attacks effectively, [ACNS '24].

Differential Privacy for System Provenance Graphs

Developed a differential privacy framework for heterogeneous provenance graphs, balancing privacy guarantees with detection accuracy, [ACNS '25].

Dissertation

IoT Integration, Adversarial Attacks, and Threat Explanations in Provenance-Based Intrusion Detection Systems

Kunal Mukherjee.

UTD Press. May, 2025.

System provenance analysis has become the predominant approach for defending against sophisticated attackers. System provenance analysis captures causal and informational flow dependencies by correlating telemetry data across key system resources such as processes, files, and network sockets. These dependencies are efficiently represented as system prove- nance graphs, which are directed, heterogeneous, and multi-attributed. These system prove- nance graphs can be used by Provenance-based Intrusion Detection Systems (PIDSs) to train adaptive behavioral Machine Learning (ML) models for intrusion detection tasks. PIDSs can effectively thwart Advanced Persistent Threat (APT) actors and Fileless Malware writers since they can measure the program behavioral deviations. Graph Neural Networks (GNNs) are the de-facto standard for learning from graphs. Consequently, GNN-based PIDS can detect zero-day and mimicry attacks by measuring deviations in program behavior.

Despite their undeniable advantages, modern PIDSs still face several open problems: (1) current system provenance analysis techniques are designed primarily for resource-rich en- vironments, leaving IoT ecosystems vulnerable; (2) the resilience of PIDS against dedicated adversaries have not been fully examined; (3) GNN-based PIDS operate as black-box models, lacking transparency in their detection decisions.

This dissertation addresses these three key challenges in system provenance analysis: ex- tending provenance analysis to IoT environments, improving robustness against adversarial attacks, and enhancing the explainability of GNN-based PIDS.

First, we introduce ProvIoT, a federated edge-cloud security framework that brings PIDSs to resource-constrained IoT devices. ProvIoT leverages federated learning to minimize network and computational overhead while maintaining high accuracy in detecting stealthy attacks, even in diverse real-world environments.

Next, we present ProvNinja, an adversarial testing framework designed to evaluate the robustness of PIDSs against realistic evasive attacks. ProvNinja generates adversarial attack variants that closely mimic benign system behaviors, allowing it to effectively test the resilience of State-of-The-Art (SOTA) PIDSs. Our experiments reveal vulnerabilities in current security models, leading to reduced detection rates in realistic attack scenarios.

Finally, we develop ProvExplainer, an explainability framework for GNN-based PIDSs to provide interpretable, security-focused explanations. ProvExplainer projects the GNN’s decision boundaries onto the interpretable surrogate model’s feature space (e.g., discrimi- native subgraph patterns). By integrating with SOTA GNN explainers, ProvExplainer improves both precision and recall in explaining stealthy attacks (i.e., APTs campaigns and Fileless malware) detections, offering a transparent and verifiable tool for security operations.

Together, these contributions offer scalable, robust, and explainable security solutions for increasingly interconnected and vulnerable digital infrastructure.

Paper Slides

Publications

Evading Provenance-Based ML Detectors with Adversarial System Actions

Kunal Mukherjee, Joshua Wiedemeier, Tianhao Wang, James Wei, Feng Chen, Muhyun Kim, Murat Kantarcioglu, and Kangkook Jee.

In Proceedings of Usenix Security. Aug, 2023.

Artifacts evaluated and badges awarded: Available, Functional, Reproducible.

Paper Slides Artifacts Code

ProvIoT: Detecting Stealthy Attacks in IoT through Federated Edge-Cloud Security

Kunal Mukherjee, Joshua Wiedemeier, Qi Wang, Junpei Kamimura, John Junghwan Rhee, James Wei, Zhichun Li, Xiao Yu, Lu-An Tang, Jiaping Gui, Kangkook Jee.

In Proceedings of 22nd International Conference on Applied Cryptography and Network Security. March, 2024.

Internet of Things (IoT) devices have increased drastically in complexity and prevalence within the last decade. Alongside the proliferation of IoT devices and applications, attacks targeting them have gained popularity. Recent large-scale attacks such as Mirai and VPNFilter highlight the lack of comprehensive defenses for IoT devices. Existing security solutions are inadequate against skilled adversaries with sophisticated and stealthy attacks against IoT devices. Powerful provenance-based intrusion detection systems have been successfully deployed in resource-rich servers and desktops to identify advanced stealthy attacks. However, IoT devices lack the memory, storage, and computing resources to directly apply these provenance analysis techniques on the device.

This paper presents ProvIoT, a novel federated edge-cloud security framework that enables on-device syscall-level behavioral anomaly detection in IoT (IoT) devices. ProvIoT applies federated learning techniques to overcome data and privacy limitations while minimizing network overhead. Infrequent on-device training of the local model requires less than 10% CPU overhead; syncing with the global models requires sending and receiving ∼2MB over the network. During normal offline operation, ProvIoT periodically incurs less than 10% CPU overhead and less than 65MB memory usage for data summarization and anomaly detection. Our evaluation using heterogeneous real-world IoT applications shows that ProvIoT detects fileless malware and stealthy APT attacks with an average F1 score of 0.97, confirming its effectiveness. ProvIoT is a step towards extending provenance analysis to resource-constrained IoT devices, beginning with well-resourced IoT devices such as the RaspberryPi, Jetson Nano, and Google TPU.

Paper Slides Code

ProvDP: Differential Privacy for Provenance Dataset

Kunal Mukherjee, Jonathan Yu, Partha De, Dinil Mon Divakaran

In Proceedings of 23nd International Conference on Applied Cryptography and Network Security. June, 2025.

Paper Slides Code

Z-REx: GNN-based Recommendation Explanation using Human-interpretable Language

Kunal Mukherjee, Zachary Harrison, Saeid Balaneshin

(Oral) KDD Workshop on ML on Graphs in the Era of Generative AI (MLoG-GenAI@KDD). August, 2025.

Paper Slides Poster

Robust Explanation of GNN-based IDS for System Provenance with Graph Structural Features

Kunal Mukherjee, Joshua Wiedemeier, Tianhao Wang, Muhyun Kim, Feng Chen, Murat Kantarcioglu, Kangkook Jee.

arXiv. Jun, 2023. (submitted to NDSS 2026).

Advanced cyber threats (e.g., Fileless Malware and Advanced Persistent Threat (APT)) have driven the adoption of provenance-based security solutions. These solutions employ Machine Learning (ML) models for behavioral modeling and critical security tasks such as malware and anomaly detection. However, the opacity of ML-based security models limits their broader adoption, as the lack of transparency in their decision-making processes restricts explainability and verifiability. We tailored our solution towards Graph Neural Network (GNN)-based security solutions since recent studies employ GNNs to comprehensively digest system provenance graphs for security critical tasks.

To enhance the explainability of GNN-based security models, we introduce PROVEXPLAINER, a framework offering instance-level security-aware explanations using an interpretable surrogate model. PROVEXPLAINER’s interpretable feature space consists of discriminant subgraph patterns and graph structural features which can be directly mapped to the system provenance problem space, making the explanations human understandable. Considering both the subgraph patterns and graph structural features, gives PROVEXPLAINER the unique advantage of providing explanations that are sensitive to both local and global contexts.

By considering prominent GNN architectures (e.g., GAT and GraphSAGE) for anomaly detection tasks, we show how PROVEXPLAINER outperformed the current state-of-the-art (SOTA) GNN explainers to deliver security domain and instance-specific explanations. We measure the explanation quality using fidelity+ /fidelity− metric as used by traditional GNN explanation literature, and we incorporate the precision/recall metric where we consider the accuracy of the explanation against the ground-truth. On real-world Fileless Malware and APT datasets, PROVEXPLAINER achieves up to 29% / 27% / 25% higher fidelity+, precision and recall (where higher values are better), and 12% lower fidelity− (where lower values are better) when compared against SOTA GNN explainers.

Paper

ProvSEEK: LLM-Powered Threat Intelligence Extraction and Correlation Framework

Kunal Mukherjee, Murat Kantarcioglu.

arXiv. Sept, 2025. (submitted to USENIX 2026).

System provenance provides a rich forensic trail for analyzing stealthy cyberattacks such as Advanced Persistent Threat (APT) campaigns and fileless malware. However, traditional detection pipelines rely heavily on recognized patterns and lack end-to-end automated mechanisms to integrate external intelligence or reason iteratively about complex attack traces. Therefore, analysts are required to craft ad-hoc queries, correlate disparate evidence, and iteratively reconstruct attack narratives. These approaches often suffer from scalability bottlenecks, limited integration of external threat intelligence, and a lack of automated reasoning support for complex, multi-stage attack campaigns.

We introduce PROVSEEK, an LLM-powered agentic framework for automated provenance-driven forensic analysis and threat intelligence extraction. PROVSEEK employs specialized toolchains to dynamically retrieve relevant context by generating precise, context-aware queries that fuse a vectorized threat report knowledge base with data from system provenance databases. The framework resolves provenance queries, orchestrates multiple role-specific agents to mitigate hallucinations, and synthesizes structured, ground-truth verifiable forensic summaries. By combining agent orchestration with Retrieval-Augmented Generation (RAG) and chain-of-thought (CoT) reasoning, PROVSEEK enables adaptive multi-step analysis that iteratively refines hypotheses, verifies supporting evidence, and produces scalable, interpretable forensic explanations of attack behaviors. By combining provenance data with agentic reasoning, PROVSEEK establishes a new paradigm for grounded agentic forensics to investigate APTs.

We conduct a comprehensive evaluation on publicly available DARPA datasets, demonstrating that PROVSEEK outperforms retrieval-based methods for the intelligence extraction task, achieving a 34% improvement in contextual precision/recall; and for the threat detection task, PROVSEEK achieves 22%/29% higher precision/recall compared to both a baseline agentic AI approach and State-Of-The-Art (SOTA) Provenance-based Intrusion Detection System (PIDS).

Paper Code Slides Demo

ProvCreator: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes

Tianhao Wang, Simon Klancher, Kunal Mukherjee, Joshua Wiedemeier, Feng Chen, Murat Kantarcioglu, Kangkook Jee.

arXiv. Jul, 2025. (submitted to NeurIPS 2025).

Paper Code

Resume

Summary

Kunal Mukherjee

Postdoctoral Research Associate at Virginia Tech working with Dr. Murat Kantarcıoğlu on adversarial robustness of Graph Neural Networks (GNNs) and agentic AI for system provenance forensics. His research bridges graph learning and large language models (LLMs) to develop scalable, interpretable, and privacy-preserving solutions for cybersecurity. He designs frameworks that generate realistic adversarial attacks on GNNs, build RAG-based agent pipelines for automated threat intelligence extraction, and explore LLM-guided forensic analysis and governance for trustworthy adoption of AI in security operations.

His broader expertise spans Adversarial ML, Explainable ML, Anomaly Detection, and Privacy-preserving Generative AI, with a strong record of publishing at top-tier venues (e.g., USENIX Security, ACNS, KDD). He has also collaborated with industry leaders (e.g., Zillow Group) to apply GNNs to recommendation systems, resulting in impactful research and a patent filing.

Education

Doctorate and M.S, Computer Science

Aug 2019 - May 2025

University of Texas at Dallas, Dallas, TX

Dissertation: Iot integration, Adversarial attacks, and Threat explanations in Provenance-based Intrusion Detection Systems
Advisor: Dr. Kangkook Jee and Dr. Murat Kantarcioglu
Qualification Exams: Machine Learning, Algorithms, and Database

Bachelor of Science, Computer Engineering

Aug 2016 - Jun 2019

University of Evansville, Evansville, IN

Senior Thesis: Location Dependent Cryptosystem
Advisor: Late Dr. Dick Blandford and Dr. Donald Roberts
Minors: Computer Science and Engineering Management

Professional Experience

Postdoctoral Research Associate

08/2025 - Present

Department of Computer Science, Virginia Tech, Blacksburg, VA

Conducting research under Dr. Murat Kantarcıoğlu on adversarial robustness of GNNs and LLM adoption in system provenance.
Implementing realistic adversarial attacks on GNNs across domains such as blockchain and social networks to evaluate detection robustness.
Exploring LLM-guided forensics and AI governance for reliable adoption of agentic AI in cybersecurity investigations.

Applied Scientist Intern

05/2024 – 12/2024

Zillow Group, Inc., Dallas, TX

Designed a GNN-based recommendation system, yielding a 40x increase in nDCG and a 60x boost in diversity.
Engineered a novel explainability framework for recommendations to improve transparency and accountability.
Work resulted in an oral research paper at KDD '25 (MLoG-GenAI workshop) and a patent application.

Hardware Research Intern

05/2017 - 07/2019

Ciholas, Inc., Evansville, IN

Designed a proprietary quaternion-based sensor fusion model to accurately extrapolate device orientation.
Deployed in production after extensive regression testing, generating $5M in revenue.

Kunal Mukherjee, Ph.D

Recent News

Security Research

Research Interests

Projects

Realizable Adversarial System Action Generation Framework

LLM-driven Forencis and Intrusion Detection Agents

Adversarial Robustness of Graph Neural Networks

Explainability Framework for GNN-based Intrusion Detectors

Privacy-Preserving Federated Intrusion Detection for IoT

Differential Privacy for System Provenance Graphs

Dissertation

IoT Integration, Adversarial Attacks, and Threat Explanations in Provenance-Based Intrusion Detection Systems

Publications

Evading Provenance-Based ML Detectors with Adversarial System Actions

ProvIoT: Detecting Stealthy Attacks in IoT through Federated Edge-Cloud Security

ProvDP: Differential Privacy for Provenance Dataset

Z-REx: GNN-based Recommendation Explanation using Human-interpretable Language

Robust Explanation of GNN-based IDS for System Provenance with Graph Structural Features

ProvSEEK: LLM-Powered Threat Intelligence Extraction and Correlation Framework

ProvCreator: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes

Resume

Summary

Kunal Mukherjee

Education

Doctorate and M.S, Computer Science

Aug 2019 - May 2025

Bachelor of Science, Computer Engineering

Aug 2016 - Jun 2019

Professional Experience

Postdoctoral Research Associate

08/2025 - Present

Applied Scientist Intern

05/2024 – 12/2024

Hardware Research Intern

05/2017 - 07/2019