NSF Award: AICyberLake - Phuong Cao @ NCSA UIUC

Illustration derived from the Art of HPC exhibit in ACM/IEEE Supercomputing '24.

Overview

The AICyberLake project curates a security data lake by sourcing cyberattacks from the DeltaAI system at NCSA and its peer supercomputing centers. The data includes Zeek network cryptographic metadata, graphics processing unit (GPU) interconnect vulnerabilities, and ground truth incident reports. The resulting data lake provides a real-time, anonymized stream of attack attempts to vetted research teams for evaluating their agentic AI-based detection models against unseen adversaries.

The data lake is in development and will be generally available in 2026/H2. An interest form is available at https://go.illinois.edu/aicyberlake-interest-form. Please fill out the form or contact us if you have questions about accessing or using our resources. Instructions for accessing the data lake will follow soon.

Welcome to the official website for the "Live Evaluations of Real-World Security Data Lake from National Cyberinfrastructure" project, also known as AICyberLake. This project is funded by the National Science Foundation (NSF)'s Cybersecurity Innovation for Cyberinfrastructure (CICI) program and supported by the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign (UIUC).

Artificial Intelligence (AI)-driven cyberattack detection is essential for safeguarding the U.S. supercomputing infrastructure. Research in AI relies on this national supercomputing infrastructure, but this critical resource is vulnerable to cyberattacks. Securing this infrastructure requires an extensive understanding of historical security incidents, providing a longitudinal perspective on trends, seasonality, and the evolution of cyberattacks. Without this historical context and insight into emerging AI workloads, the research community is left to react rather than preempt futuristic threats, such as AI-driven malware, quantum-resistant vulnerabilities, and machine learning model supply chain backdoor attacks, leaving scientific breakthroughs vulnerable.

Mailing list

https://lists.illinois.edu/lists/info/aicyberlake

Objectives

The AICyberLake project aims to:

Reinforce public trust in running AI workloads within cyberinfrastructure.
Provide traces of existing and potentially novel novel attacks.
Educate the next generation of the cybersecurity AI research workforce.

The AICyberLake team will work with research teams to analyze attacks targeting U.S. supercomputing infrastructure and provide an API (Application Programming Interface) to inform the broader community by contributing attack metadata to policymakers such as the National Institute of Standards and Technology (NIST).

People

Phuong Cao

Role: Principal Investigator (PI)

Affiliation: National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign

Email: pcao3@illinois.edu

Ravishankar Iyer

Role: Co-Principal Investigator (Co-PI)

Affiliation: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign

Data and Resources

We are committed to open science and will make our data and resources publicly available where appropriate and feasible. The security data lake curated by this project will provide a real-time, anonymized stream of attack attempts to vetted research teams. Only data related to open-science systems, e.g., NSF-funded, will be available for research.

Accessing Data Lake

An interest form is available at https://go.illinois.edu/aicyberlake-interest-form. Please fill out the form or contact us if you have questions about accessing or using our resources. Instructions for accessing the data lake will follow soon.

Documents needed for Accessing the Data Lake

We want to make onboarding onto our data lake as smooth as possible. To do this, we balance usability and security. A team requesting access to the data lake should prepare the following documents for both the team lead (PI) and team members.
1. (REQUIRED) NSF Biographical Sketch, certified using SciENcv (see PAPPG Chapter II.D.2.h(i)). This requirement is waived for PI/Senior Personnel with at least one active NSF award.
2. (REQUIRED) Core IRB Training, Privacy and Confidentiality – SBE (ID 505)
3. (OPTIONAL) Research Security training

Data Catalog

To be described.

Computing Resources

While the AICyberLake provides security data, accelerated computing resources (GPUs) can be requested through other programs such as the followings.

Program/Resource	Description	Link
NCSA Jupyter	Provides no-cost NVIDIA A100 GPUs for Illinois researchers	https://jupyter.ncsa.illinois.edu
SPHERE	Security and Privacy Heterogeneous Environment for Reproducible Experimentation	https://sphere-project.net/
FABRIC testbed	Infrastructure to explore impactful new ideas that are impossible or impractical with the current Internet	https://portal.fabric-testbed.net/
Chameleon Cloud	Accelerated computing resources	https://www.chameleoncloud.org/
National Research Platform	Accelerated computing resources	https://nrp.ai/
NAIRR Pilot	Accelerated computing resources	https://nairrpilot.org/
NSF ACCESS	Accelerated computing resources	https://access-ci.org/
DOE INCITE	Accelerated computing resources	https://doeleadershipcomputing.org/
NERSC	Accelerated computing resources	https://www.nersc.gov/

News and Events

Stay updated on the latest news and events related to the AICyberLake project:

August 1, 2025: Project officially begins!
More news and events will be posted here soon.

Publications

A list of publications resulting from this project will be posted here as they become available. Please check back for updates.

Related Publications

A list of related publications using resiliency data, which broadly includes security and reliability, from NCSA and its partners are included below as samples.

Authors	Title	Year	Full Citation
Cui, Shengkun, Archit Patke, Ziheng Chen, Aditya Ranjan, Hung Nguyen, Phuong Cao, Brett Bode et al.	Characterizing Modern GPU Resilience and Impact in HPC Systems: A Case Case Study of A100 GPUs.	2025	Cui, Shengkun, Archit Patke, Ziheng Chen, Aditya Ranjan, Hung Nguyen, Phuong Cao, Brett Bode et al. "Characterizing Modern GPU Resilience and Impact in HPC Systems: A Case Study of A100 GPUs." In 2025 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 1-6. IEEE, 2025.
Sowa, Jakub, Bach Hoang, Advaith Yeluru, Steven Qie, Anita Nikolich, Ravishankar Iyer, and Phuong Cao.	Post-quantum cryptography (pqc) network instrument: Measuring pqc adoption rates and identifying migration pathways.	2024	Sowa, Jakub, Bach Hoang, Advaith Yeluru, Steven Qie, Anita Nikolich, Ravishankar Iyer, and Phuong Cao. "Post-quantum cryptography (pqc) network instrument: Measuring pqc adoption rates and identifying migration pathways." In 2024 IEEE International Conference on Quantum Computing and Engineering (QCE), vol. 1, pp. 1835-1846. IEEE, 2024.
Yang, Limin, Zhi Chen, Chenkai Wang, Zhenning Zhang, Sushruth Booma, Phuong Cao, Constantin Adam et al.	True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center.	2024	Yang, Limin, Zhi Chen, Chenkai Wang, Zhenning Zhang, Sushruth Booma, Phuong Cao, Constantin Adam et al. "True attacks, attack attempts, or benign triggers? an empirical measurement of network alerts in a security operations center." In 33rd USENIX Security Symposium (USENIX Security 24), pp. 1525-1542. 2024.
Tay, Vanessa, Xinran Li, Daisuke Mashima, Bennet Ng, Phuong Cao, Zbigniew Kalbarczyk, and Ravishankar K. Iyer.	Taxonomy of fingerprinting techniques for evaluation of smart grid honeypot realism.	2023	Tay, Vanessa, Xinran Li, Daisuke Mashima, Bennet Ng, Phuong Cao, Zbigniew Kalbarczyk, and Ravishankar K. Iyer. "Taxonomy of fingerprinting techniques for evaluation of smart grid honeypot realism." In 2023 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), pp. 1-7. IEEE, 2023.
Chung, Keywhan, Phuong Cao, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer.	StealthML: data-driven malware for stealthy data exfiltration.	2023	Chung, Keywhan, Phuong Cao, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. "StealthML: data-driven malware for stealthy data exfiltration." In 2023 IEEE International Conference on Cyber Security and Resilience (CSR), pp. 16-21. IEEE, 2023.
Basney, Jim, Phuong Cao, and Terry Fleury.	Investigating root causes of authentication failures using a saml and oidc observatory.	2020	Basney, Jim, Phuong Cao, and Terry Fleury. "Investigating root causes of authentication failures using a saml and oidc observatory." In 2020 IEEE 6th International Conference on Dependability in Sensor, Cloud and Big Data Systems and Application (DependSys), pp. 119-126. IEEE, 2020.
Cao, Phuong M., Yuming Wu, Subho S. Banerjee, Justin Azoff, Alex Withers, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer.	{CAUDIT}: Continuous auditing of {SSH} servers to mitigate {Brute-Force} attacks.	2019	Cao, Phuong M., Yuming Wu, Subho S. Banerjee, Justin Azoff, Alex Withers, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. "{CAUDIT}: Continuous auditing of {SSH} servers to mitigate {Brute-Force} attacks." In 16th USENIX symposium on networked systems design and implementation (NSDI 19), pp. 667-682. 2019.

Contact Us

Mailing list

https://lists.illinois.edu/lists/info/aicyberlake

Interest form

https://go.illinois.edu/aicyberlake-interest-form

For general inquiries about the project, please contact:

Principal Investigator: Phuong Cao - pcao3@illinois.edu

Recipient Sponsored Research Office:
University of Illinois at Urbana-Champaign
506 S WRIGHT ST
URBANA, IL US 61801-3620
Phone: (217) 333-2187

Partners

We are partnerting with SPHERE, FABRIC testbed, SDSC, and NIST.

This material is based upon work supported by the National Science Foundation under Grant No. 2530738

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Additional support for this project is provided by the National Center for Supercomputing Applications (NCSA) at the University of Illinois Urbana-Champaign.