The Lucy Family Institute for Data & Society at the University of Notre Dame is a multidisciplinary research center focused on advancing data science and artificial intelligence for social good. Through the iTREDS (Interdisciplinary Training and Research in Ethical Data Science) program, we integrate technical expertise in data science with ethical, social, and policy considerations. Our capabilities include applied data science, machine learning, natural language processing, data visualization, and responsible AI. We specialize in translating complex data into actionable insights while ensuring that solutions are equitable, transparent, and socially responsible. The program emphasizes experiential learning through industry partnerships, research, and applied projects.
EmpowerHER: Community-Driven Breast Cancer Digital Health Platform
Problem: Women in Northeastern Indiana face significant structural and informational barriers to breast cancer awareness, early detection, and care navigation.
Scope: Led an interdisciplinary, community-engaged project that evolved from a localized mobile app concept into a scalable, web-based platform. The project incorporated surveys, focus groups, and qualitative analysis to ensure culturally responsive design and accessibility.
Deliverables: Developed a research-informed web prototype with integrated chatbot functionality, providing reliable educational resources, screening guidance, and health system navigation support. The solution emphasized ethical AI and public health integration to improve accessibility and community impact.
Probe-Guided Machine Unlearning for Responsible AI Systems
Problem: Organizations face increasing regulatory and ethical pressure (e.g., GDPR “right to be forgotten”) to remove harmful or sensitive data from AI systems, while existing methods remain vulnerable to adversarial attacks.
Scope: Led the development of a novel framework addressing machine unlearning challenges by defining the technical approach, aligning it with real-world compliance needs, and ensuring robustness against adversarial threats.
Deliverables: Designed the PARP (Probe-Guided Adversarial Robust Probing) framework, which uses adversarially trained probes to identify and neutralize harmful data representations within models. Delivered a scalable, cost-effective solution that enhances privacy, improves AI safety, and reduces the need for full model retraining
Benchmarking Large Language Models as Coding Assistants
Problem: While LLMs are increasingly used as coding assistants, existing benchmarks are limited—focusing primarily on correctness in narrow settings (e.g., Python, LeetCode-style problems) and overlooking code efficiency and real-world applicability.
Scope: Led the design of a comprehensive benchmarking framework that evaluates LLM performance on real-world coding tasks across multiple programming languages. Defined evaluation metrics emphasizing code efficiency, scalability, and practical development use cases.
Deliverables: Developed a scalable benchmarking tool using GitHub-sourced coding problems, enabling comparative analysis of LLMs across diverse environments. Delivered a framework that improves model transparency and helps industry partners identify the most effective models for specific development tasks.