Data Skeptic

40 Episodes
Subscribe

By: Kyle Polich

The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.

Unveiling Graph Datasets
Yesterday at 11:15 PM


Network Manipulation
04/30/2025

In this episode we talk with Manita Pote, a PhD student at Indiana University Bloomington, specializing in online trust and safety, with a focus on detecting coordinated manipulation campaigns on social media. 

Key insights include how coordinated reply attacks target influential figures like journalists and politicians, how machine learning models can detect these inauthentic campaigns using structural and behavioral features, and how deletion patterns reveal efforts to evade moderation or manipulate engagement metrics.

Follow our guest

X/Twitter
Google Scholar

Papers in focus
Coordinated Reply Attacks in Influence Operations: C...


The Small World Hypothesis
04/21/2025

Kyle discusses the history and proof for the small world hypothesis.


Thinking in Networks
04/12/2025

Kyle asks Asaf questions about the new network science course he is now teaching.  The conversation delves into topics such as contact tracing, tools for analyzing networks, example use cases, and the importance of thinking in networks.


Fraud Networks
04/01/2025

In this episode we talk with Bavo DC Campo, a data scientist and statistician, who shares his expertise on the intersection of actuarial science, fraud detection, and social network analytics.

Together we will learn how to use graphs to fight against insurance fraud by uncovering hidden connections between fraudulent claims and bad actors.

Key insights include how social network analytics can detect fraud rings by mapping relationships between policyholders, claims, and service providers, and how the BiRank algorithm, inspired by Google’s PageRank, helps rank suspicious claims based on network structure.

Bavo will al...


Criminal Networks
03/17/2025

In this episode we talk with Justin Wang Ngai Yeung, a PhD candidate at the Network Science Institute at Northeastern University in London, who explores how network science helps uncover criminal networks.

Justin is also a member of the organizing committee of the satellite conference dealing with criminal networks at the network science conference in The Netherlands in June 2025.

Listeners will learn how graph-based models assist law enforcement in analyzing missing data, identifying key figures in criminal organizations, and improving intervention strategies.

Key insights include the challenges of incomplete and inaccurate data in...


Graph Bugs
03/10/2025

In this episode today’s guest is Celine Wüst, a master’s student at ETH Zurich specializing in secure and reliable systems, shares her work on automated software testing for graph databases. Celine shows how fuzzing—the process of automatically generating complex queries—helps uncover hidden bugs in graph database management systems like Neo4j, FalconDB, and Apache AGE.

Key insights include how state-aware query generation can detect critical issues like buffer overflows and crashes, the challenges of debugging complex database behaviors, and the importance of security-focused software testing.

We'll also find out which Graph DB...


Organizational Network Analysis
03/03/2025

In this episode, Gabriel Petrescu, an organizational network analyst, discusses how network science can provide deep insights into organizational structures using OrgXO, a tool that maps companies as networks rather than rigid hierarchies. Listeners will learn how analyzing workplace collaboration networks can reveal hidden influencers, organizational bottlenecks, and engagement levels, offering a data-driven approach to improving effectiveness and resilience.

Key insights include how companies can identify overburdened employees, address silos between departments, and detect vulnerabilities where too few individuals hold critical knowledge. Real-life applications range from mergers and acquisitions, where network analysis helps assess company dynamics before...


Organizational Networks
02/25/2025

Is it better to have your work team fully connected or sparsely connected?


In this episode we'll try to answer this question and more with our guest Hiroki Sayama, a SUNY Distinguished Professor and director of the Center for Complex Systems at Binghamton University.


Hiroki delves into the applications of network science in organizational structures and innovation dynamics by showing his recent work of extracting network structures from organizational charts to enable insights into decision-making and performance, He'll also cover how network connectivity impacts team creativity and innovation.


Key insights...


Networks of the Mind
02/18/2025

A man goes into a bar… This is the beginning of a riddle that our guest, Yoed Kennet, an assistant professor at the Technion's Faculty of Data and Decision Sciences, uses to measure creativity in subjects.

In our talk, Yoed speaks about how to combine cognitive science and network science to explore the complexities and decode the mysteries of the human mind.

The listeners will learn how network science provides tools to map and analyze human memory, revealing how problem-solving and creativity emerge from changes in semantic memory structures.

Key insights include the ro...


LLMs and Graphs Synergy
02/10/2025

In this episode, Garima Agrawal, a senior researcher and AI consultant, brings her years of experience in data science and artificial intelligence. Listeners will learn about the evolving role of knowledge graphs in augmenting large language models (LLMs) for domain-specific tasks and how these tools can mitigate issues like hallucination in AI systems.

Key insights include how LLMs can leverage knowledge graphs to improve accuracy by integrating domain expertise, reducing hallucinations, and enabling better reasoning.

Real-life applications discussed range from enhancing customer support systems with efficient FAQ retrieval to creating smarter AI-driven decision-making pipelines.

<...


A Network of Networks
02/04/2025

In this episode, Bnaya Gross, a Fulbright postdoctoral fellow at the Center for Complex Network Research at Northwestern University, explores the transformative applications of network science in fields ranging from infrastructure to medicine, by studying the interactions between networks ("a network of networks").

Listeners will learn how interdependent networks provide a framework for understanding cascading failures, such as power outages, and how these insights transfer to physical systems like superconducting materials and biological networks.

Key takeaways include understanding how dependencies between networks can amplify vulnerabilities, applying these principles to create resilient infrastructure systems, and using...


Auditing LLMs and Twitter
01/29/2025

Our guests, Erwan Le Merrer and Gilles Tredan, are long-time collaborators in graph theory and distributed systems. They share their expertise on applying graph-based approaches to understanding both large language model (LLM) hallucinations and shadow banning on social media platforms.

In this episode, listeners will learn how graph structures and metrics can reveal patterns in algorithmic behavior and platform moderation practices.

Key insights include the use of graph theory to evaluate LLM outputs, uncovering patterns in hallucinated graphs that might hint at the underlying structure and training data of the models, and applying epidemic models...


Fraud Detection with Graphs
01/22/2025

In this episode, Ĺ imon MandlĂ­k, a PhD candidate at the Czech Technical University will talk with us about leveraging machine learning and graph-based techniques for cybersecurity applications.

We'll learn how graphs are used to detect malicious activity in networks, such as identifying harmful domains and executable files by analyzing their relationships within vast datasets.

This will include the use of hierarchical multi-instance learning (HML) to represent JSON-based network activity as graphs and the advantages of analyzing connections between entities (like clients, domains etc.).

Our guest shows that while other graph methods (such as...


Optimizing Supply Chains with GNN
01/15/2025

Thibaut Vidal, a professor at Polytechnique Montreal, specializes in leveraging advanced algorithms and machine learning to optimize supply chain operations.
In this episode, listeners will learn how graph-based approaches can transform supply chains by enabling more efficient routing, districting, and decision-making in complex logistical networks.

Key insights include the application of Graph Neural Networks to predict delivery costs, with potential to improve districting strategies for companies like UPS or Amazon and overcoming limitations of traditional heuristic methods.

Thibaut’s work underscores the potential for GNN to reduce costs, enhance operational efficiency, and provide better wo...


The Mystery Behind Large Graphs
01/10/2025

Our guest in this episode is David Tench, a Grace Hopper postdoctoral fellow at Lawrence Berkeley National Labs, who specializes in scalable graph algorithms and compression techniques to tackle massive datasets.


In this episode, we will learn how his techniques enable real-time analysis of large datasets, such as particle tracking in physics experiments or social network analysis, by reducing storage requirements while preserving critical structural properties.

David also challenges the common belief that giant graphs are sparse by pointing to a potential bias: Maybe because of the challenges that exist in analyzing large dense...


Customizing a Graph Solution
12/16/2024

In this episode, Dave Bechberger, principal Graph Architect at AWS and author of "Graph Databases in Action", brings deep insights into the field of graph databases and their applications.


Together we delve into specific scenarios in which Graph Databases provide unique solutions, such as in the fraud industry, and learn how to optimize our DB for questions around connections, such as "How are these entities related?" or "What patterns of interaction indicate anomalies?"

This discussion sheds light on when organizations should consider adopting graph databases, particularly for cases that require scalable analysis of highly...


Graph Transformations
12/09/2024

In this episode, Adam Machowczyk, a PhD student at the University of Leicester, specializes in graph rewriting and its intersection with machine learning, particularly Graph Neural Networks.

Adam explains how graph rewriting provides a formalized method to modify graphs using rule-based transformations, allowing for tasks like graph completion, attribute prediction, and structural evolution.

Bridging the worlds of graph rewriting and machine learning, Adam's work aspire to  open new possibilities for creating adaptive, scalable models capable of solving challenges that traditional methods struggle with, such as handling heterogeneous graphs or incorporating incremental updates efficiently.

R...


Networks for AB Testing
11/25/2024

In this episode, the data scientist Wentao Su shares his experience in AB testing on social media platforms like LinkedIn and TikTok.

We talk about how network science can enhance AB testing by accounting for complex social interactions, especially in environments where users are both viewers and content creators. These interactions might cause a "spillover effect" meaning a possible influence across experimental groups, which can distort results.

To mitigate this effect, our guest presents heuristics and algorithms they developed ("one-degree label propagation”) to allow for good results on big data with minimal running time and so...


Lessons from eGamer Networks
11/18/2024

Alex Bisberg, a PhD candidate at the University of Southern California, specializes in network science and game analytics, with a focus on understanding social and competitive success in multiplayer online games.

In this episode, listeners can expect to learn from a network perspective about players interactions and patterns of behavior. Through his research on games, Alex sheds light on how network analysis and statistical tests might explain positive contagious behaviors, such as generosity, and explore the dynamics of collaboration and competition in gaming environments. These insights offer valuable lessons not only for game developers in enhancing player ex...


Github Collaboration Network
11/11/2024

In this episode we discuss the GitHub Collaboration Network with Behnaz Moradi-Jamei, assistant professor at James Madison University.  As a network scientist, Behnaz created and analyzed a network of about 700,000 contributors to Github's repository.  The network of collaborators on GitHub was created by identifying developers (nodes) and linking them with edges based on shared contributions to the same repositories. This means that if two developers contributed to the same project, an edge (connection) was formed between them, representing a collaborative relationship network consisting of 32 million such connections.
By using algorithms for Community Detection, Behnaz's analysis reveals insights into ho...


Github Collaboration Network
11/11/2024

In this episode we discuss the GitHub Collaboration Network with Behnaz Moradi-Jamei, assistant professor at James Madison University.  As a network scientist, Behnaz created and analyzed a network of about 700,000 contributors to Github's repository.  The network of collaborators on GitHub was created by identifying developers (nodes) and linking them with edges based on shared contributions to the same repositories. This means that if two developers contributed to the same project, an edge (connection) was formed between them, representing a collaborative relationship network consisting of 32 million such connections.
By using algorithms for Community Detection, Behnaz's analysis reveals insights into ho...


Graphs and ML for Robotics
11/04/2024

We are joined by Abhishek Paudel, a PhD Student at George Mason University with a research focus on robotics, machine learning, and planning under uncertainty, using graph-based methods to enhance robot behavior. He explains how graph-based approaches can model environments, capture spatial relationships, and provide a framework for integrating multiple levels of planning and decision-making.


Graphs for HPC and LLMs
10/29/2024

We are joined by Maciej Besta, a senior researcher of sparse graph computations and large language models at the Scalable Parallel Computing Lab (SPCL). In this episode, we explore the intersection of graph theory and high-performance computing (HPC), Graph Neural Networks (GNNs) and LLMs.


Graph Databases and AI
10/21/2024

In this episode, we sit down with Yuanyuan Tian, a principal scientist manager at Microsoft Gray Systems Lab, to discuss the evolving role of graph databases in various industries such as fraud detection in finance and insurance, security, healthcare, and supply chain optimization. 


Network Analysis in Practice
10/14/2024

Our new season "Graphs and Networks" begins here!  We are joined by new co-host Asaf Shapira, a network analysis consultant and the podcaster of NETfrix – the network science podcast. Kyle and Asaf discuss ideas to cover in the season and explore Asaf's work in the field.


Animal Intelligence Final Exam
10/07/2024

Join us for our capstone episode on the Animal Intelligence season. We recap what we loved, what we learned, and things we wish we had gotten to spend more time on. This is a great episode to see how the podcast is produced. Now that the season is ending, our current co-host, Becky, is moving to emeritus status. In this last installment we got to spend a little more time getting to know Becky and where her work will take her after this. Did Data Skeptic inspire her to learn more about machine learning? Tune in and find out. 


Process Mining with LLMs
09/24/2024

David Obembe, a recent University of Tartu graduate, discussed his Masters thesis on integrating LLMs with process mining tools. He explained how process mining uses event logs to create maps that identify inefficiencies in business processes. David shared his research on LLMs' potential to enhance process mining, including experiments evaluating their performance and future improvements using Retrieval Augmented Generation (RAG).


Open Animal Tracks
09/17/2024

Our guest today is Risa Shinoda, a PhD student at Kyoto University Agricultural Systems Engineering Lab, where she applies computer vision techniques.

She talked about the OpenAnimalTracks dataset and what it was used for. The dataset helps researchers predict animal footprint. She also discussed how she built a model for predicting tracks of animals. She shared the algorithms used and the accuracy they achieved. She also discussed further improvement opportunities for the model.


Bird Distribution Modeling with Satbird
09/10/2024

This episode features an interview with Mélisande Teng, a PhD candidate at Université de Montréal. Her research lies in the intersection of remote sensing and computer vision for biodiversity monitoring.


Ant Encounters
08/26/2024

In this interview with author Deborah Gordon, Kyle asks questions about the mechanisms at work in an ant colony and what ants might teach us about how to build artificial intelligence. Ants are surprisingly adaptive creatures whose behavior emerges from their complex interactions. Aspects of network theory and the statistical nature of ant behavior are just some of the interesting details you'll get in this episode.

 


Computing Toolbox
08/19/2024

This season it’s become clear that computing skills are vital for working in the natural sciences. In this episode, we were fortunate to speak with Madlen Wilmes, co-author of the book "Computing Skills for Biologists: A Toolbox". We discussed the book and why it’s a great resource for students and teachers. In addition to the book, Madlen shared her experience and advice on transitioning from academia to an industry career and how data analytic skills transfer to jobs that your professionals might not always consider. Join us and learn more about the book and careers using transferable skil...


Biodiversity Monitoring
08/14/2024

In this episode, we talked shop with Hager Radi about her biodiversity monitoring work. While biodiversity modeling may sound simple, count organisms and mark their location, there is a lot more to it than that! Incomplete and biased data can make estimations hard. There are also many species with very few observations in the wild. Using machine learning and remote sensing data, scientists can build models that predict species distributions with limited data. Listen in and hear about Hager’s work tackling these challenges and the tools she has built.


Hacking the Colony
08/08/2024

Today, Ashay Aswale and Tony Lopez shared their work on swarm robotics and what they have learned from ants. Robotic swarms must solve the same problems that eusocial insects do. What if your pheromone trail goes cold? What if you’re getting bad information from a bad-actor within the swarm? Answering these questions can help tackle serious robotic challenges. For example, a swarm of robots can lose a few members to accidents and malfunctions, but a large robot cannot. Additionally, a swarm could be host to many castes like an ant colony. Specialization with redundancy built in seems like a...


Primate Poses
07/31/2024

During this season we have talked with researchers working to utilize machine learning for behavioral observations. In previous episodes, you have heard about the software people like Richard use, but you haven’t heard much from scientists modifying and using these tools for specific research cases. PhD student, Richard Vogg, is working with multi-camera set-ups to track lemurs and macaques solving puzzle boxes in the wild. His work is part of a larger movement to automate behavioral analyses of video data. Listen in and learn why this tech is useful and why multi-camera setups are a good idea for mo...


Generating 3D Animals with YouDream
07/23/2024

Generative AI can struggle to create realistic animals and 2D representations often have mistakes like extra limbs and tails. If 2D wasn’t hard enough, there are researchers working on generative 3D models. 3D models present an extra challenge because there is paucity of training datasets.In this episode, PhD students Sandeep and Oindrila walked us through their work on creating 3D animals using 2D data. Join us to learn about their pipelines, quality control, tie in with iNaturalist, and how this tech could streamline FX pipelines.


Weird Communication
07/15/2024

Today, we sat down with Dr. Ignacio Escalante Meza to learn about opiliones and treehoppers. Opiliones, known as “daddy long legs” in the US, are understudied arachnids known for their tenacious locomotor behavior, sociality, and chemical communication. Treehoppers communicate through the stems of plants using vibrations. They can signal danger, attract mates, and communicate with their offspring. Join us to learn how researchers turn their vibrations into sound waves and study what they have to say.


Reducing the Impact of Ship Noise on Marine Mammals
07/01/2024

Human shipping operations have increased significantly in the past few decades.  While that means international trade and cheap goods for humans, it also means the ocean has experienced an increase in noise pollution.  This has a measurable negative impact on marine mammals and other aquatic life.  Could mathematics be the solution?  This interview explores how optimization techniques can guide voyage optimization in a way that handles multiple optimization objectives including fuel cost and sound reduction.


Analysis of Unstructured Data
06/28/2024

Robbie Moon from the Georgia Tech Scheller College of Business joins us to discuss the analysis of unstructured data and the application of NLP methodologies towards financial data.


iNaturalist
06/24/2024

Have you ever participated in citizen science? Do you want to? One of the most popular platforms for crowdsourcing biodiversity data is iNaturalist. In addition to being a great science tool, the iNaturalist app can help you identify the organisms you encounter every day. We talked to Executive Director Scott Laurie about how scientists use iNaturalist. We also got to discuss what makes iNaturalist’s AI species recognition so good, and how citizen scientists are constantly providing high-quality training data. Listen in and learn how this fun-to-use tool works, where it's headed, and how you can get involved.