Gov R Conference
As the Government & Public Sector R Conference celebrates its 7th year, we are excited to bring together top-tier data scientists once again, fostering the exchange of groundbreaking research and presentations both in Washington D.C. and virtually worldwide on October 29th & 30th! The talks will highlight work in government, defense, the public sector and non-governmental organizations.
Workshops: October 28 | Location: Georgetown University
Conference: October 29-30 | Location: Georgetown University
Workshops: October 28
Location: Georgetown University
Conference: October 29-29
Location: Georgetown University
Monday, Oct 28
08:00 AM - 08:50 AM
Registration & Breakfast
08:50 AM - 05:00 PM
Workshop: Better Development Practices with Large Language Models (LLMs)
Abigail Haddad & Benjy Braun
More details
In recent years, data scientists have increasingly adopted best practices from software engineering to improve code quality and project management. These practices are ideal candidates for leveraging Large Language Models (LLMs), as they are well-documented online and often involve tasks performed infrequently enough that memorization is impractical. This workshop will guide you through key software development practices tailored for data science. Participants will learn how to use LLMs to enhance their documentation, version control, and other essential tasks. The goal is to produce code that's easier to run, build upon, and understand, ultimately leading to more efficient and reproducible data science projects.
Workshop Highlights:
- Writing Cleaner, More Readable Code: Learn techniques to improve code readability and maintainability, with LLMs assisting in generating clearer syntax and structure.
- Improving Documentation: Discover how LLMs can help create comprehensive and understandable documentation, making your projects easier to use and collaborate on.
- Using Git for Version Control: Gain proficiency in using git for version control, with LLMs offering support in managing branches, resolving conflicts, and maintaining a clean commit history.
- Docker/Virtual Environments: Understand the benefits of containerization and virtual environments in development, and how LLMs can assist in setting up and managing these environments
- Debugging and Error Handling: Learn effective debugging techniques and use LLMs to interpret error messages and suggest fixes Participants will engage in practical examples using either Python or R, exploring how LLMs can be integrated into their development processes. While LLMs do not produce perfect code instantly, they are invaluable for iterative development, particularly in data pipelines and analyses. We will practice effective prompting strategies to guide LLMs towards better solutions and explore their ability to interpret error messages and suggest fixes in data science contexts.
- Comfortable writing a function in either Python or R
- Laptop with Python or R and git installed
By the end of this workshop, participants will have a toolkit of best practices and the skills to utilize LLMs for enhancing their development workflows, leading to more efficient and error-resistant coding practices. This workshop is ideal for developers, data scientists, and analysts looking to integrate advanced AI tools into their everyday coding routines.
(In-Person & Virtual Ticket Options Available)
08:50 AM - 05:00 PM
Workshop: Dashboards and CRUD Apps: Managing Data For Your Organization
Maxine Drake
More details
This class focuses on working with your organization’s data from data collection to data management to data visualization. We will learn how to build a dashboard with Shiny, including dynamic calendars perfect for large-scale event tracking. We will also build a CRUD (create, read, update, delete) application that allows users to manage data themselves. In addition to these technical skills, we will cover concepts, such as multi-tiered architectures, modularizing code, clear data visualizations, and managing user permissions in your Shiny apps.
(In-Person & Virtual Ticket Options Available)
Tuesday, Oct 29
08:00 AM - 08:50 AM
Registration & Breakfast
08:50 AM - 09:00 AM
Opening Remarks
09:00 AM - 09:20 AM
Incorporating Local LLMs into to your Workflow
David Meza
Head of Analytics – Human Capital, Branch Chief People Analytics @ NASA
More details
In this talk we will walk through how to add local LLMs into your workflow. We will demonstrate how to install local LLMs on your laptop and using Positron, Posit’s new IDE, add an extension to help you in your developments. Then we will create chatbots in Shiny(R) and Streamlit(python) to ask questions of your data. -
09:25 AM - 09:45 AM
Document Detective Work: Harnessing NLP in R to Create a Concept of Operations for a Large Organization
Selen Stromgren, Danielle Larese & Evgeny Kiselev
U.S. Food and Drug Administration
09:50 AM - 10:10 AM
Finding Your Next Federal Data Job
Abigail Haddad
Machine learning engineer -- AI Corps @ Department of Homeland Security
More details
Hiring data scientists and other data/AI workers is a priority for the federal government, but that doesn't mean we make the application process easy. I'm going to talk about how to search for data jobs and write a resume that gives the hiring folks the information they need to accurately assess you. I'll also talk about the new DHS AI Corps, why I'm there, what's worked about our hiring process – and what I think you should be doing if you're on the hiring side in order to attract and hire great candidates. -
10:10 AM - 10:40 AM
Break & Networking
10:40 AM - 11:00 AM
What's Your Vector, Victor?: Navigating Your Way through the FAA Order JO 7110.65 with RAG (Because GPS Doesn't Work Here)
Mary Gibbs
Senior Applied Scientist @ Relativity
More details
The FAA Order JO 7110.65 is a key document for air traffic control (ATC) operations in the U.S., outlining the phraseology and procedures necessary for maintaining the efficiency and safety of the National Airspace System (NAS). With over 700 pages and updates about every six months, navigating this document can be difficult—something I’ve experienced firsthand. Large language models (LLMs) like GPT-4o often struggle to provide accurate or up-to-date responses when interpreting its content. This highlights the advantage of Retrieval-Augmented Generation (RAG), which combines document retrieval with LLMs to deliver responses that are both contextually relevant and anchored in the latest version of the document. In this talk, I’ll demonstrate why LLMs on their own fall short, offer a high-level overview of RAG, and guide you through building a RAG system using the FAA Order JO 7110.65 that incorporates various modalities, including images, tables, and text. Also, I’ll share valuable insights from working with a subject matter expert, emphasizing the critical role of human expertise in evaluating and enhancing RAG systems. -
11:05 AM - 11:25 AM
Mapping Ever Larger Data with PostGIS, DuckDB, GeoArrow and
Jared P. Lander
Chief Data Scientist @ Lander Analytics
More details
The volume of spatial data available to analyze is getting larger and larger every year. Fortunately, the tools used to analyze these data are improving at a faster pace. During this talk we will look at four key aspects of the geospatial pipeline. We start with storing the data efficiently using Postgres with the PostGIS and TimeScaleDB extensions installed for smart partitioning. Then we perform various spatial queries using the DuckDB query engine while the data are still in Postgres. After that we use DuckDB to quickly extract the data from Postgres into GeoArrow to enable columnar operations. Finally, we visualize large scale data with the high performance library, including filtering and aggregating data on the fly with Arquero. All those steps together make for a high performance geo workflow on large data. -
11:30 AM - 11:50 AM
Wrangling Data with DuckDB
Will Angel
Data Analytics Capability Lead @ Excella
More details
Learn how to accelerate data processing in your R code with DuckDB, a fast, open source, in-process analytical database! This talk will provide an overview of DuckDB and Duckplyr, explore when and how you can speed up your data processing with DuckDB, and benchmark the performance improvements you should expect compared to other popular data processing methods in R! This talk will also briefly explore the 'shrinking size' of big data and make the argument that you may not need to adopt distributed processing technologies to scale your data. -
11:50 AM - 01:00 PM
Lunch & Networking
01:00 PM - 01:20 PM
01:25 PM - 01:45 PM
R and Python - A Love Story
Marck Vaisman
Sr. Cloud Solutions Architect @ Microsoft
More details
In my life as a data professional, I started with and mastered R (I even ran R on Hadoop back in 2010-2012 timeframe). Over time, especially at work, I have been using Python more and more and really trying to understand its mental model and best programming practices. There are many things that drive me bananas. This talk is NOT about making an argument for which one is better, or trash talking about either (well, maybe a little for fun's sake). It's about the gripes I've had when trying to do tasks in Python that are, in my opinion, much easier to do in R, and how to effectively love both and use them together. -
01:45 PM - 02:15 PM
Break & Networking
02:15 PM - 02:35 PM
02:40 PM - 03:00 PM
What is the Best Data Format for Your Shiny Project?
Richard Schwinn
Financial Analyst @ U.S. Securities and Exchange Commission
More details
What is the best data file format for your Shiny project? I compare the performances of several popular file formats and explore their pros and cons. Choosing the optimal file format is more complex than simply prioritizing speed or file size. You will learn how additional factors, such as system architecture, CPU availability, and more, can impact performance. -
03:05 PM - 03:25 PM
03:25 PM - 03:55 PM
Break & Networking
03:55 PM - 04:15 PM
Serving Your Own Local LLM for Internal, Secure GenAI
Travis Knoche
Senior Data Scientist @ Lander Analytics
More details
You’re likely hearing a lot about generative AI and large language models (LLMs) and how they can assist in day-to-day data science and analysis, but have you considered running one that is free, self-hosted, and customizeable? In this talk, you’ll see how you can serve your own local LLM for internal GenAI use, leveraging Ollama alongside R packages like chattr and httr2. We’ll walk through the process of deploying an LLM on your own infrastructure, giving you control over data privacy and security. By the end, you’ll have a clear understanding of how to set up and interact with an LLM using these tools, and how your organization can benefit from internal AI solutions without relying on external cloud services. -
04:20 PM - 04:40 PM
Everything You Never Wanted to Know About Auth
Alex Gold
Director of Solutions Engineering @ Posit
More details
You use auth constantly -- to log into your computer, access Instagram, and pull data into R. But unless you've been forced to, you've probably never thought about how that auth works or the concerns your organization might have about auth. In this talk, you'll learn about how different auth technologies work and the ways your organization might manage auth. -
04:40 PM - 04:50 PM
Closing Remarks
05:00 PM - 07:00 PM
Happy Hour at Clubhouse
Hosted by Data Science DC
More details
Come socialize and network with fellow data scientists, analysts, software engineers, and other data enthusiasts. Chat about your latest project, your job search, or what you're learning about, Clubhouse: Beer, Billiards & Cocktails is located at 1070 Wisconsin Ave NW, Washington, DC, 20007
Wednesday, Oct 30
09:00 AM - 09:50 AM
Registration & Breakfast
09:50 AM - 10:00 AM
Opening Remarks
10:00 AM - 10:20 AM
Using Visual Perception to Find Patterns in Data and Drive Insight
Alex Gurvich
Senior Graphics Designer & Data Visualization Specialist @ NASA's Science Visualization Studio
More details
At the Data Visualization Society’s 2024 Outlier conference, communications and cognitive science researcher Dr. Steven Franconeri gave a keynote presentation on how the human brain is a super-charged pattern recognition machine. He demonstrated through interactive exercises how using principles of visual perception can reveal patterns in data almost instantly, while ignoring them can make them almost impossible to find. But my story begins earlier— almost 10 years ago, when I first saw a version of that presentation, which changed the way I think about data and the course of my life, inspiring me to become a data visualizer. In this talk, I'll summarize Dr. Franconeri's work and guide you through what I’ve learned about the core principles of using visual perception in data visualization. -
10:25 AM - 10:45 AM
Quarto, AI, and the Art of Getting Your Life Back
Tyler Morgan-Wall
Research Staff @ Institute for Defense Analyses
More details
Tired of endless server issues and maintenance headaches? Want to reclaim your time for coding, writing, and creating? Join me as I share my journey of switching from the server-based headaches of Wordpress to Quarto, with a little help from AI. In this talk, I’ll describe the simple trick I used to convert an existing Wordpress blog—complete with custom scripts, styles, and beautiful 3D dataviz content—into a slick Quarto site. I'll then demonstrate some lesser-known features of Quarto to automate deploying a website entirely from a Quarto project file. Finally, I’ll show you how I used AI to customize and style my new Quarto site, and provide several useful strategies to employ if you decide to get some help from AI on your own Quarto journey. -
10:45 AM - 11:15 AM
Break & Networking
11:15 AM - 11:35 AM
App-solutely Fabulous: A Data Scientist's Guide to Choosing Python Web Tools Wisely
Alan Feder
Magnifi @ Staff LLM Data Scientist
More details
Shiny has long been the go-to for data scientists creating web applications in R, but Python offers a plethora of alternatives that can be overwhelming to choose from. This talk will demonstrate the creation of a single application using three popular Python tools: PyShiny, Streamlit, and Gradio. We'll explore the strengths and weaknesses of each framework, through a live demonstration of each. This comparison will provide you with a practical roadmap for selecting the ideal Python web app framework to suit your specific data science projects. -
11:40 AM - 12:00 PM
The Role of R in Census Bureau Data Reporting
Jessica Klein
Data Scientist @ United States Census Bureau
More details
The Census Bureau collects vast amounts of data on America's population, places, and economy. In this presentation, we will highlight how R has been adopted by staff to enhance data analysis processes and manage these large datasets. We'll discuss how R supports data work across the Bureau, the growing R user community, and ongoing training efforts to develop staff skills. We'll also showcase several innovative use cases of R from different departments. Finally, we will look ahead to the future of R at the Census Bureau, exploring upcoming initiatives to further integrate R into our data science operations. -
12:05 PM - 12:25 PM
An Introduction to Estimation and Comparison of Discrete Variate Time Processes
Rachel Gidaro
Assistant Professor @ United States Military Academy
More details
This talk delves into the parameter estimation of discrete time series, focusing on the comparison between integer-autoregressive processes and traditional Gaussian autoregressive models. We will explore the theoretical underpinnings of these discrete models, emphasizing their relevance in various applications. This talk will largely be an overview of the relevant background in time series analysis, setting the stage for the integer-autoregressive processes. -
12:25 PM - 01:35 PM
Lunch & Networking
01:35 PM - 01:55 PM
Who Are Your Consumers? Understanding Selection Bias Into Government Programs
Travis Riddle
Senior Research Fellow @ Consumer Financial Protection Bureau
More details
Many government programs and initiatives serve an unknown subset of the population. There are many reasons for this, including because individuals need to take action to enroll, or because an unknown portion of the population is eligible. The lack of insight into how people select into or out of government programs leads to difficulties in targeting those individuals who could benefit from them and complications in using data from government programs to generalize about broader trends. In this talk, I will describe a project undertaken at the CFPB whose goal is to understand selection into our complaint program. We use a case-control survey methodology to understand this selection bias along several dimensions. I will describe our findings and provide some guidance for those looking for cost-effective ways to understand how people select into their programs. -
02:00 PM - 02:20 PM
Detecting Automotive Quality & Safety Issues from Consumer Complaints
Tommy Jones
CEO @ Foundation
More details
70% of the cars on the road are between 6 and 14 years old. Most aren't equipped with sensors allowing manufacturers, regulators, and the public to see when a vehicle platform develops a quality issue, but owners do complain about their cars online. Tommy will tell you about the process of detecting automotive quality issues from consumer complaints, demonstrate the application Foundation is building to unlock these latent issues for automotive companies, regulators, and the public, and showcase the tech stack they used to pull it all together. -
02:25 PM - 02:45 PM
SEC Board Diversity Requirements: Are NASDAQ Companies Disclosing Their Data?
Brittany Long & Princess Onyiri
Bloomberg Law
More details
In 2021, the US Securities and Exchange Commission (SEC) approved a rule which requires companies listed on Nasdaq to have or publicly disclose why they don’t have at least two diverse board members. However, there is a lack of uniformity in the way companies file the Board Diversity Matrix requirement in their proxy statements or 10k filings. This presents a series of challenges when trying to pull and analyze the data. However, with a little finessing, Bloomberg Law will discuss methodologies that uncovered certain data trends in corporate board diversity (or lack thereof) despite the SEC’s requirement. -
02:45 PM - 03:15 PM
Break & Networking
03:15 PM - 03:35 PM
Defending Your Data: When Best Practices Don’t Apply
Frederick Thayer
Data Scientist @ NAVAIR Proposal Analysis Team
More details
Data analytics best practices are important guidelines to follow whenever possible, but what do you when you cannot apply them? This talk will cover how to determine when they don’t apply, what to do in that case, and how to explain and defend your analysis to stakeholders. -
03:40 PM - 04:00 PM
Making Things Difficult: The Role of Disfluency in Science Communication
Laura Gast
Data Science & Analytics Manager @ USO
More details
This talk explores how disfluency, both in font choice and in speech, impacts memory retention, comprehension, and decision-making. We'll examine research showing how introducing slight difficulty in reading or speech can improve recall and encourage deeper cognitive processing. By understanding these effects, you can make more informed choices when crafting both written and spoken messages to maximize audience engagement and understanding. -
04:00 PM - 04:10 PM
Closing Remarks
Better Development Practices with Large Language Models (LLMs)
Hosted by Abigail Haddad & Benjy Braun
Monday, Oct 28 | 9:00am - 5:00pm
In recent years, data scientists have increasingly adopted best practices from software engineering to improve code quality and project management. These practices are ideal candidates for leveraging Large Language Models (LLMs), as they are well-documented online and often involve tasks performed infrequently enough that memorization is impractical. This workshop will guide you through key software development practices tailored for data science. Participants will learn how to use LLMs to enhance their documentation, version control, and other essential tasks. The goal is to produce code that's easier to run, build upon, and understand, ultimately leading to more efficient and reproducible data science projects.
Workshop Highlights:
- Writing Cleaner, More Readable Code: Learn techniques to improve code readability and maintainability, with LLMs assisting in generating clearer syntax and structure.
- Improving Documentation: Discover how LLMs can help create comprehensive and understandable documentation, making your projects easier to use and collaborate on.
- Using Git for Version Control: Gain proficiency in using git for version control, with LLMs offering support in managing branches, resolving conflicts, and maintaining a clean commit history.
- Docker/Virtual Environments: Understand the benefits of containerization and virtual environments in development, and how LLMs can assist in setting up and managing these environments
- Debugging and Error Handling: Learn effective debugging techniques and use LLMs to interpret error messages and suggest fixes Participants will engage in practical examples using either Python or R, exploring how LLMs can be integrated into their development processes. While LLMs do not produce perfect code instantly, they are invaluable for iterative development, particularly in data pipelines and analyses. We will practice effective prompting strategies to guide LLMs towards better solutions and explore their ability to interpret error messages and suggest fixes in data science contexts.
- Comfortable writing a function in either Python or R
- Laptop with Python or R and git installed
By the end of this workshop, participants will have a toolkit of best practices and the skills to utilize LLMs for enhancing their development workflows, leading to more efficient and error-resistant coding practices. This workshop is ideal for developers, data scientists, and analysts looking to integrate advanced AI tools into their everyday coding routines.
(In-Person & Virtual Ticket Options Available)
Dashboards and CRUD Apps: Managing Data For Your Organization
Hosted by Maxine Drake
Monday, Oct 28 | 9:00am - 5:00pm
This class focuses on working with your organization’s data from data collection to data management to data visualization. We will learn how to build a dashboard with Shiny, including dynamic calendars perfect for large-scale event tracking. We will also build a CRUD (create, read, update, delete) application that allows users to manage data themselves. In addition to these technical skills, we will cover concepts, such as multi-tiered architectures, modularizing code, clear data visualizations, and managing user permissions in your Shiny apps.
(In-Person & Virtual Ticket Options Available)
Tyler Morgan-Wall
Research Staff
Institute for Defense Analyses
Talk: Quarto, AI, and the Art of Getting Your Life Back
Selen Stromgren
Associate Director
U.S. Food and Drug Administration
Talk: Document Detective Work: Harnessing NLP in R to Create a Concept of Operations for a Large Organization (Joint Talk with Danielle & Evgeny)
Alex Gold
Director of Solutions Engineering
Talk: Everything You Never Wanted to Know About Auth
Abigail Haddad
Machine learning engineer -- AI Corps
Department of Homeland Security
Talk: Finding Your Next Federal Data Job
Princess Onyiri
Senior Data Scientist
Bloomberg Law
Talk: SEC Board Diversity Requirements: Are NASDAQ Companies Disclosing Their Data? (Joint Talk with Brittany Long)
David Meza
Head of Analytics – Human Capital, Branch Chief People Analytics
Talk: Incorporating Local LLMs into to your Workflow
Rachel Gidaro
Assistant Professor
United States Military Academy
Talk: An Introduction to Estimation and Comparison of Discrete Variate Time Processes
Jared P. Lander
Chief Data Scientist
Lander Analytics
Talk: Mapping Ever Larger Data with PostGIS, DuckDB, GeoArrow and
Jessica Klein
Data Scientist
United States Census Bureau
Talk: The Role of R in Census Bureau Data Reporting
Frederick Thayer
Data Scientist
NAVAIR Proposal Analysis Team
Talk: Defending Your Data: When Best Practices Don’t Apply
Laura Gast
Data Science & Analytics Manager
Talk: Making Things Difficult: The Role of Disfluency in Science Communication
Brittany Long
Assistant Team Lead, Data & Surveys
Bloomberg Law
Talk: SEC Board Diversity Requirements: Are NASDAQ Companies Disclosing Their Data? (Joint Talk with Princess Onyiri)
Danielle Larese
Chemist/Scientific Coordinator
U.S. Food and Drug Administration
Talk: Document Detective Work: Harnessing NLP in R to Create a Concept of Operations for a Large Organization (Joint Talk with Selen & Evgeny)
Alex Gurvich
Senior Graphics Designer & Data Visualization Specialist
NASA's Science Visualization Studio
Talk: Using Visual Perception to Find Patterns in Data and Drive Insight
Travis Riddle
Senior Research Fellow
Consumer Financial Protection Bureau
Talk: Who Are Your Consumers? Understanding Selection Bias Into Government Programs
Mary Gibbs
Senior Applied Scientist
Talk: What's Your Vector, Victor?: Navigating Your Way through the FAA Order JO 7110.65 with RAG (Because GPS Doesn't Work Here)
Richard Schwinn
Financial Analyst
U.S. Securities and Exchange Commission
Talk: What is the Best Data Format for Your Shiny Project?
Tommy Jones
Talk: Detecting Automotive Quality & Safety Issues from Consumer Complaints
Alan Feder
Staff LLM Data Scientist
Talk: App-solutely Fabulous: A Data Scientist's Guide to Choosing Python Web Tools Wisely
Evgeny Kiselev
Chemist/Scientific Coordinator
U.S. Food and Drug Administration
Talk: Document Detective Work: Harnessing NLP in R to Create a Concept of Operations for a Large Organization (Joint Talk with Selen & Danielle)
Travis Knoche
Senior Data Scientist
Lander Analytics
Talk: Serving Your Own Local LLM for Internal, Secure GenAI
More speakers coming soon!
Speakers are subject to change.
If your organization is interested in being an event sponsor, please contact us at