Road to KubeCon NA 2024: Danielle Tal
January 22, 2025
Road to KubeCon NA 2024: Duffie Cooley
March 13, 2025

Road to KubeCon NA 2024: Alexa Griffith

In this episode of The Landscape, we spoke with Alexa Griffith, a software engineer at Bloomberg and an active contributor to the Envoy project. Alexa shared insights into this exciting CNCF project, which is designed to simplify and optimize communication between microservices and external clients—especially for AI workloads. Envoy is a graduated CNCF project.

Envoy AI Gateway enhances API management with features like traffic routing, authentication, token-based rate limiting, and monitoring, making it a valuable tool for cloud-native architectures. By integrating with service meshes like Istio and offering dynamic configuration, the project is tackling unique challenges posed by large language models (LLMs).

What you will learn in this episode:

  • Unified APIs for AI Workloads: How Envoy AI Gateway simplifies access to LLMs by standardizing APIs and credentials management.
  • Token-Based Rate Limiting: A feature tailored to optimize LLM efficiency and control costs.
  • AI on Kubernetes: Alexa’s recommendations for starting with tools like KServe and Knative to deploy AI workloads.
  • The Role of the Community: How contributors can shape the development of this evolving project.
  • Future of Envoy AI Gateway: Upcoming features and opportunities to get involved in the early stages.

This episode is sponsored by OVHcloud.


Read the transcript

Alexa Griffith:
Hi, my name is Alexa Griffith, and I’m a software engineer at Bloomberg.

Bart:
You just delivered a keynote—what was it about?

Alexa:
The keynote focused on Envoy AI Gateway, a CNCF open-source project. It addresses several unique challenges associated with large language models (LLMs). For instance, LLM providers often have different access routes and credential management methods. Envoy AI Gateway unifies and simplifies these processes, providing a single API for developers, whether they’re working with on-premise LLMs or cloud-hosted ones.

Another standout feature is token-based rate limiting. This is particularly useful for LLMs, helping manage costs and improving efficiency by tuning models and controlling token usage. These capabilities are crucial for making AI workloads more accessible and cost-effective.

Bart:
AI is a big topic right now. For those looking to explore AI on Kubernetes, where should they start?

Alexa:
There are great tools to get started with. For instance, Llama is an accessible way to experiment with LLMs locally. If you’re looking for more advanced setups, CNCF tools like Knative or KServe allow you to deploy models with minimal configuration using YAML. These tools simplify the process of running models locally or in a Kubernetes environment.

Bart:
What other CNCF projects are you watching?

Alexa:
Definitely Envoy, of course. I’m also keeping an eye on KServe and Knative, which are fantastic for deploying and scaling machine learning workloads.

Bart:
What’s next for Envoy AI Gateway?

Alexa:
The project is still in its early stages, so there’s a lot of room for growth. The team is actively seeking input from the community to understand user needs and guide development. It’s a great time to get involved and help shape the project.

Bart:
And what’s next for you?

Alexa:
I’m excited to continue contributing to Envoy AI Gateway and see how it evolves. Maybe by the next conference, we’ll have even more to share!

Bart:
If people want to reach out to you, what’s the best way to do that?

Alexa:
You can find me on Twitter as @LexaL (L-X-L). I also host a podcast called Alexis Info, available on Spotify, Apple Podcasts, and other platforms. For professional connections, feel free to reach out on LinkedIn—just search for Alexa Griffith. I’d love to connect with anyone passionate about this space!

Bart:
Thank you so much, Alexa. It was great speaking with you!

Alexa:
Thank you! Take care.