Loading…
11-12, August 2026
Seoul, South Korea
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit Korea 2026 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Korea Standard Time (KST), UTC +9. To see the schedule in your preferred timezone, please select from the drop-down menu to the right.
Wednesday August 12, 2026 16:35 - 17:05 KST
Building AI agents is easier than ever with open source tools, but ensuring their reliability in production remains a major challenge. Unlike traditional software, AI agents are non-deterministic, making simple pass/fail testing insufficient.

This talk introduces a practical approach to evaluation and observability for AI agents, combining open source tools such as TruLens with agent architectures inspired by AgentGPT.

We will demonstrate how to instrument agent workflows, capture execution traces, and implement evaluation metrics such as faithfulness, tool selection accuracy, and answer relevancy. Attendees will also see how to visualize agent behavior and identify failure points across retrieval, reasoning, and generation layers using a lightweight dashboard.

Finally, we show how to build a feedback loop to iteratively improve agent performance, and share a reference implementation (GitHub) that can be reused with different agent frameworks.
Speakers
avatar for Sho Tanaka

Sho Tanaka

Lead Developer Advocate, Snowflake
A Lead Developer Advocate at Snowflake, focused on AI/ML and data engineering. He previously worked at Google (gTech) delivering ML/Data solutions across Japan, APAC and global. He is a Google Developer Expert (AI/ML) and a co-founder of the MLOps community in Japan, where he has... Read More →
Wednesday August 12, 2026 16:35 - 17:05 KST
Rose

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link