By: Ihor Lukyanenko, Eloy Duran, Sachin Jain, and Naresh Sirvi
The new Microsoft Teams chat and channels experience has become generally available since the beginning of May 2025 (in case you’ve missed it, here’s the link to the announcement). Millions of Teams customers use chats and channels daily and trust us with their mission-critical collaboration workflows. They expect Teams functionalities to always work and the experience to feel snappy and crafted. When building the new Chat and Channel experience, we wanted to further raise the bar set by the launch of new Teams, which was twice as fast and used half the system resources. In this post we share the key highlights of our journey and the improvements we see in some of the key scenarios.
Defining and Measuring Success: A Key to Continuous Improvement
Defining and measuring success is crucial for achieving desired outcomes in any project. Success metrics provide a clear framework for evaluating progress, identifying areas for enhancement, and making informed decisions. In Teams, we follow a rigorous rollout process where product and engineering crews dogfood new changes before gradually making them available to broader audiences. This involves several progressions: from internal teams, to preview customers, and finally to general availability, where each progression forms a natural checkpoint. For the new chat and channel experience we defined clear exit criteria for each checkpoint, based on performance telemetry of critical scenarios (e.g. switching into chats, managing sections, filtering the chat list, etc). Early signals and clear goals allowed us to prioritize improvements in areas at risk of not meeting targets. The next sections dive deeper into a few such areas.
17% faster switches into Chat area
Switching into Chat from other areas in the Teams app, such as Activity, Calendar, etc. is a core scenario in Teams that users perform hundreds of million times per day. After optimizing conversation list performance across the 6-month period we see a decrease of 17% in its overall latency at 95th percentile.
We achieved this by addressing some of the legacy patterns and approaches that did not keep pace with the evolution of our tech stack. GraphQL and React are designed to work seamlessly together, enabling UI and data requirements to be composed in such a way that all data for a given page is available at once, allowing the UI to render in a single pass. However, engineers often brought their REST-based understanding of data fetching and coupled it with specific UI sections. This led to many smaller queries being fired, causing staggered rendering and unnecessary costs on the main thread, as well as additional overhead for each query. GraphQL models often ended up as thin proxies to REST endpoints, while bloated UI components handled all the business logic, transforming service data into UI-ready data directly on the main thread.
To enable the new combined chat and channels experience, we rebuilt the left rail from the ground up. This provided an opportunity to address performance bottlenecks and reduce rendering to a single pass, freeing up the main thread and avoiding waste of CPU cycles. We designed a new GraphQL schema aligned with Teams UI, allowing us to fetch all necessary data in a single request. Additionally, we introduced the Relay GraphQL client, providing a framework with built-in guardrails that enabled product teams to succeed at scale. Our new thin React components, focused purely on rendering and designed for composability, made it easier to iterate quickly as product requirements evolved. We also moved transformation business logic from the UI to GraphQL field resolvers, shifting work off the main thread and reducing unnecessary computations in React components, further streamlining rendering performance. To help engineers across Teams and beyond adopt these best practices, we started publishing them at aka.ms/learn-graphql.
Slow - Waterfalls
Fast – Broad Query
40% faster chat revisits
Another top interaction in Teams is switching back and forth between different chats within the Chat area itself. When a user switches into the chat they’ve recently opened before, it’s considered a revisit. Given how often users do this, we invested time in optimizing the experience to make sure it’s not just performant but also feels responsive and crafted. As a result, chat revisits became up to 40% faster across all percentiles.
One of the biggest opportunities we found was removing nested state updates. Nested updates are the ones that are applied either during render or commit phase. These updates interrupted the render cycle and forced side effects to flush before the render could complete, causing additional renders synchronously. This blocked the main thread and delayed the browser repaint, meaning chat content wasn’t appearing as quickly as it should.
To fix this, we tracked down the source of these updates, removed them, and optimized for a single pass render to paint approach. This ensured React rendered everything in one go before handing it off to the browser, reducing delays and improving responsiveness.
Now that rendering was significantly faster, a new issue emerged: layout shifts. With a clean rendering pipeline, effects were flushed after the paint instead of before, meaning any state updates triggered by these effects could cause UI elements such as pinned messages or banners to shift unexpectedly. We chased down each of these instances. Some were easy fixes, while others required us to rethink how certain components were structured. The key was ensuring the right updates happened at the right time, without relying on effects to fix UI inconsistencies.
Key Learnings
- Avoid nested updates. Nested state updates could block main thread for longer. Avoid them specially during big render cycles.
- Rendering shouldn’t depend on effects. If a UI update requires an effect to fix it after rendering, there is likely a deeper design flaw.
- Avoid syncing props to state via effects. If you are copying props into state and updating them later, rethink the approach since it often leads to unnecessary re-renders.
State-of-the-art regression prevention
Building a performant feature is only the first step. Next, and arguably a harder one, is to ensure that it stays that way over time. To accomplish this, we use a multi-level defense system consisting of performance gates, telemetry monitoring and alerting, as well as regular profiling sessions.
Gates
We introduced performance gates as part of build validation to detect potential latency, responsiveness and memory utilization issues in core scenarios before every code check-in. For each pull request the gates run multiple iterations of the scenario and verify that performance values do not degrade compared to an established baseline. On top of that, we instrumented our gates to verify additional metrics, like the number of executed queries, total render updates and nested render updates by running React profiling build. Gates help us identify and flag pull requests with performance issues, as well as prevent such changes from merging until the issues are addressed. But even more importantly they help us learn code patterns that can be taxing on performance. One such example was setting up relay live resolvers for selecting a conversation in chat or channel list. We found that live resolver subscription was triggering a React component update for each conversation item. Gates flagged this and we then optimized the code to ensure updates were limited only to current and newly selected conversations.
Telemetry monitoring and alerting
Our second line of defense is monitoring live telemetry as new changes roll out. Any increase in latency beyond a given threshold triggers an alert to the owning team and is immediately investigated by the on-call crews. On top of that new builds and feature flags go through a 50/50 experiment before they become generally available to ensure that core metrics within the treatment group do not regress compared to the control group.
Profiling sessions
The third fundamental pillar of our framework has been regular profiling sessions. Our team looked at key user scenarios to uncover new opportunities for optimization. These sessions were focused on detailed analysis of web, React and native ETW traces. Often, we looked at traces from low-end devices, where perf opportunities tended to be even more prominent. Through this process we discovered multiple areas of improvement, both within and outside the new conversation list. One such example was the costly object spread operator (“…” in JavaScript). We found that unpacking large objects significantly impacted performance, so we removed those patterns to enhance responsiveness.
Final Thoughts
As the new chat and channel experience rolls out, it enables Teams users to stay on top of what matters and organize the way they work. In this new chapter of effective collaboration, performance will remain paramount. Our rigorous multi-faceted approach will allow us to keep raising the bar and delight users with stellar fundamentals. We have seen firsthand the difference that building upon the appropriate patterns can make. We will further drive innovation and excellence by fostering a culture that encourages questioning the status quo and making long-term design decisions with performance as a key consideration. Last, but not least, we will keep sharing our learnings within Teams and beyond, as well as apply them in our future initiatives.
Updated May 28, 2025
Version 3.0Ihor_Lukyanenko
Microsoft
Joined May 27, 2025
Microsoft Teams Blog
Welcome to the Microsoft Teams Blog! Learn best practices, news, and trends directly from the team behind Microsoft Teams.