May 6, 2026

How we replaced Ingress-NGINX at Stack Overflow

With the retirement of Ingress-NGINX, we faced the challenge of finding an alternative traffic routing solution for our Kubernetes infrastructure.

Credit: Alexandra Francis

The announcement back in November 2025 regarding the impending retirement of Ingress-NGINX took us by surprise. This tool had been pivotal for routing traffic in our Kubernetes environment ever since we adopted it. We considered transitioning to the Gateway API as it promised enhanced features, but with Ingress-NGINX performing adequately, we were hesitant to shift focus. However, the retirement ultimatum forced our hand, compelling us to integrate a new traffic routing solution into our immediate development agenda.

Faced with an overwhelming set of options, we were tasked with refining our choices before diving into implementation and testing. We aimed to leverage the Gateway API for its new capabilities and finer role separation while remaining flexible enough to consider another Ingress controller if needed. Time was of the essence, so we balanced thoroughness with urgency, ultimately establishing criteria that whittled our focus down to three Gateway implementations and two Ingress controllers.

NGINX Gateway fabric
Traefik
Istio

F5 NGINX ingress
Traefik

The implementation had to be a fully-compliant solution outlined in the list of approved implementations, ensuring a reliable foundation for our evaluations. Operating across GCP and Azure allowed us to further exclude cloud-specific solutions. Examination of the v1.4 feature matrix and third-party benchmarks also informed our tightened focus on the implementations above.

Regrettably, HAProxy didn't make the cut for our experiments since it was deemed outdated, despite its reliable history within our data center before we moved to GKE. At the time of drafting this, it has since transitioned to a fully conformant status.

In terms of fallback Ingress solutions, Traefik briefly appealed due to its compatibility with NGINX annotations. Ultimately, we discovered that many of our existing annotations weren't supported, diminishing its viability. F5 NGINX ingress initially appeared to be a dependable option but required custom resources for advanced routing—an incompatibility that complicated integration with other controller types.

Therefore, we ruled out switching to another Ingress solution early on.

To sculpt our testing scenarios, we exported all ingress objects from our production clusters into YAML files, then utilized Claude to categorize them based on different use case types. The majority of our routing was fairly straightforward, which allowed us to narrow our testing to approximately six distinct scenarios along with two scalability benchmarks.

Our testing infrastructure primarily utilized HTTPBin as a backend. It serves as an exceptional resource for anything HTTP-related, enabling in-depth analysis of request and response headers. For instance, we addressed a case where it was crucial to dynamically adjust host headers for specific traffic using HTTPBin’s /headers endpoint, which returns the request headers as a JSON response. This capability allowed us to establish test cases that validated header modifications effectively.

A second backend component was a simple Go web server deployed at perf., designed for rapid response to high request volumes. We configured it to assess gateway performance under heavy load conditions, simulating increased latency to examine how well each implementation handled surges in connections and requests. Although HTTPBin had a similar endpoint, skepticism about its performance under stress led us to use the Go server instead for our intensive tests. The architecture of our test environment was accordingly structured.

A detailed flowchart illustrating the architecture of external traffic routing through various gateways, ingress controllers, and backend services.

- External Traffic: Client requests are routed through gateways such as Istio Gateway, Traefik Gateway, and NGINX Gateway Fabric, each handling HTTP (port 80) and HTTPS (port 443) traffic with 301 redirects.
- Ingress Controllers: Includes ingress-nginx, traefik-ingress, and f5-nginx-ingress, each associated with specific domains.
- Routing Rules: Traffic is directed based on paths such as /, /perf, and /{random_id}.
- Backend Services: Requests are routed to services like ingresstest-httpbin (port 80), perf-test (port 8080), and path-test (port 8080).
- Deployments: Backend services are deployed with specific images, such as kennethreitz/httpbin:latest and go-server.

The diagram uses color-coded boxes and arrows to represent the flow of traffic and connections between components.

Setting up the three Gateway API implementations was fairly straightforward, though I did encounter a minor frustration with Traefik’s requirement for an “entrypoint,” which sets up a TCP listener. Neglecting this aspect generates errors during gateway creation, somewhat undermining the abstraction principle.

All tested implementations competently managed our use cases. Among the features of the Gateway API, Istio outperformed, while Traefik fell short. We quickly realized that some Gateway API functionalities, impressive in theory, lacked the necessary depth for our practical applications. For example, the header modification feature in HTTPRoute only supports static values, which compelled us to resort to custom extension points for dynamic requirements. While all implementations proved sufficiently flexible, the complexity of Istio's syntactic requirements for filters was noticeably higher compared to Traefik and NGINX.

In some scenarios, unique behavior across implementations necessitated adaptations in our applications. Currently, we employ ngx_http_auth_request_module for forwarding to an authentication service, but Istio's external authorization showcased distinct behavior that could complicate integration. These integration challenges were significant hurdles that slowed the migration process.

I’ve always found performance analysis to be an area where it’s easy to get lost in details. With limited time and resources here, we aimed for simplicity and practicality. In larger scales, you might find significant disparities emerge among these implementations, but our immediate goal was to assess their capability to meet current scalability demands.

Our scalability testing focused primarily on two factors: ensuring each gateway could manage multiples of our daily traffic and evaluating performance under customer-specific ingress configurations. We set an ambitious target of 10,000 requests per second (RPS) to gauge overhead capacity beyond typical operational levels.

Our testing environment featured four gateway replicas, each on dedicated compute nodes labeled as e2-standard-4 in GCP. The test client ran K6 on an Azure instance optimized for high connection volumes, specifically Standard_DC8as_cc_v5.

The latency simulations included 0, 150, and 350 ms scenarios, with all implementations performing admirably under these conditions. The results across the board were fairly consistent. Here’s how they stacked up during the 150ms latency test:

http_req_duration..............: avg=188.27ms min=176ms

med=180.22ms max=820.56ms p(90)=195.11ms p(95)=219.31ms

http_req_duration..............: avg=205.34ms min=176.15ms

med=183.13ms max=1.83s p(90)=244.14ms p(95)=297.59ms

http_req_duration..............: avg=186.73ms min=176.03ms

med=180.52ms max=3.48s p(90)=194.34ms p(95)=216.32ms

Initially aimed at creating 5000 HTTPRoutes successfully with each implementation, we later discovered practical limitations well below that number. I referenced insights from third-party benchmarks that highlighted issues with Traefik and NGINX regarding route handling. We replicated a test whereby we generated 5000 HTTPRoutes, each linked to one path rule, and conducted concurrent requests to examine route accuracy.

While all implementations managed the initial route creation, we observed significant delays with Traefik, which eventually timed out, highlighting some inefficiencies during the process.

=== RUN TestRoutedPaths

=== RUN TestRoutedPaths/gw-nginx

gateway_test.go:443: 5000/5000 paths pending, retrying in 5s

gateway_test.go:443: 105/5000 paths pending, retrying in 5s

gateway_test.go:443: 105/5000 paths pending, retrying in 5s

gateway_test.go:443: 87/5000 paths pending, retrying in 5s

gateway_test.go:335: applied 5000 HTTPRoutes

gateway_test.go:443: 3/5000 paths pending, retrying in 5s

gateway_test.go:431: all 5000 routes converged in 42.047s

=== RUN TestRoutedPaths/gw-istio

gateway_test.go:443: 5000/5000 paths pending, retrying in 5s

gateway_test.go:443: 1167/5000 paths pending, retrying in 5s

gateway_test.go:443: 93/5000 paths pending, retrying in 5s

gateway_test.go:443: 93/5000 paths pending, retrying in 5s

gateway_test.go:443: 32/5000 paths pending, retrying in 5s

gateway_test.go:443: 32/5000 paths pending, retrying in 5s

gateway_test.go:335: applied 5000 HTTPRoutes

gateway_test.go:431: all 5000 routes converged in 41.981s

=== RUN TestRoutedPaths/gw-traefik

gateway_test.go:443: 5000/5000 paths pending, retrying in 5s

gateway_test.go:443: 4939/5000 paths pending, retrying in 5s

gateway_test.go:443: 4921/5000 paths pending, retrying in 5s

gateway_test.go:335: applied 5000 HTTPRoutes

gateway_test.go:443: 4871/5000 paths pending, retrying in 5s

gateway_test.go:443: 4856/5000 paths pending, retrying in 5s

gateway_test.go:443: 4833/5000 paths pending, retrying in 5s

gateway_test.go:443: 4827/5000 paths pending, retrying in 5s

gateway_test.go:443: 4823/5000 paths pending, retrying in 5s

gateway_test.go:443: 4820/5000 paths pending, retrying in 5s

gateway_test.go:443: 4816/5000 paths pending, retrying in 5s

gateway_test.go:443: 4811/5000 paths pending, retrying in 5s

...

--- FAIL: TestRoutedPaths (441.53s)

--- PASS: TestRoutedPaths/gw-nginx (56.68s)

--- PASS: TestRoutedPaths/gw-istio (79.96s)

--- FAIL: TestRoutedPaths/gw-traefik (304.88s)

--- FAIL: TestRoutedPaths (410.94s)

--- PASS: TestRoutedPaths/gw-nginx (60.82s)

--- PASS: TestRoutedPaths/gw-istio (46.98s)

--- FAIL: TestRoutedPaths/gw-traefik (303.13s)

Once we loaded 5000 routes, we realized we had set an overly ambitious target. When we re-ran the K6 benchmark aiming for 10k RPS through the gateway, it resulted in failures. High latency and numbers of concurrent requests were problematic. We settled on a more reasonable target of 1,000 routes for functional performance validation and re-ran the tests with successful outcomes.

Testing updates to routes while traffic was flowing proved revealing. Istio and Traefik managed the task without significant drawbacks, whereas NGINX experienced latency issues whenever an HTTPRoute was modified, especially with a configuration of 1,000 routes. As illustrated in our analysis, this led to notable response time increases during updates.

A dashboard from k6, a performance testing tool, displaying four graphs:

1. Virtual Users: A line chart showing the number of virtual users over time, peaking at around 3,500 users between 17:45 and 17:46.
2. HTTP Request Duration: A line chart showing the duration of HTTP requests in seconds, with noticeable spikes around 17:45.
3. Requests Per Second: A line chart showing the number of requests per second, peaking at around 10,000 requests between 17:45 and 17:46.
4. Response Timings (95th Percentile): A bar chart showing response times in seconds, with significant spikes around 17:45.

The graphs provide insights into system performance during a load test.

After thoroughly evaluating all three implementations and their performance characteristics, we opted for Istio as our solution. Its reliability and consistent performance throughout our tests stood out as key advantages. While all options had their merits, Istio’s advanced feature set intrigued us, suggesting potential for future-proofing our service.

The migration will commence in the upcoming weeks. I’ll be documenting any notable findings and challenges along the way in a follow-up article.

Transitioning from Ingress-NGINX at Stack Overflow

How we replaced Ingress-NGINX at Stack Overflow

Crafting a transition strategy

Gateway API Candidates

Ingress Options

Analyzing usage patterns

Constructing a testing environment

Key findings from testing

Performance benchmarks

Initial RPS benchmarks

Traefik Performance

NGINX Performance

Istio Performance

Route Creation Benchmark

Traffic management during updates

NGINX latency during 10k RPS updates with 150ms simulated latency

Final Decision

Related Articles

Enhancing User Interaction through Voice Technology

Advancing Sustainable Web Design: A Focus on Practical Solutions

Enhancing Safety Through Strategic Design