103   docs.google.com

Refreshing Comments...

This is definitely useful but I'm not sure it is up to date/complete. For example, the spreadsheet says ingress-nginx does not have authentication support, but according to the docs it has support for basic, client cert, external basic, and external oauth. But this info is easy to miss because it is hidden in the "examples" section of the docs.

https://kubernetes.github.io/ingress-nginx/examples/auth/bas...

it also says that nginx ingress controller doesn't have a CRD, as far as I know that's not true.
There are two ingress controllers that use nginx -- ingress-nginx (maintained by the k8s project) and nginx-ingress (maintained by nginx). The one on the chart is ingress-nginx (confusingly called Nginx Ingress), and the one with CRD's (VirtualServer, etc.) is nginx's nginx-ingress.

Naming. Hard problem in computer science.

This is a nice summary. It would be nice if it included deployment model like daemonset vs replicaset.

With the plethora of options, what missing for me is what ones perform well under heavy loads. Its painful to find out after the fact since each ingress controller deploys in differing (and sometimes not compatible) ways.

For example, with nginx-ingress there are gotchas under heavy load. nginx-ingress doesn't support SSL session caching on the upstream (nginx<->your pod). This is a deficiency in the lua-balancer implementation. You can tune keep-alive requests on the upstream, but it isn't always enough. That 50% CPU savings from SSL resumption is costly to lose at times.

This has bitten me when a client side connection burst requires a connection burst in nginx<->service. Upstream services then burn a lot of CPU negotiating SSL, to the detriment of request processing. This then causes more nginx connections to open up due to slower request processing; and might cause healthchecks to fail. There just aren't enough tuning parameters to control how hard nginx hits your upstream PODs.

This is excellent. One thing is missing - support for Proxy Protocol.

It is the only real, standards compliant way in which you can preserve client information (IP address, etc) while its moving inside a kubernetes cluster. Not all ingresses have support for injecting it. Most ingresses can read it (assuming a cloud based load balancer has inserted it already).

We had moved to haproxy for this reason.

Huh? I don't think i get this.

Is Proxy Protocol different to making sure to add client information as an extra header?

At least thats what we normally do, we make sure that client information like the client ip are added as header filds which can easily be used in monitoring and co.

I think they're referring to this protocol: http://www.haproxy.org/download/1.8/doc/proxy-protocol.txt

It wraps a TCP connection (not just HTTP) and forwards it, and preserves the originating IP information as a small header on the proxied connection.

It's supported beyond HAProxy, too; e.g., AWS ELBs support it. I've used it when forwarding TLS connections, but not wanting to lose the remote IP information. (And not wanting to decrypt on the ELB, which is required to add, e.g., an HTTP Forwarded-For header.)

Your header is your custom way .

If you add your header and the next hop is to apache or some other software (for e.g.), it won't know what you did.

Proxy protocol is a standard. Every software that supports proxy protocol reads and write it in the same way. This is supported outside of software. All clouds (AWS, GCP, Azure ) support it. So does third party services like Cloudflare - https://developers.cloudflare.com/spectrum/proxy-protocol

Different software expects different headers for http, and some software isn’t http based. Proxy protocol addresses these situations.
Ingress in K8S is too complex. We need some sane defaults. It seems like there are more decisions to be made upfront then are really necessary.

We are setting up our cluster and we ended up going with Traefik and I just published an interview with our architect where he explained why he choose Traefik. Excuse the plug, but its here:

https://blog.earthly.dev/building-on-kubernetes-ingress/#kub...

The short version is that he finds it easier to setup than Nginx. I think learning curves are an important metric that must be considered as well.

The things I hate about k8s. * opinionated and promoting opinionated software * CNI thou shalt not NAT * Service mesh NAT * east-west -vs- north-south as an explicit design statement * cloud integration or it's no better than ryo with 'legacy' approaches implemented by competent SA/SE (this is the k8s design goal anyway: mirror what a competent SA could do.) * Go
Ingress Controllers in K8S can and should be simplified. For most cases you'll need a controller, but setting one up can be daunting with the various options and having most of them under constant change with varying documentation.

This is why things like K3s will bundle Traefik with it to save you the pain, but really this should be the standard. It should be swappable (like it is in K3s), but come with something already available.

Updated - Thanks for the reminder - it's ingress controller, not ingress.

I think you're confusing Ingresses and Ingress Controllers? Traefik/HAProxy/Nginx act as Ingress Controllers, not Ingresses. Ingresses are the objects created by users to define an intent of L7 traffic forwarding/ingress. Or maybe I'm misparsing your post.

Anyway - the reason K3s can come with an Ingress Controller OOTB is purely a result of the fact that K8s has made it possible by making the architecture pluggable. Nearly every K8s-a-a-S solution comes with its own built-in flavour of an Ingress Controller (and other controllers, too). K3s is just such an implementation. K8s could come with its own standard implementation of an Ingress Controller, but considering how much the implementation varies between deployments, it makes sense they have opted out of that.

Comparing K8s to K8s distributions (like K3s) is somewhat missing the point of what K8s is - a framework to build your own implementation upon, with whatever fits your deployment.

I get it's pluggable, but it's yet another hurdle to jump to get started on K8s. So maybe dont make it included, but make it much simpler to add it in and use it. They already have one that they support, it should be more plug and play with simple steps to route your app, knowing that this will be the case for most uses.
Most of the time when you 'get started on K8s', you get started by using a cluster that's setup by a provider of some sort - be it a cloud provider, a platform team in your company, or some prebundled bare metal Kubernetes distribution like K3s. In these cases, a you will most likely already be provided with, or at least recommended an Ingress Controller.

Setting up K8s clusters from scratch on bare metal (or even worse, VMs) is generally not something you should be doing, unless you really know what you're doing and have a good reason to. Think: going 'Linux from Scratch' instead of installing Ubuntu. K3s is not an alternative to K8s, it really just is a distribution for bare metal deployments. So is Rancher, OpenShift, etc.

I would argue that an ingress is not a barrier to getting started with k8s (even with kubeadm on bare metal). You don't NEED an L7 router to get started. Much more likely is you need a LoadBalancer. An ingress is only going to help route traffic that is already in your cluster.

If you deploying to almost any cloud provider they have an implementation for you. If you're on bare metal, you have to make that choice yourself.

Even if you're hosting yourself you don't technically need a load balancer, you could use host ports, or node ports and it would work for some definition of work. I have in the past set up an ingress to listen on host ports of every node and it works well, if a little janky.

I'm surprised you're complaining about this which is an easy thing to replace and not your choice of network plugin, which can result in having to reprovision the cluster or similar if you want to replace it with something different.

Is k3s really just a distribution of kubernetes? I thought it was a reimplementation that implements the core API specs?
It's a fork-ish. Mostly the same codebase, some things ripped out, some things simplified, others just rearranged. One of the most notable changes is the API server being made backend-agnostic, so you can run it against anything from a SPOFy sqlite to a normal etcd.

And to me, even if it was a full rewrite: if it quacks like a K8s, walks like a K8s, and passes K8s conformance tests, it's a K8s :). I don't require a distribution to take upstream k8s binaries as is and just orchestrate them with a configuration management system, I think a lot of value comes from slightly moving things around and changing them to fit your vision. Better that than a murder of shell scripts.

> And to me, even if it was a full rewrite: if it quacks like a K8s, walks like a K8s, and passes K8s conformance tests, it's a K8s :)

I get it, but it's often helpful to distinguish between interfaces and implementations. If you want to say that the term "kubernetes" reforms to the API interface such that it can be implemented by any of a number of implementations, that's fine but we should have a separate term for the default implementation.

> I don't require a distribution to take upstream k8s binaries as is and just orchestrate them with a configuration management system, I think a lot of value comes from slightly moving things around and changing them to fit your vision. Better that than a murder of shell scripts.

Distributions are nice because you often don't want to have to build out your own logging, monitoring, cert-management, secret-encryption, ingress controller, load balancer controller, volume controllers, object storage, etc, etc, etc every time you stand up a cluster. Basically for the same reasons we have Linux distributions (most people don't want to have to roll their own Linux from scratch every time), Kubernetes distributions would be nice. None of this requires "a murder of shell scripts" (but I like that collective noun) nor does it prevent you from swapping the distro-standard components for your own.

In general I wish there were more kubernetes “distros” that came pre-installed with logging, monitoring, storage (block and object), ingress controllers, load balancers, cert management, secret encryption, etc. The things that every installation needs, but which everyone has to roll themselves.
HAProxy has a dashboard the last time I checked, which is today.

Thanks for the effort! A very nice overview, which makes choosing between load balancer implementations when looking for a specific feature a lot easier. Somehow tables like these are hard to find when you actually need them, good to know this one exists.

I am seeing more and more users wanting to implement end-to-end connectivity from GW to service mesh. Particularly when it comes to Kong, we have done all the heavy lifting in Kong[1] + Kuma[2] (the latter a CNCF project) to do that.

Typically we want to create a service mesh overlay across our applications and their services - to secure and observe the underlying service traffic - and still expose a subset of those via an API GW (and via an Ingress Controller) at the edge, to either mobile applications or an ecosystem of partners (where a sidecar pattern model is not feasible).

With Kuma and its "gateway" data plane proxy mode, this can be easily achieved via the Kong Ingress Controller, which is mentioned in this spreadsheet.

Disclaimer: I am a maintainer of both Kong and Kuma.

[1] - https://github.com/Kong/kong

[2] - https://github.com/kumahq/kuma

Would you recommend taking this to production as an Istio replacement? What are the downsides?

I couldn't find a good comparison like the one in OP about ingress controllers.

This setup has been running in production in mission critical use-cases already, and unlike Istio, it provides fully automated multi-zone support across both Kubernetes and VM environments on both the CP and DP.

If you have questions, you can always reach out at https://kuma.io/community

Ingress is a big disaster and is probably the first thing people switching to Kubernetes encounter.

The large underlying problem is that the Ingress controller is the place where people need to do a lot of very important things, and the API doesn't specify a compatible way to do those things. Even something as simple as routing ingress:/api/v1/(.*)$ to a backend api-server-v1:/($1) isn't specified. Nginx has its own proprietary way to do it. Traefik has its own proprietary way to do it. Every reverse proxy has a way to do this, because it's a common demand. But to do this in Kubernetes, you will have to hope that there is some magical annotation that does it for you (different between every Ingress controller, so you can never switch), or come up with some workaround.

Composing route tables is another problem (which order do the routing rules get evaulated in), and Ingress again punts. Some controllers pick date-of-admission on the Ingress resource, meaning that you'll never be able to recreate your configuration again. (Do you store resource application date in your gitops repo? Didn't think so.) Some controllers don't even define an order! The API truly fails at even medium complexity operations. (It's good, I guess, for deploying hello-app in minikube. But everything is good at running hello-app on your workstation.)

Then there are deeper features that are simply not implemented, and seriously hurt the ecosystem in general. One big feature that apps need is authentication and authorization handled at the ingress controller level. If that was reliable, then apps wouldn't have to bundle Dex or roll their own non-single-sign-on. Cluster administrators are forced to configure that every time, and users are forced to sign in 100 times a day. But the promise of containerization was that you'd never have to worry about that again -- the environment would provide crucial services like authentication and the developer just had to worry about writing their app to that API. The result, of course, is a lot of horrifying workarounds (pay a billion dollars a month to Auth0 and use oauth-proxy, etc.). (I wrote my own at my last job and boy was it wonderful. I'm verrrrry slowly writing an open-source successor, but I'm not even going to link it because it's in such an early stage. Meanwhile, I continue to suffer from not having this every single day.)

It's not just auth; it's really all cross-cutting concerns. Every ingress controller handles metrics differently. ingress-nginx has some basic prometheus metrics and can start Zipkin traces. Ambassador and Istio can do more (statsd, opencensus, opentracing plugins), but only with their own configuration layer on top of raw Envoy configuration (and you often have to build your own container to get the drivers). The result is that something that's pretty easy to do is nearly impossible for all but the most dedicated users. The promise of containerization basically failed, if you really look hard enough you'll see that you're no better off than nginx sitting in front of your PHP app. At least you can edit nginx.conf in that situation.

My personal opinion is to not use it. I just use an Envoy front proxy and an xDS server that listens to the Kubernetes API server to setup backends (github.com/jrockway/ekglue). Adding the backends to the configuration automatically saves a lot of configuration toil, but I still write the route table manually so it can do exactly what I want. It doesn't have to be this way, but it is. So many people are turned off of Kubernetes because the first thing they have to do is find an Ingress controller. In the best case, they decide they don't need one. In the worst case, they end up locked into a proprietary hell. It makes me very sad.

This is a good read though I don't agree 100% with everything. I also ignore ingress via nodeport and external provisioning and only use istio where there are service dependencies and it makes sense. This approach then becomes cumbersome requiring (as you note) additional middleware and code.
The document has (section 12) "Developer Portal" which is good, but I'd suggest to make this a more prominent item, perhaps even break it up into "Documentation", "Examples", Primary Support Channel (Github/SO), etc.

I recently tried every single K8s IC, one by one, painfully. The biggest challenge was documentation, even something as simple as an example was missing for many of them. They would have examples for one use case, but not per use case. It was incredibly frustrating.

What would it look like to use just normal IP addressing and DNS instead of proxies, NAT and ambiguous rfc1918 addresses? Have a bunch of public API endpoints exposed in DNS and rotate new ones in/out at new names and addresses. Then a fallback proxy for v4 clients switching off the Host header.

Proxies seem to add a lot of complexity and indirection (not to mention inefficiency).

Slow to scale/drain will make your utilization go into toilet since you’ll have to overprovision by a lot. There are many other issues with dns loadbalancing
Scaling up won't be a problem (new DNS names will just work). As for draining, it does slow down the draining some, but how many scenarios really need to retire endpoints at a quicker cadence than DNS can do, you can set the TTL to 10 minutes?
We pull unhealthy VMs and containers from rotation in real time. Serving a bunch of 500 series errors to clients because your application server took a nosedive isn't great even with a 300 second TTL.

We don't use NAT. When people say RFC1918 addresses aren't routable they mean you can't advertise them on the public Internet. You can totally have 10/8 switched and routed internally. Even if you don't want to mess with that, it's entirely possible to set a proxy to proxy from its public IP to another public IP, but why waste the IP space, especially in v4? I can server 80 applications each with half a dozen or more backend systems from about six proxy servers. The proxy isn't running the application and connecting to the RDBMS so each one can handle way, way more traffic than even an efficient application.

Using a proxy that's layer 7 aware also lets you do things like scale different parts of a URI tree individually. If you're not proxying, every copy of app.example.com serves everything under https://app.example.com/ even if it's one of 23 (about the most A records that will fit in UDP DNS) servers doing it. With a proxy, I can decide that https://app.example.com/freestuff/ is too busy and spikes the backends, and that the load of that and the rest of the stuff under / is too hard to scale for properly when taken together. So I just tell my proxies that everything for https://app.example.com/freestuff goes to server set A and the rest of stuff in / goes to server set B. Then the different performance and different demand for the two can be studied, improved, monitored, and reported separately at the VM or container (or even bare metal backend) level.

I can also throw memcached or Redis into the mix and do site-wide limiting on an IP that's scanning or attacking my entire laundry list of applications. Even without those I can rate-limit what each proxy will accept from a single visitor globally or per backend type.

How often does it happen that working containers suddenly fail like that though and it's critical to shave the minutes?

I appreciate proxying does give you slightly better reaction time but this has to be weighed against the costs and outages caused by the complexity in the proxying.

Re you app.example.com scenario: I was not thinking of serving the static assets and frontend code from the individual services, but rather having eg your SPA know about the different endpoints and serving the resources from their own service which would be stable.

Re the memcached angle - the static serving can have caching by whatever mechanism and the API endpoints can also use memcached all they want, with at least as good control granularity as in the proxy case. But if you were thinking about a caching reverse proxy, I think that would be more prone to poisoning the cache with error replies than having the endpoints do their caching individually.

Again, think of all the failure prone machinery you can elide in this scenario.

Reasonable rules for something like HAProxy are not at all prone to errors. Ours aren't even applied by hand. It only has to be made right when the automation module is updated, and that gets tested well before it goes to production.

Varnish rules are also not that difficult, but I said nothing of caching reverse proxy. I mentioned Memcached specifically in the context of federating traffic counts across multiple non-caching proxies.

Whether an application uses Memcached or not is irrelevant to the proxies.

Re the v4 space question, for v4 from the public internet was covered in my original (fallback proxy switched on host header). Internally, using v4 10.x networks for internal systems is fraught with problems as you'll inevitably end up trying to connect other 10.x ambiguously addressed networks, and get bit by the ambiguity in security, monitoring, configuration etc.

End-to-end addrssing with globally unique addresses is just a good idea, it's a big reason the internet model won over competing networking technologies.

It's not inevitable. Our core routers aren't even the same routers as our edge routers. The edge routers won't touch a 10/8 in any direction. The core routers only allow them on internal interfaces.

I'm not sure what you mean by fallback proxies without proxying being involved. Could you elaborate and clarify?

It will not, in fact, just work bc many clients cache old entries for a long time so you won’t be able to solve your load spreading problem quickly at least in general case
In browsers at least this should be a thing of the past - Chrome and Firefox both are documented to internally cache only for 1 minute. You might of course have non browser clients that don't respect DNS TTL which might be a problem if you're trying to introduce this into existing old systems.
And then chrome and ff will query your isp resolver than caches for a day just because - oops
Those ISP resolver caches do respect the DNS TTL per the spec, otherwise all kinds of stuff breaks.

If you're concerned that this would still be an issue you could easily do some measurements. There's no incentive for ISPs to break the this given the "my internet is broken" UX and how tiny DNS traffic is.

I really liked Traefik when I used it last. It seemed straightforward to use in both docker-compose and Kubernetes environments, allowing me to mess around with settings locally before I deployed to the cluster.
I am not an K8S expert, but why is it that (most?) cloud load balancers not act as ingress controllers?
I'd say that's primarily because those load balancers (assuming some like a Network Load Balancer in AWS for example) are not integrated in the k8s network stack. Cloud load balancers are absolutely an important part of delivering network traffic to your cluster, but the ingress controllers have to "know" a bit more about cluster network topology to serve the purpose they serve. That is, taking incoming traffic from node(s) at a common port and sending it on to a particular service in the cluster.
I was thinking about AWS's Application Load Balancer which already has some kind of rules similar to ingress resources in K8S. I think the LB in Digital Ocean has something similar as well. Having to run (multiple) ingress controller instances in your cluster seems like a waste of resources. Also, if ingress controller specific functionality is used, are ingress resources even the right abstraction?
AWS ALB is exposed as Ingress through alb-ingress-controller. NLB is exposed through Service objects which correspond to L4 load balancers (with Type=LoadBalancer set)
In addition to the great sibling comment, another reason is typically cost: if I had to provision a new cloud load balancer for every review environment in GitLab, that'd be quite pricy (and much, much slower to deploy)
How do you typically access your K8S cluster when it's not fronted by a LB and it has multiple nodes?
At least in AWS, it will by default instantiate an application LB for every ingress (so at least one per namespace).

With an internal ingress controller, one can combine all ingresses into e.g. a single nginx service. That service can then be fronted by one single network LB for the whole cluster.

On a similar but unrelated note about load balancers, if I have a VM with multiple IPs, how can I bind a service to a single IP? It seems like a service using a NodePort listens on that port on all IPs, however, I haven't found a load balancer implementation that lets me do this either.
On the Service object side, the method is to set loadBalancerIP in the spec.

The implementing side for Services has to then do the appropriate operations to either get you that IP, or fail creation of Service. On GCP, if you set loadBalancerIP to IP of a persistent, unattached public IP you own, it will automatically attach to it. I expect similar behaviour on AWS and others.

Ingress resources are used to declare what is the expected input traffic of your cluster.

How you route that traffic to your cluster is another problem.