Monday, October 30, 2023

Transparent Mode Proxies

 I have a fairly complicated home network setup, which should come as a surprise to absolutely nobody.  I recently dealt with an issue that had been bugging me for ages, but first some background.  I want to be able to connect to my home systems from pretty much anywhere, and ssh is the obvious tool for that.  Occasionally I've been somewhere where they've blocked outgoing connections to port 22 (the ssh port), but I can instead connect to port 443 (the https port).  So if I instead run ssh on port 443, that gets around the problem.  But I also want to have a web server on port 443.  Fortunately there's this neat little tool called 'sslh' that can sit and listen for connections on a port, and when something connects and sends a message to the server, it determines what protocol the client is using and forwards the connection to the appropriate program.  So now I have multiple services running on the one port that should never be blocked.

But there's a problem.

The server logs for ssh and apache show the source of the connections as being from the local system, which is technically correct since they are coming from sslh on my local system.  I could merge the logs from sslh into the logs for apache and ssh, but that would be a pain.  What I really want is to have the original source IP to show up in the logs for the applications as if there wasn't a proxy in the middle.

Wishful thinking, right?  But apparently some really smart people wished for it, so they made it happen.

There are instructions for how to make this work for sslh, and if you follow them exactly, and if you're lucky, it does work.  I say you have to be lucky because there are some subtle issues that you'll hit if you think you're smarter than the instructions or try to do things a little differently.  Which is exactly the sort of thing I'm obviously going to do.

The way sslh works, is it accepts connections, and then sees the first message that the client sends, which it uses to determine what application to forward the connection to.  Fortunately network protocols tend to expect the client to send the first message, so for things like ssh, ssl, and http, sslh will know what to do.  (However, some protocols like imap have the server send the first message upon establishing a connection, so sslh can only service one such protocol by defaulting to it using a timeout.)

So my real setup is more complicated than what I described (which should come as no surprise).  The issue I hit was with connections using ssl (or really tls as the newer versions have been renamed).  If I am going to have multiple services using ssl encryption, then I need to decrypt the incoming connection and then use sslh to multiplex to different applications.  This is done by using the program stunnel.  And it also has transparent proxy support.

So an incoming connection comes in on port 443.  First sslh gets it and sees what protocol it's using.  If it's SSL/TLS, it sends it to stunnel, which then sends it back to sslh, which finally sends it on to apache, ssh, imap, or whatever else I have hidden behind that port.

Support for transparent proxying is included with both sslh and stunnel, so I'm good, right?

Nope.

I can get it working with one of them.  I can get it working with sslh going to stunnel and on to apache.  But if I have stunnel going to sslh, it breaks badly.

Why is that?

Well here's the problem.  A quick search brings up instructions on how to make the transparent proxy work, but while they give you the formula, they don't explain how it actually works.  And without understanding the reasoning behind the instructions, you're stuck if something goes wrong or if you want to try some creative variation on the same concept.

So I decided to figure out what's actually happening, and here's the technical meat of my post:

There are two parts of making this work.  The first is the transparent proxy has to send outgoing packets that appear to be from the original host, not from itself.  The second is the network layer of the operating system has to know to route the return packets back to the proxy application, even though they'll be addressed to some other system.

To send packets with the original source IP, the transparent proxy has to do two things between creating the socket and connecting to the target.  First, it has to enable transparent mode, which requires either running as root or having the cap_net_raw capability.  This is done with a line of code like:
  int transparent=1;
  res = setsockopt(fd, IPPROTO_IP, IP_TRANSPARENT, &transparent, sizeof(transparent));
And then it has to say where the packet is to appear to originate from:
  getpeername(fd_from, from.ai_addr, &from.ai_addrlen);
  res = bind(fd, from.ai_addr, from.ai_addrlen);
Normally you only bind a socket to an address like that when listening for incoming connections, but that's how you tell the kernel what address you're sending from in transparent mode.

The other part of this is to have the operating system route packets for your application back to you.  I won't go into the details here, but there are two approaches I've seen.  One is to run your proxy as a specific user, and have a firewall rule that all packets from that user are flagged by the firewall, with firewall rules that have the return packets get sent to a different routing table that tells them to go to the local machine.  The other, and the one I prefer, is to connect to a local IP address other than 127.0.0.1, such as 127.255.0.1.  Really, anything under 127.x.x.x works besides all zeros or all 255.  (I have a separate post about localhost being 127.0.0.1/8 instead of 127.0.0.1/32, giving you a ton of addresses to use for things like this.)

Note that the above assumes that the target of the transparent proxy is on the same system.  It's probably possible to create firewall rules that will make this work with proxying for a separate internal server, but I haven't explored that, as I haven't needed it yet.

Of course, what would be really nice is to just tell the networking layer to check all local bindings before forwarding packets, but there's no option for that.  I think that would make a nasty mix of code layers in the routing code, so I can understand why the fancy firewall rules are required.


So now that I understand how it's supposed to work, why didn't it work for me with my complicated setup?

The problem was the bind() call.  That's not just binding an IP address, it's also binding a port.  What does that mean?  Every network port is bound to a combination of an IP address and a port.  This is done explicitly for any service listening for incoming connections.  For outgoing connections, it's done implicitly with the local IP and some high numbered port that is automatically assigned.  But in transparent proxy mode, that outgoing connection binding is controlled directly by the program.  And if you have multiple layers of transparent proxies, you get into trouble.  You simply can't have two connections bound to the same IP address and port.  In transparent mode, you're using the original IP and port, so a second hop doesn't work.

Except it does if stunnel is the second hop.  How does it do this?  It gets a bind error, but then retries with a new port.  This means the end application sees a different origin port number, but the IP address is right.  And in most cases, that port number isn't logged or meaningful, so it doesn't matter.  It does break the 'ident' protocol, but I don't think anyone uses that anymore; certainly not for connections from outside a firewall.

So to make it work for me, once I understood what was really happening and why it was failing, was to put in the same retry on bind failures in sslh using a new port.  And being open source, I sent my patch to the developer of the program, and my patch will be in the next release.  That's the power of open source.

No comments:

Post a Comment