Monday, October 30, 2023

Apache as a Proxy

I have a number of different devices and services running on my home network with web interfaces, and I would like to be able to access them all from anywhere.  Some of these are smart devices, some are software packages, but it doesn't really matter.  I want to be able to access them all by connecting to a web page on the server running on my firewall (or wherever port 443 is forwarded).

To start with, I did an inventory of all the devices and servers on my network.  The tool 'nmap' is idea for this.  Something like this works:

nmap 192.168.0.1-254

My network is a little more complicated, but I just run the equivalent command on each subnet.  For any device showing something on port 80 or 443, I connect with a web browser and see what's there.  For other ports that might be web pages, I can construct a URL by adding 'http://' or 'https://' to the front and adding ':8443' or whatever the port is.  If the web browser brings up anything interesting, I note that, too.

For the first pass, I create a web page that simply has a link to each service of interest on my network.  Of course, this will only work from inside my network, but at least I now have a page listing everything that I want to connect to.  Some of these will be http, others https, often with bad certificates (like my printer).  That's fine, I'll eventually deal with that using my proxy.

There are two categories of applications that I want to proxy.

The simple case is web applications that for various reasons run on different web servers (either on the same system on a different port or a different system; it doesn't matter), but are already running as a subdirectory, and I just want to forward anything going to that subdirectory on the main web server to the application.

The complicated case is typically a device where it assumes it's its own web server, and everything is relative to the root of the server.  I want to proxy it from a subdirectory on my main server, and this means modifying all the links that are passed through.  Sometimes this is easy, but sometimes it's quite difficult.

For the simple case, I'll use an application called "mythweb" as an example.  This is running on an internal server at http://192.168.0.5/mythweb/.  I've set it up in my /etc/hosts file as myth.mydomain.com, so I don't have to reference the IP number.  My apache installation has my server defined in /etc/apache2/vhosts.d/mydomain.conf, and I'll be adding entries to that file.  So for mythweb, it's really quite simple:

  <Location /mythweb/>
        ProxyPass http://myth.mydomain.com/mythweb/
        ProxyPassReverse http://myth.mydomain.com/mythweb/
        # mythweb sets a global cookie; change path from / to /mythweb/
        ProxyPassReverseCookiePath "/" "/mythweb/"
  </Location>

That's it.  Since I'm going to be defining multiple proxies as virtual subdirectories, I'm defining each as a location.  The "ProxyPass" line says where to send requests for anything in this directory.  The "ProxyPassReverse" is used for rewriting any redirects that the internal server may send back to the location.  The final note is that this application was setting a global cookie, so the "ProxyPassReverseCookiePath" option restricts any cookies it sets to only be for the subdirectory so they don't get seen by any other programs.

Doing a proxy for something that isn't already packaged in a subdirectory can get much more complicated, as you have to tell Apache how to modify the files being served as well as as the URLs being requested.  This potentially means modifying .js, .css, and .html files, and what has to be done is different for each application.

Perhaps my simplest example is my NAS management.  I'm running XigmaNAS.

    <Location /nas/>
        ProxyPass http://nas.mydomain.com/
        ProxyPassReverse http://nas.mydomain.com/
        ProxyPassReverse /
        ProxyHTMLEnable On
        ProxyHTMLExtended On
        RequestHeader    unset  Accept-Encoding
        ProxyHTMLURLMap ^/ /nas/ R
        ProxyHTMLURLMap http://[w.]*mydomain[^/]*/ /nas/ R
    </Location>

This starts out the same, but there's a good bit more.  Now we need another ProxyPassReverse line for '/' to direct to the location (/nas/).  Since we're modifying file internals, we turn off the "Accept-Encoding" option to disable compression.  I use ProxyHTMLURLMap to do regular expression substitutions in links in html files.  The "extended" option tries to also modify Javascript and CSS within html files.  The two lines for changing are switching any top-level links to the subdirectory, as well as any full links.  Fortunately this program doesn't use any CSS or JS files with links in them that need to be modified.

However, there's some setup for the ProxyHTMLURLMap command to define what gets treated as a URL.  I've seen some references to including "proxy_html.conf," but my Apache didn't have that file, so I put these lines in directly to my config file.  Note that these are not specific to a location, so I put them before the location tags:

    ProxyHTMLEvents onclick ondblclick onmousedown onmouseup onmouseover onmousemove onmouseout onkeypress onkeydown onkeyup onfocus onblur onload onunload onsubmit onreset onselect onchange
    ProxyHTMLLinks  a          href
    ProxyHTMLLinks  area       href
    ProxyHTMLLinks  link       href
    ProxyHTMLLinks  img        src longdesc usemap
    ProxyHTMLLinks  object     classid codebase data usemap
    ProxyHTMLLinks  q          cite
    ProxyHTMLLinks  blockquote cite
    ProxyHTMLLinks  ins        cite
    ProxyHTMLLinks  del        cite
    ProxyHTMLLinks  form       action
    ProxyHTMLLinks  input      src usemap
    ProxyHTMLLinks  head       profile
    ProxyHTMLLinks  base       href
    ProxyHTMLLinks  script     src for
    ProxyHTMLLinks  iframe     src

For my managed network switch, I had to also do some additional modifications to the html files to get it to work.  In the process, I double-modified some URLs, so I had to undo them:

        AddOutputFilterByType SUBSTITUTE text/html
        Substitute s|action="/|action="/switch/|n
        Substitute s|="/"|="/switch/"|n
        Substitute s|location.href="/|location.href="/switch/|n
        Substitute s|"/switch/switch/|"/switch/|n

So how did I figure that out?  It's an iterative process of loading it through the proxy and looking at what files are being requested, and figuring out how the browser got the wrong information when it wasn't translating things correctly.  This isn't too hard if you open up the console (Control-Alt-I in Chrome, Control-Shift-I in Firefox). There you can see every request and response as seen by the web browser.  You can compare connecting directly and through the proxy.

From here it keeps getting more complicated, but it boils down to creating substitute rules for other types.  I had to watch carefully in the console, as in one application I had to modify application/javascript, while in another it was application/x-javascript.

Another issue that I've encountered is proxying an internal device that uses https with a bad certificate.  I have my own legitimate certificate, and I just want to ignore the internal one, and it turns out that's easy to do.  I just put the following into my config file (before the <location> tags):

    # Enable SSL proxy and ignore local certificate errors
    SSLProxyEngine On
    SSLProxyVerify none
    SSLProxyCheckPeerCN Off # Ignore certificate error
    SSLProxyCheckPeerName off
    SSLProxyCheckPeerExpire off

With that I can set a proxy to https:// instead of http:// and everything else just works.

What about security?  Many of the things I'm proxying already have password protection, but some don't.  Fortunately it's easy to add a password inside any <location> field:

        # Authentication
        AuthType Basic
        AuthName "Password"
        AuthUserFile /var/www/localhost/accounts/webfrontend
        require valid-user

That's just the same as you would do for a <directory> if you weren't doing a proxy.  Note that my server only runs on https, or I wouldn't use "basic" authentication.

Another problem is doing a proxy for something that uses websockets.  I hit this with my security camera, and the following does the trick:

        # See: https://httpd.apache.org/docs/2.4/mod/mod_proxy_wstunnel.html
        RewriteEngine on
        RewriteCond %{HTTP:Upgrade} websocket [NC]
        RewriteCond %{HTTP:Connection} upgrade [NC]
        RewriteRule ^/?(.*) "ws://camera.mydomain.com/$1" [P,L]

Unfortunately I haven't succeeded in doing a proxy of everything.  One device fails to load all the pages even though it appears Apache is modifying all the links correctly.  What I have been able to do in that case is have Apache listen on a separate port with a new entry in /etc/apache2/vhosts.d/, and that vhost is a straight proxy for the device without any link rewriting to push it into a subdirectory.  If you're having trouble getting something to work with the rewrites, that's a good first step.

The Apache proxy feature is very powerful.  It's great to be able to take all my different devices and put them in a single interface and appear as if they are all on the same server.  Unfortunately this can be very complicated in some cases.  If developers would avoid absolute links, especially in CSS and JavaScript, it would make proxying much easier.  It would also be nice if there were some community database of proxy recipes for devices and web applications.  You would think there would be a wiki for this with pages for each device or application that someone had done a proxy for.

No comments:

Post a Comment