Clangd and envkernel.sh

Over the last week I’ve been working on a Linux driver for the RTL8723CS wifi chip (the one used in the Pinephone). This involves a lot of reading and writing code, as well as kernel builds. After a bit of fiddling and bugfixing to get them to work, envkernel.sh and clangd make this easier and faster for me.

Clangd is an LSP (language server protocol) server for C (and C++, Objective-C, but I don’t use those), it can tell your editor where there are errors in your code, offer autocompletion, refactoring, jump to definitions, etc., lots of helpful things. The catch: If your code does anything more complex than #includeing a standard header file, you need to tell clangd how it’s supposed to be built using a compile_commands.json file.

The Linux kernel definitely falls under “more complex”. For a common library you might use a tool like Bear to generate that file. The kernel has a helpful scripts/clang-tools/gen_compile_commands.py script right in the source tree, but it needs to analyze kernel build output. So far I had been building my kernel using pmbootstrap build --src (which produces a postmarketOS kernel package), so I didn’t have that at hand.

envkernel.sh promised to help there, and make my builds smoother in general: Sourcing it sets up a few aliases (e.g. for make), so the usual Linux build process (with make menuconfig and make) runs in the chroot and with the cross-compilers already provided by pmbootstrap. Essentially no extra setup if you use pmbootstrap already.

Setting up envkernel.sh

Sourcing envkernel.sh isn’t a particular challenge, it just itched me that the documentation said you need to have it in a full pmbootstrap source tree. I had the Debian package already installed and didn’t want to switch. There was an already open issue marked “help wanted”, and the fixes looked simple enough, so I made an MR. Yes, that means I ended up working right from the pmbootstrap source anyway. 😸

Kernel build

Kernel config and make just worked as described in the postmarketOS wiki, nothing special there.

Compile Commands

I made a mistake here: I assumed I’d have to run gen_compile_commands.py in the chroot too, using the run-script alias provided by envkernel.sh. Turns out it works without that, but I tried run-script scripts/clang-tools/gen_compile_commands.py first and wondered why I got no output and a compile_commands.json file containing exactly [].

I did the obvious thing: The script has a --logging parameter, so I added --logging DEBUG and got, to my surprise, exactly the same result: no output, and an empty list. Digging into it I realized that the run-script alias quietly dropped any parameters to the script being called. The fix I came up with (another MR) isn’t super nice because POSIX shell printf lacks %q, but works for my needs.

After I got debug output from gen_compile_commands.py I understood I needed to tell it where to find the actual build output. The make alias doesn’t place it right next to the sources (which is the default), but instead into a .output/ subdirectory. So after building your kernel in the envkernel.sh environment, you need to run:

./scripts/clang-tools/gen_compile_commands.py -d .output/

This should give you a compile_commands.json file, with one catch: Paths in it might contain /mnt/linux/ (the mount point of the source tree inside the chroot) instead of the actual location of your source tree in a few places. That’s easily fixed with sed, e.g.:

sed -e s,/mnt/linux/,$(realpath)/,g compile_commands.json

Clangd config

At least on my system clangd didn’t like some GCC compiler flags used in the kernel build, and it needs to be told about cross-compilation. Also clangd somehow likes to assume C headers are C++ (no, I don’t want to #include <iostream>, thanks). I fixed those errors with .clangd config:

CompileFlags:
  Compiler: clang
  Add:
    - -xc  # C only, no C++
    - --target=aarch64
  Remove:
    - -fno-allow-store-data-races
    - -fconserve-stack
    - -march=*
    - -mabi=*

After telling my editor to restart clangd, it worked. Well, I did it in iterations, but eventually it did. If you replicate it, I recommend you try to work with as few removed options as possible.

Kernel packaging fix

pmbootstrap build has a --envkernel option, which packages a kernel built using envkernel.sh as the matching postmarketOS kernel package. The package depends on your device, with a Pinephone my workflow is:

  • Do some programming.
  • make
  • pmbootstrap build --envkernel linux-postmarketos-allwinner
  • pmbootstrap sideload linux-postmarketos-allwinner

The sideload command just installs the freshly built package on the connected phone, so I’m ready to test.

The problem is that there’s a bug in pmbootstrap which breaks pmbootstrap build --envkernel for kernel packages that us the downstreamkernel_package helper in their APKBUILD. That includes linux-postmarketos-allwinner. So I had to fix that too. That’s the bugfix you actually need if you want to use this workflow with an affected kernel package, the other two are just a fun story.

Summary

If you want to build a kernel with envkernel.sh, create a compile_commands.json for clangd from the build, and package the result, you need:

  • pmbootstrap (with envkernel.sh, so currently from Git).
  • Build your kernel as the envkernel.sh documentation says.
  • ./scripts/clang-tools/gen_compile_commands.py -d .output/
  • Replace /mnt/linux/ with the location of your actual kernel tree in compile_commands.json, e.g. using sed.
  • To build a postmarketOS package of the kernel you might need this fix for pmbootstrap if your kernel package uses downstreamkernel_package (likely).

Happy hacking! 😸

IPv6 Multicast Routing for Syncthing discovery

By default, Syncthing instances send IPv4 broadcast and IPv6 multicast discovery packages into the local network to find each other. I have my router set up so wired and wireless interfaces are connected by routing, not bridged, so broadcasts can’t reach the other segment. With IPv6 multicast packets it should be possible to set up routing so they do, but unfortunately that doesn’t work out of the box. I got it to work, and here are the things I had to solve.

Multicast group

The multicast group Syncthing uses by default is ff12::8384. The problem is that it is a link-local group (the 2 in the last 4 bits of the first group signifies that, see IPv6 Multicast Address Space Registry). Because of that the kernel picks a link-local source address (fe80:...) for multicast packets, which means they can’t be routed. Switching to a site-local group like ff15::8384 in the advanced config fixes that, the source address will be a regular unicast address. Mind that all Syncthing instances that should find each other via multicast must use the same group.

Multicast routing

The router kernel must be configured to allow multicast routing for the selected multicast group using a multicast routing daemon. SMCRoute is available in OpenWRT. The following config in /etc/smcroute.conf enables routing packets for the multicast group from interface br-lan to br-wlan (but not the other way):

mgroup from br-lan group ff15::8384
mroute from br-lan group ff15::8384 to br-wlan

And of course the firewall must allow the packets to pass.

Hop limit

The “hop limit” field in an IPv6 header is decreased at each router, and on 0 the packet is dropped (even if it would otherwise be routed). Syncthing sets the hop limit to 1, which effectively means “no routing”, without any option to configure it.

As a workaround, it’s possible to manipulate the hop limit via firewall rules. BIG WARNING: If you do this, be extremely careful not to create a loop where the hop limit gets increased again and again and the packets circle in the network indefinitely. The iptables and kernel documentation strongly discourage hop limit manipulation because of this danger, and I agree: This is deep in “understand exactly what you’re doing” territory. ⚠️

What I do to avoid that problem and other unwanted effects:

  • Do the manipulation on the system that runs Syncthing, never the router.
  • Put the rule in an output chain (iptables) or hook (nftables), so it applies only to newly created packets, never routed ones.
  • Be as specific as reasonably possible about which packets to match. Here I’m matching on interface, multicast (destination) address and destination port.
  • Keep the hop limit as low as possible. I need the packets to pass through exactly one router, so I’m using 2.

The manipulation can be set up using nftables or legacy iptables.

Nftables

Including setting up the table and chain:

nft add table ip6 mangle
nft add chain ip6 mangle output '{type filter hook output priority mangle;}'
nft add rule ip6 mangle output ip6 daddr ff15::8384 oifname eth0 udp dport 21027 counter packets 0 bytes 0 ip6 hoplimit set 2

The resulting rule should look similar to this in nft list ruleset:

table ip6 mangle {
    chain output {
        type filter hook output priority mangle; policy accept;
        ip6 daddr ff15::8384 oifname "eth0" udp dport 21027 counter packets 0 bytes 0 ip6 hoplimit set 2
    }
}

The counters aren’t strictly necessary of course, but they’re nice to see.

Iptables

The equivalent in iptables (though different internally):

ip6tables -t mangle -A OUTPUT -o eth0 -d ff15::8384 -p udp --dport 21027 -j HL --hl-set 2

Testing

The easiest way to check if multicast routing works is to enable the “beacon” debug logging facility in Syncthing on a system that should receive the routed multicast packages. Then watch the log for packets from instances on the other side of the router.

Should I use this?

Discovery by multicast routing is very nice if you have a local network with routing, otherwise it doesn’t really matter. The setup described above works, but has some disadvantages:

  • Changing the hop limit in the firewall is a rather hackish. It works, but it’s not nice. I think Syncthing should definitely add an option for this.
  • Changing the multicast group means your instances won’t find ones using the default Syncthing multicast group, at least not via IPv6 multicast. Depending on what devices you synchronize with this may matter or not.

I still haven’t decided if I’ll keep running it this way, but it was definitely fun to figure out! 😸

Configure a Firefox web extension from Selenium

I’ve been wanting to add automated tests for Referer Modifier for a while, and now I finally got around to implementing some using Selenium (which lets you remote-control a browser). One tricky question to solve was: How do I automatically configure the freshly installed Firefox add-on?

Selenium has interfaces to open a page in the browser, find elements, click on them, and so on. It also has a way to install an add-on, in my Python unittest code it looks like this:

    def setUp(self):
        self.browser = webdriver.Firefox(options=self.options)
        self.browser.install_addon(str(self.addon_path), temporary=True)

The self.addon_path is the path to the ZIP archive containing the add-on, setting temporary=True is necessary because Firefox refuses to install unsigned add-ons permanently. But how do I configure the settings I want to test?

Semi-obvious answer: also automate the options page. Which leads to the next question: How to I open the options page? If you use Firefox with add-ons you may have noticed that add-on configuration pages have URLs like this:

moz-extension://[some UUID goes here]/page.html

Hm, okay, but what’s the right UUID? It’s definitely not the add-on ID set in the web extension manifest. As it turns out, the UUIDs are randomly generated on each computer to make them hard to guess, and a mapping from add-on IDs to the local UUIDs is stored in the extensions.webextensions.uuids preference. Next question: How do I get that mapping?

Answer: Not at all, it seems Selenium has no way to access Firefox preferences once the browser is running. But it has something else: It lets you set up a Firefox profile (including preferences) before starting the browser… If I can’t read the mapping, maybe I can just provide my own? 🤔

    @classmethod
    def setUpClass(cls):
        cls.ext_dir = Path(sys.argv[0]).parent
        with open(cls.ext_dir / 'manifest.json') as fh:
            manifest = json.load(fh)

        cls.addon_path = (cls.ext_dir /
                          f'referer-mod-{manifest["version"]}.zip').resolve()
        addon_id = manifest["browser_specific_settings"]["gecko"]["id"]
        addon_dyn_id = str(uuid.uuid4())
        cls.config_url = f'moz-extension://{addon_dyn_id}/options.html'
        print(f'Dynamic ID: {addon_dyn_id}')

        profile = webdriver.FirefoxProfile()
        # Pre-seed the dynamic addon ID so we can find the options page
        profile.set_preference('extensions.webextensions.uuids',
                               json.dumps({addon_id: addon_dyn_id}))
        # [...]
        cls.options = FirefoxOptions()
        cls.options.profile = profile

And it works! What this code does is:

  1. Read the manifest file of the add-on.
  2. Retrieve the static add-on ID from the manifest.
  3. Create a random UUID.
  4. Create a new Firefox profile and store the mapping from the add-on ID to the UUID as JSON in the extensions.webextensions.uuids preference.

The resulting cls.options is what’s used as the options parameter in the first snippet above. The pre-configured mapping is used when the add-on is installed, and because my test knows the UUID it generated (still randomly!) it can later load the config_url generated above, and import the configuration I want to test:

        test_config = (self.ext_dir / 'test_config.json').resolve()
        self.browser.get(self.config_url)
        import_file = self.browser.find_element_by_id('import_file')
        import_file.send_keys(str(test_config))
        import_button = self.browser.find_element_by_id('import_button')
        import_button.click()

And done! Of course if your add-on doesn’t support configuration import, or you actually want to test the add-on UI (good idea!) a few more actions will be necessary. If you want to see the full test, look at test.py in the Referer Modifier repository.

Testing Python log output

Imagine you have a module and want to make sure everything’s working, so you test. For functions the idea is usually pretty simple: Provide some inputs, see if the output looks as expected. But what the function is also supposed to write some log messages and you want to check if they look the way they should? I was asking myself that question on the weekend and found a solution that I think is fun: Adding a custom log handler!

Maybe you’ve already used the logging module? Logging messages is pretty simple, for example:

import logging
logger = logging.getLogger(__name__)

def add(*vargs):
    logger.debug('Calculating the sum of %s', vargs)
    res = sum(vargs)
    logger.debug('Result is %d', res)
    return res

Let’s assume you have that in a file add.py, and are going to write tests in a separate file. A simple unittest-based functionality test might look like this:

import add
import unittest

class AddTest(unittest.TestCase):
    def test_add(self):
        self.assertEqual(add.add(1, 2, 3), 6)

But what if you want to check if the debug messages are logged as expected? I realized you can tap into the logging framework for that! The logging module uses handlers to send log messages wherever they are supposed to go, and you can add as many as you want, so why not add one that sends output to the test?

First you’re going to need a logger for the target module:

        logger = logging.getLogger('add')
        logger.setLevel(logging.DEBUG)

Setting the level of the logger is necessary to ensure you really get all messages (unless that has been set elsewhere already), but keep in mind that that’s a process-wide setting. In a simple unit test like here that’s not going to cause trouble. Now that we have the logger we need to add a handler.

Also I want to have the input parameters and expected result in variables, so I can use them later for comparing with the log messages:

        params = (1, 2, 3)
        expected_sum = 6

Option one: A temporary log file

        with tempfile.SpooledTemporaryFile(mode='w+') as log:
            handler = logging.StreamHandler(stream=log)
            logger.addHandler(handler)
            try:
                self.assertEqual(add.add(*params), expected_sum)
            finally:
                logger.removeHandler(handler)
            log.seek(0)
            logdata = log.read()

The logging.StreamHandler can write to all sorts of streams, its default is sys.stderr. I’m using a tempfile.SpooledTemporaryFile as the target because it is automatically cleaned up as soon as it is closed, and the amount of log data will be small, so it makes sense to keep it in memory.

The try/finally block around the function I’m testing ensures the handler is always removed after the function call, even in case of an exception (including those from failed assertions).

In the end you just have to read the file and check if the output looks like it should.

        lines = logdata.splitlines()
        self.assertEqual(lines[0], f'Calculating the sum of {params!s}')
        self.assertEqual(lines[1], f'Result is {expected_sum}')

This also shows the disadvantages of this method: You end up with a wall of text that you have to parse. With two lines it’s not too bad, but with a lot of output it may get messy.

You can remedy that somewhat by attaching a Formatter to the handler, which as the name indicates lets you format the log messages, including adding some metadata.

Option two: A message queue

        q = queue.SimpleQueue()
        handler = logging.handlers.QueueHandler(q)
        logger.addHandler(handler)
        try:
            self.assertEqual(add.add(*params), expected_sum)
        finally:
            logger.removeHandler(handler)

This code is a bit shorter, because there’s no file to open. Instead the log messages are added to the queue, and I can retrieve them message by message:

        self.assertEqual(q.get_nowait().getMessage(),
                         f'Calculating the sum of {params!s}')
        self.assertEqual(q.get_nowait().getMessage(),
                         f'Result is {expected_sum}')

This has two advantages:

  1. I always get complete messages, no need to worry about splitting lines and newlines in messages.
  2. The objects in the queue are not strings, they are LogRecord objects, which hold all the metadata of the log message. Though in this example I’m just like “give me the message” and that’s it.

Conclusion

Turns out the Python logging module is easier to use than I had thought when I started figuring this out, and is fun to play with. Of course with more complex tests this kind of analysis might get more complex, too: You might not want to look at every message (maybe a Filter helps?), or you might not be sure which order they arrive in.

Have fun coding, and if you want you can find my full example code on GitHub.

Fun with Javascript, or: Setting document.referrer

It’s a bit of an odd thing for me to say, but yesterday I had some fun playing with Javascript. I maintain a little Firefox add-on called “Referer Modifier” to modify the HTTP Referer header, an since Web Extensions became the standard for browser extension Javascript is simply the language of choice for that.

In case you don’t know, when you click a link by default most browsers will send the URL of the site on which the link was to the site you’re going to as the “Referer” (the spelling is a typo that made it into the specification). There are some uses for this, for example if you get to this blog via a link I can see that in the statistics, but it can also be a privacy issue. That’s why I wrote an add-on that can change, or simply remove that information.

A while ago someone asked if the add-on would also support changing the Javascript document.referrer property (oh hey, spelled correctly there!). It’s basically the same thing, except instead of being sent to the server it’s available to scripts running on the site, with the same possible privacy issues. At the time I didn’t have time to look into it, but later someone else was so kind to look into it and leave a comment: Firefox used to make document.referrer match the HTTP Referer, but there’s a bug since version 69 that stopped that. The comment also included a link to how another add-on called “Smart Referer” solved the problem.

So yesterday I set out to find my own solution. From looking at the Smart Referer code I knew I’d need to add a content script to Referer Modifier, code that would run in the context of each loaded site. But I didn’t like their approach of getting configuration to the content script, which runs in a limited environment: They generated the script (as text) at runtime, and included the configuration in that text as JSON. I guess it works, but it seems overly complicated.

Looking around MDN (highly recommended for any kind of web stuff!) I found that there’s a messaging mechanism specifically for communication between different parts of an add-on: runtime.sendMessage(). It’s perfect for my purposes: The content script can send a message to the background script of the extension containing the URL of the current page and the pre-set document.referrer, and the response will tell it how to overwrite the latter. That way there’s no need to generate code, and I could just write it as a (pretty short) script file that Firefox will load when a page is loaded. Also the code that checks the configuration is literally the same, no unnecessary complexity there!

In the existing background script I just had to add a listener for incoming messages, and write a function to send the responses. It took a little while though to figure out why I’d get “invalid URL” errors when trying to parse the default document.referrer and it was unset. Turns out while I was checking for null, “unset” is actually an empty string in this case.

That was a fun little exercise, and here’s the patch with everything put together if you’re interested. 😉

So, about THAT GnuTLS session ticket bug (CVE-2020-13777)

It’s been almost two weeks ago now that I discovered the bug in GnuTLS now known as CVE-2020-13777. The report is issue #1011 on the GnuTLS bug tracker. Here I want to talk a bit about how I discovered the bug and some thoughts on its impact.

Session resumption for mod_gnutls

I was working on session resumption support for proxy connections in mod_gnutls. To test I had a setup with two Apache servers with mod_gnutls:

  1. a front-end server, which should cache session tickets and try to resume sessions, and
  2. a back-end server which should issue and use the session tickets.

After session resumption worked as it should, I restarted the back-end server to invalidate the session ticket the front-end server had cached. On server stop mod_gnutls wipes the primary key used to protect session tickets from memory, so the freshly started server should have had no way to decrypt the old ticket. But somehow both sides still reported successful session resumption. That absolutely shouldn’t happen, so I started investigating.

At first I worried about a bug in mod_gnutls: Maybe the log output about resumption wasn’t correct? Or, way worse, was the key not used correctly? Over time I ruled those out, and instead started to suspect a bug in GnuTLS.

Testing GnuTLS itself

To check that I tested gnutls-serv, a relatively simple TLS server included with GnuTLS.

As the client I had to use OpenSSL’s s_client, because contrary to gnutls-cli it offers a way to cache sessions to disk and load them. I needed that because for my test I needed to change the server between initial session and resumption, and the --resume flag in gnutls-cli is way too fast to do that manually.

This brought me to the steps I described in the bug report to reproduce the problem:

  1. Start a server with valid credentials: gnutls-serv --x509keyfile=authority/server/secret.key --x509certfile=authority/server/x509.pem
  2. Connect to the server and store resumption data: openssl s_client -connect localhost:5556 -CAfile authority/x509.pem -verify_return_error -sess_out session.cache
  3. Stop the server started in step 1.
  4. Start a server with bogus credentials at the same address port (imagine a real attacker redirecting connections only if the client is attempting resumption): gnutls-serv --x509keyfile=rogueca/mitm/secret.key --x509certfile=rogueca/mitm/x509.pem
  5. Connect again, using the stored resumption data: openssl s_client -connect localhost:5556 -CAfile authority/x509.pem -verify_return_error -sess_in session.cache

That worked flawlessly on the first try. At that point it sunk in that I had a serious security issue on my hands that allowed man-in-the-middle (MITM) attacks on GnuTLS servers, and did a round of cycling to clear my mind.

After a few more tests I wrote up the bug report. It was clearly too bad to sit on until I had all the details, what I had was already more than bad enough to require an immediate fix. A quick attempt to reproduce the problem with TLS 1.2 had failed (looking back I must’ve made some mistake in the commands), so I guessed there was some difference in how GnuTLS handled tickets for TLS 1.3 and 1.2.

It gets worse

Daiki Ueno commented on my report:

“Looking at the code path, ticket encryption key and decryption key are all-zero, until the first rotation happens. In TLS 1.3, that can only bypass the authentication, but in TLS 1.2, it may allow attackers to recover the previous conversations.”

With that hint I was able to find the relevant GnuTLS code pretty quickly, and indeed there was no hint of different ticket encryption for the different TLS versions. So I tried to reproduce the problem with TLS 1.2 again, and indeed it worked. With a bit of fiddling I was able to get the server to write the key to disk during the resumed session, and decrypt the previous, initial session. Mind that that was just the easiest way to do it, it is possible to get the same data just from captured network traffic.

Allowing MITM with TLS 1.3 is already really bad, but effectively unprotected tickets with TLS 1.2 are way worse, because it means it’s possible to decrypt past recorded sessions. If the NSA has such a TLS connection from a year ago stored somewhere? Whoops, they can decrypt it now.

This is because of design flaws in TLS 1.2:

  • Resumed sessions use exactly the same secret keys as the initial session. So if you get access to the secrets at any point (like from a not properly protected ticket here), you can decrypt all the sessions.
  • Session tickets are sent from the server to the client right before starting to use the negotiated encryption. For this bug that means TLS 1.2 sessions are vulnerable as soon as the server sends a ticket, not only on resumption.

TLS 1.3 mostly avoids those issues by doing a fresh Diffie-Hellman exchange during resumption (simply put, it negotiates new keys, and just keeps authentication in place – which allowed the MITM here), and sending the ticket to the client encrypted. During session resumption the client naturally has to send the ticket in the clear.

I wrote “mostly” above, because there is one part where TLS 1.3 should be vulnerable to passive decryption, that I haven’t seen discussed much yet: Early data. I say “should” because I haven’t tested it, but based on what I know about how early data works it should be. Early data is sent with the request for session resumption to speed up communication (hence the name), so it cannot be protected by the new Diffie-Hellman exchange yet and uses key material from the ticket, similar to what TLS 1.2 does for the whole session. “Early data” tends to be small (if it is used at all, mod_gnutls does not support it), but if that small piece of data contains, say, an authentication token… Not good.

Consequences

First of, big thanks to the GnuTLS team for handling the bug report and fix well!

Secondly, this demonstrates why the design flaws in TLS 1.2 I described above are so bad. If a ticket is compromised, all associated sessions are. And these are not the only flaws, and even though most can be avoided with careful implementation: Don’t rely on TLS 1.2 (much less older) any more if you can avoid it! From the client side that’s hard to enforce (for example, way too many websites still offer only TLS 1.2), but seriously consider it if you’re operating a TLS server. Although I understand that may be difficult too: All modern browsers support TLS 1.3, but I have no idea how many legacy devices, like phones and IoT devices that might not have seen updates in a long time, might still need your servers.

After this it gets murky. Should you still use session tickets with TLS 1.3? I think it’s reasonable to, assuming of course that the implementation is sound. Should you use early data? I’m leaning towards no, and have no plans to support it with mod_gnutls, because it doesn’t offer forward secrecy in case of ticket compromise (unlike the rest of TLS 1.3), and I doubt the slight speedup is worth the complexity.

The issue with session ticket key rotation

A particularly interesting case is the session ticket key rotation as implemented in GnuTLS. What I learned while looking at the code that caused this bug I learned that the rotation doesn’t in fact rotate the primary key (as generated using gnutls_session_ticket_key_generate()) based on time, it only derives the key used for a particular session from it in a time-based manner. That means it does not protect against an attacker who is able to steal the primary key from server memory, which I consider to be the main risk for tickets. That is because if you can steal that key, you can decrypt all tickets, and at the very least do MITM attacks.

Some have called the rotation scheme entirely useless because of that. I’m not yet sure if I agree with that, but I do want to implement a rotation of the primary key in mod_gnutls to protect against server compromise. However, I also see that it is extremely difficult for a library like GnuTLS to offer a key rotation mechanism that reliably works for all possible use cases. For example with mod_gnutls I’d have to synchronize a truly random new key across multiple server processes. Maybe clarifying the documentation on exactly what is and isn’t rotated is the best GnuTLS can do on the library side.

NSA Admins and Privacy

While reading through an Intercept article on the NSA’s XKEYSCORE program (simply put: a search engine for data captured by the NSA), I came across this gem:

When systems administrators log into XKEYSCORE servers to configure them, they appear to use a shared account, under the name “oper.” Adams notes, “That means that changes made by an administrator cannot be logged.” If one administrator does something malicious on an XKEYSCORE server using the “oper” user, it’s possible that the digital trail of what was done wouldn’t lead back to the administrator, since multiple operators use the account.

Behind the Curtain, The Intercept, 2015-07-02

It’s common knowledge that shared accounts are extremely bad practice from a security standpoint. It’s difficult to revoke access for a specific person without causing a fuss for everyone else, or attribute actions to a specific person. And that’s exactly why I would want a shared administrator account if I wanted to avoid responsibility. “Someone ran an illegal query? Wasn’t me, and you can’t prove otherwise!” See, NSA admins know about privacy… They’re just selective about who should have it. 🙄

New in Java 8: Catching Integer Overflows

I’ve recently discovered a nice new feature in Java 8: methods to properly handle integer overflows. Consider the following example:

public class OverflowTest
{
	public static void main(String[] args)
	{
		int a = Integer.MAX_VALUE;
		int b = 1;

		int c = a + b;
		System.out.println(a + " + " + b + " = " + c);
	}
}

When you compile and run it, this is the result:

$ javac OverflowTest.java
$ java OverflowTest
2147483647 + 1 = -2147483648

Quite obviously, this can’t be mathematically right. The problem occurs because an int has a limited size of 4 byte. When this size is too small to store a number, the value will overflow and wrap around from the largest to the smallest possible value (or vice versa if the number is too small).

If you wanted to catch possible overflows in previous Java versions, you had to write your own checks. In Java 8, java.lang.Math offers new methods that will take care of that for you.

public class OverflowTest
{
	public static void main(String[] args)
	{
		int a = Integer.MAX_VALUE;
		int b = 1;

		int c = Math.addExact(a, b);
		System.out.println(a + " + " + b + " = " + c);
	}
}
$ javac OverflowTest.java
$ java OverflowTest
Exception in thread "main" java.lang.ArithmeticException: integer overflow
	at java.lang.Math.addExact(Math.java:790)
	at OverflowTest.main(OverflowTest.java:8)

If an overflow occurs, Math.addExcact(int,int) throws an ArithmeticException, which you can catch and handle. Similar methods exist for other operations and the long type. What to do in case of an exception depends on your application, and may be quite complicated. This post is just about mentioning these shiny new methods. 😉

Anyway, if I want to print the correct result in my example, I can simply fall back to long. At 8 bytes length it can definitely store the result of adding two ints. Note that I have to cast at least one summand to long before adding them, or the intermediate result would still be an int and overflow.

public class OverflowTest
{
	public static void main(String[] args)
	{
		int a = Integer.MAX_VALUE;
		int b = Integer.MAX_VALUE;

		try
		{
			int c = Math.addExact(a, b);
			System.out.println(a + " + " + b + " = " + c);
		}
		catch (ArithmeticException ex)
		{
			System.err.println("int is too small, falling back to long.");
			long c = (long) a + (long) b;
			System.out.println(a + " + " + b + " = " + c);
		}
	}
}
$ javac OverflowTest.java
$ java OverflowTest
int is too small, falling back to long.
2147483647 + 2147483647 = 4294967294

If you need a way to handle numbers of (almost) unlimited length, take a look at java.math.BigInteger. Also note that catching possible overflows influences performance, so using *Exact everywhere instead of simple operators is probably a bad idea. Anyway, this is not one of the biggest news in Java 8 (lambda expressions!), but I think it’s neat, and also about the right scope for a quick blog post. 😀

Peculiar Ethernet Timings

Have you ever tried sending one Ethernet packet every 78 microseconds? If not, what would you expect to happen? Actually, I did that kind of experiment (and many others) last year in my graduation thesis “Development of a Scalable and Distributed System for Precise Performance Analysis of Communication Networks“, which is now published. For the thesis I developed a system called the Lightweight Universal Network Analyzer (LUNA), which can generate packets at precise times and record their arrival times, among other things. When I tested it on different hardware, I got some surprising results, as you can see in the figure below.

Graphs of IAT Distributions with Different Ethernet Hardware
IAT distributions from tests with 78 μs IST (Figure 8.12)

The diagram shows packet inter arrival times (IAT) on the x-axis. I had configured the packet source to send a packet every 78 microsecond, and the IAT measurement shows at which intervals they actually arrived. The y-axis shows how frequently a certain IAT occurred, note that it has a logarithmic scale. The differently colored curves are from different measurements:

  • The measurement for the red curve was done between two hosts equipped with Realtek RTL8111/8168B Gigabit Ethernet controllers,
  • the cyan one between two hosts with Intel Gigabit Ethernet controllers (82567LF and 82579LM, to be precise),
  • and the dark blue one via the loopback interface on one of the hosts for reference.

The hosts were sufficiently similar in processing power (for details, see chapter 8 of the thesis).

The loopback measurement looks as expected, with a strong peak at 78 µs IAT and a packets distributed around it. In both measurements with real hardware some packets were transmitted in rapid succession, probably after some of them were stalled. The really interesting thing, however, is the different behavior at and above the intended IAT. The measurement with Intel hardware led to a peak around 78 µs, although much wider than the loopback one. Using the Realtek cards, almost no packet arrived with the intended intervals, instead, there is a very wide peak around approximately 250 µs. All three measurements showed average IATs of 77 µs, though.

If you now think that the Intel hardware followed the timing pattern created by the software much better, well, it’s not that easy. Yes, the distribution looks more like the one I wanted, but the maximum deviation from the intended inter arrival time was actually much larger. For the red curve, representing the measurement with Realtek hardware, the rightmost signal (328 µs IAT) in the graph is indeed the maximum deviation. The largest IAT recorded in the Intel measurement, however, was 1922 µs. These outliers are not shown in the figure because otherwise the peaks would be very difficult to distinguish. You can find the detailed numbers in Table 8.8 in the thesis.

The hardware for this experiment was essentially just what was available at the lab. 😉 Nonetheless the results show that networking hardware can have an impressive impact on the timing behavior of packet transmissions. I’d really like to see some studies on other devices! Also, it may be interesting to check in what part the difference is caused by the hardware itself, and what influence the hardware drivers have on the results.