Force the allocation of an elasticsearch index on restore

Today, I had occasion to delete an elasticsearch index and restore it from a snapshot. Normally this is a straightforward process, but this particular cluster has been in a yellow state for a while because all of its nodes are up against the low-disk watermark and many of its replica shards are thus sitting unallocated. We’re working on a fix for that, but in the meantime I needed to restore a small index urgently.

I only discovered this would be a problem after I tried and failed to restore the snapshot. I figured that a small index should be able to slip between the cracks, especially if I had deleted something of a similar size immediately beforehand. But the shard allocator had other ideas, and all the restored shards (both primary and replica) went straight into an unallocated state and sat there unblinking.

I found some old indexes that I could safely delete to free up a little space, but the shard allocator started work on the long-pending replica shards rather than the (surely more important!) primaries of my fresh restore. Even after setting the index priority to 1000 on the restored index, it still preferred to allocate old replicas.

I ended up forcing the allocation by hand, after combining the techniques here, here and here. The trick is to get a list of the shard numbers for the offending index and call the command “allocate_empty_primary” on each, which forces them into an allocated (but empty) state. Once they are allocated, we can then retry the restore from snapshot.

Defining BAD_INDEX and TARGET_NODE appropriately, we incant:

curl -q -s "http://localhost:9200/_cat/shards" | egrep "$BAD_INDEX" | \
  while read index shard type state; do
    if [ $type = "p" ]; then
      curl -X POST "http://localhost:9200/_cluster/reroute" -d "{commands\" : [ { \"allocate_empty_primary\": { \"index\": \"$index\", \"shard\": $shard, \"node\": \"$TARGET_NODE\", \"accept_data_loss\": true } } ] }"


This produced an ungodly amount of output, as the shard allocator proceeded to restructure its entire work queue. But the offending index had indeed been allocated with a higher priority than the old replicas, and a repeat attempt at restoring from snapshot worked.


/var/log/ksymoops modprobe error flood with a Xen kernel

I have a small virtual server in Linode that I use for various public-facing things such as serving web pages, a small Debian repository, email etc. Unfortunately it has a bad habit of filling up /var/log – I did say it was small.

I noticed today a lot of space being used up in `/var/log/ksymoops`, every few seconds and all of the form:

xen:~# tail -f /var/log/ksymoops/20180502.log
20180502 183837 start /sbin/modprobe -q -- net-pf-0 safemode=1
20180502 183837 probe ended
20180502 183842 start /sbin/modprobe -q -- net-pf-0 safemode=1
20180502 183842 probe ended
20180502 183842 start /sbin/modprobe -q -- net-pf-0 safemode=1
20180502 183842 probe ended
20180502 183842 start /sbin/modprobe -q -- net-pf-0 safemode=1
20180502 183842 probe ended
20180502 183842 start /sbin/modprobe -q -- net-pf-0 safemode=1
20180502 183842 probe ended
20180502 183847 start /sbin/modprobe -q -- net-pf-0 safemode=1
20180502 183847 probe ended
20180502 183847 start /sbin/modprobe -q -- net-pf-0 safemode=1
20180502 183847 probe ended
20180502 183847 start /sbin/modprobe -q -- net-pf-0 safemode=1
20180502 183847 probe ended
20180502 183847 start /sbin/modprobe -q -- net-pf-0 safemode=1
20180502 183847 probe ended

What’s more, these log files are not subject to logrotate by default, so I have them going back years, for as long as the VM has existed. While this is somewhat concerning, it does allow me to work out what happened.

The ksymoops logfiles are relatively sparse (a few lines every month or so) up until the date last year that I dist-upgraded the server from Debian 8 to 9. Then the dam broke and it has been throwing a batch of four errors like the above every five seconds since, day and night. Something somewhere is calling modprobe in a very tight loop, and it appears to be related to a package upgrade.

I managed to track down the offending process by replacing /sbin/modprobe with a script that called `sleep 100000` and running `ps axf`. It turns out that it had been called directly from one of the kworkers. If all four kworkers (2xHT) are doing the same thing, that would explain the pattern of errors.

Remember that this is in Linode – their VMs run under Xen paravirtualisation with non-modular kernels. Which means that modprobe, lsmod etc. have no effect – they crash out with the following error if you try to use them:

xen:~# lsmod
Module Size Used by Not tainted
lsmod: QM_MODULES: Function not implemented

This is only to be expected, as the VM’s modutils won’t talk to the Xen para-v kernel. But why is a para-v kernel worker triggering modprobe, when failure is guaranteed?

Luckily, I don’t have to care.

ksymoops is non-optional, but it can only write its logfiles if `/var/log/ksymoops` exists. The solution is simple.

rm -rf /var/log/ksymoops

Peace and quiet.

A universal layout for grid-symmetrical keyboards

In a previous post, I mentioned the Model 01. One thing that slightly worried me about it was the keyboard layout of the prototype – it was yet another symmetric-grid arrangement similar to, but distinct from, each of the other Kinesis/Maltron layouts already out there. We are promised of course a fully programmable model, which means I can fix it up just the way I like, but it would be really nice if modern keyboards would pick a standard and stick to it – not least that maybe the keyboard legends would be useful. 😉

So it was with great interest that I saw they are requesting feedback before finalizing the key layout. This post is my contribution.

Keymaps, arrangements and layouts

In the following I use “keymap” for an OS-level mapping of “physical” scancodes to “logical” code points, and “arrangement” for the physical location of the keys that generate particular scancodes. When taken together, these form a “layout”, which is normally represented by the labels on the physical keys. One can usually change the keymap in the OS (e.g. via a menu in the taskbar) so that the layout no longer matches the key labels. On some keyboards (such as the Kinesis Advantage and, one can effectively change the physical arrangement in firmware so that the same OS keymap produces a different layout.

Alternative keyboard arrangements

Most keyboards manufactured today follow the pc104 (USA) or pc105 (everywhere else) paradigm, which derives from the original Scholes typewriter, and was set out in the 1980s with the IBM XT. Many keyboards since then have either added extra keys (e.g. multimedia controls) or left out a few (e.g. laptops) but the basic arrangement remains the same. This is notably asymmetrical, with the keys located along staggered diagonals and the right hand (for touch typists) having significantly more keys to cover than the left.

By contrast, in the Maltron keyboard and its various grid-symmetrical derivatives, this basic arrangement has been changed so that the left and right hands are equal and the staggered diagonals have been made vertical to match the natural movement of the fingers. This has required a number of changes to the pc105 layout to rebalance the key arrangement. In most such grid-symmetrical keyboards, a selection of keys (usually modifiers) have been moved under the thumbs and the total number of keys under the fingers reduced to (typically) two 6-by-4 grids, sometimes with extra keys under the “z” row and/or between the hands. There are exactly 48 keys on a pc105 keyboard that produce printable, non-whitespace characters, so they would fit perfectly into 2x6x4 if all the whitespace and modifier keys could be relocated elsewhere, e.g. under the thumbs. However the shift keys in particular have often been kept under the little fingers in grid keyboards for user familiarity, requiring further compromises elsewhere. Unfortunately these are almost always made with only us_ascii in mind.

When keymaps go bad

The default Kinesis Advantage arrangement (for example) is tolerable under a pc105 us_ascii keymap, but is nasty under alternative keymaps, as it differs too much from the traditional 105-key arrangement and so breaks too many assumptions that most keymaps are designed under (e.g. that “[” is to the immediate right of “p”). It leaves the shift, tab and caps-lock keys inside the core 2x6x4 grid and moves four printables to a row below “z” along with the cursor keys. It also rearranges some of the remaining symbol keys to slightly non-standard positions. This is where the trouble starts.

Kinesis Advantage default arrangement + us_ascii keymap:

   = 1 2 3 4 5    6 7 8 9 0 -
 tab q w e r t    y u i o p \
 cap a s d f g    h j k l ; '
 lsh z x c v b    n m , . / rsh
     ` § <-->      ^--v [ ]

(NB the key I’ve labelled “§” is the “international key” that is normally located to the left of “z” outside the USA and produces a variety of symbols depending on your exact keymap)

It’s only if you speak English that these innocent rearrangements are mere “symbol keys”. In most other language keymaps, one or more of these scancodes maps to an accented letter. And if the key you were expecting to be at the right of “p” disappears to somewhere below “.” your touch typing is screwed.

Consider for example what happens if we ask the OS to apply the us_dvorak keymap instead of us_ascii:

   ] 1 2 3 4 5    6 7 8 9 0 [
 tab ' , . p y    f g c r l \
 cap a o e u i    d h t n s -
 lsh ; q j k x    b m w v z rsh
     ` § <-->      ^--v / =

A Dvorak touch typist (particularly a programmer!) expects “/” to be the key to the right of “l”. Kinesis try to get around this by recommending their users not to use their OS-supplied Dvorak keymap, but instead use a hotkey to change the keyboard’s firmware arrangement to another custom one that they provide. This however has just as many oddities as their QWERTY arrangement:

   = 1 2 3 4 5    6 7 8 9 0 -
 tab ' , . p y    f g c r l /
 cap a o e u i    d h t n s \
 lsh ; q j k x    b m w v z rsh
     ` § <-->      ^--v [ ]

Sorry, but the key to the right of “s” in Dvorak should be “-“. For touch-typing, this is even worse than losing “/”!

Dvorak is bad enough, but in other language keymaps the scancodes just to the right of “0” and “p” are even more vital. In a Scandinavian keymap, the key to the right of “p” should be “Å”. In Italian, this should be “è”, and in German it should be “Ü”. And now look at how many accented letters the Hungarian layout requires.

Most other symmetric-grid arrangements make similar errors. Here’s Maltron with us_ascii:

   1 2 3 4 5 6    7 8 9 0 [ ]
   ` q w e r t    y u i o p \
   § a s d f g    h j k l ; '
     z x c v b    n m , . / 
           -        =

Again, the key to the right of “p” has gone walkies, as have the keys to the right of “0”, which have disappeared into the extra row. On the bright side, the keys to the left of “q” and “a” are used relatively sensibly.

One commonality between all these imperfect arrangements seems to be a desire to keep the square brackets “[]” together on the keyboard. But for touch-typists, particularly those who speak a language other than English, it is much more important that “[” is to the right of “p” and above ” ‘ “. But all is not lost!

A universal keyboard arrangement

The following 2x6x4 arrangement minimizes the pain across a wide selection of standard European language keymaps:

    1 2 3 4 5 6   7 8 9 0 - = 
    ` q w e r t   y u i o p [
    \ a s d f g   h j k l ; '
    § z x c v b   n m , . / ]

This only relocates three keys when compared with pc105 — “`”, “]”and “\”. The first is only moved by one position and the second by two, but the third unfortunately has to move the whole way to the opposite side of the keyboard — we just don’t have enough spare keys under the right hand to do otherwise. Considering though that this is the key most inaccessible to touch-typists on a pc105 keyboard, its new position could be considered an improvement (and for UK keymap users, its placement beside the “international key” is entirely sensible!). Other than these three changes, no surprises lie in store for touch typists — and the only ones likely to find a letter (rather than a symbol) at the wrong side of the keyboard entirely are the Hungarians.


Note that the positions of “-” and “=” have been preserved by left-shifting the number row (as per Maltron). This is not as far-fetched as it seems — old-school touch typists were taught to hit “5” and “6” with their left index finger etc., because the number row on pc105 is shifted almost a whole key width to the left of the home row due to column stagger. The placement of the numbers is also particularly suited to keyboards that have the function keys in an embedded layer, as F1 can be trivially mapped onto 1, F2 onto 2 etc. without running out of keys for F11 and F12.

Also note that “[]” are not totally disassociated — they remain close together and symmetrically arranged around the home row little finger position. The slight inconvenience for us_ascii users here should be weighed against the vast increase in usability for non-English keymap users.

(BTW it is relatively easy to apply this arrangement to existing programmable keyboards such as the Kinesis Advantage, although in that particular case it is easier to keep the number row in the position that matches the labels and live with “=” being in an odd position).

A plea to

You don’t have to use the exact arrangement I suggest above, but if you don’t then please take into account that not everyone uses us_ascii, and being able to change keymaps in the OS and touch type is a desirable feature for many people, particularly those who work in more than one language.

And maybe if we do find a key arrangement that works acceptably for everyone, it could become a de facto standard for grid-symmetrical keyboards — and everyone will ask for a “ layout” in the future…!

Indistinguishability Obfuscation, or how I learned to stop worrying and love DRM

There’s a lot of hype running around about this:

Lots of excitable talk of “perfect security” and other stuff. One of the possible applications is supposedly quantum-resistant public-key crypto. But if you read into it, it’s actually a way of making code resistant to decompilation. So instead of creating a secret number that’s processed by a well-known algorithm, you create a secret algorithm that you can ask other people to run without fear of it being reverse engineered. So the “public-key” crypto is really shared-secret crypto with the secret sealed inside an obfuscated algorithm.

In other words, it’s bulletproof DRM. deCSS is even referenced (obliquely) in one of the articles as a use case.

Of course, this makes it in principle impossible to test code for malicious behaviour. You could insert a latent trojan into it and never be discovered, and it removes one of the most important security features of security software – auditability of the algorithm. For example, someone could write a rot13 algorithm and call it “encryption” and the only way to (dis)prove it would be to run a statistical analysis on the ciphertext.

So the question becomes – why would anyone allow IO programs to run on their systems? Virus scanners would be useless in principle. Performance, even in the most optimistic case, would be dreadful. And it doesn’t do anything for the end user that can’t be achieved by traditional crypto (barring the development of a quantum factoriser, and even that is not yet certain). No, the only people who gain are the ones who want to prevent the next deCSS.

warning: connect to Milter service unix:/var/run/opendkim/opendkim.sock: No such file or directory

I currently run a postfix mailserver and have souped it up to use all the latest security features (see Hamzah Khan’s blog for a good tutorial). One thing that had been bothering me though was the appearance of the above milter connection failures in the logs – even though these seemed to fail gracefully it was a worrying sign that something was Just Not Right.

After a lot of trial and error, it seems that the culprit is my postfix chroot jail. I had originally attempted to compensate for this by defining “Socket /var/spool/postfix/var/run/opendkim/opendkim.sock” in /etc/opendkim.conf, but even so, postfix was throwing errors (and no, putting the socket in the standard location doesn’t work – I tried that!). It turns out that postfix sometimes attempts to connect to the socket from inside the jail, and sometimes from outside. The solution is to create a soft link in the standard location pointing to the real socket inside the jail.

Of course I could have reconfigured it to bind to a localhost port instead, but the soft link was less work.

The Model 01 vs the Kinesis Advantage

Several years ago I took the plunge and bought a Kinesis Advantage (after even more years of lusting) and it’s still my favourite keyboard (so far) and the one that I have used exclusively in work ever since (every new work colleague at some point says “how do you type on THAT…?” in a tone of voice somewhere between suspicion and awe). Even so, I have found over time that it does have its issues. The tiny escape key is probably the most annoying, as well as the general rubbery crapness of the entire function key row. I find it very easy to accidentally trigger the embedded keypad, which is not as immediately noticeable as you might think. The number keys are also surprisingly difficult to type on without changing hand position.

I also found that I needed to do a LOT of remapping to get keys in a fully ergonomic position (the shift keys are by default under the pinkies rather than the thumbs, for example). And if I remove it from power for too long it forgets and I need to do the whole dance again (I have the steps stuck to the underneath in case I forget). It would be nice if this was scriptable, or unnecessary.

I did once disassemble an IBM model M with the intent of hotwiring it into a more sensible arrangement, but was put off by the difficulty of performing surgery on the underlying membrane circuitry. I think the bits are still in a box under the sofa on my mum’s landing…

The Model 01 is almost exactly the keyboard I envisioned at the time but didn’t have the patience to see through. It’s good to see someone finally build what I was dreaming of all those years ago (and after seeing how much work went into it, I’m sort of relieved it wasn’t me!). They are currently making tons of money on kickstarter, so it looks like full steam ahead. They plan to have the first shipments next year, and I plan to have one in my home office soon thereafter (the Microsoft Natural currently in there just can’t compete with the Kinesis).

In other cool keyboard news, it looks like are nearly ready to start shipping their dinky TextBlade Bluetooth keyboard. I may have one of those on preorder too…

Openvpn “WARNING: Bad encapsulated packet length from peer”


I run a VPN from my Linode VM for various reasons, the most important of which is so that I and other family members can submit email over SMTP without having to worry about braindead networks that block outgoing port 587 for makey-uppey “security” reasons. Since my brother and I both have jobs that entail connecting to random corporate wireless networks, this is critical.

The problem was that I was running openvpn over the standard port 1194, which is also blocked by many networks – including my own employer’s. Openvpn uses a mock-HTTP protocol that will work over HTTP proxies, so I configured squid on the server’s port 8080 to forward packets to localhost:1194 and told the laptop openvpn client to use myserver:8080 as a proxy.

This worked well for my employer’s network, but did not agree with the guest wireless network of one of my clients, which had absolutely no problem with port 1194, but uses its own transparent proxy that doesn’t play nice with daisychained proxies. I kept having to comment and uncomment the proxy directive in my laptop’s openvpn.conf and restart, depending on location.

So I decided to do it the proper way, by connecting directly to openvpn on port 8080. My employer’s network would allow this through directly, and the client’s network should route through its transparent proxy without complaint. I don’t want to turn off port 1194 though, as this would rudely nobble all my brother’s devices, so I configured the server’s iptables to masquerade 8080->1194. I could then remove the proxy config from the laptop, change its connecting port to 8080 and restart the vpn client.

Problem solved! Except then I started getting the following error in my server logs:

Apr 28 13:02:43 xxx ovpn-server[13110]: WARNING: Bad encapsulated packet length from peer (17231), which must be > 0 and <= 1560 -- please ensure that --tun-mtu or --link-mtu is equal on both peers -- this condition could also indicate a possible active attack on the TCP link -- [Attempting restart...]

It turned out this was being generated by another client which had also been configured to use the proxy, but which had slipped my mind. The error stems from the client connecting to an openvpn port directly but sending requests formatted for a web proxy. Not sure why it shows up as an MTU error, but changing the other client config to match the laptop solved it.