úlfurinn » ramblings

Substring matching in Erlang

I can’t remember if I’ve ever seen it in the official documentation and I can’t really be bothered to search for it, so I’ll just put it here regardless.

There’s this article on the internets that’s trying to expose Erlang’s weak points (sorry, not giving the link, it doesn’t really offer anything original and most of the points aren’t even valid, actually), and it’s mentioned there that string processing is one of the things that Erlang doesn’t do very well at all. Now, you have to keep in mind that strings are actually linked lists in Erlang, so you have to write your algorithms accordingly (i.e., the only O(1) operations you have are “take first element” and “add first element”; concatenation and array-like access are O(n), so if your code fetches individual characters a lot, it’ll be really sluggish), that is, mostly rely on recursive descent. So I was sort of meditating on this and then I got a small idea.

When you match a list, it is possible to specify an arbitrary (but fixed) number of elements at the front: [a, b, c | Rest] = [a, b, c, d, e]. So I thought, perhaps there is a similar laconic way to match a substring, et voilà: "abc" ++ Rest = "abcde". I imagine it could be used in Yaws servlets that work with deep URLs, something like Rails’ /controller/action/id.

There are limitations though. For instance, for some reason I can’t quite figure out at this hour, you cannot use the ”++” syntax for matching lists: [a, b, c] ++ Rest = [a, b, c, d, e]. * 1: illegal pattern although there shouldn’t be any apparent difference in internal representation.
Also, the head string must be a literal; you cannot pre-bind it to a variable. (Otherwise it could be rather handy for building regular automata or something trie-like, I imagine.) But at least it’s there and it could be useful in some cases.

Emergency epmd recovery

Suppose you’re running an Erlang application (ejabberd, for instance). It’s been up for months and months but then you try to use a remote control script (like ejabberdctl) and it fails, probably saying things like “nodedown”, indicating it cannot communicate with the Erlang node; yet the application itself is apparently running just fine. Running out of ideas, you run epmd -names and, to your horror, it shows an empty list.

Every Erlang node uses a high port for communication, and epmd’s task is to map symbolic names like ejabberd to port numbers. (A bit like what DNS does for IP addresses, yes.) Without this information nodes cannot find each other. It’s quite interesting how epmd could just lose it, but right now it’s more important to find out how you can restore it without restarting your application. (Oh, the beautiful uptime!)

There are bits of information on the net that say epmd will gladly register any unused name with any port with an Erlang node attached when it’s told to do so, we just need a way to send the right packet. Turns out there’s the erl_epmd:register_node/2 function that does exactly what we need. Use netstat to find the port your orphan node is listening to, then start an anonymous Erlang node:

$ erl It must be anonymous because you can only register once, and we’ll need to do it by hand instead. > erl_epmd:start(). Since the node is anonymous, we must start the gen_server ourselves. > erl_node:register_node(ejabberd, 23456). …assuming these are the node name and the port you need. Voila! Check the epmd -names and rejo…

But not quite yet. The thing about epmd is, although it registers whatever it’s told without asking any questions, it will also keep track of the connection that issued the registration request. The moment you disconnect your anonymous node, the restored registration will be gone again. We need to find a way to sneak into the application node and take control from there.

Start one more node, this time with a name: $ erl -sname repair Ping the application node to establish the connection: repair@hostname> net_adm:ping(ejabberd@hostname). The node will ask epmd “who’s ejabberd?” and receive the port we gave it a moment ago, so all is fine. Now, the thing about Erlang connections is that once they’re established by whatever means, including net_adm:ping/1, you don’t need epmd to talk to that node anymore. So now you can turn to the first anonymous node and kill it: > halt().

This will, among other things, break the connection to epmd and make the name ejabberd available for re-registration. Note that the connection we made from the second node is still very much alive, about time we used it:

repair@hostname> rpc:call(ejabberd@hostname, erl_epmd, register_node, [ejabberd, 23456]).

Note that this is essentially the same call to erl_epmd:register_node/2 as before, but this time we do it on behalf of the remote node using rpc:call. This means that the node name is now associated with its rightful owner once again! You can stop the second node now:

repair@hostname> halt().

Run epmd -names once more to make sure, and go on with your business.