MV

Friday, July 17, 2009

Clustering ejabberd nodes using mnesia databases

A while ago, I was asked if I could set up a cluster of 2 ejabberd nodes (a Jabber/XMPP instant messaging server), using the default internal mnesia database. I found this a nice challenge, which shouldn't take too much time... NOT!

I use 2 servers, both running OpenSolaris on which I have installed a very nice framework, namely PyMonkey (see www.pymonkey.org and www.qshell.org), used for building compute and storage clouds amongst others. To work on this framework I need to be root in "/opt", in which the framework is installed. I installed the ejabberd (v. 2.0.5) and erlang (v. R12B-5) packages in this framework (Solaris only, as we speak). It's not that I'm working in a specific framework that this information is not valid for the default packages on Linux, Windows, or (Open)Solaris. Differences might be the location of the mnesia database.

So I started my journey in googling for information on the net about installing ejabberd nodes. Quickly I got to the official ejabberd guide, which is clear and a good starting point (1). I soon found out that setting up one node is pretty straight-forward, setting up nodes in a cluster however is another piece of cake. Looking further in the official guide I even got the documentation of clustering right in front of me (2). However, what is described in that chapter is far from accurate to get a cluster of ejabberd nodes.

Some more googling lead me to various blogs (3) and internet fora (4)(5), some of them more helpful than the other, but they all helped me a little further and I finally got my nodes into clustering, thanks to my colleague, who gave me the last hint.
The big issue was the location of the cookie (.erlang.cookie) on the new ejabberd node. The cookie is required to identify the different nodes in the cluster. This cookie must be identical on all nodes, otherwise the nodes won't be able to communicate with each other. Some mailings (6) pointed also to ".hosts.erlang" as cause for my issue, but in my test case I didn't need that file.

To save you a lot of time I decided to share my procedure in my blog.

Here is my setup:
* OpenSolaris ejabberd1
* OpenSolaris ejabberd2
* Both running the PyMonkey framework (/opt/qbase3). Download the PyMonkey sandbox on http://confluence.qlayer.com/display/SOFT/Download
* I can ping from one machine to the other by using the host name. Possibly you have to change this in /etc/hosts
On both machines:
* the ejabberd package is installed in /opt/qbase3/apps/ejabberd-2.0.5, further noted as $EJABBERD
* the erlang/otp package is installed in /opt/qbase3/apps/erlang-R12B-5, further noted as $ERLANG
* I am logged on as root

Set up First Node (on ejabberd1)
1. Create backups of some critical files (just in case...):
* ejabberd.cfg in $EJABBERD/etc/ejabberd
* ejabberdctl in $EJABBERD/sbin/

2. Modify ejabberd.cfg
In the section SERVED HOSTNAMES I have a line "{hosts, ["localhost"]}.". (!) I'm careful not to remove any periods at the end of lines.
I change localhost by a domain (e.g. myexample.com), served by my first ejabberd node, as clearly stated in the example in the above lines of the file.

In the section ACCESS CONTROL LISTS I have 2 commented examples of users who will have admin rights on the ejabberd node, for example "%%{acl, admin, {user, "aleksey", "localhost"}}."
I remove the %-signs (which indicate comments) and then change "aleksey" and "localhost" by my username (e.g. tdewolf) and my domain (the one defined as served hostname).
Remark:
If I want to create multiple admin users, I have to create a line for each admin user.

I will still need to register the users to the ejabberd node however once the node is up and running, see later.

3. Modify ejabberdctl
At approximately line 12, I find the line "HOST=localhost", where I can set the host name of my ejabberd node. This is the name of the host of the ejabberd node name. Use the server's hostname, ejabberd1.
Remind that this host is not the same as host in the ejabberd.cfg file!
At the last line of that section ("EJABBERD_DB=$ROOTDIR/var/lib/ejabberd/db/$NODE"), approximately line 18, I change $NODE to $ERLANG_NODE, to obtain a clearer name for the database.

4. Since I'm working in a sandbox, I also need to set my environment variable LD_LIBRARY_PATH to point to /opt/qbase3/lib.
Remark: I could omit this step if I would use the default ejabberd and erlang packages.

5. Ok, now I'm ready to launch my first ejabberd node.
From $EJABBERD, I enter the following command: ./sbin/ejabberdctl start
My node should be started, so I check the status by issuing: ./sbin/ejabberdctl status
It returns:
Node ejabberd@ejabberd1 is started. Status: started
ejabberd is running

6. I will now register the admin users that I have set in step 2, issuing the command: ./sbin/ejabberdctl register user host password
where:
* user: the user name as defined in ejabberd.cfg
* host: the host name as defined in ejabberd.cfg
* password: a randomly chosen password

With these credentials you are able to open ejabberd's WebAdmin at http://ip node:5280/admin/. Use the full Jabber ID to log on, for example tdewolf@myexample.com.
Visit the official guide for more information about the WebAdmin (7). I leave the WebAdmin session open to easily see a new ejabberd node appear, therefore I go to the Nodes page.

I have an ejabberd node up and running and I can even access the WebAdmin of it, w00t. That was easy, wasn't it?

When the ejabberd node has started the following files and directories are created:
* $EJABBERD/var/lib/ejabberd/.erlang.cookie
* $EJABBERD/var/lib/ejabberd/db/ejabberd@myexample.com (host is defined in ejabberdctl)


Set up Second Node (on ejabberd2)

1. Copy the necessary files from your first node to my second node.
Copy from the first node $EJABBERD/var/lib/ejabberd/.erlang.cookie to /root on this new node and to $EJABBERD/var/lib/ejabberd. The default working directory of erlang is $HOME, and since I need to be root to be able to work in my sandbox, /root is my $HOME.

Copy from the first node $EJABBERD/etc/ejabberd/ejabberd.cfg to the same location on this new node. This means that this new ejabberd node will serve the same domain and that it will have the same admin users.

2. Modify ejabberdctl
At approximately line 12, I find the line "HOST=localhost", where I can set the host name of my ejabberd node. This is the name of the host of the ejabberd node name. Use the server's hostname, ejabberd2.
Remind that this host is not the same as host in the ejabberd.cfg file!
At the last line of that section ("EJABBERD_DB=$ROOTDIR/var/lib/ejabberd/db/$NODE"), approximately line 18, I change $NODE to $ERLANG_NODE, to obtain a clearer name for the database.

3. Since I'm working in a sandbox, I also need to set my environment variable LD_LIBRARY_PATH to point to /opt/qbase3/lib.
Remark: I could omit this step if I would use the default ejabberd and erlang packages.

4. Now I'm ready to set my new ejabberd node in cluster with the first node. From $ERLANG/bin I enter the following very long command.
(!) Take care about ALL quotes and omit the backslashes, they just indicate to continue the command:

./erl -sname ejabberd@ejabberd2 \
-mnesia dir '"/opt/qbase3/apps/ejabberd-2.0.5/var/lib/ejabberd/db/"' \
-mnesia extra_db_nodes "['ejabberd@ejabberd1']" \
-s mnesia

where:
* -sname ejabberd@ejabberd2 is the new ejabberd node
* -mnesia dir '"..."' is the location of the database on the new node, by default in var/lib/ejabberd/db/. The outer quotes is a single quote; the inner pair is a double quote!
* -mnesia extra_db_nodes "['...']" is the node to make the cluster with, mind again the single and double quotes!

5. I arrive in an erlang shell when I have executed the above command: (ejabberd@ejabberd2)1>
To see if my two nodes are in a cluster I enter the command "mnesia:info()." then scroll up to find a line "running db nodes" which must be a list with the two nodes.
Double-check: I see the second ejabberd node appear under Running Nodes in the WebAdmin of my first ejabberd node, w00t again!
Once again, mind the period at the end of the command!

***Tip:
If you find only one ejabberd node in running db nodes and the other one in stopped db nodes, something went wrong.
Most likely your .erlang.cookie is located in the wrong location. In your erlang session, execute "erlang:get_cookie()." and note the result between the quotes, e.g. IXHDCSATUADBDTKVTOFC. Then lookup .erlang.cookie with that result and overwrite it with the version from ejabberd node 1.
***

6. Synchronize databases
I need to synchronize the database of my new node with the database of my first node. At the Erlang Shell, I enter the following command:
mnesia:change_table_copy_type(schema, node(), disc_copies).
Quit the Erlang Shell with the command "q()." The second ejabberd node is now listed in Stopped Nodes! Check in a refreshed Nodes page in the first node's WebAdmin.

7. Now that I know that my two nodes can work in a cluster, I need to change $EJABBERD/sbin/ejabberdctl to run with the same parameters as I did with the command in step 4.
In the section "# start server" insert a new line after the line "-mnesia dir "\"$EJABBERD_DB\"" \". Add the following content to this new line (without the start and end quote): "-mnesia extra_db_nodes "['ejabberd@ejabberd1']" -s mnesia \"
which are the last two parameters of the .erl command.

***Tip:
If you would use the node in "live" mode, do the same in the section "# start interactive server"
***

8. It's time now to start my second node in the normal way. From $EJABBERD I enter the command:
./sbin/ejabberdctl start

In WebAdmin of the first node, I see my second node appear in Running Nodes, which means that both ejabberd nodes are running and active in one cluster.
To get the information written to both databases, I change the storage type of a database (Nodes > select node > Database) and click Submit at the bottom of the page to apply my changes.

For more information about the parameters and functions I can advise you to visit the official ejabberd guide (1) and the official erlang site (8).

A good tutorial for a Linux setup can be found on sysmonblog (9).

(1) http://www.process-one.net/en/ejabberd/guide_en
(2) http://www.process-one.net/en/ejabberd/guide_en#htoc79
(3) http://dev.esl.eu/blog/2008/09/30/set-up-clustering-in-ejabberd/
(4) http://www.nabble.com/problems-in-clustering-ejabberd-via-tutorial-on-official-site-td22475081.html
(5) http://www.trapexit.org/forum/viewtopic.php?p=43930
(6) http://lists.jabber.ru/pipermail/ejabberd/2005-March/000883.html
(7) http://www.process-one.net/en/ejabberd/guide_en#webadmin
(8) http://erlang.org/
(9) http://sysmonblog.co.uk/2008/06/ot-installing-ejabberd-on-debian-ubuntu.html

7 comments:

  1. This Post is awesome !
    Can you do this for me on my server ?
    How much will it cost ?
    How we can talk by IM or email.
    Thanks
    Fran

    ReplyDelete
  2. Hi Anonymous,
    you can contact me via tdewolf@gmail.com. My IM is tdwmons@hotmail.com, but please refer to this post or your IM invite will be ignored.

    ReplyDelete
  3. Hi ,
    This log is the life saver, thanks a lot . I have a question for you, instead of modifying ejabberdctl is there are other way to set startup synchronization parameters? I see in the
    /etc/ejabberd/ejabberdctl.cfg ERL_OPTIONS option which can be use to alter startup of the ejabberd server. When I set something like ERL_OPTIONS=" -mnesia extra_db_nodes "['ejabberd@jabberd1']" -s mnesia " this will crash ejabberd server.

    Any thoughts on this ?

    thanks a lot in advance.

    ReplyDelete
    Replies
    1. Really late, but to help the following ones...
      You want to escape the inner quotes, so :

      ERL_OPTIONS=" -mnesia extra_db_nodes \"['ejabberd@jabberd1']\" -s mnesia"

      Delete
  4. Hey thanks a lot :) !

    I couldn't get it right until I finally followed your instructions :)

    It worked like a charm on Debian squeeze, BTW :)

    --
    Felix

    ReplyDelete
    Replies
    1. Actually,

      I did run into a problem, which I'm describing in this post on the official mailing list: http://lists.jabber.ru/pipermail/ejabberd/2012-May/007500.html

      And here is the solution I found, to get a cluster that works completely properly: http://lists.jabber.ru/pipermail/ejabberd/2012-May/007516.html

      Here is the relevant part of the thread, for anyone who's following the above guide:

      There is something that this guide does not mention, which seems pretty
      evident in retrospect, but since it was not explicitly mentioned, I didn't
      think of doing it.

      The guide instructed me to modify the ejabberdctl script on node #2 so that
      it has the following extra parameter:

      -mnesia extra_db_nodes "['ejabberd at ejabberd1']"

      After doing this, I saw node #2 coming up in the web admin interface, so I
      thought everything was ok, but it was not ok, because I also needed to add
      this parameter to node #1's startup script (while mentioning that the
      extra_db_node is node #2, and not itself, obviously).

      Doing this made the clustering actually happen for real. After this, the
      status command outputted correct values on both nodes, and I could bring
      down any one node, and the other would keep serving chat messages. (The
      connected clients would be disconnected and reconnect automatically to the
      remaining alive node, but that is all right.)

      Now, having different configurations embedded in the ejabberdctl script of
      each node is not really convenient, so I tried having a uniform script
      instead, where all nodes mention all nodes (including themselves) in the
      extra_db_nodes parameter, and that seems to work correctly, so I'm planning
      to leave it this way.

      Anyway, thanks a lot for posting this guide. I'm not sure I would have succeeded without it. In any case, it certainly would have taken me a lot more time!

      --
      Felix

      Delete
    2. where exactly and how exactly you put
      -mnesia extra_db_nodes "['ejabberd at ejabberd1']"

      just at the end of file? exactly like that with dash in front?

      Delete