2008/07/24

WMU-6500FS - Deluge torrent (part I)



Info sources: [Home page]

Overview

The deluge torrent client is first real X application (and the first application implemented in c++) I am trying to build on the WMU box. Thus it is by far the hardest puzzle I had to solve so far. I learned a lot along the way and so I would like to share it step by step with you.
Whole build process is so complicated for me, that I have not finished it yet (Update: it is finished now 8-) ) and so I am going to organize this post as series of three (hopefully there will be no need for more ;) parts. Also the deluge version was changing during my mission so the series will reflect these changes. It is not because I want to force readers (possibly trying to build the deluge as well) to go the exact same path, the point is to show all the problems (along with solutions or workarounds) I have experienced, since they can emerge in another situations as well.

This part contains the initial build process and deployment. Next part will deal with the local and remote debugging and (hopefully) the last one will describe the solution of the problems found during the debugging sessions. I am also going to prepare a post with build sequence of libraries the deluge directly or indirectly depends on.

Compilation:

I have started with the 0.5.9.1 version (the newest one at the time).
dev# wget http://download.deluge-torrent.org/source/0.5.9.1/deluge-0.5.9.1.tar.gz
dev# tar xzvf deluge-0.5.9.1.tar.gz
dev# export CFLAGS=-I/mnt/C/sys/include/boost-1_35
dev# python setup.py build
...
Error 1
libtorrent/src/session_impl.cpp:829:   instantiated from here
libtorrent/include/libtorrent/socket.hpp:181: error: `IPV6_V6ONLY' undeclared
It seems the uClibc (at least 0.9.27 version) lacks the IPV6_V6ONLY definition. See for example this message for more info about this (missing) define.
As a quick and dirty fix I just define the identifier locally:
dev# nano libtorrent/include/libtorrent/socket.hpp
... and add following (the actual value got from debian /usr/include/bits/in.h file) just after the include section:
#define IPV6_V6ONLY             26

Static linkage:

Now we can try the build again:
dev# python setup.py build
...
/usr/bin/ld: cannot find -lboost_filesystem-mt
Static linker is unable to find the boost_filesystem-mt library. At first we have to make sure that linker flags LDFLAGS are pointing to proper directory (sys/lib):
dev# echo $LDFLAGS

dev# export LDFLAGS=-L/mnt/C/sys/lib
... but the problem still persists. The problem is that the boost library names are more specific (contain boost version and compiler version). In README Additional notes is the following:
1) On some distributions, boost libraries are renamed to have "-mt" at the end (boost_thread_mt instead of boost_thread, for example), the "mt" indicating "multithreaded". In some cases it appears the distros lack symlinks to connect things. The solution is to either add symlinks from the short names to those with "-mt", or to alter setup.py to look for the "-mt" versions.
I do not know whether there should be libboost_*-mt.so symlinks by default, maybe I have not configured boost library properly, as a quick fix I decided to create the links manually:
dev# cd /mnt/C/sys/lib
dev# for i in `ls ./libboost_*-gcc33-mt-1_35.so.1.35.0`; do ln -s $i ${i/%-gcc33-mt-1_35.so.1.35.0/-mt.so}; done
Now we have another link error:
dev# python setup.py build
...
can't resolve symbol 'get_nprocs'
It seems that uClibc does not implement get_nprocs function. Following text from this forum describes the workaround:
I grep'ed the boost source tree for get_nprocs and found the only instance of it in thread.cpp. I found out that this function returns the number of processors and is really not needed in my case. So instead of having "return get_nprocs()", I changed it to "return 1". While I think this should not break anything, I thought I should still tell you.
So in file boost_1_35_0/libs/thread/src/pthread/thread.cpp I have replaced
    return get_nprocs();
with:
    return 1;
in short:
dev# cd /usr/local/src/boost_1_35_0
dev# sed -i 's/return get_nprocs();/return 1;/g' libs/thread/src/pthread/thread.cpp
... and rebuilt the boost library.
dev# make
...
...updating 3 targets...
...
...updated 3 targets...
dev# make install
...
...updating 6 targets...
common.copy /mnt/C/sys/lib/libboost_thread-gcc33-mt-1_35.so.1.35.0
ln-UNIX /mnt/C/sys/lib/libboost_thread-gcc33-mt-1_35.so
common.copy /mnt/C/sys/lib/libboost_wave-gcc33-mt-1_35.so.1.35.0
ln-UNIX /mnt/C/sys/lib/libboost_wave-gcc33-mt-1_35.so
common.hard-link /mnt/C/sys/lib/libboost_thread-gcc33-mt.so
common.hard-link /mnt/C/sys/lib/libboost_wave-gcc33-mt.so
...updated 6 targets...
... now the build finished successfully, we are ready to install:
dev# python setup.py install --prefix=/mnt/C/sys/X11
In order to correctly deploy on the box, one additional hack was needed (touch everything containing deluge in path so it is included in the filopack archive):
dev# find /mnt/C/sys -path "*deluge*" -type f -exec touch {} \;
We have the first step done. Do not know yet what is ahead of us ;-)

Dynamic linkage:

Now we can try to run the application:
dev# deluge
...
/usr/bin/python: can't resolve symbol '_ZN5boost6system19get_system_categoryEv'
It seems there is some symbol expected by deluge app but missing in the boost libraries.
With the nm command we find out whether the symbol is in any boost library:
dev# nm -A -D /mnt/C/sys/lib/libboost_* | grep "T _ZN5boost6system19get_system_categoryEv"
/mnt/C/sys/lib/libboost_system-gcc33-mt-1_35.so:00001d28 T _ZN5boost6system19get_system_categoryEv
/mnt/C/sys/lib/libboost_system-gcc33-mt-1_35.so.1.35.0:00001d28 T _ZN5boost6system19get_system_categoryEv
/mnt/C/sys/lib/libboost_system-gcc33-mt.so:00001d28 T _ZN5boost6system19get_system_categoryEv
/mnt/C/sys/lib/libboost_system-mt.so:00001d28 T _ZN5boost6system19get_system_categoryEv
Make sure that the symbol is needed from the deluge_core library:
dev# nm /mnt/C/sys/lib/python2.5/site-packages/deluge/deluge_core.so | grep _ZN5boost6system19get_system_categoryEv
         U _ZN5boost6system19get_system_categoryEv
We can test whether all the libraries the deluge_core depends on are properly found by the dynamic linker:
dev# ldd /mnt/C/sys/lib/python2.5/site-packages/deluge/deluge_core.so
        libboost_filesystem-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0 (0x00000000)
        libboost_date_time-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_date_time-gcc33-mt-1_35.so.1.35.0 (0x00000000)
        libboost_thread-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_thread-gcc33-mt-1_35.so.1.35.0 (0x00000000)
        ...
        libboost_system-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 (0x00000000)
        ...
I am not sure what exactly is going on here, I would be glad if anybody shed some light on it, as a workaround the LD_PRELOAD works for me:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge
Traceback (most recent call last):
  File "/usr/bin/deluge", line 46, in 
    import deluge._dbus as dbus
  File "/mnt/C/sys/lib/python2.5/site-packages/deluge/_dbus.py", line 32, in 
    from dbus import Interface, SessionBus, version
ImportError: No module named dbus
We have to install dbus before we proceed (see this post)...
Now we can try it again:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge
process 16385: D-Bus library appears to be incorrectly set up; failed to read machine uuid: Failed to open 

"/mnt/C/sys/X11//var/lib/dbus/machine-id": No such file or directory
See the manual page for dbus-uuidgen to correct this issue.
  D-Bus not compiled with backtrace support so unable to print a backtrace
Aborted
So far so good, following command seem to solve it:
dev# sys/X11/bin/dbus-uuidgen --ensure
... but then there is another problem:

Crashes:

#  LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge
no existing Deluge session

(deluge:16391): GLib-GIO-CRITICAL **: g_unix_mount_is_system_internal: assertion `mount_entry != NULL' failed
Segmentation fault
The g_unix_mount_is_system_internal function (in glib-2.17.2/gio/gunixmounts.c file) asserted since the given mount_entry was NULL.
gboolean
g_unix_mount_is_system_internal (GUnixMountEntry *mount_entry)
{
  g_return_val_if_fail (mount_entry != NULL, FALSE);

  return mount_entry->is_system_internal;
}
Look at the places where the function is called. We find the g_unix_mount_guess_should_display function in the same file.
gboolean
g_unix_mount_guess_should_display (GUnixMountEntry *mount_entry)
{
  const char *mount_path;

  /* Never display internal mountpoints */
  if (g_unix_mount_is_system_internal (mount_entry))
    return FALSE;

  /* Only display things in /media (which are generally user mountable)
     and home dir (fuse stuff) */
  mount_path = mount_entry->mount_path;
It is apperent that this function calls the g_unix_mount_is_system_internal it asserts but returns FALSE and so the g_unix_mount_guess_should_display immediatelly crashes.
Now we can look at the caller's code, to the _g_unix_mount_new function in the glib-2.17.2/gio/gunixmount.c:
GUnixMount *
_g_unix_mount_new (GVolumeMonitor  *volume_monitor,
                   GUnixMountEntry *mount_entry,
                   GUnixVolume     *volume)
{
  GUnixMount *mount;

  /* No volume for mount: Ignore internal things */
  if (volume == NULL && !g_unix_mount_guess_should_display (mount_entry) )
    return NULL;
  ...
I do not know what are the exact contract of those functions, from the code it seems that all three functions share the precondition that the mount_entry cannot be NULL.
I decided not to go up on the call stack and look for the actual cause of the NULL mount_entry, it can very well be the fact that I am running on the chroot-ed environment. As a quick fix I modify this function as follows:
dev# nano /usr/local/src/glib-2.17.2/gio/gunixmount.c
  if (volume == NULL && ( !mount_entry || !g_unix_mount_guess_should_display (mount_entry) ) )
    return NULL;
dev# make 
dev# make install
Ok, let's try it again:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 /mnt/C/sys/X11/bin/deluge
no existing Deluge session
Traceback (most recent call last):
  File "/mnt/C/sys/X11/bin/deluge", line 130, in 
    deluge.wizard.WizardGTK()
  File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/wizard.py", line 58, in __init__
    self.window.set_icon(deluge.common.get_logo(18))
  File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py", line 154, in get_logo
    size, size)
gobject.GError: Unrecognized image file format
When we look to the common.py we see there are two icon variants: svg and png. When we change .svg to .png everything works. You can open the file in a text editor:
dev# nano /mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py
        return gtk.gdk.pixbuf_new_from_file_at_size(get_pixmap("deluge.png"), \
            size, size)
... or simply call the following command:
dev# sed -i 's/deluge.svg/deluge.png/g' /mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py
Now we have made a progress, when we launch deluge the we have to setup the preferences.

But when we proceed, there is another error:
...
no existing Deluge session
Starting new Deluge session...
deluge_core; using libtorrent 0.13.0.0. Compiled with NDEBUG.
Applying preferences
Scanning plugin dir /mnt/C/sys/X11/share/deluge/plugins
Initialising plugin TorrentPeers
...
Initialising plugin SpeedLimiter
Applying preferences
Starting DHT...
Aborted
When we use strace tool to monitor system calls we get the following:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 strace deluge
...
close(11)                               = 0
write(1, "Applying preferences\n", 21Applying preferences
)  = 21
write(1, "Starting DHT...\n", 16Starting DHT...
)       = 16
open("/root/.config/deluge/dht.state", O_RDONLY) = -1 ENOENT (No such file or directory)
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
kill(31503, SIGABRT)                    = 0
--- SIGABRT (Aborted) @ 0 (0) ---
+++ killed by SIGABRT +++
Process 31503 detached
It seems the abort could have some relation with missing dht.state file. Let's try to debug it:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 gdb python
GNU gdb 6.3
...
(gdb) run /mnt/C/sys/X11/bin/deluge
Starting program: /mnt/C/sys/bin/python /mnt/C/sys/X11/bin/deluge
[Thread debugging using libthread_db enabled]
[New Thread 1024 (LWP 7763)]
Could not open /proc/7763/status
It seems it is not so easy to run gdb on chroot-ed system, I have tried mess up with manual /proc "dummy" file creation but had no success.
Another possibility is to debug directly on the box. Before that we have to upload it.

Deployment to the box:


After the upload routine some more problems emerged:
box# deluge
Traceback (most recent call last):
  File "/mnt/C/sys/X11/bin/deluge", line 43, in 
    import deluge
  File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/__init__.py", line 34, in 
ImportError: No module named pygtk
Ok, I forgot to deploy pygtk. Done.
box# deluge
Traceback (most recent call last):
  File "/mnt/C/sys/X11/bin/deluge", line 45, in 
    import deluge.core
  File "/usr/lib/python2.5/site-packages/deluge/core.py", line 59, in 
    import deluge_core
ImportError: File not found
This one is more interesting, we have to find out what is going on:
box# strace python
...
import deluge.deluge_core
...
open("./libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/mnt/C/sys/lib//libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/mnt/C/sys/X11/lib//libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
close(3)                                = 0
write(2, "Traceback (most recent call last"..., 35Traceback (most recent call last):
) = 35
...
Ok, it seems we forgot to deploy the boost libraries. Done.
box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge
no existing Deluge session

(deluge:19324): libglade-WARNING **: could not find glade file '/usr/share/deluge/glade/wizard.glade'
Traceback (most recent call last):
  File "/mnt/C/sys/X11/bin/deluge", line 130, in 
    deluge.wizard.WizardGTK()
  File "/usr/lib/python2.5/site-packages/deluge/wizard.py", line 50, in __init__
    , domain='deluge')
RuntimeError: could not create GladeXML object
We have to create a symlink in /usr/share (since it is just memory filesystem, in order to make it permanent we have to make symlink anytime the box boots up).
For now just call following command, later on we can modify an config file (e.g. /mnt/C/sys/etc/rc-local).
box# ln -s /mnt/C/sys/share/deluge /usr/share/deluge
When we try to launch it again, we get known "Unrecognized image file format" message.
box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge
no existing Deluge session
Traceback (most recent call last):
  File "/mnt/C/sys/X11/bin/deluge", line 130, in 
    deluge.wizard.WizardGTK()
  File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/wizard.py", line 58, in __init__
    self.window.set_icon(deluge.common.get_logo(18))
  File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py", line 154, in get_logo
    size, size)
gobject.GError: Unrecognized image file format
Stream editor helps us to fix it:
box# sed -i 's/deluge.svg/deluge.png/g' /mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py
Now when run it under strace again, and it seems that we got on the box# system as far as we were on the dev# system:
box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 strace deluge
...
close(11)                               = 0
write(1, "Applying preferences\n", 21Applying preferences
)  = 21
write(1, "Starting DHT...\n", 16Starting DHT...
)       = 16
open("/root/.config/deluge/dht.state", O_RDONLY) = -1 ENOENT (No such file or directory)
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
kill(31503, SIGABRT)                    = 0
--- SIGABRT (Aborted) @ 0 (0) ---
+++ killed by SIGABRT +++
Process 31503 detached

Local debugging:

Now we are ready to try the gdb debugger:
box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 gdb python
...
(gdb) run /mnt/C/sys/X11/bin/deluge
Starting program: /bin/python /mnt/C/sys/X11/bin/deluge
...
no existing Deluge session
Starting new Deluge session...
deluge_core; using libtorrent 0.13.0.0. Compiled with NDEBUG.

Program received signal SIG32, Real-time event 32.
0x4019cb64 in __rt_sigsuspend () from /lib/libc.so.0
(gdb) Quit
(gdb) where
#0  0x4019cb64 in __rt_sigsuspend () from /lib/libc.so.0
#1  0x4019cb9b in sigsuspend () from /lib/libc.so.0
#2  0x40139cf9 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
#3  0x401396bd in pthread_create () from /lib/libpthread.so.0
#4  0x416c9f26 in boost::thread::start_thread () from /mnt/C/sys/lib//libboost_thread-gcc33-mt-1_35.so.1.35.0
#5  0x415d68c8 in disk_io_thread (this=0x821acc8, block_size=-4) at thread.hpp:151
...
Sig32 event is not really what we are looking for (some info about it can be seen in this discussion), we wait for the ABORT signal and so can continue:
(gdb) cont
Continuing.
Program received signal SIG32, Real-time event 32.
0x4019cb64 in __rt_sigsuspend () from /lib/libc.so.0
(gdb) cont
Continuing.
...
Applying preferences
Scanning plugin dir /usr/share/deluge/plugins
...
Starting DHT...

Program received signal SIGABRT, Aborted.
0x4019b1c4 in kill () from /lib/libc.so.0
So let's look where the problem is:
(gdb) where
#0  0x4019b1c4 in kill () from /lib/libc.so.0
#1  0x4013af69 in pthread_kill () from /lib/libpthread.so.0
#2  0x4013b37f in raise () from /lib/libpthread.so.0
#3  0x40194cc7 in abort () from /lib/libc.so.0
#4  0x4027140f in __cxa_call_unexpected () from /mnt/C/sys/lib//libstdc++.so.5
#5  0x40271439 in std::terminate () from /mnt/C/sys/lib//libstdc++.so.5
#6  0x40271560 in __cxa_throw () from /mnt/C/sys/lib//libstdc++.so.5
#7  0x41638218 in libtorrent::bdecode<std::istream_iterator<char, char, 
std::char_traits<char>, int> > (end= {<std::iterator<std::input_iterator_tag, char, int, char const*, char 
const&>> = {<No data fields>}, _M_stream = 0x8d188, _M_value = -124 
'\204', _M_ok = 88}) at bencode.hpp:332
#8  0x41632d47 in torrent_start_DHT (self=0x0, args=0x41a6d44c) at basic_ios.h:110
#9  0x4005d122 in PyCFunction_Call (func=0x412712ac, arg=0x41a6d44c, kw=0x6) at Objects/methodobject.c:108
#10 0x400a2661 in call_function (pp_stack=0xbfffec5c, oparg=0) at Python/ceval.c:3573
...
So we can see an exception thrown in libtorrent::bdecode which causes the immediate process termination. As far as I know it can be caused either by not-catching the exception or by a double throw. Here and here are examples when one experiences the same.

This is enough for today, we shall look at it in the next part. Since the local debugging is reaaaallyyyyy sloooow and several times I have even run out of swap space, in next part we also look at the remote debugging.

No comments: