Overview
The deluge torrent client is first real X application (and the first application implemented in c++) I am trying to build on the WMU box. Thus it is by far the hardest puzzle I had to solve so far. I learned a lot along the way and so I would like to share it step by step with you.Whole build process is so complicated for me, that I have not finished it yet (Update: it is finished now 8-) ) and so I am going to organize this post as series of three (hopefully there will be no need for more ;) parts. Also the deluge version was changing during my mission so the series will reflect these changes. It is not because I want to force readers (possibly trying to build the deluge as well) to go the exact same path, the point is to show all the problems (along with solutions or workarounds) I have experienced, since they can emerge in another situations as well.
This part contains the initial build process and deployment. Next part will deal with the local and remote debugging and (hopefully) the last one will describe the solution of the problems found during the debugging sessions. I am also going to prepare a post with build sequence of libraries the deluge directly or indirectly depends on.
Compilation:
I have started with the 0.5.9.1 version (the newest one at the time).dev# wget http://download.deluge-torrent.org/source/0.5.9.1/deluge-0.5.9.1.tar.gz dev# tar xzvf deluge-0.5.9.1.tar.gz dev# export CFLAGS=-I/mnt/C/sys/include/boost-1_35 dev# python setup.py build ... Error 1 libtorrent/src/session_impl.cpp:829: instantiated from here libtorrent/include/libtorrent/socket.hpp:181: error: `IPV6_V6ONLY' undeclaredIt seems the uClibc (at least 0.9.27 version) lacks the IPV6_V6ONLY definition. See for example this message for more info about this (missing) define.
As a quick and dirty fix I just define the identifier locally:
dev# nano libtorrent/include/libtorrent/socket.hpp... and add following (the actual value got from debian /usr/include/bits/in.h file) just after the include section:
#define IPV6_V6ONLY 26
Static linkage:
Now we can try the build again:dev# python setup.py build ... /usr/bin/ld: cannot find -lboost_filesystem-mtStatic linker is unable to find the boost_filesystem-mt library. At first we have to make sure that linker flags LDFLAGS are pointing to proper directory (sys/lib):
dev# echo $LDFLAGS dev# export LDFLAGS=-L/mnt/C/sys/lib... but the problem still persists. The problem is that the boost library names are more specific (contain boost version and compiler version). In README Additional notes is the following:
1) On some distributions, boost libraries are renamed to have "-mt" at the end (boost_thread_mt instead of boost_thread, for example), the "mt" indicating "multithreaded". In some cases it appears the distros lack symlinks to connect things. The solution is to either add symlinks from the short names to those with "-mt", or to alter setup.py to look for the "-mt" versions.I do not know whether there should be libboost_*-mt.so symlinks by default, maybe I have not configured boost library properly, as a quick fix I decided to create the links manually:
dev# cd /mnt/C/sys/lib dev# for i in `ls ./libboost_*-gcc33-mt-1_35.so.1.35.0`; do ln -s $i ${i/%-gcc33-mt-1_35.so.1.35.0/-mt.so}; doneNow we have another link error:
dev# python setup.py build ... can't resolve symbol 'get_nprocs'It seems that uClibc does not implement get_nprocs function. Following text from this forum describes the workaround:
I grep'ed the boost source tree for get_nprocs and found the only instance of it in thread.cpp. I found out that this function returns the number of processors and is really not needed in my case. So instead of having "return get_nprocs()", I changed it to "return 1". While I think this should not break anything, I thought I should still tell you.So in file boost_1_35_0/libs/thread/src/pthread/thread.cpp I have replaced
return get_nprocs();with:
return 1;in short:
dev# cd /usr/local/src/boost_1_35_0 dev# sed -i 's/return get_nprocs();/return 1;/g' libs/thread/src/pthread/thread.cpp... and rebuilt the boost library.
dev# make ... ...updating 3 targets... ... ...updated 3 targets... dev# make install ... ...updating 6 targets... common.copy /mnt/C/sys/lib/libboost_thread-gcc33-mt-1_35.so.1.35.0 ln-UNIX /mnt/C/sys/lib/libboost_thread-gcc33-mt-1_35.so common.copy /mnt/C/sys/lib/libboost_wave-gcc33-mt-1_35.so.1.35.0 ln-UNIX /mnt/C/sys/lib/libboost_wave-gcc33-mt-1_35.so common.hard-link /mnt/C/sys/lib/libboost_thread-gcc33-mt.so common.hard-link /mnt/C/sys/lib/libboost_wave-gcc33-mt.so ...updated 6 targets...... now the build finished successfully, we are ready to install:
dev# python setup.py install --prefix=/mnt/C/sys/X11In order to correctly deploy on the box, one additional hack was needed (touch everything containing deluge in path so it is included in the filopack archive):
dev# find /mnt/C/sys -path "*deluge*" -type f -exec touch {} \;We have the first step done. Do not know yet what is ahead of us ;-)
Dynamic linkage:
Now we can try to run the application:dev# deluge ... /usr/bin/python: can't resolve symbol '_ZN5boost6system19get_system_categoryEv'It seems there is some symbol expected by deluge app but missing in the boost libraries.
With the nm command we find out whether the symbol is in any boost library:
dev# nm -A -D /mnt/C/sys/lib/libboost_* | grep "T _ZN5boost6system19get_system_categoryEv" /mnt/C/sys/lib/libboost_system-gcc33-mt-1_35.so:00001d28 T _ZN5boost6system19get_system_categoryEv /mnt/C/sys/lib/libboost_system-gcc33-mt-1_35.so.1.35.0:00001d28 T _ZN5boost6system19get_system_categoryEv /mnt/C/sys/lib/libboost_system-gcc33-mt.so:00001d28 T _ZN5boost6system19get_system_categoryEv /mnt/C/sys/lib/libboost_system-mt.so:00001d28 T _ZN5boost6system19get_system_categoryEvMake sure that the symbol is needed from the deluge_core library:
dev# nm /mnt/C/sys/lib/python2.5/site-packages/deluge/deluge_core.so | grep _ZN5boost6system19get_system_categoryEv U _ZN5boost6system19get_system_categoryEvWe can test whether all the libraries the deluge_core depends on are properly found by the dynamic linker:
dev# ldd /mnt/C/sys/lib/python2.5/site-packages/deluge/deluge_core.so libboost_filesystem-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0 (0x00000000) libboost_date_time-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_date_time-gcc33-mt-1_35.so.1.35.0 (0x00000000) libboost_thread-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_thread-gcc33-mt-1_35.so.1.35.0 (0x00000000) ... libboost_system-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 (0x00000000) ...I am not sure what exactly is going on here, I would be glad if anybody shed some light on it, as a workaround the LD_PRELOAD works for me:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge Traceback (most recent call last): File "/usr/bin/deluge", line 46, inWe have to install dbus before we proceed (see this post)...import deluge._dbus as dbus File "/mnt/C/sys/lib/python2.5/site-packages/deluge/_dbus.py", line 32, in from dbus import Interface, SessionBus, version ImportError: No module named dbus
Now we can try it again:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge process 16385: D-Bus library appears to be incorrectly set up; failed to read machine uuid: Failed to open "/mnt/C/sys/X11//var/lib/dbus/machine-id": No such file or directory See the manual page for dbus-uuidgen to correct this issue. D-Bus not compiled with backtrace support so unable to print a backtrace AbortedSo far so good, following command seem to solve it:
dev# sys/X11/bin/dbus-uuidgen --ensure... but then there is another problem:
Crashes:
# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge no existing Deluge session (deluge:16391): GLib-GIO-CRITICAL **: g_unix_mount_is_system_internal: assertion `mount_entry != NULL' failed Segmentation faultThe g_unix_mount_is_system_internal function (in glib-2.17.2/gio/gunixmounts.c file) asserted since the given mount_entry was NULL.
gboolean g_unix_mount_is_system_internal (GUnixMountEntry *mount_entry) { g_return_val_if_fail (mount_entry != NULL, FALSE); return mount_entry->is_system_internal; }Look at the places where the function is called. We find the g_unix_mount_guess_should_display function in the same file.
gboolean g_unix_mount_guess_should_display (GUnixMountEntry *mount_entry) { const char *mount_path; /* Never display internal mountpoints */ if (g_unix_mount_is_system_internal (mount_entry)) return FALSE; /* Only display things in /media (which are generally user mountable) and home dir (fuse stuff) */ mount_path = mount_entry->mount_path;It is apperent that this function calls the g_unix_mount_is_system_internal it asserts but returns FALSE and so the g_unix_mount_guess_should_display immediatelly crashes.
Now we can look at the caller's code, to the _g_unix_mount_new function in the glib-2.17.2/gio/gunixmount.c:
GUnixMount * _g_unix_mount_new (GVolumeMonitor *volume_monitor, GUnixMountEntry *mount_entry, GUnixVolume *volume) { GUnixMount *mount; /* No volume for mount: Ignore internal things */ if (volume == NULL && !g_unix_mount_guess_should_display (mount_entry) ) return NULL; ...I do not know what are the exact contract of those functions, from the code it seems that all three functions share the precondition that the mount_entry cannot be NULL.
I decided not to go up on the call stack and look for the actual cause of the NULL mount_entry, it can very well be the fact that I am running on the chroot-ed environment. As a quick fix I modify this function as follows:
dev# nano /usr/local/src/glib-2.17.2/gio/gunixmount.c if (volume == NULL && ( !mount_entry || !g_unix_mount_guess_should_display (mount_entry) ) ) return NULL; dev# make dev# make install
Ok, let's try it again: dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 /mnt/C/sys/X11/bin/deluge no existing Deluge session Traceback (most recent call last): File "/mnt/C/sys/X11/bin/deluge", line 130, inWhen we look to the common.py we see there are two icon variants: svg and png. When we change .svg to .png everything works. You can open the file in a text editor:deluge.wizard.WizardGTK() File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/wizard.py", line 58, in __init__ self.window.set_icon(deluge.common.get_logo(18)) File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py", line 154, in get_logo size, size) gobject.GError: Unrecognized image file format
dev# nano /mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py return gtk.gdk.pixbuf_new_from_file_at_size(get_pixmap("deluge.png"), \ size, size)... or simply call the following command:
dev# sed -i 's/deluge.svg/deluge.png/g' /mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.pyNow we have made a progress, when we launch deluge the we have to setup the preferences.
But when we proceed, there is another error:
... no existing Deluge session Starting new Deluge session... deluge_core; using libtorrent 0.13.0.0. Compiled with NDEBUG. Applying preferences Scanning plugin dir /mnt/C/sys/X11/share/deluge/plugins Initialising plugin TorrentPeers ... Initialising plugin SpeedLimiter Applying preferences Starting DHT... AbortedWhen we use strace tool to monitor system calls we get the following:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 strace deluge ... close(11) = 0 write(1, "Applying preferences\n", 21Applying preferences ) = 21 write(1, "Starting DHT...\n", 16Starting DHT... ) = 16 open("/root/.config/deluge/dht.state", O_RDONLY) = -1 ENOENT (No such file or directory) rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 kill(31503, SIGABRT) = 0 --- SIGABRT (Aborted) @ 0 (0) --- +++ killed by SIGABRT +++ Process 31503 detachedIt seems the abort could have some relation with missing dht.state file. Let's try to debug it:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 gdb python GNU gdb 6.3 ... (gdb) run /mnt/C/sys/X11/bin/deluge Starting program: /mnt/C/sys/bin/python /mnt/C/sys/X11/bin/deluge [Thread debugging using libthread_db enabled] [New Thread 1024 (LWP 7763)] Could not open /proc/7763/statusIt seems it is not so easy to run gdb on chroot-ed system, I have tried mess up with manual /proc "dummy" file creation but had no success.
Another possibility is to debug directly on the box. Before that we have to upload it.
Deployment to the box:
After the upload routine some more problems emerged:
box# deluge Traceback (most recent call last): File "/mnt/C/sys/X11/bin/deluge", line 43, inOk, I forgot to deploy pygtk. Done.import deluge File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/__init__.py", line 34, in ImportError: No module named pygtk
box# deluge Traceback (most recent call last): File "/mnt/C/sys/X11/bin/deluge", line 45, inThis one is more interesting, we have to find out what is going on:import deluge.core File "/usr/lib/python2.5/site-packages/deluge/core.py", line 59, in import deluge_core ImportError: File not found
box# strace python ... import deluge.deluge_core ... open("./libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory) open("/mnt/C/sys/lib//libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory) open("/mnt/C/sys/X11/lib//libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory) open("/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory) open("/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory) close(3) = 0 write(2, "Traceback (most recent call last"..., 35Traceback (most recent call last): ) = 35 ...Ok, it seems we forgot to deploy the boost libraries. Done.
box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge no existing Deluge session (deluge:19324): libglade-WARNING **: could not find glade file '/usr/share/deluge/glade/wizard.glade' Traceback (most recent call last): File "/mnt/C/sys/X11/bin/deluge", line 130, inWe have to create a symlink in /usr/share (since it is just memory filesystem, in order to make it permanent we have to make symlink anytime the box boots up).deluge.wizard.WizardGTK() File "/usr/lib/python2.5/site-packages/deluge/wizard.py", line 50, in __init__ , domain='deluge') RuntimeError: could not create GladeXML object
For now just call following command, later on we can modify an config file (e.g. /mnt/C/sys/etc/rc-local).
box# ln -s /mnt/C/sys/share/deluge /usr/share/delugeWhen we try to launch it again, we get known "Unrecognized image file format" message.
box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge no existing Deluge session Traceback (most recent call last): File "/mnt/C/sys/X11/bin/deluge", line 130, inStream editor helps us to fix it:deluge.wizard.WizardGTK() File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/wizard.py", line 58, in __init__ self.window.set_icon(deluge.common.get_logo(18)) File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py", line 154, in get_logo size, size) gobject.GError: Unrecognized image file format
box# sed -i 's/deluge.svg/deluge.png/g' /mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.pyNow when run it under strace again, and it seems that we got on the box# system as far as we were on the dev# system:
box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 strace deluge ... close(11) = 0 write(1, "Applying preferences\n", 21Applying preferences ) = 21 write(1, "Starting DHT...\n", 16Starting DHT... ) = 16 open("/root/.config/deluge/dht.state", O_RDONLY) = -1 ENOENT (No such file or directory) rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 kill(31503, SIGABRT) = 0 --- SIGABRT (Aborted) @ 0 (0) --- +++ killed by SIGABRT +++ Process 31503 detached
Local debugging:
Now we are ready to try the gdb debugger:box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 gdb python ... (gdb) run /mnt/C/sys/X11/bin/deluge Starting program: /bin/python /mnt/C/sys/X11/bin/deluge ... no existing Deluge session Starting new Deluge session... deluge_core; using libtorrent 0.13.0.0. Compiled with NDEBUG. Program received signal SIG32, Real-time event 32. 0x4019cb64 in __rt_sigsuspend () from /lib/libc.so.0 (gdb) Quit (gdb) where #0 0x4019cb64 in __rt_sigsuspend () from /lib/libc.so.0 #1 0x4019cb9b in sigsuspend () from /lib/libc.so.0 #2 0x40139cf9 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0 #3 0x401396bd in pthread_create () from /lib/libpthread.so.0 #4 0x416c9f26 in boost::thread::start_thread () from /mnt/C/sys/lib//libboost_thread-gcc33-mt-1_35.so.1.35.0 #5 0x415d68c8 in disk_io_thread (this=0x821acc8, block_size=-4) at thread.hpp:151 ...Sig32 event is not really what we are looking for (some info about it can be seen in this discussion), we wait for the ABORT signal and so can continue:
(gdb) cont Continuing. Program received signal SIG32, Real-time event 32. 0x4019cb64 in __rt_sigsuspend () from /lib/libc.so.0 (gdb) cont Continuing. ... Applying preferences Scanning plugin dir /usr/share/deluge/plugins ... Starting DHT... Program received signal SIGABRT, Aborted. 0x4019b1c4 in kill () from /lib/libc.so.0So let's look where the problem is:
(gdb) where #0 0x4019b1c4 in kill () from /lib/libc.so.0 #1 0x4013af69 in pthread_kill () from /lib/libpthread.so.0 #2 0x4013b37f in raise () from /lib/libpthread.so.0 #3 0x40194cc7 in abort () from /lib/libc.so.0 #4 0x4027140f in __cxa_call_unexpected () from /mnt/C/sys/lib//libstdc++.so.5 #5 0x40271439 in std::terminate () from /mnt/C/sys/lib//libstdc++.so.5 #6 0x40271560 in __cxa_throw () from /mnt/C/sys/lib//libstdc++.so.5 #7 0x41638218 in libtorrent::bdecode<std::istream_iterator<char, char, std::char_traits<char>, int> > (end= {<std::iterator<std::input_iterator_tag, char, int, char const*, char const&>> = {<No data fields>}, _M_stream = 0x8d188, _M_value = -124 '\204', _M_ok = 88}) at bencode.hpp:332 #8 0x41632d47 in torrent_start_DHT (self=0x0, args=0x41a6d44c) at basic_ios.h:110 #9 0x4005d122 in PyCFunction_Call (func=0x412712ac, arg=0x41a6d44c, kw=0x6) at Objects/methodobject.c:108 #10 0x400a2661 in call_function (pp_stack=0xbfffec5c, oparg=0) at Python/ceval.c:3573 ...So we can see an exception thrown in libtorrent::bdecode which causes the immediate process termination. As far as I know it can be caused either by not-catching the exception or by a double throw. Here and here are examples when one experiences the same.
This is enough for today, we shall look at it in the next part. Since the local debugging is reaaaallyyyyy sloooow and several times I have even run out of swap space, in next part we also look at the remote debugging.
No comments:
Post a Comment