2008/07/28

WMU-6500FS - strace 4.5.15

strace is a system call tracer, i.e. a debugging tool which prints out a trace of all the system calls made by a another process/program.
Build result: [binary] [files list]

Info sources: [Man page] [Project page]

Build sequence:
dev# wget "http://downloads.sourceforge.net/strace/strace-4.5.15.tar.bz2?modtime=1168972008&big_mirror=0&filesize=455607"
dev# tar xjvf strace-4.5.15.tar.bz2
dev# cd strace-4.5.15
dev# ./configure --prefix=/mnt/C/sys
dev# make
cdv# make install

Read more...

2008/07/26

WMU-6500FS - Deluge torrent (part II)

Overview

In previous part we have started building the deluge torrent ...

We ended up with a curious error, process was aborted after an exception thrown in libtorrent::bdecode. It seems that for some reason there is exception thrown and not caught which causes immediate process termination.

Screenshot

Here is a screenshot of the remote debugging in action:

Uncaught exception

When we look at the strace output:
...
close(11)                               = 0
write(1, "Applying preferences\n", 21Applying preferences
)  = 21
write(1, "Starting DHT...\n", 16Starting DHT...
)       = 16
open("/root/.config/deluge/dht.state", O_RDONLY) = -1 ENOENT (No such file or directory)
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
kill(31503, SIGABRT)                    = 0
--- SIGABRT (Aborted) @ 0 (0) ---
+++ killed by SIGABRT +++
Process 31503 detached
Call stack at the moment of termination:
...
Program received signal SIG32, Real-time event 32.
0x4019cb64 in __rt_sigsuspend () from /lib/libc.so.0
(gdb) Quit
(gdb) where
#0  0x4019cb64 in __rt_sigsuspend () from /lib/libc.so.0
#1  0x4019cb9b in sigsuspend () from /lib/libc.so.0
#2  0x40139cf9 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
#3  0x401396bd in pthread_create () from /lib/libpthread.so.0
#4  0x416c9f26 in boost::thread::start_thread () from /mnt/C/sys/lib//libboost_thread-gcc33-mt-1_35.so.1.35.0
#5  0x415d68c8 in disk_io_thread (this=0x821acc8, block_size=-4) at thread.hpp:151
...
An the corresponding source file src/deluge_core.cpp
    boost::filesystem::path tempPath(DHT_path, empty_name_check);
    boost::filesystem::ifstream DHT_state_file(tempPath, std::ios_base::binary);
    DHT_state_file.unsetf(std::ios_base::skipws);

    entry DHT_state;
    try
    {
(1)     DHT_state = bdecode(std::istream_iterator<char>(DHT_state_file),
            std::istream_iterator<char>());
        M_ses->start_dht(DHT_state);
        //        printf("DHT state recovered.\r\n");

        //        // Print out the state data from the FILE (not the session!)
        //        printf("Number of DHT peers in recovered state: %ld\r\n", count_DHT_peers(DHT_state));

    }
(2) catch (std::exception&)
    {
        printf("No DHT file to resume\r\n");
        M_ses->start_dht();
    }
... it seems obvious what happened. Application tried to open the dht.state file and since there was no such file an exception was thrown in the std::istream_iterator constructor (source code line marked as (1)). This exception is nevertheless not caught in the catch block (source code line marked as (2)). I do not know why, there is no aparent reason (the exception has definitely the std::exception base), the fact is that due to this the application is aborted.

Uncaught exception - workaround

For now I decided to make the following workaround, explicitly handle the missing file and so bypass the exception:
    boost::filesystem::path tempPath(DHT_path, empty_name_check);
    boost::filesystem::ifstream DHT_state_file(tempPath, std::ios_base::binary);
    DHT_state_file.unsetf(std::ios_base::skipws);

    entry DHT_state;
    if ( !exists( tempPath ) )
    {
        printf("No DHT file to resume\r\n");
        M_ses->start_dht();
    }
    else try
    {
        DHT_state = bdecode(std::istream_iterator<char>(DHT_state_file),
            std::istream_iterator<char>());
        M_ses->start_dht(DHT_state);
        //        printf("DHT state recovered.\r\n");

        //        // Print out the state data from the FILE (not the session!)
        //        printf("Number of DHT peers in recovered state: %ld\r\n", count_DHT_peers(DHT_state));

    }
    catch (std::exception&)
    {
        printf("No DHT file to resume\r\n");
        M_ses->start_dht();
    }
The unified diff looks as follows:
dev# diff -u src/deluge_core.cpp.old src/deluge_core.cpp
--- src/deluge_core.cpp.old     2008-06-04 20:03:49.000000000 -0600
+++ src/deluge_core.cpp 2008-07-02 16:32:01.000000000 -0600
@@ -1889,7 +1889,12 @@
     DHT_state_file.unsetf(std::ios_base::skipws);

     entry DHT_state;
-    try
+    if ( !exists( tempPath ) )
+    {
+        printf("No DHT file to resume\r\n");
+        M_ses->start_dht();
+    }
+    else try
     {
         DHT_state = bdecode(std::istream_iterator<char>(DHT_state_file),
             std::istream_iterator<char>());
With this workaround we are able to sucessfully open the application. But when we add a torrent file, there is another (but similar) problem. Another exception is thrown which leads to process termination. Since the local debugging is really slow on the box, before we proceed to solve this problem, let's look how to debug remotely.

Remote debugging

For remote debugging we will use the gdbserver. Following relevant inf is in the GDB remote debugging manual:
gdbserver is a control program for Unix-like systems, which allows you to connect your program 
with a remote GDB via target remote - but without linking in the usual debugging stub.
...
The `host:2345' argument means that gdbserver is to expect a TCP connection from machine `host' 
to local TCP port 2345. (Currently, the `host' part is ignored.)
...
On some targets, gdbserver can also attach to running programs. This is accomplished via the 
--attach argument.
...
On the GDB host machine, you need an unstripped copy of your program, since GDB needs symbols 
and debugging information. Start up GDB as usual, using the name of the local copy of your 
program as the first argument. ... After that, use target remote to establish communications 
with gdbserver.
The use of gdbserver is ideal for us, since due to the deployment process we have identical copied of the debugged program both on the box# and on the build dev# system.

So on the box we can run the deluge
box# deluge &
[1] 29193
... and attach to it with the gdbserver:
box# gdbserver colinux:2345 --attach 29193
Attached; pid = 29193
Listening on port 2345
or alternatively we can run the gdbserver running the deluge roght from the beggining:
box# gdbserver colinux:2345 `which python` `which deluge`
Process /mnt/C/sys/bin/python created; pid = 29235
Listening on port 2345
Now we can connect to the gdbserver with gdb running on the build machine (PC). Since the deluge just a python script, the executable we will debug is in fact python interpreter:
dev# gdb `which python`
...
(gdb) target remote 192.168.1.104:2345
(gdb) cont
Continuing.
[New Thread 1024]
...
Ok, that's it.

Note: sometimes, when we interrupt the debugging process, the deluge is not properly killed, there remains an instance. When we start deluge again the output looks like this:
...
create proxy object
create iface
send to iface

Child exited with retcode = 0

Child exited with status 0
GDBserver exiting
In such situation we have to find the hanging processes and kill them:
box# ps ax | grep deluge
29429 root      14136 S     1.7  /mnt/C/sys/bin/python /mnt/C/sys/bin/deluge
29440 root      14136 S     0.0  /mnt/C/sys/bin/python /mnt/C/sys/bin/deluge
29482 root        324 S     0.0  grep deluge
box# kill -9 29429 29440

Uncaught exceptions - looking for a cause

So as I already said, the workaround pushed us forward, but another exception is thrown later on, when we open a torrent file:
(gdb) where
...
#81 0xbfffca60 in ?? ()
#82 0x4027140f in __cxa_call_unexpected () from /mnt/C/sys/lib//libstdc++.so.5
#83 0x4027140f in __cxa_call_unexpected () from /mnt/C/sys/lib//libstdc++.so.5
#84 0x40271439 in std::terminate () from /mnt/C/sys/lib//libstdc++.so.5
#85 0x40271560 in __cxa_throw () from /mnt/C/sys/lib//libstdc++.so.5
#86 0x414d0d39 in libtorrent::entry::operator[] (this=0x821a4e0, key=0x41644ef4 "url-list") at entry.hpp:77
#87 0x415a5bc8 in libtorrent::torrent_info::read_torrent_info (this=0xbfffd090, torrent_file=@0xbfffcee0)
    at libtorrent/src/torrent_info.cpp:519
#88 0x4159e59e in torrent_info (this=0xbfffd090, torrent_file=@0xbfffcee0) at libtorrent/src/torrent_info.cpp:236
#89 0x41629068 in internal_get_torrent_info (torrent_name=@0x6) at src/deluge_core.cpp:264
#90 0x4162d0e7 in torrent_dump_file_info (self=0x0, args=0x41a7ab2c) at stl_alloc.h:652
#91 0x4005d122 in PyCFunction_Call (func=0x4127122c, arg=0x41a7ab2c, kw=0x6) at Objects/methodobject.c:108
...
When we look at the stack frame 87 and torrent_file structure:
(gdb) frame 87
#87 0x415a5bc8 in libtorrent::torrent_info::read_torrent_info (this=0xbfffd090, torrent_file=@0xbfffcee0)
    at libtorrent/src/torrent_info.cpp:519
519                             entry const& url_seeds = torrent_file["url-list"];

(gdb) list
514                     catch (type_error) {}
515
516                     // if there are any url-seeds, extract them
517                     try
518                     {
519                             entry const& url_seeds = torrent_file["url-list"];
520                             if (url_seeds.type() == entry::string_t)
521                             {
522                                     m_url_seeds.push_back(url_seeds.string());
523                             }

(gdb) print torrent_file
$1 = (const libtorrent::entry &) @0xbfffcee0: {m_type = dictionary_t, {data = "øv!\b\004\000\000\000ògB@",
    dummy_aligner = 17316280056}}
... and to the src/deluge-torrent-0.5.9.3/libtorrent/src/entry.cpp file :
#ifndef BOOST_NO_EXCEPTIONS
        const entry& entry::operator[](char const* key) const
        {
                dictionary_type::const_iterator i = dict().find(key);
                if (i == dict().end()) throw type_error(
                        (std::string("key not found: ") + key).c_str());
                return i->second;
        }

        const entry& entry::operator[](std::string const& key) const
        {
                return (*this)[key.c_str()];
        }
#endif
... we can see that the torrent_file is a python-like dictionary, when one tries to access a value stored with a key and there is no such entry, an exception is thrown. As in previous case neither this exception is not caught. It is time to stop with workarounds and look at the exception issue in a finer detail.
This discussion contains an ellegant test for exception catching. It seems that I am not alone having this sort of problems:
> So, the throw+catch are not hooking up properly under uClibc.
> This is uClibc's problem.

What version of uClibc?  gcc?  binutils?  How did you build your
toolchain?  Was gcc built with --enable-sjlj-exceptions?  What
kernel version are you using?
...
> part3:
> comment on gdb #5: __cxa_throw ()
> >From the backtrace above it looks that somethings goes wrong during stack 
> unwinding - does exception catching work for other C++ programs ?

It does if gcc was built properly...

    http://codepoet.org/throw1.cpp
    http://codepoet.org/throw2.cpp
So let's make the same test:
dev# cd /usr/local/src/
dev# mkdir test_exceptions
dev# cd test_exceptions/
dev# wget http://codepoet.org/throw1.cpp
dev# cat throw1.cpp
#include <features.h>
#include "iostream"
int main(void)
{
    try {
        throw("This is an exception");
    }
    catch (...) {
        std::cout << "caught an exception\n";
    }
    return(0);
}
dev# g++ -Wall -O2 throw1.cpp -o throw1
dev# ldd throw1
        libstdc++.so.5 => /lib/libstdc++.so.5 (0xb7edf000)
        libm.so.0 => /lib/libm.so.0 (0xb7ed1000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7ec9000)
        libc.so.0 => /lib/libc.so.0 (0xb7e35000)
        ld-uClibc.so.0 => /lib/ld-uClibc.so.0 (0xb7f91000)
dev# ./throw1
Aborted
So even in such a trivial case the exception is not caught and process is aborted.
Now let's look at our gcc configuration:
dev# gcc -v
Using built-in specs.
Configured with: /opt/buildroot/toolchain_build_i386/gcc-3.3.6/configure --prefix=/usr --build=i386-pc-linux-gnu --host=i386-linux-uclibc 
--target=i386-linux-uclibc --enable-languages=c,c++,objc --enable-shared --with-gxx-include-dir=/usr/include/c++ --disable-__cxa_atexit 
--enable-target-optspace --with-gnu-ld --disable-nls --enable-multilib : 
(reconfigured) /opt/buildroot/toolchain_build_i386/gcc-3.3.6/configure --prefix=/usr --build=i386-pc-linux-gnu --host=i386-linux-uclibc 
--target=i386-linux-uclibc --enable-languages=c,c++,objc --enable-shared --with-gxx-include-dir=/usr/include/c++ --disable-__cxa_atexit 
--enable-target-optspace --with-gnu-ld --disable-nls --enable-multilib : 
(reconfigured) /home/joker/CR/opt/buildroot/toolchain_build_i386/gcc-3.3.6/configure --prefix=/usr --build=i386-pc-linux-gnu 
--host=i386-linux-uclibc --target=i386-linux-uclibc --enable-languages=c,c++,objc --enable-shared --with-gxx-include-dir=/usr/include/c++ 
--disable-__cxa_atexit --enable-target-optspace --with-gnu-ld --disable-nls --enable-multilib
Thread model: posix
gcc version 3.3.6
... it seems that it is not configured with --enable-sjlj-exceptions.
This message named "libgcc_s.so compatibility between 3.3 and 3.4 (sjlj/dwarf2 exceptions)" describes a problem with the change of default exception model. It seems that the gcc change management (so versioning) did not go very well at that point:
On m68k-linux and parisc-linux between 3.3 and 3.4 the default exception model changed from sjlj based exceptions to dw2 based exceptions. Unfortunately at this time the soversion number of the shared libgcc was not bumped. ... I didn't look at other architectures, if the distinction is needed as well.
Regarding to this message it seems that re-build of GCC with --enable-sjlj-exceptions could be a solution. The problem is that I have not prepared the uClibc build environment (I just used the one prebuilt by JoKeR) and so have no experience with building the GCC.
I did not want to start such a big task, so I decided to give another shot to a workaround:

Disable exceptions

The code in the src/deluge-torrent-0.5.9.3/libtorrent/src/entry.cpp file inspired me to this workaround:
#ifndef BOOST_NO_EXCEPTIONS
        const entry& entry::operator[](char const* key) const
        {
                dictionary_type::const_iterator i = dict().find(key);
                if (i == dict().end()) throw type_error(
                        (std::string("key not found: ") + key).c_str());
                return i->second;
        }

        const entry& entry::operator[](std::string const& key) const
        {
                return (*this)[key.c_str()];
        }
#endif
It seems that the libtorrent code is prepared for cases when there is no exception support in the platform (as is the case for some embedded systems for example). So why not try to disable the boost exceptions?
dev# cd /usr/local/src/boost_1_35_0
dev# nano boost/config/user.hpp
append the following at the end of the file:
#define BOOST_NO_EXCEPTIONS
... and now re-build the boost libraries:
dev# make
...
libs/serialization/src/xml_woarchive.cpp:58:   instantiated from here
boost/archive/iterators/wchar_from_mb.hpp:119: error: `dataflow_exception'
   undeclared in namespace `boost::archive::iterators'
...
...failed updating 9 targets...
...skipped 2 targets...
Not all Boost libraries built properly.
It seems not all the libraries are compatible with this define, anyway, let's proceed (finish the boost installation and restart the deluge build process):
dev# make install
...
dev# cd ../deluge-torrent-0.5.9.3
dev#  python setup.py clean
dev# python setup.py build
...
libtorrent/src/metadata_transfer.cpp: In member function `virtual bool
   libtorrent::<unnamed>::metadata_peer_plugin::on_extension_handshake(const
   libtorrent::entry&)':
libtorrent/src/metadata_transfer.cpp:275: error: passing `const
   libtorrent::entry' as `this' argument of `libtorrent::entry&
   libtorrent::entry::operator[](const char*)' discards qualifiers
error: command 'gcc' failed with exit status 1
In libtorrent/src/metadata_transfer.cpp there is a following code:
    virtual bool on_extension_handshake(entry const& h)
    {
            entry const& messages = h["m"];
            if (entry const* index = messages.find_key("LT_metadata"))
            {
                    m_message_index = int(index->integer());
                    return true;
            }
            else
            {
                    m_message_index = 0;
                    return false;
            }
    }
... in libtorrent/include/libtorrent/entry.hpp there are following map accessors:
                entry& operator[](char const* key);
                entry& operator[](std::string const& key);
#ifndef BOOST_NO_EXCEPTIONS
                const entry& operator[](char const* key) const;
                const entry& operator[](std::string const& key) const;
#endif
                entry* find_key(char const* key);
                entry const* find_key(char const* key) const;
                entry* find_key(std::string const& key);
the solution was to change the operator[] access to find_key() method call:
    virtual bool on_extension_handshake(entry const& h)
    {
//          entry const& messages = h["m"];
            entry const* messages( h.find_key("m") );
            if ( messages )
            {
                if (entry const* index = messages->find_key("LT_metadata"))
                {
                    m_message_index = int(index->integer());
                    return true;
                }
            }
//          else
            {
                    m_message_index = 0;
                    return false;
            }
    }
There is a lot of such places in the code, it seems that the library support for BOOST_NO_EXCEPTIONS directive is not fully implemented yet:
dev# python setup.py build
...
libtorrent/src/ut_pex.cpp: In member function `virtual bool
   libtorrent::<unnamed>::ut_pex_peer_plugin::on_extension_handshake(const
   libtorrent::entry&)':
libtorrent/src/ut_pex.cpp:201: error: passing `const libtorrent::entry' as
   `this' argument of `libtorrent::entry& libtorrent::entry::operator[](const
   char*)' discards qualifiers
error: command 'gcc' failed with exit status 1
I still did not feel ready to rebuild the GCC. In a desperation I looked to the deluge site and noticed that there is a new version out there - 1.0.0_RC3. It promised a change in my miserable situation.

Deluge new version - Disable exceptions

So let's upgrade:
dev# cd /usr/local/src/
dev# wget http://download.deluge-torrent.org/source/0.9.03/deluge-0.9.03.tar.gz
dev# tar xzvf deluge-0.9.03.tar.gz
dev# cd deluge-0.9.03
dev# python setup.py build
...
In file included from /mnt/C/sys/include/boost-1_35/boost/thread.hpp:17,
                 from libtorrent/include/libtorrent/storage.hpp:44,
                 from libtorrent/include/libtorrent/peer_connection.hpp:63,
                 from libtorrent/src/peer_connection.cpp:41:
/mnt/C/sys/include/boost-1_35/boost/thread/recursive_mutex.hpp:18:2: #error "Boost threads unavailable on this platform"
...
We already have a solution for this:
dev# export CFLAGS="-pthread $CFLAGS "
Then there is another build error:
dev# python setup.py build
...
libtorrent/src/torrent_handle.cpp: In member function `const
   libtorrent::torrent_info& libtorrent::torrent_handle::get_torrent_info()
   const':
libtorrent/src/torrent_handle.cpp:503: error: no matching function for call to
   `libtorrent::torrent_info::torrent_info()'
libtorrent/include/libtorrent/torrent_info.hpp:86: error: candidates are:
   libtorrent::torrent_info::torrent_info(const libtorrent::torrent_info&)
libtorrent/include/libtorrent/torrent_info.hpp:130: error:
   libtorrent::torrent_info::torrent_info(const libtorrent::entry&)
libtorrent/include/libtorrent/torrent_info.hpp:92: error:
   libtorrent::torrent_info::torrent_info(const boost::filesystem::path&)
libtorrent/include/libtorrent/torrent_info.hpp:91: error:
   libtorrent::torrent_info::torrent_info(const char*, int)
libtorrent/include/libtorrent/torrent_info.hpp:90: error:
   libtorrent::torrent_info::torrent_info(const libtorrent::lazy_entry&)
libtorrent/include/libtorrent/torrent_info.hpp:89: error:
   libtorrent::torrent_info::torrent_info(const libtorrent::sha1_hash&)
libtorrent/src/torrent_handle.cpp: In member function `libtorrent::entry
   libtorrent::torrent_handle::write_resume_data() const':
libtorrent/src/torrent_handle.cpp:533: error: return-statement with no value,
   in function declared with a non-void return type
error: command 'gcc' failed with exit status 1
When we look at libtorrent/src/torrent_handle.cpp:
        torrent_info const& torrent_handle::get_torrent_info() const
        {
                INVARIANT_CHECK;
#ifdef BOOST_NO_EXCEPTIONS
                const static torrent_info empty;
#endif
                boost::shared_ptr<torrent> t = m_torrent.lock();
        ...
... it is obvious that there the torrent_info has no default constructor, but in case of defined BOOST_NO_EXCEPTIONS such one is used. So I tried to add one:
libtorrent/include/libtorrent/torrent_info.hpp:
#ifdef BOOST_NO_EXCEPTIONS
 torrent_info();
#endif
    torrent_info(sha1_hash const& info_hash);
    torrent_info(lazy_entry const& torrent_file);
    torrent_info(char const* buffer, int size);
    torrent_info(fs::path const& filename);
    ~torrent_info();
libtorrent/src/torrent_info.cpp:
#ifdef BOOST_NO_EXCEPTIONS
        torrent_info::torrent_info()
                : m_creation_date(pt::second_clock::universal_time())
                , m_multifile(false)
                , m_private(false)
                , m_info_section_size(0)
                , m_piece_hashes(0)
        {}
#endif
Another build error and solution was following:
dev# python setup.py build
...
libtorrent/src/torrent_handle.cpp: In member function `libtorrent::entry
   libtorrent::torrent_handle::write_resume_data() const':
libtorrent/src/torrent_handle.cpp:533: error: return-statement with no value,
   in function declared with a non-void return type
error: command 'gcc' failed with exit status 1
Let's look to the libtorrent/src/torrent_handle.cpp file:
        entry torrent_handle::write_resume_data() const
        {
                INVARIANT_CHECK;

                entry ret(entry::dictionary_t);
                TORRENT_FORWARD(write_resume_data(ret));
                t->filesystem().write_resume_data(ret);

                return ret;
        }
When we look at the definition of the macro in the same file:
#ifdef BOOST_NO_EXCEPTIONS

#define TORRENT_FORWARD(call) \
        boost::shared_ptr<torrent> t = m_torrent.lock(); \
        if (!t) return; \
        session_impl::mutex_t::scoped_lock l(t->session().m_mutex); \
        t->call
...
#else

#define TORRENT_FORWARD(call) \
        boost::shared_ptr<torrent> t = m_torrent.lock(); \
        if (!t) throw_invalid_handle(); \
        session_impl::mutex_t::scoped_lock l(t->session().m_mutex); \
        t->call
...
... we see that this version of macro can be used only in function returning void. The solution is use another variant of the macro:
        TORRENT_FORWARD_RETURN2(write_resume_data(ret), entry() );
Let's proceed:
dev# python setup.py build
...
libtorrent/src/memdebug.cpp:34:22: execinfo.h: No such file or directory
libtorrent/src/memdebug.cpp: In constructor `memdebug::memdebug()':
libtorrent/src/memdebug.cpp:61: error: `__malloc_hook' undeclared (first use
   this function)
libtorrent/src/memdebug.cpp:61: error: (Each undeclared identifier is reported
   only once for each function it appears in.)
libtorrent/src/memdebug.cpp:62: error: `__free_hook' undeclared (first use this
   function)
libtorrent/src/memdebug.cpp: In static member function `static void*
   memdebug::my_malloc_hook(unsigned int, const void*)':
libtorrent/src/memdebug.cpp:133: error: `backtrace' undeclared (first use this
   function)
libtorrent/src/memdebug.cpp:144: error: `backtrace_symbols' undeclared (first
   use this function)
error: command 'gcc' failed with exit status 1
This kind of error is not connected with BOOST_NO_EXCEPTIONS. It is due to the missing backtrace support in uClibc.
Fortunately it seems that the memdebug.cpp is used just for memory debugging (prevention of memory leaks and so), this file is self contained and there are no outside dependencies on this file. Let's try simply disable it (simply undefine or clear the file content):
#if 0
#if defined __linux__ && defined __GNUC__
...
#endif
#endif
Here is another BOOST_NO_EXCEPTIONS related build error:
dev# python setup.py build
...
libtorrent/src/http_connection.cpp:208:   instantiated from here
libtorrent/include/libtorrent/variant_stream.hpp:226: error: no matching
   function for call to `libtorrent::ssl_stream<libtorrent::socket_type>::close
   ()'
libtorrent/include/libtorrent/ssl_stream.hpp:158: error: candidates are: void
   libtorrent::ssl_stream<Stream>::close(boost::system::error_code&) [with
   Stream = libtorrent::socket_type]
error: command 'gcc' failed with exit status 1
As we see in the libtorrent/include/libtorrent/ssl_stream.hpp the ssl_stream class contains two overloads of close() method, the non-parametric one disabled in case there are no exceptions:
#ifndef BOOST_NO_EXCEPTIONS
        void close()
        {
                m_sock.next_layer().close();
        }
#endif

        void close(error_code& ec)
        {
                m_sock.next_layer().close(ec);
        }
In libtorrent/include/libtorrent/variant_stream.hpp there are two kinds of close visitor class: close_visitor (one used with exceptions) and close_visitor (one used with error codes), but the "exception" variant is not disabled, let's try to fix it:
 struct close_visitor_ec
    : boost::static_visitor<>
  {
      close_visitor_ec(error_code& ec_)
        : ec(ec_)
      {}

      template <class T>
      void operator()(T* p) const
      { p->close(ec); }

      void operator()(boost::blank) const {}

      error_code& ec;
  };

#ifndef BOOST_NO_EXCEPTIONS
  struct close_visitor
    : boost::static_visitor<>
  {
      template <class T>
      void operator()(T* p) const
      { p->close(); }

      void operator()(boost::blank) const {}
  };
#endif
When we try to rebuild, we can see the place where the inappropriate visitor is called:
dev# python setup.py build
...
libtorrent/include/libtorrent/variant_stream.hpp:665: error: `close_visitor'
   undeclared in namespace `libtorrent::aux'
error: command 'gcc' failed with exit status 1
We have to somewhat fix the libtorrent/include/libtorrent/variant_stream.hpp. There are basically two possibilities: (a) undefine the non-parametric version or (b)(maybe less correctly) use the error code variant and drop the error_code.
... as a quick fix I decided to use the latter method:
    void close(error_code& ec)
    {
        if (!instantiated()) return;
        boost::apply_visitor(
            aux::close_visitor_ec(ec), m_variant
        );
    }

// #ifndef BOOST_NO_EXCEPTIONS   // (1st - more correct - method)
    void close()
    {
//        if (!instantiated()) return;
//        boost::apply_visitor(aux::close_visitor(), m_variant);
        error_code ec;              // 2nd - simpler - method
        close( ec );
    }
// #endif                        // (1st - more correct - method) 
Another build problem (with already known solution) followed - undeclared IPV6_V6ONLY; so I have modified the libtorrent/include/libtorrent/socket.hpp and added the dummy IPV6_V6ONLY definition.

Then there was a bunch of errors with dictionary and lack of non-mutable operator[] access in case of BOOST_NO_EXCEPTIONS (similar to the previous version). I decided to solve it with a helper function called get_subentry_const implementing the missing functionality:
The libtorrent/src/kademlia/dht_tracker.cpp file:
        libtorrent::entry const& get_subentry_const( libtorrent::entry const& e, char const* key )
        {
#ifndef BOOST_NO_EXCEPTIONS
            return e[key];
#else
            libtorrent::entry const* const s( e.find_key( key ) );
            if (!s) throw std::runtime_error("subentry not found");
            return *s;
#endif
        }
This function - in case of missing key - throws the exception - which in turn cause the process termination, the question is whether the missing key in client code is really exceptional or it is used in normal program flow... Alternative would be to return reference to a global empty instance of entry class.

Now the client code has to be modified as follows:
//     std::string const& id = r["id"].string();
       std::string const& id = get_subentry_const( r, "id" ).string();
With all these changes the compilation completed succesfully but I got the following link errors:
dev# python setup.py build
...
`.L7753' referenced in section `.rodata' of build/temp.linux-i686-2.5/./libtorrent/src/create_torrent.o: defined in d
iscarded section `.gnu.linkonce.t._ZN5boost9date_time23gregorian_calendar_baseINS0_19year_month_day_baseINS_9gregoria
n9greg_yearENS3_10greg_monthENS3_8greg_dayEEEmE16end_of_month_dayES4_S5_' of build/temp.linux-i686-2.5/./libtorrent/s
rc/create_torrent.o
`.L7740' referenced in section `.rodata' of build/temp.linux-i686-2.5/./libtorrent/src/create_torrent.o: defined in d
iscarded section `.gnu.linkonce.t._ZN5boost9date_time23gregorian_calendar_baseINS0_19year_month_day_baseINS_9gregoria
n9greg_yearENS3_10greg_monthENS3_8greg_dayEEEmE16end_of_month_dayES4_S5_' of build/temp.linux-i686-2.5/./libtorrent/s
rc/create_torrent.o
...
collect2: ld returned 1 exit status
error: command 'gcc' failed with exit status 1
As you can see in this bug report it seems as a known compiler bug.
There was one promising hint in this discussion:
While trying to build some C++ code with g++-3.3.6 we encountered error
messages  similar to those mentioned in earlier comments in this discussion:

  `xxx' referenced in section `.rodata' of somefile.o: defined 
  in discarded section `.gnu.linkonce.t._zzz' of something.o

  ...

We were able to eliminate these error messages and successfuly compile our code
with g++-3.3.6 by using the '-frepo' option in our g++ compiles.

I understand that this may not be the ultimate solution to the bug(s) being
discussed here, but since we ended up here when searching for this particular
error message, I'm adding this comment in case it helps other people.
Nevertheless when I tried the suggested workaround:
dev# export CFLAGS="$CFLAGS -frepo"
... the result was exactly the same.

As you could see, I tried really hard to avoid the re-configuration and re-compilation of the compiler but at the end it seems inevitable.
But it is enough for now, I feel really tired ;-)
We shall look at that in the next part.

Note: I have created a ticket on libtorrent Trac; it contains the diff with all the changes I have done.

Read more...

2008/07/24

WMU-6500FS - Deluge torrent (part I)

Info sources: [Home page]

Overview

The deluge torrent client is first real X application (and the first application implemented in c++) I am trying to build on the WMU box. Thus it is by far the hardest puzzle I had to solve so far. I learned a lot along the way and so I would like to share it step by step with you.
Whole build process is so complicated for me, that I have not finished it yet (Update: it is finished now 8-) ) and so I am going to organize this post as series of three (hopefully there will be no need for more ;) parts. Also the deluge version was changing during my mission so the series will reflect these changes. It is not because I want to force readers (possibly trying to build the deluge as well) to go the exact same path, the point is to show all the problems (along with solutions or workarounds) I have experienced, since they can emerge in another situations as well.

This part contains the initial build process and deployment. Next part will deal with the local and remote debugging and (hopefully) the last one will describe the solution of the problems found during the debugging sessions. I am also going to prepare a post with build sequence of libraries the deluge directly or indirectly depends on.

Compilation:

I have started with the 0.5.9.1 version (the newest one at the time).
dev# wget http://download.deluge-torrent.org/source/0.5.9.1/deluge-0.5.9.1.tar.gz
dev# tar xzvf deluge-0.5.9.1.tar.gz
dev# export CFLAGS=-I/mnt/C/sys/include/boost-1_35
dev# python setup.py build
...
Error 1
libtorrent/src/session_impl.cpp:829:   instantiated from here
libtorrent/include/libtorrent/socket.hpp:181: error: `IPV6_V6ONLY' undeclared
It seems the uClibc (at least 0.9.27 version) lacks the IPV6_V6ONLY definition. See for example this message for more info about this (missing) define.
As a quick and dirty fix I just define the identifier locally:
dev# nano libtorrent/include/libtorrent/socket.hpp
... and add following (the actual value got from debian /usr/include/bits/in.h file) just after the include section:
#define IPV6_V6ONLY             26

Static linkage:

Now we can try the build again:
dev# python setup.py build
...
/usr/bin/ld: cannot find -lboost_filesystem-mt
Static linker is unable to find the boost_filesystem-mt library. At first we have to make sure that linker flags LDFLAGS are pointing to proper directory (sys/lib):
dev# echo $LDFLAGS

dev# export LDFLAGS=-L/mnt/C/sys/lib
... but the problem still persists. The problem is that the boost library names are more specific (contain boost version and compiler version). In README Additional notes is the following:
1) On some distributions, boost libraries are renamed to have "-mt" at the end (boost_thread_mt instead of boost_thread, for example), the "mt" indicating "multithreaded". In some cases it appears the distros lack symlinks to connect things. The solution is to either add symlinks from the short names to those with "-mt", or to alter setup.py to look for the "-mt" versions.
I do not know whether there should be libboost_*-mt.so symlinks by default, maybe I have not configured boost library properly, as a quick fix I decided to create the links manually:
dev# cd /mnt/C/sys/lib
dev# for i in `ls ./libboost_*-gcc33-mt-1_35.so.1.35.0`; do ln -s $i ${i/%-gcc33-mt-1_35.so.1.35.0/-mt.so}; done
Now we have another link error:
dev# python setup.py build
...
can't resolve symbol 'get_nprocs'
It seems that uClibc does not implement get_nprocs function. Following text from this forum describes the workaround:
I grep'ed the boost source tree for get_nprocs and found the only instance of it in thread.cpp. I found out that this function returns the number of processors and is really not needed in my case. So instead of having "return get_nprocs()", I changed it to "return 1". While I think this should not break anything, I thought I should still tell you.
So in file boost_1_35_0/libs/thread/src/pthread/thread.cpp I have replaced
    return get_nprocs();
with:
    return 1;
in short:
dev# cd /usr/local/src/boost_1_35_0
dev# sed -i 's/return get_nprocs();/return 1;/g' libs/thread/src/pthread/thread.cpp
... and rebuilt the boost library.
dev# make
...
...updating 3 targets...
...
...updated 3 targets...
dev# make install
...
...updating 6 targets...
common.copy /mnt/C/sys/lib/libboost_thread-gcc33-mt-1_35.so.1.35.0
ln-UNIX /mnt/C/sys/lib/libboost_thread-gcc33-mt-1_35.so
common.copy /mnt/C/sys/lib/libboost_wave-gcc33-mt-1_35.so.1.35.0
ln-UNIX /mnt/C/sys/lib/libboost_wave-gcc33-mt-1_35.so
common.hard-link /mnt/C/sys/lib/libboost_thread-gcc33-mt.so
common.hard-link /mnt/C/sys/lib/libboost_wave-gcc33-mt.so
...updated 6 targets...
... now the build finished successfully, we are ready to install:
dev# python setup.py install --prefix=/mnt/C/sys/X11
In order to correctly deploy on the box, one additional hack was needed (touch everything containing deluge in path so it is included in the filopack archive):
dev# find /mnt/C/sys -path "*deluge*" -type f -exec touch {} \;
We have the first step done. Do not know yet what is ahead of us ;-)

Dynamic linkage:

Now we can try to run the application:
dev# deluge
...
/usr/bin/python: can't resolve symbol '_ZN5boost6system19get_system_categoryEv'
It seems there is some symbol expected by deluge app but missing in the boost libraries.
With the nm command we find out whether the symbol is in any boost library:
dev# nm -A -D /mnt/C/sys/lib/libboost_* | grep "T _ZN5boost6system19get_system_categoryEv"
/mnt/C/sys/lib/libboost_system-gcc33-mt-1_35.so:00001d28 T _ZN5boost6system19get_system_categoryEv
/mnt/C/sys/lib/libboost_system-gcc33-mt-1_35.so.1.35.0:00001d28 T _ZN5boost6system19get_system_categoryEv
/mnt/C/sys/lib/libboost_system-gcc33-mt.so:00001d28 T _ZN5boost6system19get_system_categoryEv
/mnt/C/sys/lib/libboost_system-mt.so:00001d28 T _ZN5boost6system19get_system_categoryEv
Make sure that the symbol is needed from the deluge_core library:
dev# nm /mnt/C/sys/lib/python2.5/site-packages/deluge/deluge_core.so | grep _ZN5boost6system19get_system_categoryEv
         U _ZN5boost6system19get_system_categoryEv
We can test whether all the libraries the deluge_core depends on are properly found by the dynamic linker:
dev# ldd /mnt/C/sys/lib/python2.5/site-packages/deluge/deluge_core.so
        libboost_filesystem-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0 (0x00000000)
        libboost_date_time-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_date_time-gcc33-mt-1_35.so.1.35.0 (0x00000000)
        libboost_thread-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_thread-gcc33-mt-1_35.so.1.35.0 (0x00000000)
        ...
        libboost_system-gcc33-mt-1_35.so.1.35.0 => /usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 (0x00000000)
        ...
I am not sure what exactly is going on here, I would be glad if anybody shed some light on it, as a workaround the LD_PRELOAD works for me:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge
Traceback (most recent call last):
  File "/usr/bin/deluge", line 46, in 
    import deluge._dbus as dbus
  File "/mnt/C/sys/lib/python2.5/site-packages/deluge/_dbus.py", line 32, in 
    from dbus import Interface, SessionBus, version
ImportError: No module named dbus
We have to install dbus before we proceed (see this post)...
Now we can try it again:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge
process 16385: D-Bus library appears to be incorrectly set up; failed to read machine uuid: Failed to open 

"/mnt/C/sys/X11//var/lib/dbus/machine-id": No such file or directory
See the manual page for dbus-uuidgen to correct this issue.
  D-Bus not compiled with backtrace support so unable to print a backtrace
Aborted
So far so good, following command seem to solve it:
dev# sys/X11/bin/dbus-uuidgen --ensure
... but then there is another problem:

Crashes:

#  LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge
no existing Deluge session

(deluge:16391): GLib-GIO-CRITICAL **: g_unix_mount_is_system_internal: assertion `mount_entry != NULL' failed
Segmentation fault
The g_unix_mount_is_system_internal function (in glib-2.17.2/gio/gunixmounts.c file) asserted since the given mount_entry was NULL.
gboolean
g_unix_mount_is_system_internal (GUnixMountEntry *mount_entry)
{
  g_return_val_if_fail (mount_entry != NULL, FALSE);

  return mount_entry->is_system_internal;
}
Look at the places where the function is called. We find the g_unix_mount_guess_should_display function in the same file.
gboolean
g_unix_mount_guess_should_display (GUnixMountEntry *mount_entry)
{
  const char *mount_path;

  /* Never display internal mountpoints */
  if (g_unix_mount_is_system_internal (mount_entry))
    return FALSE;

  /* Only display things in /media (which are generally user mountable)
     and home dir (fuse stuff) */
  mount_path = mount_entry->mount_path;
It is apperent that this function calls the g_unix_mount_is_system_internal it asserts but returns FALSE and so the g_unix_mount_guess_should_display immediatelly crashes.
Now we can look at the caller's code, to the _g_unix_mount_new function in the glib-2.17.2/gio/gunixmount.c:
GUnixMount *
_g_unix_mount_new (GVolumeMonitor  *volume_monitor,
                   GUnixMountEntry *mount_entry,
                   GUnixVolume     *volume)
{
  GUnixMount *mount;

  /* No volume for mount: Ignore internal things */
  if (volume == NULL && !g_unix_mount_guess_should_display (mount_entry) )
    return NULL;
  ...
I do not know what are the exact contract of those functions, from the code it seems that all three functions share the precondition that the mount_entry cannot be NULL.
I decided not to go up on the call stack and look for the actual cause of the NULL mount_entry, it can very well be the fact that I am running on the chroot-ed environment. As a quick fix I modify this function as follows:
dev# nano /usr/local/src/glib-2.17.2/gio/gunixmount.c
  if (volume == NULL && ( !mount_entry || !g_unix_mount_guess_should_display (mount_entry) ) )
    return NULL;
dev# make 
dev# make install
Ok, let's try it again:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 /mnt/C/sys/X11/bin/deluge
no existing Deluge session
Traceback (most recent call last):
  File "/mnt/C/sys/X11/bin/deluge", line 130, in 
    deluge.wizard.WizardGTK()
  File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/wizard.py", line 58, in __init__
    self.window.set_icon(deluge.common.get_logo(18))
  File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py", line 154, in get_logo
    size, size)
gobject.GError: Unrecognized image file format
When we look to the common.py we see there are two icon variants: svg and png. When we change .svg to .png everything works. You can open the file in a text editor:
dev# nano /mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py
        return gtk.gdk.pixbuf_new_from_file_at_size(get_pixmap("deluge.png"), \
            size, size)
... or simply call the following command:
dev# sed -i 's/deluge.svg/deluge.png/g' /mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py
Now we have made a progress, when we launch deluge the we have to setup the preferences.

But when we proceed, there is another error:
...
no existing Deluge session
Starting new Deluge session...
deluge_core; using libtorrent 0.13.0.0. Compiled with NDEBUG.
Applying preferences
Scanning plugin dir /mnt/C/sys/X11/share/deluge/plugins
Initialising plugin TorrentPeers
...
Initialising plugin SpeedLimiter
Applying preferences
Starting DHT...
Aborted
When we use strace tool to monitor system calls we get the following:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 strace deluge
...
close(11)                               = 0
write(1, "Applying preferences\n", 21Applying preferences
)  = 21
write(1, "Starting DHT...\n", 16Starting DHT...
)       = 16
open("/root/.config/deluge/dht.state", O_RDONLY) = -1 ENOENT (No such file or directory)
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
kill(31503, SIGABRT)                    = 0
--- SIGABRT (Aborted) @ 0 (0) ---
+++ killed by SIGABRT +++
Process 31503 detached
It seems the abort could have some relation with missing dht.state file. Let's try to debug it:
dev# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 gdb python
GNU gdb 6.3
...
(gdb) run /mnt/C/sys/X11/bin/deluge
Starting program: /mnt/C/sys/bin/python /mnt/C/sys/X11/bin/deluge
[Thread debugging using libthread_db enabled]
[New Thread 1024 (LWP 7763)]
Could not open /proc/7763/status
It seems it is not so easy to run gdb on chroot-ed system, I have tried mess up with manual /proc "dummy" file creation but had no success.
Another possibility is to debug directly on the box. Before that we have to upload it.

Deployment to the box:


After the upload routine some more problems emerged:
box# deluge
Traceback (most recent call last):
  File "/mnt/C/sys/X11/bin/deluge", line 43, in 
    import deluge
  File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/__init__.py", line 34, in 
ImportError: No module named pygtk
Ok, I forgot to deploy pygtk. Done.
box# deluge
Traceback (most recent call last):
  File "/mnt/C/sys/X11/bin/deluge", line 45, in 
    import deluge.core
  File "/usr/lib/python2.5/site-packages/deluge/core.py", line 59, in 
    import deluge_core
ImportError: File not found
This one is more interesting, we have to find out what is going on:
box# strace python
...
import deluge.deluge_core
...
open("./libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/mnt/C/sys/lib//libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/mnt/C/sys/X11/lib//libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/libboost_filesystem-gcc33-mt-1_35.so.1.35.0", O_RDONLY) = -1 ENOENT (No such file or directory)
close(3)                                = 0
write(2, "Traceback (most recent call last"..., 35Traceback (most recent call last):
) = 35
...
Ok, it seems we forgot to deploy the boost libraries. Done.
box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge
no existing Deluge session

(deluge:19324): libglade-WARNING **: could not find glade file '/usr/share/deluge/glade/wizard.glade'
Traceback (most recent call last):
  File "/mnt/C/sys/X11/bin/deluge", line 130, in 
    deluge.wizard.WizardGTK()
  File "/usr/lib/python2.5/site-packages/deluge/wizard.py", line 50, in __init__
    , domain='deluge')
RuntimeError: could not create GladeXML object
We have to create a symlink in /usr/share (since it is just memory filesystem, in order to make it permanent we have to make symlink anytime the box boots up).
For now just call following command, later on we can modify an config file (e.g. /mnt/C/sys/etc/rc-local).
box# ln -s /mnt/C/sys/share/deluge /usr/share/deluge
When we try to launch it again, we get known "Unrecognized image file format" message.
box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 deluge
no existing Deluge session
Traceback (most recent call last):
  File "/mnt/C/sys/X11/bin/deluge", line 130, in 
    deluge.wizard.WizardGTK()
  File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/wizard.py", line 58, in __init__
    self.window.set_icon(deluge.common.get_logo(18))
  File "/mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py", line 154, in get_logo
    size, size)
gobject.GError: Unrecognized image file format
Stream editor helps us to fix it:
box# sed -i 's/deluge.svg/deluge.png/g' /mnt/C/sys/X11/lib/python2.5/site-packages/deluge/common.py
Now when run it under strace again, and it seems that we got on the box# system as far as we were on the dev# system:
box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 strace deluge
...
close(11)                               = 0
write(1, "Applying preferences\n", 21Applying preferences
)  = 21
write(1, "Starting DHT...\n", 16Starting DHT...
)       = 16
open("/root/.config/deluge/dht.state", O_RDONLY) = -1 ENOENT (No such file or directory)
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
kill(31503, SIGABRT)                    = 0
--- SIGABRT (Aborted) @ 0 (0) ---
+++ killed by SIGABRT +++
Process 31503 detached

Local debugging:

Now we are ready to try the gdb debugger:
box# LD_PRELOAD=/usr/lib/libboost_system-gcc33-mt-1_35.so.1.35.0 gdb python
...
(gdb) run /mnt/C/sys/X11/bin/deluge
Starting program: /bin/python /mnt/C/sys/X11/bin/deluge
...
no existing Deluge session
Starting new Deluge session...
deluge_core; using libtorrent 0.13.0.0. Compiled with NDEBUG.

Program received signal SIG32, Real-time event 32.
0x4019cb64 in __rt_sigsuspend () from /lib/libc.so.0
(gdb) Quit
(gdb) where
#0  0x4019cb64 in __rt_sigsuspend () from /lib/libc.so.0
#1  0x4019cb9b in sigsuspend () from /lib/libc.so.0
#2  0x40139cf9 in __pthread_wait_for_restart_signal () from /lib/libpthread.so.0
#3  0x401396bd in pthread_create () from /lib/libpthread.so.0
#4  0x416c9f26 in boost::thread::start_thread () from /mnt/C/sys/lib//libboost_thread-gcc33-mt-1_35.so.1.35.0
#5  0x415d68c8 in disk_io_thread (this=0x821acc8, block_size=-4) at thread.hpp:151
...
Sig32 event is not really what we are looking for (some info about it can be seen in this discussion), we wait for the ABORT signal and so can continue:
(gdb) cont
Continuing.
Program received signal SIG32, Real-time event 32.
0x4019cb64 in __rt_sigsuspend () from /lib/libc.so.0
(gdb) cont
Continuing.
...
Applying preferences
Scanning plugin dir /usr/share/deluge/plugins
...
Starting DHT...

Program received signal SIGABRT, Aborted.
0x4019b1c4 in kill () from /lib/libc.so.0
So let's look where the problem is:
(gdb) where
#0  0x4019b1c4 in kill () from /lib/libc.so.0
#1  0x4013af69 in pthread_kill () from /lib/libpthread.so.0
#2  0x4013b37f in raise () from /lib/libpthread.so.0
#3  0x40194cc7 in abort () from /lib/libc.so.0
#4  0x4027140f in __cxa_call_unexpected () from /mnt/C/sys/lib//libstdc++.so.5
#5  0x40271439 in std::terminate () from /mnt/C/sys/lib//libstdc++.so.5
#6  0x40271560 in __cxa_throw () from /mnt/C/sys/lib//libstdc++.so.5
#7  0x41638218 in libtorrent::bdecode<std::istream_iterator<char, char, 
std::char_traits<char>, int> > (end= {<std::iterator<std::input_iterator_tag, char, int, char const*, char 
const&>> = {<No data fields>}, _M_stream = 0x8d188, _M_value = -124 
'\204', _M_ok = 88}) at bencode.hpp:332
#8  0x41632d47 in torrent_start_DHT (self=0x0, args=0x41a6d44c) at basic_ios.h:110
#9  0x4005d122 in PyCFunction_Call (func=0x412712ac, arg=0x41a6d44c, kw=0x6) at Objects/methodobject.c:108
#10 0x400a2661 in call_function (pp_stack=0xbfffec5c, oparg=0) at Python/ceval.c:3573
...
So we can see an exception thrown in libtorrent::bdecode which causes the immediate process termination. As far as I know it can be caused either by not-catching the exception or by a double throw. Here and here are examples when one experiences the same.

This is enough for today, we shall look at it in the next part. Since the local debugging is reaaaallyyyyy sloooow and several times I have even run out of swap space, in next part we also look at the remote debugging.

Read more...

2008/07/09

WMU-6500FS - Mesa 7.0.3

Following image demonstrates the Mesa 3d application running on the box. As can be seen from FPS, there is no HW acceleration :-)



Build results (library): [binary] [files list]
Build results (demos): [binary] [files list]
Info sources: [Home page] [MesaLib dependencies]
Related info: [Xming notes] [Remote desktop for linux] [Compiling Mesa fbdev/DRI drivers without X installed]

For now we try to compile the stand-alone/Xlib mode (no Hardware acceleration necessary ;-)

Dependencies: [file 4.24] [Xorg libraries]

First we have to download and extract the libraries and related stuff:
dev# cd /usr/local/src
dev# wget "http://downloads.sourceforge.net/mesa3d/MesaLib-7.0.3.tar.bz2?modtime=1207336858&big_mirror=0"
dev# tar xjvf MesaLib-7.0.3.tar.bz2
dev# wget "http://downloads.sourceforge.net/mesa3d/MesaDemos-7.0.3.tar.bz2?modtime=1207336927&big_mirror=0"
dev# tar xjvf MesaDemos-7.0.3.tar.bz2
dev# wget "http://downloads.sourceforge.net/mesa3d/MesaGLUT-7.0.3.tar.bz2?modtime=1207336986&big_mirror=0"
dev# tar xjvf MesaGLUT-7.0.3.tar.bz2 
dev# cd Mesa-7.0.3
Now it is necessary to modify the configuration files, we have to replace the following two paths in the configs/linux:
dev# nano configs/linux

#X11_INCLUDES = -I/usr/X11R6/include
X11_INCLUDES = -I/mnt/C/sys/X11/include

#EXTRA_LIB_PATH = -L/usr/X11R6/lib
EXTRA_LIB_PATH = -L/mnt/C/sys/X11/lib
Also we have to alter the installation paths int the configs/default:
dev# nano configs/default

#INSTALL_DIR = /usr/local
INSTALL_DIR = /mnt/C/sys/X11

#DRI_DRIVER_INSTALL_DIR = /usr/X11R6/lib/modules/dri
DRI_DRIVER_INSTALL_DIR = /mnt/C/sys/X11/lib/modules/dri
No we are ready build:
dev# make linux-x86
...
../../lib/libGL.so: undefined reference to `posix_memalign'
collect2: ld returned 1 exit status
Regarding to this forum topic the posix_memalign is currently not implemented in uClibc so we have to disable it.
So let's back to the configuration file:
dev# nano configs/linux

DEFINES = -D_POSIX_SOURCE -D_POSIX_C_SOURCE=199309L -D_SVID_SOURCE \
        -D_BSD_SOURCE -D_GNU_SOURCE \
        -DPTHREADS -DUSE_XSHM
# -DHAVE_POSIX_MEMALIGN
Now we try the build...
dev# make linux-x86
... and it completed successfully.

We are ready to start testing some demo apps, we try the notorious gears application:
#dev /mnt/C/sys/bin/mesa-demos/gears
If you get an error message similar to this:
Error: Can't open display:
... you probably did not configured the DISPLAY environment variable properly. For a hint how to do that see the X client/server configuration.

We have now the X client/server properly configured, gears test passed with no problems, we are ready to deploy on the box.
Curiously on the box we observe the following crash:
box# /mnt/C/sys/bin/mesa-demos/gears
Illegal instruction
To get to the heart of the matter we have to digg little deeper, however we need a tool for it. Let's try JoKeR's built version of gdb. For now we are going to debug directly at the box, later on we look at the remote debugging.
box# gdb /mnt/C/sys/bin/mesa-demos/gears
GNU gdb 6.3
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-linux-uclibc"...Using host libthread_db library "/mnt/C/sys/lib/libthread_db.so.1".

(gdb) run
Starting program: /tmp/mnt/C/sys/bin/mesa-demos/gears
warning: Cannot initialize thread debugging library: generic error


Program received signal SIGILL, Illegal instruction.
_mesa_x86_cpuid () at x86/common_x86_asm.S:80
80      x86/common_x86_asm.S: No such file or directory.
        in x86/common_x86_asm.S
Current language:  auto; currently asm
(gdb) backtrace
#0  _mesa_x86_cpuid () at x86/common_x86_asm.S:80
#1  0x403053f0 in ?? () from /mnt/C/sys/lib/libGL.so.1
#2  0x4031f980 in quad_tab () from /mnt/C/sys/lib/libGL.so.1
#3  0x4021c24a in _mesa_init_all_x86_transform_asm () at x86/common_x86.c:180
#4  0x4016a17d in _math_init_transformation () at math/m_xform.c:214
#5  0x4016a19a in _math_init () at math/m_xform.c:227
#6  0x400e8674 in one_time_init (ctx=0x8056b38) at main/context.c:368
#7  0x400e75e1 in _mesa_initialize_context (ctx=0x8056b38, visual=0x8054688,
    share_list=0x0, driverFunctions=0xbffff6c0, driverContext=0x0) at main/context.c:1067
#8  0x4026f92b in XMesaCreateContext (v=0x8054688, share_list=0x0)
    at drivers/x11/xm_api.c:1504
#9  0x4026c234 in Fake_glXCreateContext (dpy=0x804cfa0, visinfo=0x8056900,
    share_list=0x0, direct=1) at drivers/x11/fakeglx.c:1401
#10 0x4026a46d in glXCreateContext (dpy=0x804cfa0, visinfo=0x8056900, shareList=0x0,
    direct=1) at drivers/x11/glxapi.c:188
#11 0x40020d22 in __glutCreateWindow (parent=0x0, x=0, y=0, width=300,
    height=1073998032, gameMode=0) at glut_win.c:609
#12 0x40020df3 in glutCreateWindow (title=0x804ae60 "Gears") at glut_win.c:731
#13 0x0804902e in main (argc=1, argv=0xbffffbc4) at gears.c:370
(gdb)
Ok, it seems promising, now we can look at the file in question:
dev# less /usr/local/src/Mesa-7.0.3/src/mesa/x86/common_x86_asm.S
... or we can look for example here on the web. At the line 80 we see the CPUID instruction:
...
76 :  MOV_L (REGOFF(4, ESP), EAX) /* cpuid op */
77 :  PUSH_L (EDI)
78 :  PUSH_L (EBX)
79 :  
80 :  CPUID
81 :  
82 :  MOV_L (REGOFF(16, ESP), EDI) /* *eax */
83 :  MOV_L (EAX, REGIND(EDI))
...
At the CPUID instruction description we can see what exactly the this instruction means and when it was introduced.
The following way we can find out the exact info about the CPU in the box:
box# cat /proc/cpuinfo
processor       : 0
vendor_id       : CyrixInstead
cpu family      : 4
model           : 1
model name      : Cx486SLC
stepping        : unknown
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : no
fpu_exception   : no
cpuid level     : -1
wp              : yes
flags           :
bogomips        : 44.33
In the document Application Note112 CyrixCPU Detection Guide I have found the following info:
The Cx486SLC, Cx486DLC, Cx486SRx2, Cx486DRx2, Cx486S, Cx486DX, Cx486DX2, and 5x86 processors do not recognize the CPUID instruction and must be identified using the DIR0 and DIR1 register.
So the cause of the problem is found. There is no support of certain instructions on the box and so we cannot be sure the code we built on the PC will run.
In the configuration file configs/linux-x86 we can see following ASM related switches:
ASM_FLAGS = -DUSE_X86_ASM -DUSE_MMX_ASM -DUSE_3DNOW_ASM -DUSE_SSE_ASM
In order not to use ASM implementations we can use linux configuration instead of linux-x86:
dev# make realclean
dev# make linux
Everything seem to be working now :-)

Read more...