Hi ALL,
My MPI + C++ program running on a IBM BladeCenter JS21 machine (running Linux Advanced Server 3) always works fine. However, recently it appear to be strange. It hang on for a unusually long period of time in the begining of execution if I reboot the BladeCenter cluster in advance. However, it will be normal againfor subsequent executions. That is, the problem appears only upon the first execution after rebooting the BladeCenter cluster. It will hang on for about 2 minutes in abnormal case.
However, it should be less than 5 seconds in normal case.gdb backtrace info is listed below. It is noticeable that the program hanging on the following line when it is abnormal. When I repeated the gdb command 'where' I always get the information listed between #0 and #7 during the period of 2 minutes.This problem has stuck me for a long time. anyone could help me out. Thanks in advance.
0x0fc3ca58 in __read_nocancel () from /lib/tls/libpthread.so.0
(gdb) where#0 0x0fc3ca58 in __read_nocancel () from /lib/tls/libpthread.so.0#1 0x0f05176c in std::__basic_file<char>::xsgetn () from /usr/lib/libstdc++.so.6#2 0x0eff7048 in std::basic_filebuf<char, std::char_traits<char> >::underflow () from /usr/lib/libstdc++.so.6#3 0x0f02a990 in std::basic_streambuf<char, std::char_traits<char> >::uflow () from /usr/lib/libstdc++.so.6#4 0x0f02a900 in std::basic_streambuf<char, std::char_traits<char> >::xsgetn () from /usr/lib/libstdc++.so.6#5 0x0eff691c in std::basic_filebuf<char, std::char_traits<char> >::xsgetn () from /usr/lib/libstdc++.so.6#6 0x0efffc6c in std::istream::read () from /usr/lib/libstdc++.so.6#7 0x10044ef4 in main (argc=1, argv=0xffffe5d4) at Discover_Test_5.cpp:1063
(gdb) nSingle stepping until exit from function __read_nocancel,which has no line number information.[Switching to Thread 4160532512 (LWP 2856)]0x0f05176c in std::__basic_file<char>::xsgetn () from /usr/lib/libstdc++.so.6(gdb) nSingle stepping until exit from function _ZNSt12__basic_fileIcE6xsgetnEPci,which has no line number information.0x0eff7048 in std::basic_filebuf<char, std::char_traits<char> >::underflow () from /usr/lib/libstdc++.so.6(gdb) nSingle stepping until exit from function _ZNSt13basic_filebufIcSt11char_traitsIcEE9underflowEv,which has no line number information.0x0f02a990 in std::basic_streambuf<char, std::char_traits<char> >::uflow () from /usr/lib/libstdc++.so.6(gdb) nSingle stepping until exit from function _ZNSt15basic_streambufIcSt11char_traitsIcEE5uflowEv,which has no line number information.0x0f02a900 in std::basic_streambuf<char, std::char_traits<char> >::xsgetn () from /usr/lib/libstdc++.so.6(gdb) nSingle stepping until exit from function _ZNSt15basic_streambufIcSt11char_traitsIcEE6
xsgetnEPci,which has no line number information.0x0eff691c in std::basic_filebuf<char, std::char_traits<char> >::xsgetn () from /usr/lib/libstdc++.so.6(gdb) nSingle stepping until exit from function _ZNSt13basic_filebufIcSt11char_traitsIcEE6xsgetnEPci,which has no line number information.0x0efffc6c in std::istream::read () from /usr/lib/libstdc++.so.6(gdb) nSingle stepping until exit from function _ZNSi4readEPci,which has no line number information.main (argc=1, argv=0xffffe5d4) at Discover_Test_5.cpp:10661066 test_Sum += tempItemsetTwoforWrite->Count;
_________________________________________________________________
ÊÖ»úÒ²ÄÜÉÏ MSN ÁÄÌìÁË£¬¿ìÀ´ÊÔÊÔ°É£¡
http://mobile.msn.com.cn/
|