System Freeze

A cluster may freeze during:

  • Node startup or initialization: Refer to Node Startup Exception - Process Stuck for solutions.
  • System runtime:
    • Check for machine-related abnormal metrics to see if the cluster load is too high, which could slow down system responsiveness.
    • Provide the collected stack traces to DolphinDB technical support for further troubleshooting. To assist technical support in accurately analyzing stack changes, it is recommended to collect stack traces at least twice, with a 3-5 minute interval.

The following introduces two methods for collecting stack traces — pstack and gdb.

Method 1: pstack

Install and use pstack to collect stack traces by running the following shell script on each machine in the cluster:
#!/bin/bash 

mkdir /root/output/ 

dpid=`ps -ef |grep "mode datanode" |grep -v grep | awk '{print $2}'` 
cpid=`ps -ef |grep "mode controller" |grep -v grep | awk '{print $2}'` 

for i in $dpid 
do 
    cd /ddb/software/server
    pstack $i > /root/output/pstack_dnode_${i}.log 
done 

for i in $cpid 
do 
    cd /ddb/software/server 
    pstack $i > /root/output/pstack_ctrl_${i}.log 
done

Then, send the generated stack traces in the /root/output directory to DolphinDB technical support for further troubleshooting.

Method 2: gdb

Use gdb to collect stack traces:
#!/bin/bash
mkdir /root/output/

dpid=`ps -ef |grep "mode datanode" |grep -v grep | awk '{print $2}'`
cpid=`ps -ef |grep "mode controller" |grep -v grep | awk '{print $2}'`
for i in $dpid
do
    cd /home/dolphindb/server
    gdb --eval-command "set logging file /root/output/pstack_dnode_$i.log" --eval-command "set logging on" --eval-command "thread apply all bt" --batch --pid $i;
done

for i in $cpid
do
    cd /home/dolphindb/server
    gdb --eval-command "set logging file /root/output/pstack_ctl_$i.log" --eval-command "set logging on" --eval-command "thread apply all bt" --batch --pid $i;
done