Building a Self-Healing PHP Crawler System with Bash Watchdog

Script created with the assistance of DeepSeek AI

When running multiple PHP crawler instances, you need reliability. If a script detects an infinite loop and exits, you want it automatically restarted. Here’s how to create a robust watchdog system using bash scripting.

The Complete Watchdog Script (at the bottom)

Let’s build watchdog.sh for monitoring 4 PHP crawler instances:

#!/bin/bash

# Configuration section
SCREEN_SESSIONS=("crawl_sess_01" "crawl_sess_02" "crawl_sess_03" "crawl_sess_04")
CRAWL_SCRIPT="/path/to/your/crawl.php"
LOG_FILE="/path/to/watchdog.log"

Line-by-line explanation:

  • #!/bin/bash – The shebang tells system this is a bash script
  • SCREEN_SESSIONS array contains names for our 4 screen sessions
  • CRAWL_SCRIPT points to your PHP crawler file
  • LOG_FILE defines where monitoring logs are saved

Logging Function

log_message() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}

This function timestamps each message and appends to the log file. The $1 represents the first argument passed to the function.

Session Management Functions

screen_session_exists() {
    screen -list | grep -q "[0-9]\+\.$1"
}

This checks if a screen session exists:

  • screen -list shows all active sessions
  • grep -q quietly searches for pattern
  • [0-9]\+\.$1 matches session IDs followed by our session name
start_session() {
    local session_name="$1"
    log_message "Starting session: $session_name"
    screen -S "$session_name" -d -m php "$CRAWL_SCRIPT"
    sleep 2
}

The start_session function:

  • local session_name="$1" captures the session name parameter
  • screen -S "$session_name" -d -m creates detached screen session
  • php "$CRAWL_SCRIPT" runs your PHP script within the screen
  • sleep 2 prevents resource contention during mass startups

Initialization and Monitoring Loop

echo "=== Watchdog Started $(date) ===" > "$LOG_FILE"
log_message "Initializing crawler sessions..."

for session in "${SCREEN_SESSIONS[@]}"; do
    if screen_session_exists "$session"; then
        log_message "✓ $session is running"
    else
        log_message "✗ $session not found - starting..."
        start_session "$session"
    fi
done

The initialization:

  • >"$LOG_FILE" overwrites log file on each startup
  • "${SCREEN_SESSIONS[@]}" expands to all array elements
  • The loop checks each session, starting missing ones

The Heartbeat Monitor

log_message "Beginning monitoring loop..."
while true; do
    for session in "${SCREEN_SESSIONS[@]}"; do
        if ! screen_session_exists "$session"; then
            log_message "⚠️  $session died - restarting..."
            start_session "$session"
        fi
    done
    sleep 45
done

The infinite loop (while true) continuously:

  1. Checks all 4 sessions every 45 seconds
  2. Restarts any that have stopped
  3. Uses sleep 45 to prevent excessive CPU usage

Here’s the full, ready-to-use watchdog script that monitors and restarts 4 PHP crawler instances:

#!/bin/bash

# ========================
# CONFIGURATION SECTION
# ========================

# Array of screen session names for your crawler instances
SCREEN_SESSIONS=("crawl_sess_01" "crawl_sess_02" "crawl_sess_03" "crawl_sess_04")

# Path to your PHP crawler script
CRAWL_SCRIPT="/path/to/your/crawl.php"

# Log file path for monitoring activities
LOG_FILE="/path/to/watchdog.log"

# ========================
# FUNCTION DEFINITIONS
# ========================

# Log messages with timestamps
log_message() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}

# Check if a screen session exists
screen_session_exists() {
    screen -list | grep -q "[0-9]\+\.$1"
}

# Start a screen session with PHP crawler
start_session() {
    local session_name="$1"
    log_message "Starting session: $session_name"
    screen -S "$session_name" -d -m php "$CRAWL_SCRIPT"
    sleep 2  # Brief delay to allow proper initialization
}

# ========================
# MAIN SCRIPT EXECUTION
# ========================

# Initialize log file
echo "=== Watchdog Started $(date) ===" > "$LOG_FILE"
log_message "Initializing crawler sessions..."

# Initial startup: Check and start any missing sessions
for session in "${SCREEN_SESSIONS[@]}"; do
    if screen_session_exists "$session"; then
        log_message "✓ $session is running"
    else
        log_message "✗ $session not found - starting..."
        start_session "$session"
    fi
done

log_message "Initial startup complete. Beginning monitoring loop..."

# ========================
# MONITORING LOOP
# ========================

# Infinite monitoring loop
while true; do
    for session in "${SCREEN_SESSIONS[@]}"; do
        if ! screen_session_exists "$session"; then
            log_message "⚠️  $session died - restarting..."
            start_session "$session"
        fi
    done
    
    # Wait before next check (adjust timing as needed)
    sleep 45
done

Deployment Commands

# Make script executable
chmod +x watchdog.sh

#

Test Run:

./watchdog.sh

Deploy in Background:

screen -S watchdog -d -m ./watchdog.sh

Monitor Logs:

tail -f /path/to/watchdog.log

Script Customization Tips

  • Adjust Monitoring Frequency: Change sleep 45 to sleep 30 for more frequent checks
  • Add More Instances: Expand the SCREEN_SESSIONS array with additional session names
  • Include Email Alerts: Add mail command within the restart section for notifications
  • Add Resource Checks: Incorporate CPU/memory monitoring before restarting processes

This complete script provides enterprise-level process monitoring without any external dependencies—just pure bash power!

Congratulations! You’ve just built a professional-grade, self-healing monitoring system for your PHP applications!

By implementing this bash watchdog script, you’ve taken a major step toward server automation and reliability. No longer will you need to manually check whether your scripts are running—your system now watches itself.

This approach isn’t just limited to PHP crawlers. You can adapt this watchdog to monitor any long-running process: Python data pipelines, Node.js services, backup scripts, or even custom CLI tools. The logic remains the same; only the command changes.

Key Achievements:
✅ Automated monitoring and recovery
✅ Detailed logging for auditing and debugging
✅ Lightweight, no external dependencies required
✅ Easily scalable to monitor dozens of processes

Want to take this further? Consider adding email alerts on restart, integrating with systemd for even more robustness, or building a dashboard to visualize process health.

This script was created with the assistance of DeepSeek AI. For more command-line tutorials and DevOps insights, visit IT-INDIA.com.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

No Ads, No Buy Buttons! IT-INDIA.org