Health API Reference

Monitor the health and status of the O1 backend service. The Health API provides essential information about service availability and performance.

Endpoints Overview

Method	Endpoint	Description
GET	`/health`	Check service health

Health Check

Check Service Health

GET /health

Response:

{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:00Z",
  "version": "0.1.0",
  "uptime": "15d 6h 30m 45s",
  "services": {
    "database": "healthy",
    "websocket": "healthy",
    "monitoring": "healthy"
  }
}

Response Details

Healthy Status

When all services are operating normally:

{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:00Z",
  "version": "0.1.0",
  "uptime": "15d 6h 30m 45s",
  "services": {
    "database": "healthy",
    "websocket": "healthy",
    "monitoring": "healthy",
    "scheduler": "healthy"
  },
  "metrics": {
    "activeConnections": 15,
    "memoryUsage": "45.2MB",
    "cpuUsage": "12.5%"
  }
}

Degraded Status

When some services are experiencing issues:

{
  "status": "degraded",
  "timestamp": "2024-01-15T10:30:00Z",
  "version": "0.1.0",
  "uptime": "15d 6h 30m 45s",
  "services": {
    "database": "healthy",
    "websocket": "healthy",
    "monitoring": "unhealthy",
    "scheduler": "healthy"
  },
  "issues": [
    {
      "service": "monitoring",
      "status": "unhealthy",
      "message": "Metrics collection service not responding",
      "impact": "Metrics and monitoring functionality degraded"
    }
  ],
  "metrics": {
    "activeConnections": 15,
    "memoryUsage": "45.2MB",
    "cpuUsage": "12.5%"
  }
}

Unhealthy Status

When critical services are down:

{
  "status": "unhealthy",
  "timestamp": "2024-01-15T10:30:00Z",
  "version": "0.1.0",
  "uptime": "15d 6h 30m 45s",
  "services": {
    "database": "unhealthy",
    "websocket": "healthy",
    "monitoring": "unhealthy",
    "scheduler": "unhealthy"
  },
  "issues": [
    {
      "service": "database",
      "status": "unhealthy",
      "message": "Database connection failed",
      "impact": "All data operations unavailable"
    },
    {
      "service": "scheduler",
      "status": "unhealthy",
      "message": "Scheduler service not running",
      "impact": "Scheduled jobs not executing"
    }
  ],
  "recommendations": [
    "Check database connectivity",
    "Restart scheduler service",
    "Review application logs for errors"
  ]
}

Service Status Codes

Status	Description
`healthy`	Service operating normally
`degraded`	Service experiencing minor issues
`unhealthy`	Service unavailable or critical issues
`unknown`	Service status cannot be determined

Monitoring Integration

External Health Checks

Integrate with external monitoring systems:

# Simple health check (returns 200 if healthy)
curl -f http://localhost:3000/health

# Get detailed health information
curl http://localhost:3000/health | jq .

# Check specific service
curl http://localhost:3000/health | jq '.services.database'

Automated Monitoring

Set up automated health checks:

# Prometheus configuration
scrape_configs:
  - job_name: 'o1-health'
    static_configs:
      - targets: ['localhost:3000']
    metrics_path: '/health'
    params:
      format: ['prometheus']

Error Responses

Service Unavailable

When the health check endpoint itself is unavailable:

{
  "status": "unhealthy",
  "error": "Service unavailable",
  "message": "Health check endpoint not accessible",
  "timestamp": "2024-01-15T10:30:00Z"
}

Best Practices

Health Check Configuration

Set appropriate timeout values for health checks
Implement circuit breaker patterns
Monitor health check response times
Set up alerting for unhealthy states

Service Monitoring

Monitor individual service health
Track service dependencies
Implement graceful degradation
Set up automated recovery procedures

Performance Monitoring

Monitor response times and throughput
Track resource utilization
Set performance baselines
Implement capacity planning

Integration Examples

Load Balancer Health Check

# Nginx configuration
upstream o1_backend {
  server localhost:3000;
}

server {
  location /health {
    proxy_pass http://o1_backend/health;
    proxy_next_upstream error timeout invalid_header;
    proxy_connect_timeout 2s;
    proxy_read_timeout 5s;
  }
}

Container Health Check

# Dockerfile health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

Kubernetes Health Check

# Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
        - name: o1-backend
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 5

Troubleshooting

Common Issues

Service Not Starting

# Check service logs
docker-compose logs backend

# Verify dependencies
curl http://localhost:3000/health

# Check database connectivity
psql -h localhost -U o1 -d o1

High Resource Usage

# Check memory usage
top -p $(pgrep -f "node.*o1")

# Monitor connections
netstat -an | grep :3000 | wc -l

# Check for memory leaks
node --inspect-brk server.js

Connection Issues

# Check port availability
netstat -tulpn | grep :3000

# Test connectivity
telnet localhost 3000

# Check firewall rules
ufw status

For detailed schema definitions, refer to the OpenAPI specification published alongside the backend service.

Documentation Index

​Health API Reference

​Endpoints Overview

​Health Check

​Check Service Health

​Response Details

​Healthy Status

​Degraded Status

​Unhealthy Status

​Service Status Codes

​Monitoring Integration

​External Health Checks

​Automated Monitoring

​Error Responses

​Service Unavailable

​Best Practices

​Health Check Configuration

​Service Monitoring

​Performance Monitoring

​Integration Examples

​Load Balancer Health Check

​Container Health Check

​Kubernetes Health Check

​Troubleshooting

​Common Issues