Logging, Monitoring, and Detection

Security assessments find vulnerabilities. They don’t detect exploitation. Logging and monitoring do that. Yet scope documents focus entirely on finding vulnerabilities and ignore whether the organization can detect attacks when they happen.

The Detection Gap

A typical engagement:

Scope: Test for prompt injection, data leakage, and unauthorized access.
Deliverable: Report of findings with severity ratings.
Missing: Can the organization detect these attacks in production?

Finding a vulnerability is useful. Knowing if attackers are actively exploiting it is critical. If logs don’t capture relevant events, detection is impossible.

What Should Be Logged

Most applications log something. The question is whether logs capture security-relevant events.

Input Logging

@app.route('/chat', methods=['POST'])
def chat():
    message = request.json['message']
    
    # No logging
    response = model_api.call(message)
    return response

No record of user inputs. If an attack succeeds, no evidence exists. Can’t investigate. Can’t determine scope. Can’t identify attacker.

Basic logging:

@app.route('/chat', methods=['POST'])
def chat():
    message = request.json['message']
    user_id = get_authenticated_user()
    
    logger.info(f"User {user_id} sent message: {message}")
    
    response = model_api.call(message)
    return response

Now inputs are logged. Can search logs for suspicious patterns. But is this sufficient?

Output Logging

response = model_api.call(message)

# No output logging
return response

Model response isn’t logged. If the model leaks data, no record exists. Can’t audit what information was disclosed.

With output logging:

response = model_api.call(message)

logger.info(f"Model response to user {user_id}: {response}")

return response

Now both inputs and outputs are logged. Can correlate requests with responses. Can audit what data was disclosed.

But: logging all outputs might create privacy issues. Model responses might contain PII. Logs become sensitive data requiring protection.

Function Call Logging

def execute_function(function_name, parameters):
    # No logging
    result = functions[function_name](**parameters)
    return result

No record of function executions. If unauthorized function calls occur, no evidence.

With logging:

def execute_function(user_id, function_name, parameters):
    logger.info(f"User {user_id} called {function_name} with {parameters}")
    
    result = functions[function_name](**parameters)
    
    logger.info(f"Function {function_name} returned: {result}")
    
    return result

Function calls and results are logged. Can audit who called what. Can detect unauthorized function access.

Error Logging

try:
    response = model_api.call(message)
except APIError as e:
    # No error logging
    return "An error occurred"

Errors aren’t logged. Can’t determine error frequency. Can’t identify patterns. Can’t detect attacks that trigger errors.

With error logging:

try:
    response = model_api.call(message)
except APIError as e:
    logger.error(f"API error for user {user_id}: {str(e)}")
    logger.error(f"Input that caused error: {message}")
    return "An error occurred"

Errors are logged with context. Can analyze error patterns. Can detect attacks causing systematic errors.

Authentication and Authorization Events

def check_permission(user_id, resource):
    has_permission = db.query(
        "SELECT * FROM permissions WHERE user_id = ? AND resource = ?",
        user_id, resource
    )
    return has_permission

# No logging

Permission checks happen silently. No audit trail of access attempts.

With logging:

def check_permission(user_id, resource):
    has_permission = db.query(
        "SELECT * FROM permissions WHERE user_id = ? AND resource = ?",
        user_id, resource
    )
    
    logger.info(f"User {user_id} permission check for {resource}: {has_permission}")
    
    return has_permission

Permission checks are logged. Can detect unauthorized access attempts. Can audit who accessed what.

Log Retention and Storage

Logs need to be retained long enough to be useful.

Insufficient Retention

# Logs kept for 7 days
log_retention_days = 7

Attack happens on day 1. Goes unnoticed. Discovered on day 10. Logs from the attack are gone.

Need longer retention:

# Security logs kept for 90 days
security_log_retention_days = 90

# Audit logs kept for 1 year
audit_log_retention_days = 365

Retention period depends on detection time and compliance requirements.

Log Storage Security

# Logs written to file
with open('/var/log/app.log', 'a') as f:
    f.write(f"{timestamp} {message}\n")

If file permissions allow read access to unauthorized users, logs can be tampered with or deleted.

Secure storage:

# Logs sent to centralized logging system
log_client.send({
    "timestamp": timestamp,
    "level": level,
    "message": message,
    "user_id": user_id
})

Centralized logging systems provide:

  • Access controls
  • Immutability (append-only)
  • Encryption at rest
  • Retention management

Log Encryption

# Logs contain PII
logger.info(f"User {email} logged in")
logger.info(f"Query result: {user_data}")

If logs contain PII and aren’t encrypted, anyone with log access can view sensitive data.

Scope should specify: “Verify logs containing PII are encrypted at rest and in transit.”

Monitoring and Alerting

Logs are useful only if monitored. Most organizations collect logs but don’t actively monitor them.

Pattern Detection

# Monitoring for prompt injection patterns
suspicious_patterns = [
    "ignore previous instructions",
    "disregard",
    "system prompt",
    "TRIGGER"
]

def check_input(message):
    for pattern in suspicious_patterns:
        if pattern.lower() in message.lower():
            alert_security_team(f"Suspicious pattern detected: {pattern}")

Simple pattern matching catches obvious attacks. But sophisticated attacks evade simple patterns.

Behavioral Anomalies

# Monitor for unusual behavior
def monitor_user_behavior(user_id):
    recent_requests = get_recent_requests(user_id, time_window=3600)
    
    if len(recent_requests) > 100:
        alert(f"User {user_id} made {len(recent_requests)} requests in 1 hour")
    
    error_rate = calculate_error_rate(recent_requests)
    if error_rate > 0.5:
        alert(f"User {user_id} has {error_rate*100}% error rate")

Detects anomalies like:

  • Excessive request volume
  • High error rates
  • Unusual access patterns
  • Request spikes

Function Call Monitoring

def monitor_function_calls():
    recent_calls = get_recent_function_calls(time_window=3600)
    
    # Check for repeated admin function calls
    admin_calls = [c for c in recent_calls if c['function'] in admin_functions]
    if len(admin_calls) > 10:
        alert("Unusual number of admin function calls")
    
    # Check for function call chains
    for user_id in set(c['user_id'] for c in recent_calls):
        user_calls = [c for c in recent_calls if c['user_id'] == user_id]
        if len(user_calls) > 20:
            alert(f"User {user_id} made {len(user_calls)} function calls")

Monitors for:

  • Unauthorized function access attempts
  • Excessive function usage
  • Suspicious function call patterns

Model Behavior Monitoring

def monitor_model_responses():
    recent_responses = get_recent_responses(time_window=3600)
    
    # Check for refusal rate changes
    refusal_rate = calculate_refusal_rate(recent_responses)
    baseline_refusal_rate = 0.02
    
    if refusal_rate < baseline_refusal_rate * 0.5:
        alert("Model refusal rate dropped significantly")
    
    # Check for unusual response patterns
    avg_response_length = calculate_avg_length(recent_responses)
    if avg_response_length > baseline_avg_length * 2:
        alert("Model generating unusually long responses")

Detects:

  • Jailbreak success (lower refusal rates)
  • Data extraction attempts (longer responses)
  • Model behavior changes

Incident Investigation

When attacks are detected, logs enable investigation.

Timeline Reconstruction

def investigate_incident(user_id, timestamp):
    # Get all events around incident time
    events = db.query("""
        SELECT * FROM logs 
        WHERE user_id = ? 
        AND timestamp BETWEEN ? AND ?
        ORDER BY timestamp
    """, user_id, timestamp - 3600, timestamp + 3600)
    
    return events

Comprehensive logs allow reconstructing what happened:

  • What inputs were sent
  • What outputs were generated
  • What functions were called
  • What errors occurred

Attacker Identification

def identify_attacker(suspicious_pattern):
    # Find all users who used this pattern
    users = db.query("""
        SELECT DISTINCT user_id 
        FROM logs 
        WHERE message LIKE ?
    """, f"%{suspicious_pattern}%")
    
    # Correlate with other indicators
    for user_id in users:
        user_activity = analyze_user_activity(user_id)
        if is_suspicious(user_activity):
            report_attacker(user_id, user_activity)

Logs enable identifying attackers and their techniques.

Impact Assessment

def assess_impact(incident_timestamp):
    # What data was potentially exposed?
    responses = db.query("""
        SELECT response 
        FROM logs 
        WHERE timestamp >= ? 
        AND user_id = ?
    """, incident_timestamp, attacker_user_id)
    
    # Did attacker access unauthorized data?
    unauthorized_access = db.query("""
        SELECT * FROM logs 
        WHERE user_id = ? 
        AND action = 'permission_denied'
    """, attacker_user_id)
    
    return {
        "exposed_data": analyze_responses(responses),
        "unauthorized_attempts": len(unauthorized_access)
    }

Logs determine breach scope and impact.

What Scope Should Specify

Security assessments should verify logging and monitoring capabilities.

Log Coverage

Verify the following events are logged:
- All user inputs to model
- All model responses
- All function calls with parameters and results
- All authentication and authorization events
- All errors and exceptions
- All configuration changes

Log Content

Verify logs include:
- Timestamp (UTC, millisecond precision)
- User identifier
- Session identifier  
- Request identifier (for correlation)
- IP address
- Event type
- Event details

Log Security

Verify:
- Logs stored in centralized system with access controls
- Logs encrypted at rest and in transit
- Log retention meets compliance requirements (90 days minimum)
- Logs are immutable (append-only)
- Log access is audited

Monitoring Capabilities

Verify monitoring for:
- Suspicious input patterns (prompt injection indicators)
- Anomalous user behavior (request volume, error rates)
- Unauthorized function access attempts
- Model behavior changes (refusal rate, response length)
- Automated alerting for critical events

Incident Response

Verify:
- Incident response procedures exist
- Procedures include log analysis steps
- Procedures define timeline reconstruction methods
- Procedures specify escalation criteria
- Procedures are tested regularly

Testing Logging and Monitoring

Scope should include active testing of detection capabilities.

Detection Testing

Test whether monitoring detects:

1. Prompt injection attempts:
   - Send known prompt injection patterns
   - Verify detection and alerting
   - Response time: < 5 minutes

2. Data extraction attempts:
   - Attempt to extract system prompt
   - Attempt to extract training data
   - Verify detection and alerting

3. Function abuse:
   - Attempt unauthorized function calls
   - Attempt function call chains
   - Verify detection and alerting

4. Volume attacks:
   - Send excessive requests
   - Verify rate limiting triggers
   - Verify alerting occurs

Success criteria: 
- 100% detection of critical attacks
- < 5 minute alert latency
- < 1% false positive rate

Log Completeness Testing

Verify logs capture required information:

1. Perform test attack
2. Check logs for evidence
3. Verify all expected fields present
4. Verify log correlation possible
5. Verify timeline reconstruction works

Missing fields indicate logging gaps.

Alert Testing

Test alerting mechanisms:

1. Trigger detection rule
2. Verify alert generated
3. Verify alert reaches security team
4. Measure alert latency
5. Verify alert contains actionable information

If alerts don't reach the right people, detection is useless.

Common Logging Failures

Logging Sensitive Data

# Logs API keys
logger.info(f"Calling API with key: {api_key}")

# Logs passwords
logger.info(f"User {user_id} changed password to: {new_password}")

Logs should not contain secrets. If attackers gain log access, they gain secrets.

Insufficient Context

# Not useful
logger.info("Error occurred")

# Useful
logger.error(f"API call failed for user {user_id} with message '{message}': {error_details}")

Logs need context to be actionable.

No Correlation IDs

# Request 1
logger.info(f"User sent message")
logger.info(f"Model responded")

# Request 2  
logger.info(f"User sent message")
logger.info(f"Model responded")

Without correlation IDs, can’t determine which response belongs to which request.

Better:

request_id = generate_request_id()

logger.info(f"[{request_id}] User {user_id} sent message")
logger.info(f"[{request_id}] Model responded")

Correlation IDs enable request tracking.

Logging After Failure

try:
    response = model_api.call(message)
    logger.info("Successfully called model")
except:
    # No logging
    pass

If the exception prevents logging, no record of the failure exists.

Log before operations:

logger.info(f"Calling model with message: {message}")

try:
    response = model_api.call(message)
    logger.info("Model call succeeded")
except Exception as e:
    logger.error(f"Model call failed: {e}")

Scope Language for Logging and Monitoring

Bad scope:

Test AI system security.

Doesn’t address detection capabilities.

Better scope:

Test AI system security and detection including:

1. Verify logging coverage:
   - User inputs logged
   - Model outputs logged
   - Function calls logged
   - Errors logged
   - Authentication events logged

2. Verify log security:
   - Centralized storage with access controls
   - Encryption at rest and in transit
   - 90-day retention
   - Immutable logs

3. Test monitoring capabilities:
   - Detect prompt injection attempts
   - Detect excessive request volume
   - Detect unauthorized function access
   - Alert latency < 5 minutes

4. Test incident response:
   - Reconstruct attack timeline from logs
   - Identify attacker from log analysis
   - Assess impact from logged data

5. Provide:
   - Detection test results (what was detected, what wasn't)
   - Log analysis samples showing investigation process
   - Monitoring coverage gaps
   - Recommended additional monitoring rules

Conclusion

Finding vulnerabilities matters. Detecting exploitation matters more. Organizations that can’t detect attacks in progress can’t respond to them.

Effective AI security requires:

  • Comprehensive logging of security events
  • Secure log storage with appropriate retention
  • Active monitoring and alerting
  • Incident investigation capabilities
  • Regular testing of detection mechanisms

Scope documents that focus only on vulnerability finding without addressing detection leave organizations blind to active exploitation. Testing should verify both that vulnerabilities exist and that exploitation would be detected.


[Original Source](No response)