Concept #104Mediumextended-ai-concepts

How to keep AI secured? What are the security aspects of AI?

#gen-ai#security

Answer

How to Keep AI Secured — Security Aspects of AI

AI systems introduce new attack surfaces. Here are the key security domains and how to address them.

Security Threat Landscape

ThreatDescriptionImpact
Prompt injectionOverride AI instructions via malicious inputHigh
Data poisoningCorrupt training/RAG dataHigh
Model theftExtract model weights or knowledgeMedium
PII leakageAI reveals sensitive user dataHigh
Hallucination abuseExploit AI-generated misinformationMedium
Insecure tool useAgent takes unauthorized actionsCritical
Over-permissioned agentsAgent can access more than neededHigh

1. Input Validation and Sanitization

python
def validate_and_sanitize_input(user_input: str) -> tuple[bool, str]:
    # Length check
    if len(user_input) > 10000:
        return False, "Input too long"

    # Injection pattern check
    injection_patterns = [
        "ignore previous", "disregard instructions",
        "you are now", "act as", "jailbreak"
    ]
    lower_input = user_input.lower()
    if any(p in lower_input for p in injection_patterns):
        return False, "Suspicious input detected"

    # PII detection (basic)
    import re
    if re.search(r'\b\d{3}-\d{2}-\d{4}\b', user_input):  # SSN pattern
        return False, "Personal information not allowed"

    return True, user_input

2. Output Filtering (Guardrails)

python
from anthropic import Anthropic

client = Anthropic()

BLOCKED_CONTENT = ["confidential", "system prompt", "internal instructions"]

def safe_generate(user_message: str, system_prompt: str) -> str:
    response = client.messages.create(
        model="claude-opus-4-6",
        system=system_prompt,
        messages=[{"role": "user", "content": user_message}]
    )
    output = response.content[0].text

    # Check output doesn't leak sensitive info
    for term in BLOCKED_CONTENT:
        if term in output.lower():
            return "I cannot provide that information."
    return output

3. Principle of Least Privilege for Agents

python
# Bad: agent has unrestricted tool access
agent = Agent(tools=["read_file", "write_file", "delete_file", "execute_command"])

# Good: limit to what's actually needed
agent = Agent(
    tools=["read_file"],         # Only read, not write/delete
    allowed_paths=["/data/docs"],  # Only specific directories
    allowed_domains=["api.company.com"],  # Only approved APIs
    max_iterations=10,           # Prevent infinite loops
    require_human_approval=["delete", "send_email"]  # Human-in-the-loop for destructive ops
)

4. Audit Logging

python
import logging
from datetime import datetime

security_logger = logging.getLogger("ai_security")

def logged_ai_call(user_id: str, user_message: str, response: str):
    security_logger.info({
        "timestamp": datetime.utcnow().isoformat(),
        "user_id": user_id,
        "input_hash": hash(user_message),  # Hash, not raw input (privacy)
        "input_length": len(user_message),
        "output_length": len(response),
        "flagged": detect_injection(user_message)
    })

5. RAG Security

python
# Secure document retrieval
def secure_rag(query: str, user_role: str) -> str:
    # Filter docs by user's access level
    accessible_docs = vector_db.query(
        query=query,
        filter={"access_level": {"$lte": user_role_to_level(user_role)}}
    )

    # Clearly mark retrieved content as data, not instructions
    context = f"<retrieved_documents>\n{accessible_docs}\n</retrieved_documents>"
    return llm.invoke(f"Answer based on documents only:\n{context}\n\nQ: {query}")

Security Checklist

  • Validate and sanitize all user inputs
  • Filter outputs for sensitive data leakage
  • Apply least-privilege to agent tool access
  • Require human approval for destructive operations
  • Audit log all AI interactions
  • Separate system instructions from user-provided data
  • Regularly red-team your AI system
  • Monitor for unusual query patterns
  • Use content moderation APIs for user-generated content
  • Encrypt data in transit and at rest