How to keep AI secured? What are the security aspects of AI?
#gen-ai#security
Answer
How to Keep AI Secured — Security Aspects of AI
AI systems introduce new attack surfaces. Here are the key security domains and how to address them.
Security Threat Landscape
| Threat | Description | Impact |
|---|---|---|
| Prompt injection | Override AI instructions via malicious input | High |
| Data poisoning | Corrupt training/RAG data | High |
| Model theft | Extract model weights or knowledge | Medium |
| PII leakage | AI reveals sensitive user data | High |
| Hallucination abuse | Exploit AI-generated misinformation | Medium |
| Insecure tool use | Agent takes unauthorized actions | Critical |
| Over-permissioned agents | Agent can access more than needed | High |
1. Input Validation and Sanitization
pythondef validate_and_sanitize_input(user_input: str) -> tuple[bool, str]: # Length check if len(user_input) > 10000: return False, "Input too long" # Injection pattern check injection_patterns = [ "ignore previous", "disregard instructions", "you are now", "act as", "jailbreak" ] lower_input = user_input.lower() if any(p in lower_input for p in injection_patterns): return False, "Suspicious input detected" # PII detection (basic) import re if re.search(r'\b\d{3}-\d{2}-\d{4}\b', user_input): # SSN pattern return False, "Personal information not allowed" return True, user_input
2. Output Filtering (Guardrails)
pythonfrom anthropic import Anthropic client = Anthropic() BLOCKED_CONTENT = ["confidential", "system prompt", "internal instructions"] def safe_generate(user_message: str, system_prompt: str) -> str: response = client.messages.create( model="claude-opus-4-6", system=system_prompt, messages=[{"role": "user", "content": user_message}] ) output = response.content[0].text # Check output doesn't leak sensitive info for term in BLOCKED_CONTENT: if term in output.lower(): return "I cannot provide that information." return output
3. Principle of Least Privilege for Agents
python# Bad: agent has unrestricted tool access agent = Agent(tools=["read_file", "write_file", "delete_file", "execute_command"]) # Good: limit to what's actually needed agent = Agent( tools=["read_file"], # Only read, not write/delete allowed_paths=["/data/docs"], # Only specific directories allowed_domains=["api.company.com"], # Only approved APIs max_iterations=10, # Prevent infinite loops require_human_approval=["delete", "send_email"] # Human-in-the-loop for destructive ops )
4. Audit Logging
pythonimport logging from datetime import datetime security_logger = logging.getLogger("ai_security") def logged_ai_call(user_id: str, user_message: str, response: str): security_logger.info({ "timestamp": datetime.utcnow().isoformat(), "user_id": user_id, "input_hash": hash(user_message), # Hash, not raw input (privacy) "input_length": len(user_message), "output_length": len(response), "flagged": detect_injection(user_message) })
5. RAG Security
python# Secure document retrieval def secure_rag(query: str, user_role: str) -> str: # Filter docs by user's access level accessible_docs = vector_db.query( query=query, filter={"access_level": {"$lte": user_role_to_level(user_role)}} ) # Clearly mark retrieved content as data, not instructions context = f"<retrieved_documents>\n{accessible_docs}\n</retrieved_documents>" return llm.invoke(f"Answer based on documents only:\n{context}\n\nQ: {query}")
Security Checklist
- Validate and sanitize all user inputs
- Filter outputs for sensitive data leakage
- Apply least-privilege to agent tool access
- Require human approval for destructive operations
- Audit log all AI interactions
- Separate system instructions from user-provided data
- Regularly red-team your AI system
- Monitor for unusual query patterns
- Use content moderation APIs for user-generated content
- Encrypt data in transit and at rest