What is a plagiarism checker?

Question

Accepted Answer

## What is a Plagiarism Checker?

A **plagiarism checker** is a tool that detects whether text is copied or closely derived from existing sources — comparing submitted content against databases of web pages, academic papers, and other documents.

### How Plagiarism Checkers Work

```
Submitted text
      ↓
1. Fingerprinting: Break into n-grams (overlapping word sequences)
      ↓
2. Search: Compare against large database (web, academic papers, student submissions)
      ↓
3. Match detection: Find similar or identical passages
      ↓
4. Similarity report: Show % match and highlight sources
```

### Technical Implementation (Basic)

```python
from difflib import SequenceMatcher
import hashlib

def basic_plagiarism_check(submitted: str, database: list[str]) -> list[dict]:
    results = []

for doc in database:
        # Calculate similarity ratio
        similarity = SequenceMatcher(None, submitted.lower(), doc.lower()).ratio()

if similarity > 0.3:  # 30% threshold
            results.append({
                "similarity": similarity,
                "matched_text": doc[:100],
                "match_percent": f"{similarity:.1%}"
            })

return sorted(results, key=lambda x: x["similarity"], reverse=True)

# N-gram fingerprinting (more robust)
def get_ngrams(text: str, n: int = 5) -> set:
    words = text.lower().split()
    return {" ".join(words[i:i+n]) for i in range(len(words) - n + 1)}

def ngram_similarity(text1: str, text2: str, n: int = 5) -> float:
    ng1 = get_ngrams(text1, n)
    ng2 = get_ngrams(text2, n)
    intersection = ng1 & ng2
    union = ng1 | ng2
    return len(intersection) / len(union) if union else 0.0

similarity = ngram_similarity("The cat sat on the mat", "A cat was sitting on the mat")
print(f"Similarity: {similarity:.1%}")
```

### Popular Tools

| Tool | Use Case | Database |
|------|---------|---------|
| **Turnitin** | Academic/education | Student papers, web, journals |
| **Grammarly** | Writing assistance | Web content |
| **Copyscape** | Web content | Web pages |
| **iThenticate** | Research/publishing | Academic journals |
| **Unicheck** | Education | Web + student submissions |
| **PlagScan** | Enterprise | Web + academic |

### Types of Plagiarism Detected

| Type | Description |
|------|-------------|
| **Verbatim** | Exact word-for-word copy |
| **Paraphrasing** | Same ideas, different words |
| **Mosaic** | Mixing quoted and paraphrased content |
| **Self-plagiarism** | Reusing own previous work without attribution |
| **AI-generated** | Content from AI tools (newer detectors) |

### Modern Plagiarism Checkers vs AI Detectors

| | Plagiarism Checker | AI Detector |
|--|-------------------|------------|
| **Detects** | Copying from sources | AI-generated text |
| **Compares against** | Document databases | Statistical patterns |
| **Accuracy** | High for exact matches | Variable (~80-90%) |
| **False positives** | Low | Higher (10-20%) |
| **Turnitin** | ✅ Classic function | ✅ Added in 2023 |

### Integration in CI/CD (Code Plagiarism)

```python
# Detecting code plagiarism (for programming assignments)
import ast
import hashlib

def normalize_code(code: str) -> str:
    '''Normalize Python code by removing variable names'''
    try:
        tree = ast.parse(code)
        # Replace variable names with generic placeholders
        for node in ast.walk(tree):
            if isinstance(node, ast.Name):
                node.id = "VAR"
        return ast.dump(tree)
    except SyntaxError:
        return code

def code_similarity(code1: str, code2: str) -> float:
    norm1 = normalize_code(code1)
    norm2 = normalize_code(code2)
    return SequenceMatcher(None, norm1, norm2).ratio()
```

What is a plagiarism checker?

Answer

What is a Plagiarism Checker?

How Plagiarism Checkers Work

Technical Implementation (Basic)

Popular Tools

Types of Plagiarism Detected

Modern Plagiarism Checkers vs AI Detectors

Integration in CI/CD (Code Plagiarism)

Related Concepts

What is AI?

What are all the current types of AI?

What is Machine Learning (ML)?

What is Deep Learning in AI?

What is an LLM?

Tool	Use Case	Database
Turnitin	Academic/education	Student papers, web, journals
Grammarly	Writing assistance	Web content
Copyscape	Web content	Web pages
iThenticate	Research/publishing	Academic journals
Unicheck	Education	Web + student submissions
PlagScan	Enterprise	Web + academic

Type	Description
Verbatim	Exact word-for-word copy
Paraphrasing	Same ideas, different words
Mosaic	Mixing quoted and paraphrased content
Self-plagiarism	Reusing own previous work without attribution
AI-generated	Content from AI tools (newer detectors)

	Plagiarism Checker	AI Detector
Detects	Copying from sources	AI-generated text
Compares against	Document databases	Statistical patterns
Accuracy	High for exact matches	Variable (~80-90%)
False positives	Low	Higher (10-20%)
Turnitin	✅ Classic function	✅ Added in 2023