Duplicate Files Search & Link — A Complete Guide for Windows & macOS
Overview
Duplicate files consume disk space, cause version confusion, and complicate backups. This guide covers how to find duplicates on Windows and macOS, methods to handle them safely, and how to replace duplicates with links (hard links or symbolic links) to save space while preserving accessibility.
Key concepts
- Duplicate file: Two or more files with identical content (not necessarily the same name).
- Checksum/hash: A fingerprint (e.g., MD5, SHA-1, SHA-256) used to confirm identical content.
- Hard link: A directory entry that points to the same filesystem inode as another file. Saves space; both names are equal — deleting one leaves the other intact. Works only within the same filesystem.
- Symbolic link (symlink): A special file that points to another file path. Can cross filesystems and point to directories; deleting the target breaks the link.
- Reparse/junction (Windows): Windows-specific link types for directories (junctions) and files (symlinks).
Preparation and safety
- Back up important data before deleting or replacing files.
- Work on copies or start with a small folder to confirm behavior.
- Prefer read-only checks first: identify duplicates and review before modifying.
- Understand filesystem limits: hard links can’t cross drives; symlinks require admin/Developer Mode on Windows (or appropriate privileges).
How to find duplicates
General approach:
- Scan filenames and sizes to shortlist candidates.
- Compare file hashes (e.g., SHA-256) to confirm identical content.
- Optionally perform byte-by-byte comparison for final verification.
Tools — Windows:
- Built-in: Use PowerShell (script below) for robust, scriptable detection.
- GUI apps: AllDup, Duplicate Cleaner, CCleaner’s duplicate finder (use cautiously).
- Command-line: fdupes (via WSL or Windows ports), dupeGuru (cross-platform).
Tools — macOS:
- Built-in: Terminal + commands (md5/sha256, find).
- GUI apps: Gemini 2, dupeGuru.
- Command-line: fdupes, rdfind, or custom scripts using md5/sha256.
Example PowerShell (Windows) — list duplicates by SHA-256:
powershell
Get-ChildItem -Recurse -File “C:\Path\To\Scan” | Get-FileHash -Algorithm SHA256 | Group-Object -Property Hash | Where-Object { \(_</span><span class="token" style="color: rgb(57, 58, 52);">.</span><span>Count </span><span class="token" style="color: rgb(57, 58, 52);">-gt</span><span> 1 </span><span class="token" style="color: rgb(57, 58, 52);">}</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">|</span><span> </span><span> </span><span class="token" style="color: rgb(57, 58, 52);">ForEach-Object</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">{</span><span> </span><span class="token" style="color: rgb(54, 172, 170);">\).Group | Select-Object Path, Hash }
Example macOS shell (SHA-256):
bash
find /Path/To/Scan -type f -print0 | xargs -0 sha256sum | sort | awk ‘\(1==last{print last_line"\n"\)0} {last=\(1; last_line=\)0}’
How to handle duplicates safely
Options:
- Delete duplicates: Keep one canonical copy; delete others.
- Move duplicates to archive: Move to a separate folder for review before deletion.
- Replace duplicates with links: Convert duplicates into hard links or symlinks pointing to single canonical file.
When to use links:
- Use hard links when files must appear at multiple paths on the same filesystem and you want true single-storage behavior.
- Use symlinks when duplicates span different drives or when linking directories.
Example PowerShell to replace duplicates with hard links:
- Identify duplicates (as above).
- For each duplicate group, choose one master file (e.g., earliest or in preferred folder).
- Remove other files and create hard links:
powershell
# Pseudocode outline — test carefully \(groups</span><span> = </span><span class="token" style="color: rgb(57, 58, 52);">Get-ChildItem</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">-</span><span>Recurse </span><span class="token" style="color: rgb(57, 58, 52);">-</span><span>File </span><span class="token" style="color: rgb(57, 58, 52);">|</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">Get-FileHash</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">-</span><span>Algorithm SHA256 </span><span class="token" style="color: rgb(57, 58, 52);">|</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">Group-Object</span><span> Hash </span><span class="token" style="color: rgb(57, 58, 52);">|</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">Where-Object</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">{</span><span> </span><span class="token" style="color: rgb(54, 172, 170);">\).Count -gt 1 } foreach (\(g</span><span> in </span><span class="token" style="color: rgb(54, 172, 170);">\)groups) { \(master</span><span> = </span><span class="token" style="color: rgb(54, 172, 170);">\)g.Group | Select-Object -First 1 \(others</span><span> = </span><span class="token" style="color: rgb(54, 172, 170);">\)g.Group | Select-Object -Skip 1 foreach (\(f</span><span> in </span><span class="token" style="color: rgb(54, 172, 170);">\)others) { Remove-Item \(f</span><span class="token" style="color: rgb(57, 58, 52);">.</span><span>Path </span><span> </span><span class="token" style="color: rgb(57, 58, 52);">New-Item</span><span> </span><span class="token" style="color: rgb(57, 58, 52);">-</span><span>ItemType HardLink </span><span class="token" style="color: rgb(57, 58, 52);">-</span><span>Path </span><span class="token" style="color: rgb(54, 172, 170);">\)f.Path -Target $master.Path } }
Notes: Run on copies first. Hard links require same filesystem and appropriate permissions.
Example macOS/Linux to replace with hard links:
bash
# Using rdfind to replace duplicates with hardlinks: rdfind -makesymlinks false -makehardlinks true /Path/To/Scan
Or manual:
- Compute hashes and choose master.
- rm duplicate && ln /path/to/master /path/to/duplicate
Windows specifics
- Creating symlinks: New-Item -ItemType SymbolicLink (may require admin or Developer Mode).
- Junctions: Use mklink /J for directories.
- Hard links: Use mklink /H or New-Item -ItemType HardLink in PowerShell.
macOS specifics
- Use ln for hard links: ln /path/to/master /path/to/link
- Use ln -s for symlinks: ln -s /path/to/target /path/to/link
- APFS supports clones (copy-on-write) which is another space-efficient option — tools must support it.
Automation and best practices
- Exclude system folders, application directories, and cloud-synced folders you don’t control.
- Keep a log of actions (what was deleted or linked) for recovery.
- Use checksums rather than names to avoid false positives.
- Prefer moving duplicates to a quarantine folder for a retention period (e.g., 30 days) before permanent deletion.
- Test link behavior with applications that access the files to ensure compatibility.
Quick decision flow
- Scan and verify duplicates with hashes.
- Move duplicates to quarantine (optional).
- If keeping multiple paths is required and same filesystem → create hard links.
- If cross-filesystem or directory linking needed → create symlinks/junctions.
- Maintain logs and backups.
Date: February 8, 2026
Leave a Reply