Files
2026-05-14 14:07:04 -03:00
..
2026-05-14 14:07:04 -03:00
2026-05-14 14:07:04 -03:00
2026-05-14 14:07:04 -03:00
2026-05-14 14:07:04 -03:00
2026-05-14 14:07:04 -03:00
2026-05-14 14:07:04 -03:00

PDF Layer Extractor for Industrial Diagrams

Extract colored layers from PDF industrial diagrams with white backgrounds. Automatically handles variable layer counts and antialiasing around text.

Features

  • PDF Support: Direct PDF processing at configurable DPI
  • Automatic Layer Detection: K-means clustering identifies distinct colored layers
  • Handles Antialiasing: Tolerates color mixing around text and fine details
  • Variable Layer Counts: Auto-detects all colored layers
  • Strict White Filtering: Pure white (255,255,255) treated as background only
  • High Quality Output: Each layer saved as transparent PNG

Installation

pip install -r requirements.txt

Quick Start

# Basic usage
python layer_extractor.py diagram.pdf

# Higher resolution
python layer_extractor.py diagram.pdf --dpi 600

# Extract to specific folder
python layer_extractor.py diagram.pdf -o my_layers/

Usage

Basic Command

python layer_extractor.py diagram.pdf

Output: output/diagram_layer1_255_000_000.png, output/diagram_layer2_000_000_255.png, etc.

Common Options

# High resolution rendering (better for detailed diagrams)
python layer_extractor.py diagram.pdf --dpi 600

# Adjust color tolerance (for antialiasing issues)
python layer_extractor.py diagram.pdf -t 40

# Extract only top 3 layers
python layer_extractor.py diagram.pdf -n 3

# Custom output directory
python layer_extractor.py diagram.pdf -o layers/

Parameters

  • --dpi (default: 300) - PDF rendering resolution

    • 300: Standard quality, faster
    • 600: High quality, larger files
    • 150: Draft quality, quick preview
  • -t, --tolerance (default: 30) - Color matching tolerance (0-100 scale)

    • 10-15: Very strict, only nearly identical colors
    • 20-25: Strict, minimal antialiasing
    • 30: Default, handles moderate antialiasing (RECOMMENDED)
    • 40-50: Lenient, good for heavy antialiasing around text
    • 60+: Very lenient, may blur layer boundaries
  • -n, --n-layers - Extract specific number of layers (default: auto-detect)

  • -m, --min-pixels (default: 100) - Minimum pixels to consider a valid layer

How It Works

  1. PDF Rendering: Converts PDF to high-resolution image at specified DPI
  2. Color Analysis: Uses K-means clustering on pixel colors
  3. White Filtering: Removes pure white background (RGB ≥ 250,250,250)
  4. Layer Extraction: For each color, creates a mask of similar pixels
  5. Alpha Blending: Handles antialiasing with gradient transparency
  6. Output: Saves each layer as transparent PNG

Output Format

Files are named: {pdf_name}_layer{N}_{R}_{G}_{B}.png

Example:

output/
├── piping_diagram_layer1_220_050_050.png  (Red layer)
├── piping_diagram_layer2_050_100_220.png  (Blue layer)
└── piping_diagram_layer3_050_180_050.png  (Green layer)

Troubleshooting

Colors bleeding between layers (antialiasing issue)

Problem: Gray pixels from antialiasing appearing in wrong layer, especially around black text on gray layers

Explanation: When black text (0,0,0) sits on a gray layer (150,150,150), antialiasing creates intermediate grays (75,75,75, 100,100,100, etc.) that are far from both black and gray in color space.

Solution: Increase tolerance to capture these intermediate colors

# For moderate antialiasing (default, usually works)
python layer_extractor.py diagram.pdf -t 30

# For heavy antialiasing (small text, compressed PDFs)
python layer_extractor.py diagram.pdf -t 45

# For extreme cases (very compressed or low quality)
python layer_extractor.py diagram.pdf -t 60

Missing fine details

Problem: Thin lines or small text not captured

Solution: Increase tolerance or DPI

python layer_extractor.py diagram.pdf -t 40 --dpi 600

Too many layers detected

Problem: Small color artifacts creating extra layers

Solution: Increase minimum pixel threshold

python layer_extractor.py diagram.pdf -m 500

Blurry output

Problem: Output quality not good enough

Solution: Increase DPI

python layer_extractor.py diagram.pdf --dpi 600

Examples

Standard industrial diagram

python layer_extractor.py electrical_schematic.pdf

High-detail mechanical drawing

python layer_extractor.py mechanical_drawing.pdf --dpi 600 -t 25

Diagram with known 4 layers

python layer_extractor.py hvac_diagram.pdf -n 4

Compressed/low-quality PDF

python layer_extractor.py scanned_diagram.pdf -t 50 --dpi 300

Tips

  1. Start with defaults - They work for most diagrams
  2. Check first - Run once and review output before batch processing
  3. DPI vs File Size - Higher DPI = better quality but larger files
  4. Tolerance tuning - Adjust by ±5-10 at a time
  5. Layer count - Use -n if you know exact number for faster processing

Requirements

  • Python 3.7+
  • PyMuPDF (PDF rendering)
  • Pillow (image processing)
  • NumPy (array operations)
  • scikit-learn (color clustering)