Initial commit
This commit is contained in:
120
label/cores/file/SUMMARY.md
Normal file
120
label/cores/file/SUMMARY.md
Normal file
@@ -0,0 +1,120 @@
|
||||
# PDF Layer Extractor - Summary
|
||||
|
||||
## What It Does
|
||||
|
||||
Extracts colored layers from PDF industrial diagrams into separate transparent PNG files.
|
||||
|
||||
✓ Single PDF file processing
|
||||
✓ White background filtered (pure white only)
|
||||
✓ Variable number of layers (auto-detected)
|
||||
✓ Handles antialiasing around text
|
||||
✓ High-quality output at configurable DPI
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. Install dependencies:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
2. Run on your PDF:
|
||||
```bash
|
||||
python layer_extractor.py your_diagram.pdf
|
||||
```
|
||||
|
||||
3. Find layers in `output/` folder
|
||||
|
||||
## Key Features
|
||||
|
||||
### Automatic Color Detection
|
||||
Uses K-means clustering to identify distinct colored layers. White (RGB ≥ 250) is treated as background only.
|
||||
|
||||
### Antialiasing Handling
|
||||
The tolerance parameter (default 30) handles color mixing:
|
||||
- Text antialiasing creates gray pixels around black text
|
||||
- Tolerance value captures these gradual color transitions
|
||||
- Each pixel gets alpha based on distance from target color
|
||||
|
||||
### Output Format
|
||||
Files named: `diagram_layerN_RRR_GGG_BBB.png`
|
||||
- Transparent PNG with only that color layer
|
||||
- RGB values in filename for reference
|
||||
|
||||
## Common Usage
|
||||
|
||||
```bash
|
||||
# Default (works for most diagrams)
|
||||
python layer_extractor.py diagram.pdf
|
||||
|
||||
# High quality
|
||||
python layer_extractor.py diagram.pdf --dpi 600
|
||||
|
||||
# Strict color separation (less antialiasing bleed)
|
||||
python layer_extractor.py diagram.pdf -t 20
|
||||
|
||||
# Lenient (more antialiasing tolerance)
|
||||
python layer_extractor.py diagram.pdf -t 40
|
||||
|
||||
# Extract top 3 layers only
|
||||
python layer_extractor.py diagram.pdf -n 3
|
||||
|
||||
# Custom output folder
|
||||
python layer_extractor.py diagram.pdf -o my_layers/
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `--dpi` | 300 | PDF rendering resolution (150/300/600) |
|
||||
| `-t, --tolerance` | 30 | Color matching tolerance (15-50 typical) |
|
||||
| `-n, --n-layers` | auto | Number of layers to extract |
|
||||
| `-m, --min-pixels` | 100 | Minimum pixels for valid layer |
|
||||
| `-o, --output` | output | Output directory |
|
||||
|
||||
## Tolerance Guide
|
||||
|
||||
The tolerance parameter is key to handling antialiasing:
|
||||
|
||||
- **15-20**: Very strict, clean diagrams with no antialiasing
|
||||
- **30** (default): Balanced, handles moderate antialiasing
|
||||
- **40-50**: Lenient, for heavy antialiasing or compression artifacts
|
||||
|
||||
### Example: Gray Layer with Black Text
|
||||
|
||||
When you have a light gray layer with black text:
|
||||
- Black text creates gray antialiasing pixels
|
||||
- These gray pixels are close to the gray layer color
|
||||
- Higher tolerance includes them in the gray layer
|
||||
- Lower tolerance might miss them
|
||||
|
||||
Start with default (30) and adjust ±10 based on results.
|
||||
|
||||
## Files Included
|
||||
|
||||
1. **layer_extractor.py** - Main script
|
||||
2. **requirements.txt** - Dependencies (PyMuPDF, Pillow, numpy, scikit-learn)
|
||||
3. **README.md** - Full documentation
|
||||
4. **QUICKSTART.md** - Quick reference guide
|
||||
|
||||
## Technical Notes
|
||||
|
||||
- Uses PyMuPDF to render PDF at specified DPI
|
||||
- K-means clustering identifies dominant colors
|
||||
- Euclidean distance in RGB space for color matching
|
||||
- Alpha channel gradient for smooth edges
|
||||
- White detection: all RGB values ≥ 250
|
||||
|
||||
## Example Output
|
||||
|
||||
Input: `piping_diagram.pdf`
|
||||
Output:
|
||||
```
|
||||
output/
|
||||
├── piping_diagram_layer1_220_050_050.png (red piping)
|
||||
├── piping_diagram_layer2_050_100_220.png (blue electrical)
|
||||
├── piping_diagram_layer3_150_150_150.png (gray annotations)
|
||||
└── piping_diagram_layer4_050_180_050.png (green mechanical)
|
||||
```
|
||||
|
||||
Each PNG has transparent background with only that color layer visible.
|
||||
Reference in New Issue
Block a user