1.7 KiB
1.7 KiB
Tolerance Parameter Fix - Update Notes
What Was Wrong
The original tolerance parameter used raw Euclidean distance in RGB space (0-255 scale), which was unintuitive:
- Max possible distance in RGB = sqrt(3 × 255²) ≈ 441
- A tolerance of "30" was actually very strict (only ~7% of max distance)
- For antialiasing around text, you needed values like 150+ which wasn't obvious
What's Fixed
New Scale: 0-100 (percentage-based)
- 0 = exact color match only
- 30 = 30% of maximum color distance (default, RECOMMENDED)
- 100 = maximum tolerance
Why This Matters for Antialiasing:
Example: Gray layer (150,150,150) with black text (0,0,0)
- Antialiasing creates intermediate colors: (75,75,75), (100,100,100), (125,125,125)
- Distance from gray (150,150,150) to (75,75,75) = sqrt(3 × 75²) ≈ 130
- Old scale: You'd need tolerance ~130 (not intuitive)
- New scale: tolerance 30-45 captures these (makes sense!)
Updated Recommendations
# Default - good for most diagrams
python layer_extractor.py diagram.pdf -t 30
# Heavy antialiasing (small text, complex diagrams)
python layer_extractor.py diagram.pdf -t 45
# Extreme antialiasing (compressed PDFs, low quality)
python layer_extractor.py diagram.pdf -t 60
# Very strict (clean diagrams, no antialiasing)
python layer_extractor.py diagram.pdf -t 15
Key Point
If you see missing pixels around text or edges → INCREASE tolerance (not decrease!)
The antialiased pixels are "far" from the target color in RGB space, so they need higher tolerance to be captured.
Test Your Diagram
Start with default (30), then:
- Missing pixels/gaps around text? → Try 45
- Still missing details? → Try 60
- Layers bleeding together? → Try 20