So you've got a Python project that's been growing for years and you suspect half your code is dead weight? I just removed 47% of my codebase using Skylos and learned some wild things about dead code detection that nobody talks about.
The Dead Code Problem Nobody Wants to Admit
Okay, let me paint you a picture. Last month I inherited a 50,000 line Python project. The original team? Long gone. Documentation? What documentation. My first code review revealed imports that went nowhere, functions that nobody called, and entire modules that were basically digital ghosts.
That's when I discovered Skylos - and holy crap, the results were not what I expected.
Setting Up Skylos for Python Dead Code Detection
First things first, installation is stupidly simple:
pip install skylos
# or if you're fancy
pipx install skylos
Now here's where it gets interesting. Most tutorials tell you to just run skylos analyze and call it a day. But I wanted to know - how accurate is this thing really?
The Experiment: 4 Different Dead Code Detection Methods
I tested Skylos against three other popular methods on the same codebase. Here's my benchmarking setup:
import time
import subprocess
import os
from pathlib import Path
def benchmark_tool(tool_name, command, project_path):
"""
Quick n dirty benchmarking for dead code tools
yeah i know, subprocess isn't ideal but it works
"""
os.chdir(project_path)
start = time.perf_counter()
result = subprocess.run(command, shell=True, capture_output=True, text=True)
end = time.perf_counter()
execution_time = end - start
# count detected issues (super basic parsing)
issues = len(result.stdout.split('\n')) - 1 # -1 for empty line
return {
'tool': tool_name,
'time': execution_time,
'issues_found': issues,
'stdout': result.stdout[:500] # first 500 chars for debugging
}
# My test suite
tools_to_test = [
('skylos', 'skylos analyze --json'),
('vulture', 'vulture . --min-confidence 80'),
('pyflakes', 'pyflakes .'),
('manual_ast', 'python custom_ast_analyzer.py') # my custom solution
]
Surprising Discovery #1: False Positives Are EVERYWHERE
Skylos found 2,847 potentially dead code segments. Sounds great right? Wrong. Here's what actually happened when I started removing them:
# Example of false positive - this looked dead to Skylos
def _internal_validator(data):
"""Skylos marked this as dead but its called dynamically"""
return data.get('valid', False)
# This is how it was actually being used
validator_name = f"_internal_{validation_type}"
validator_func = globals()[validator_name] # Dynamic lookup
result = validator_func(user_data)
After manually checking 100 random detections:
- Skylos accuracy: 73% (27 false positives)
- Vulture accuracy: 61% (39 false positives)
- Pyflakes: Only finds unused imports, but 98% accurate
- Custom AST walker: 89% accurate but took 4x longer
The Smart Way to Use Skylos (What Actually Works)
Here's the approach that finally worked after three days of trial and error:
# skylos_smart_cleanup.py
import json
import subprocess
from collections import defaultdict
def analyze_with_confidence_levels():
"""
Run Skylos with different confidence thresholds
This catches way more nuanced cases
"""
confidence_levels = [90, 75, 60, 40]
results = defaultdict(list)
for confidence in confidence_levels:
cmd = f"skylos analyze --confidence {confidence} --format json"
output = subprocess.run(cmd, shell=True, capture_output=True, text=True)
if output.returncode == 0:
data = json.loads(output.stdout)
for item in data.get('dead_code', []):
# Group by file for easier review
results[item['file']].append({
'line': item['line'],
'type': item['type'],
'confidence': confidence,
'code': item.get('preview', '')
})
return results
def generate_cleanup_script(results, threshold=75):
"""
Create a script that comments out dead code instead of deleting
THIS SAVED MY ASS - dont delete, comment first!
"""
cleanup_script = []
for file_path, issues in results.items():
high_confidence = [i for i in issues if i['confidence'] >= threshold]
if high_confidence:
cleanup_script.append(f"# File: {file_path}")
cleanup_script.append(f"# Found {len(high_confidence)} dead code segments")
for issue in high_confidence:
cleanup_script.append(
f"# Line {issue['line']}: {issue['type']} "
f"(confidence: {issue['confidence']}%)"
)
return '\n'.join(cleanup_script)
Unexpected Discovery #2: Dead Code Patterns
After analyzing 15 different Python projects, I noticed these patterns that Skylos consistently finds:
- The "Just In Case" Functions - Functions kept "for future use" (spoiler: they never get used)
- Zombie Imports - Imports that survived refactoring but serve no purpose
- The Configuration Graveyard - Old config parsing code that nobody remembers why it exists
Here's a real example from my project:
# Found this beauty - 200 lines of dead code
class LegacyDatabaseHandler:
"""
TODO: Remove after migration (dated: 2019)
Still here in 2024... classic
"""
def connect_old_system(self):
# 50 lines of code nobody has called in 5 years
pass
def migrate_data(self):
# Another 150 lines of pure digital dust
pass
Performance Impact: The Part Nobody Talks About
So I measured the actual performance impact before and after cleanup:
# performance_test.py
import time
import importlib
import gc
def measure_import_time(module_name, iterations=100):
"""
Measure how long it takes to import modules
before and after dead code removal
"""
times = []
for _ in range(iterations):
# Clear any cached imports
if module_name in sys.modules:
del sys.modules[module_name]
gc.collect()
start = time.perf_counter()
importlib.import_module(module_name)
end = time.perf_counter()
times.append(end - start)
return {
'average': sum(times) / len(times),
'min': min(times),
'max': max(times)
}
# Results that blew my mind:
# Before cleanup: avg 0.0234s
# After cleanup: avg 0.0089s
# That's 62% faster import time!
The Skylos Configuration That Actually Makes Sense
After tons of experimentation, here's the .skylos.yml that works best:
# .skylos.yml
version: 1
analyze:
exclude_paths:
- tests/
- venv/
- .venv/
- migrations/ # Django migrations look dead but aren't
- __pycache__/
ignore_patterns:
- "*_test.py"
- "test_*.py"
- "conftest.py" # pytest fixtures look unused
# This is crucial for dynamic code
dynamic_analysis:
enabled: true
entry_points:
- main.py
- app.py
- manage.py
confidence_threshold: 70 # Sweet spot after testing
# Ignore specific decorators that hide usage
ignore_decorators:
- "@property"
- "@cached_property"
- "@celery.task"
- "@app.route" # Flask routes
Edge Cases That Will Bite You
Learned these the hard way during my cleanup spree:
1. Django/Flask Dynamic Imports
# Skylos thinks this is dead but it's not
def custom_middleware(get_response):
def middleware(request):
# This gets called by Django magic
return get_response(request)
return middleware
2. Pytest Fixtures That Look Dead
# conftest.py
@pytest.fixture
def db_connection():
# Skylos: "Nobody uses this!"
# Pytest: "Hold my beer"
return create_connection()
3. Metaclass Methods
class MetaSerializer(type):
def __new__(cls, name, bases, attrs):
# Skylos has no idea wtf is happening here
# but this code is very much alive
return super().__new__(cls, name, bases, attrs)
My Production Cleanup Workflow
Here's exactly how I clean dead code in production now:
#!/usr/bin/env python3
# production_dead_code_cleanup.py
import subprocess
import json
from datetime import datetime
import shutil
def safe_cleanup_workflow(project_path):
"""
The paranoid but effective approach
"""
# Step 1: Backup everything (learned this one the hard way)
backup_dir = f"backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
shutil.copytree(project_path, backup_dir)
print(f"Backup created: {backup_dir}")
# Step 2: Run Skylos with conservative settings
skylos_cmd = "skylos analyze --confidence 80 --format json"
result = subprocess.run(skylos_cmd, shell=True, capture_output=True, text=True)
if result.returncode != 0:
print("Skylos failed, aborting")
return
dead_code = json.loads(result.stdout)
# Step 3: Comment out instead of delete
for item in dead_code['items']:
if item['confidence'] > 85:
comment_out_code(item['file'], item['line_start'], item['line_end'])
# Step 4: Run tests
test_result = subprocess.run("pytest", shell=True, capture_output=True)
if test_result.returncode != 0:
print("Tests failed! Rolling back...")
shutil.rmtree(project_path)
shutil.copytree(backup_dir, project_path)
return False
print(f"Successfully cleaned {len(dead_code['items'])} dead code blocks")
return True
def comment_out_code(filepath, start_line, end_line):
"""
Safer than deleting - trust me on this
"""
with open(filepath, 'r') as f:
lines = f.readlines()
for i in range(start_line - 1, min(end_line, len(lines))):
if not lines[i].strip().startswith('#'):
lines[i] = f"# DEAD_CODE_DETECTED: {lines[i]}"
with open(filepath, 'w') as f:
f.writelines(lines)
Results: The 47% Reduction Breakdown
After running Skylos on my 50,000 line project:
- Lines removed: 23,500
- Files completely deleted: 37
- Import time improvement: 62%
- Test suite runtime: 31% faster
- Docker image size: Reduced by 18MB
- My sanity: Restored
But here's the kicker - about 3,000 of those lines weren't actually dead. They were:
- Dynamically imported modules
- Celery tasks
- Django management commands
- Code called via
eval()orexec()(yes, I know, dont @ me)
When NOT to Trust Skylos
After extensive testing, here's when to be extra careful:
- Plugin architectures - Dynamic loading makes everything look dead
- Metaprogramming heavy code - Skylos gets confused
- Projects using
__getattr__extensively - Async code with complex event loops
- Code using
importlibdynamically
The One Weird Trick That 10x'd Detection Accuracy
Combine Skylos with runtime coverage data:
# genius_move.py
import coverage
import json
import subprocess
def smart_dead_code_detection():
"""
Run your test suite with coverage, then use Skylos
on uncovered code. This is chef's kiss perfect
"""
# Step 1: Get coverage data
cov = coverage.Coverage()
cov.start()
# Run your app/tests here
subprocess.run("pytest", shell=True)
cov.stop()
cov.save()
# Step 2: Get uncovered lines
uncovered = []
for filename in cov.get_data().measured_files():
missing_lines = cov.analysis(filename)[3]
uncovered.append({
'file': filename,
'lines': missing_lines
})
# Step 3: Cross-reference with Skylos
skylos_result = subprocess.run(
"skylos analyze --format json",
shell=True,
capture_output=True,
text=True
)
skylos_data = json.loads(skylos_result.stdout)
# Step 4: High confidence dead code = in both lists
confirmed_dead = []
for skylos_item in skylos_data['items']:
for uncovered_file in uncovered:
if (skylos_item['file'] == uncovered_file['file'] and
skylos_item['line'] in uncovered_file['lines']):
confirmed_dead.append(skylos_item)
return confirmed_dead
Conclusion: Is Skylos Worth It?
After spending a week deep-diving into Skylos, here's my honest take:
Use Skylos when:
- You have a large legacy codebase
- Import times are getting ridiculous
- You're doing a major refactor anyway
- You have good test coverage
Skip Skylos when:
- Your project is under 5,000 lines
- You use lots of metaprogramming
- You dont have tests (fix this first!)
- The codebase is actively changing
The 47% reduction in my codebase? Totally worth it. But I spent 3 days verifying everything. Skylos is a power tool - use it wisely or you'll delete something important and spend your weekend debugging.
BTW, if you're dealing with Python dead code, also check out vulture for a second opinion. And whatever you do, ALWAYS comment out code before deleting. I learned that lesson at 2am on a production server. Don't be like me.