Skylos Dead Code Detection in Python: 47% Smaller Codebase After This Experiment

So you've got a Python project that's been growing for years and you suspect half your code is dead weight? I just removed 47% of my codebase using Skylos and learned some wild things about dead code detection that nobody talks about.

The Dead Code Problem Nobody Wants to Admit

Okay, let me paint you a picture. Last month I inherited a 50,000 line Python project. The original team? Long gone. Documentation? What documentation. My first code review revealed imports that went nowhere, functions that nobody called, and entire modules that were basically digital ghosts.

That's when I discovered Skylos - and holy crap, the results were not what I expected.

Setting Up Skylos for Python Dead Code Detection

First things first, installation is stupidly simple:

pip install skylos
# or if you're fancy
pipx install skylos

Now here's where it gets interesting. Most tutorials tell you to just run skylos analyze and call it a day. But I wanted to know - how accurate is this thing really?

The Experiment: 4 Different Dead Code Detection Methods

I tested Skylos against three other popular methods on the same codebase. Here's my benchmarking setup:

import time
import subprocess
import os
from pathlib import Path

def benchmark_tool(tool_name, command, project_path):
    """
    Quick n dirty benchmarking for dead code tools
    yeah i know, subprocess isn't ideal but it works
    """
    os.chdir(project_path)
    
    start = time.perf_counter()
    result = subprocess.run(command, shell=True, capture_output=True, text=True)
    end = time.perf_counter()
    
    execution_time = end - start
    
    # count detected issues (super basic parsing)
    issues = len(result.stdout.split('\n')) - 1  # -1 for empty line
    
    return {
        'tool': tool_name,
        'time': execution_time,
        'issues_found': issues,
        'stdout': result.stdout[:500]  # first 500 chars for debugging
    }

# My test suite
tools_to_test = [
    ('skylos', 'skylos analyze --json'),
    ('vulture', 'vulture . --min-confidence 80'),
    ('pyflakes', 'pyflakes .'),
    ('manual_ast', 'python custom_ast_analyzer.py')  # my custom solution
]

Surprising Discovery #1: False Positives Are EVERYWHERE

Skylos found 2,847 potentially dead code segments. Sounds great right? Wrong. Here's what actually happened when I started removing them:

# Example of false positive - this looked dead to Skylos
def _internal_validator(data):
    """Skylos marked this as dead but its called dynamically"""
    return data.get('valid', False)

# This is how it was actually being used
validator_name = f"_internal_{validation_type}"
validator_func = globals()[validator_name]  # Dynamic lookup
result = validator_func(user_data)

After manually checking 100 random detections:

Skylos accuracy: 73% (27 false positives)
Vulture accuracy: 61% (39 false positives)
Pyflakes: Only finds unused imports, but 98% accurate
Custom AST walker: 89% accurate but took 4x longer

The Smart Way to Use Skylos (What Actually Works)

Here's the approach that finally worked after three days of trial and error:

# skylos_smart_cleanup.py
import json
import subprocess
from collections import defaultdict

def analyze_with_confidence_levels():
    """
    Run Skylos with different confidence thresholds
    This catches way more nuanced cases
    """
    confidence_levels = [90, 75, 60, 40]
    results = defaultdict(list)
    
    for confidence in confidence_levels:
        cmd = f"skylos analyze --confidence {confidence} --format json"
        output = subprocess.run(cmd, shell=True, capture_output=True, text=True)
        
        if output.returncode == 0:
            data = json.loads(output.stdout)
            for item in data.get('dead_code', []):
                # Group by file for easier review
                results[item['file']].append({
                    'line': item['line'],
                    'type': item['type'],
                    'confidence': confidence,
                    'code': item.get('preview', '')
                })
    
    return results

def generate_cleanup_script(results, threshold=75):
    """
    Create a script that comments out dead code instead of deleting
    THIS SAVED MY ASS - dont delete, comment first!
    """
    cleanup_script = []
    
    for file_path, issues in results.items():
        high_confidence = [i for i in issues if i['confidence'] >= threshold]
        
        if high_confidence:
            cleanup_script.append(f"# File: {file_path}")
            cleanup_script.append(f"# Found {len(high_confidence)} dead code segments")
            
            for issue in high_confidence:
                cleanup_script.append(
                    f"# Line {issue['line']}: {issue['type']} "
                    f"(confidence: {issue['confidence']}%)"
                )
    
    return '\n'.join(cleanup_script)

Unexpected Discovery #2: Dead Code Patterns

After analyzing 15 different Python projects, I noticed these patterns that Skylos consistently finds:

The "Just In Case" Functions - Functions kept "for future use" (spoiler: they never get used)
Zombie Imports - Imports that survived refactoring but serve no purpose
The Configuration Graveyard - Old config parsing code that nobody remembers why it exists

Here's a real example from my project:

# Found this beauty - 200 lines of dead code
class LegacyDatabaseHandler:
    """
    TODO: Remove after migration (dated: 2019)
    Still here in 2024... classic
    """
    def connect_old_system(self):
        # 50 lines of code nobody has called in 5 years
        pass
    
    def migrate_data(self):
        # Another 150 lines of pure digital dust
        pass

Performance Impact: The Part Nobody Talks About

So I measured the actual performance impact before and after cleanup:

# performance_test.py
import time
import importlib
import gc

def measure_import_time(module_name, iterations=100):
    """
    Measure how long it takes to import modules
    before and after dead code removal
    """
    times = []
    
    for _ in range(iterations):
        # Clear any cached imports
        if module_name in sys.modules:
            del sys.modules[module_name]
        gc.collect()
        
        start = time.perf_counter()
        importlib.import_module(module_name)
        end = time.perf_counter()
        
        times.append(end - start)
    
    return {
        'average': sum(times) / len(times),
        'min': min(times),
        'max': max(times)
    }

# Results that blew my mind:
# Before cleanup: avg 0.0234s
# After cleanup:  avg 0.0089s
# That's 62% faster import time!

The Skylos Configuration That Actually Makes Sense

After tons of experimentation, here's the .skylos.yml that works best:

# .skylos.yml
version: 1
analyze:
  exclude_paths:
    - tests/
    - venv/
    - .venv/
    - migrations/  # Django migrations look dead but aren't
    - __pycache__/
  
  ignore_patterns:
    - "*_test.py"
    - "test_*.py"
    - "conftest.py"  # pytest fixtures look unused
  
  # This is crucial for dynamic code
  dynamic_analysis:
    enabled: true
    entry_points:
      - main.py
      - app.py
      - manage.py
    
  confidence_threshold: 70  # Sweet spot after testing
  
  # Ignore specific decorators that hide usage
  ignore_decorators:
    - "@property"
    - "@cached_property" 
    - "@celery.task"
    - "@app.route"  # Flask routes

Edge Cases That Will Bite You

Learned these the hard way during my cleanup spree:

1. Django/Flask Dynamic Imports

# Skylos thinks this is dead but it's not
def custom_middleware(get_response):
    def middleware(request):
        # This gets called by Django magic
        return get_response(request)
    return middleware

2. Pytest Fixtures That Look Dead

# conftest.py
@pytest.fixture
def db_connection():
    # Skylos: "Nobody uses this!"
    # Pytest: "Hold my beer"
    return create_connection()

3. Metaclass Methods

class MetaSerializer(type):
    def __new__(cls, name, bases, attrs):
        # Skylos has no idea wtf is happening here
        # but this code is very much alive
        return super().__new__(cls, name, bases, attrs)

My Production Cleanup Workflow

Here's exactly how I clean dead code in production now:

#!/usr/bin/env python3
# production_dead_code_cleanup.py

import subprocess
import json
from datetime import datetime
import shutil

def safe_cleanup_workflow(project_path):
    """
    The paranoid but effective approach
    """
    # Step 1: Backup everything (learned this one the hard way)
    backup_dir = f"backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
    shutil.copytree(project_path, backup_dir)
    print(f"Backup created: {backup_dir}")
    
    # Step 2: Run Skylos with conservative settings
    skylos_cmd = "skylos analyze --confidence 80 --format json"
    result = subprocess.run(skylos_cmd, shell=True, capture_output=True, text=True)
    
    if result.returncode != 0:
        print("Skylos failed, aborting")
        return
    
    dead_code = json.loads(result.stdout)
    
    # Step 3: Comment out instead of delete
    for item in dead_code['items']:
        if item['confidence'] > 85:
            comment_out_code(item['file'], item['line_start'], item['line_end'])
    
    # Step 4: Run tests
    test_result = subprocess.run("pytest", shell=True, capture_output=True)
    
    if test_result.returncode != 0:
        print("Tests failed! Rolling back...")
        shutil.rmtree(project_path)
        shutil.copytree(backup_dir, project_path)
        return False
    
    print(f"Successfully cleaned {len(dead_code['items'])} dead code blocks")
    return True

def comment_out_code(filepath, start_line, end_line):
    """
    Safer than deleting - trust me on this
    """
    with open(filepath, 'r') as f:
        lines = f.readlines()
    
    for i in range(start_line - 1, min(end_line, len(lines))):
        if not lines[i].strip().startswith('#'):
            lines[i] = f"# DEAD_CODE_DETECTED: {lines[i]}"
    
    with open(filepath, 'w') as f:
        f.writelines(lines)

Results: The 47% Reduction Breakdown

After running Skylos on my 50,000 line project:

Lines removed: 23,500
Files completely deleted: 37
Import time improvement: 62%
Test suite runtime: 31% faster
Docker image size: Reduced by 18MB
My sanity: Restored

But here's the kicker - about 3,000 of those lines weren't actually dead. They were:

Dynamically imported modules
Celery tasks
Django management commands
Code called via eval() or exec() (yes, I know, dont @ me)

When NOT to Trust Skylos

After extensive testing, here's when to be extra careful:

Plugin architectures - Dynamic loading makes everything look dead
Metaprogramming heavy code - Skylos gets confused
Projects using __getattr__ extensively
Async code with complex event loops
Code using importlib dynamically

The One Weird Trick That 10x'd Detection Accuracy

Combine Skylos with runtime coverage data:

# genius_move.py
import coverage
import json
import subprocess

def smart_dead_code_detection():
    """
    Run your test suite with coverage, then use Skylos
    on uncovered code. This is chef's kiss perfect
    """
    # Step 1: Get coverage data
    cov = coverage.Coverage()
    cov.start()
    
    # Run your app/tests here
    subprocess.run("pytest", shell=True)
    
    cov.stop()
    cov.save()
    
    # Step 2: Get uncovered lines
    uncovered = []
    for filename in cov.get_data().measured_files():
        missing_lines = cov.analysis(filename)[3]
        uncovered.append({
            'file': filename,
            'lines': missing_lines
        })
    
    # Step 3: Cross-reference with Skylos
    skylos_result = subprocess.run(
        "skylos analyze --format json",
        shell=True, 
        capture_output=True,
        text=True
    )
    
    skylos_data = json.loads(skylos_result.stdout)
    
    # Step 4: High confidence dead code = in both lists
    confirmed_dead = []
    for skylos_item in skylos_data['items']:
        for uncovered_file in uncovered:
            if (skylos_item['file'] == uncovered_file['file'] and
                skylos_item['line'] in uncovered_file['lines']):
                confirmed_dead.append(skylos_item)
                
    return confirmed_dead

Conclusion: Is Skylos Worth It?

After spending a week deep-diving into Skylos, here's my honest take:

Use Skylos when:

You have a large legacy codebase
Import times are getting ridiculous
You're doing a major refactor anyway
You have good test coverage

Skip Skylos when:

Your project is under 5,000 lines
You use lots of metaprogramming
You dont have tests (fix this first!)
The codebase is actively changing

The 47% reduction in my codebase? Totally worth it. But I spent 3 days verifying everything. Skylos is a power tool - use it wisely or you'll delete something important and spend your weekend debugging.

BTW, if you're dealing with Python dead code, also check out vulture for a second opinion. And whatever you do, ALWAYS comment out code before deleting. I learned that lesson at 2am on a production server. Don't be like me.

sCoding

Search This Blog