How changedetection.io Works: Architecture, System Design & Code Deep Dive

Project Overview

changedetection.io is a Python Flask-based application designed for monitoring web pages and other digital content for changes. It provides both a web interface and a RESTful API, enabling users or integrated systems to define 'watches' on specific URLs or content sections. The system asynchronously fetches content, calculates differences against previous versions, persists watch configurations and change history, and dispatches notifications via various channels when significant changes are detected. It operates primarily as a backend service, focusing on robust content monitoring and alert delivery.

Category
monitoring
Difficulty
intermediate
Tech Stack
Docker, Python
Author
dgtlmoon
Tags
python, scraping, automation

How changedetection.io Works

changedetection.io is a Python Flask-based application designed for monitoring web pages and other digital content for changes. It provides both a web interface and a RESTful API, enabling users or integrated systems to define 'watches' on specific URLs or content sections. The system asynchronously fetches content, calculates differences against previous versions, persists watch configurations and change history, and dispatches notifications via various channels when significant changes are detected. It operates primarily as a backend service, focusing on robust content monitoring and alert delivery.

Data Flow

Data primarily flows between the Flask web application, the persistence layer, and the asynchronous worker processes. Watch configurations, provided via the Flask API (`changedetectionio/api/Watch.py`), are validated, structured into `changedetectionio/model/Watch.py` objects, and then stored by `changedetectionio/store.py`. These `Watch` objects are then enqueued as tasks into the `changedetectionio/queue_handlers.py::RecheckPriorityQueue`. Asynchronous workers (`changedetectionio/worker_handler.py`) retrieve tasks from this queue, fetch external content (potentially involving `changedetectionio/content_fetchers/screenshot_handler.py`), process it using `changedetectionio/html_tools.py`, and compare it against previously stored versions using `changedetectionio/diff.py`. If changes are detected, notification data is generated and routed through `changedetectionio/notification_service.py`. Concurrently, the watch's status, history, and updated content are persisted back into the database via `changedetectionio/store.py`. User interfaces or API clients can retrieve current watch states and historical data by querying the Flask API, which in turn fetches the relevant `changedetectionio/model/Watch.py` objects from `changedetectionio/store.py` for serialization and display.

Key Modules & Components

  • Watch Management and Scheduling: Provides the core functionality for defining, storing, retrieving, and scheduling web content monitoring tasks ('watches'). It manages the lifecycle of watches, from creation and modification via the API to their execution within the background worker queue.
    Key files: changedetectionio/api/Watch.py, changedetectionio/model/Watch.py, changedetectionio/store.py
  • Content Fetching and Processing: Handles the retrieval of web content from specified URLs, including support for dynamic content rendering via headless browsers. It also provides tools for parsing, filtering, and extracting relevant content from the fetched data, preparing it for change detection.
    Key files: changedetectionio/content_fetchers/screenshot_handler.py, changedetectionio/html_tools.py
  • Change Detection Engine: Implements the core logic for comparing different versions of web content and identifying meaningful changes. It utilizes text differencing algorithms to highlight additions, deletions, and modifications, making it possible to alert users to relevant content updates.
    Key files: changedetectionio/diff.py
  • Notification and Alerting Service: Orchestrates the creation and dispatch of notifications when changes are detected in monitored web content. It allows for configurable notification channels (e.g., email, webhooks) and customizable message templates, enabling users to receive timely alerts about relevant updates.
    Key files: changedetectionio/notification_service.py
  • API Management and Authentication: Provides a RESTful API for programmatic access to the application's functionality, including watch management and data retrieval. It also enforces API key authentication to protect endpoints from unauthorized access, enabling secure integration with other systems.
    Key files: changedetectionio/flask_app.py, changedetectionio/api/auth.py
  • Background Task Processing: Manages the execution of asynchronous tasks, such as content fetching and change detection, using a worker queue pattern. This ensures that long-running operations do not block the main application thread, maintaining responsiveness and scalability.
    Key files: changedetectionio/worker_handler.py

Source repository: https://github.com/dgtlmoon/changedetection.io

Explore the full interactive analysis of changedetection.io on Revibe — architecture diagrams, module flow, execution paths, and code-level insights.