AH

Robust Multithreaded Web Crawler

University project: a multithreaded web crawler with a responsive WPF UI and MSSQL-backed persistence.

Description

A robust WPF-based crawler that traverses websites, extracts anchor links, and stores them in MSSQL while maintaining UI responsiveness via multithreading.

Objective / Aim

Demonstrate scalable crawling, concurrency, and structured persistence with solid OOP design.

Role & Responsibilities

Engineered the crawler core, integrated MSSQL, built the WPF UI, implemented thread management, and added domain filters.

Features / Implementation

Threaded crawl workers; HTML parsing for anchors; domain allowlist; MSSQL schema for results; progress and control via WPF.

Impact / Results

Delivered a reusable crawling framework showcasing advanced OOP and concurrency in a desktop setting.