Project: AI-Driven Mini-SOC for Real-Time Threat Detection

Description

Date: 16/11/2025
Categories: Project
Share:

Live Demonstration

A live demonstration of the HIDS (Host-Based Intrusion Detection System) dashboard is available at: https://me.packprotv.com/works/smart-soc

Project Overview

This project is a fully operational cybersecurity lab, or “Mini-SOC,” built from the ground up to demonstrate a modern Defense-in-Depth (DiD) strategy. The core of this lab is an AI-driven HIDS that uses a Deep Learning model (a GRU Autoencoder) to detect anomalies in real-time network and system logs.

The entire environment is virtualized, segmented, and monitored, simulating a small corporate network with dedicated zones for an attacker, a victim, a DMZ, and a central firewall that also doubles as the HIDS analysis engine.

Core Architecture

The infrastructure is built on VirtualBox and segmented into four distinct zones, all managed by a central Debian 12 firewall.

Firewall & HIDS (Debian 12): The “brain” of the operation. It handles all routing, network address translation (NAT), and firewalling (iptables). Most importantly, it runs the FastAPI backend that hosts the trained GRU model for real-time log analysis.
Kali Zone (Attacker): A VM simulating an attacker’s machine. This zone is used to launch simulated “normal” traffic (like browsing) and “attack” traffic (like SSH scans).
Windows Zone (Victim): A Windows 10 VM representing a standard corporate workstation, used as a target for lateral movement tests (e.g., RDP).
DMZ Zone (Honeypot): A dedicated, isolated zone running a Cowrie honeypot to capture and analyze unauthorized SSH/Telnet attempts.
Host Machine (Monitoring): My host machine, which runs the Streamlit Dashboard (the SOC analyst’s view) and accesses the HIDS API over the virtual network.

Key Features & Implementation

1. Network Segmentation & Hardening

The foundation of the lab is strong segmentation using iptables on the Debian firewall.

Host-Only Networks: VirtualBox Host-only networks create isolated L2 segments for each zone.
iptables Rules: The FORWARD chain is configured to drop all inter-zone traffic by default.
Explicit Rules: Specific rules are added to allow legitimate traffic (e.g., Kali to Internet via NAT, Kali to Windows via RDP for testing) and log dropped packets.

2. Intelligent Threat Deception (Honeypot)

To actively trap the simulated attacker, a DNAT (Destination NAT) rule is implemented on the firewall.

The Trap: Any SSH (TCP/22) connection originating from the Kali Zone (192.168.57.x) to any destination (e.g., Google’s DNS, a random IP) is transparently redirected.
The Target: The traffic is forced to the Cowrie Honeypot (192.168.58.10) in the DMZ.
The Result: The attacker believes they are connecting to a real server, while all their credentials and commands are safely logged by Cowrie.

3. AI-Driven Intrusion Detection (GRU Autoencoder)

This is the core of the HIDS, built with Python and TensorFlow/Keras.

Model: A GRU (Gated Recurrent Unit) Autoencoder is trained on a dataset of “normal” logs collected from the firewall (including syslog, auth.log, and iptables logs).
Training: The model learns the sequential patterns of normal activity (pings, web traffic, admin logins, and even the “normal” attacker traffic being redirected to the honeypot).
Anomaly Detection: In production, the model calculates a reconstruction error. When new, real-time log sequences are fed to the model, it tries to reconstruct them.
- Normal Log: The model reconstructs it accurately (low error).
- Anomalous Log (e.g., a port scan, a new exploit): The model fails to reconstruct it (high error).
Threshold: A threshold (e.g., the 98th percentile of normal error) is set. Any log sequence with an error above this threshold is flagged as an “Anomaly Detected”.

4. Real-Time Monitoring Pipeline (API & Dashboard)

The system provides instant alerts to a SOC analyst via a web dashboard.

log_collector.py: A Python script on the firewall tails critical logs (/var/log/syslog, /var/log/auth.log) in real-time.
api_server.py: A FastAPI backend (running on the firewall) ingests these logs, aggregates them into 1-second windows, and feeds them into the loaded Keras model for prediction every second. It exposes endpoints like /get_prediction and /get_logs.
dashboard.py: A Streamlit dashboard (running on the host machine) polls the FastAPI endpoints every 2 seconds, displaying the current system status (Normal/Anomaly), the latest reconstruction error, and the raw logs being analyzed.

Technology Stack

Virtualization: VirtualBox
Firewall & Routing: Debian 12, iptables (DNAT, FORWARD, NAT, LOGGING)
Honeypot: Cowrie
HIDS Model: Python, TensorFlow/Keras (GRU Autoencoder), Pandas, Scikit-learn
Backend API: FastAPI, Uvicorn
Frontend Dashboard: Streamlit, Requests
Log Collection: Custom Python multithreaded script
Attacker/Victim: Kali Linux, Windows 10

Conclusion

This project successfully integrates network engineering (segmentation, iptables), defensive security (honeypots, HIDS), and AI (Deep Learning) into a single, functional Mini-SOC. It demonstrates the complete lifecycle of threat detection: from hardening the network to actively trapping attackers, analyzing their behavior with an intelligent model, and reporting anomalies to an analyst dashboard in real-time.