Research domains for production engineering.

HexTechGuide organizes technical research around durable engineering disciplines, not temporary tools or vendor categories. Each domain represents a long-term area of production systems work.

Domain Index

AI Infrastructure

GPU inference, model serving, RAG, embeddings, runtimes, and AI gateways.

Security Architecture

Confidential AI, threat detection, cyberdeception, secrets, and identity.

Cloud Architecture

Azure, networking, landing zones, private endpoints, storage, and recovery.

Kubernetes

Scheduling, node pools, probes, ingress, autoscaling, and operations.

Python Systems

Flask, FastAPI, WSGI, automation, background jobs, and deployment behavior.

Reliability Engineering

Observability, SLOs, alerts, incidents, runbooks, and recovery procedures.

Architecture Reviews

Reference architectures, trade-offs, constraints, and platform design analysis.

Incident Analysis

Failure reports, root causes, remediation patterns, and operational lessons.

Domains are not product categories.

A runtime, cloud provider, framework, or model may appear inside an article, but the domain describes the engineering discipline behind the work.

Example

An article about GPU inference on AKS belongs to AI Infrastructure. Kubernetes and Azure are implementation tags, not the main domain.

Why this matters

Tools change quickly. Domains keep the publication organized around problems that remain relevant for years.

Operating AI systems beyond the demo.

Covers GPU inference, model serving, RAG infrastructure, embedding services, AI gateways, runtime selection, latency, capacity planning, and production scaling.

Designing systems that remain trustworthy under pressure.

Covers confidential computing, threat intelligence, cyberdeception, secrets, identity, RBAC, network policy, runtime security, and data protection.

Cloud patterns for systems that must survive production.

Covers landing zones, networking, identity, storage, private endpoints, cost constraints, hybrid architecture, backup, and disaster recovery.

Cluster behavior, scheduling, and operational failure modes.

Covers scheduling, probes, node pools, taints, tolerations, ingress, autoscaling, storage, operators, Helm, and cluster troubleshooting.

Python applications as production systems.

Covers Flask, FastAPI, WSGI, background jobs, automation, packaging, deployment structure, performance tuning, and managed hosting behavior.

Understanding what breaks and how systems recover.

Covers observability, metrics, logs, tracing, SLOs, alerts, incidents, runbooks, rollback planning, and operational recovery.