Skip to content
Back to Hub
0/10 complete
Specialization Track

Backend Engineer Path

Learn how production services are designed and operated: runtime fundamentals, HTTP contracts, persistence, indexing, auth, async work, observability, monitoring, security, and reliability.

10 lessons
3 phases
0 official source families

Certificate Lane

Docs-Driven Specialization Review

Complete the authored lessons, finish the track assessment, and pass at 80%+ to unlock certificate eligibility.

Certificate in progress
Coursework

0/10

0% complete

Assessment

not started

Certificate

Locked

locked

Lesson Flow

Flow Timeline

0/10 lessons done

Next up

Node Runtime, Event Loop, and Service Boundaries

Runtime and Service Contracts

Runtime and Service Contracts

Node Runtime, Event Loop, and Service Boundaries

Pending

Step 1

Runtime and Service Contracts

HTTP Anatomy and API Contracts

Pending

Step 2

Runtime and Service Contracts

Modules, Configuration, and Validation Boundaries

Pending

Step 3

Data, Trust, and Async Work

Persistence, Data Modeling, and Query Design

Pending

Step 4

Data, Trust, and Async Work

Database Indexing, Query Plans, and Performance Paths

Pending

Step 5

Data, Trust, and Async Work

Authentication, Authorization, and Trust Boundaries

Pending

Step 6

Data, Trust, and Async Work

Async Work, Streams, Caching, and Background Jobs

Pending

Step 7

Observability and Ownership

Observability, Incidents, and Reliability

Pending

Step 8

Observability and Ownership

Monitoring Systems, SLOs, and Alerting Discipline

Pending

Step 9

Observability and Ownership

Capstone: Production Backend Service

Pending

Step 10

Phase 1

Runtime and Service Contracts

Understand the runtime, the shape of HTTP work, and the validation boundaries that keep services predictable.

Phase 2

Data, Trust, and Async Work

Model the business clearly, enforce trust boundaries, and decide what belongs in the request path versus background work.

Phase 3

Observability and Ownership

Close the loop with telemetry, incidents, and a capstone that proves the service can be operated, not just demoed.

Lesson 1 of 10

Node Runtime, Event Loop, and Service Boundaries

Backend engineers need a runtime model before they can make sound service decisions. In Node.js, asynchronous I/O and the event loop shape how work flows through the process.

That model affects latency, concurrency, background work, and debugging. Without it, teams often misuse blocking logic or misunderstand where slowdowns come from.

A service boundary should also be explicit: what the service owns, what it exposes, and what it refuses to do.

runtime-model.js
1const queue: string[] = []; 2 3function schedule(job: string) { 4 queue.push(job); 5 setTimeout(() => { 6 console.log("Processed:", queue.shift()); 7 }, 0); 8}
Real-World Scenario

A small team is building several services quickly, but nobody has agreed on service boundaries or what belongs in the request path.

What you learned
  • Node favors async I/O and event-driven service flow.
  • The runtime model affects service shape and debugging.
  • A good backend starts with clear boundaries.
Build Mission

Describe one backend service you would build and list exactly what it owns, what it calls, and what it leaves to other services.

Check Yourself
  • What kind of work should not block the main request path?
  • Why does runtime knowledge matter for service design?
  • What should a service boundary protect?
Definition of Done
  • The runtime model is explained in terms of real service behavior.
  • The service boundary is narrow enough to reason about cleanly.
  • At least one blocking-risk workload is identified and handled deliberately.
Lesson 2 of 10

HTTP Anatomy and API Contracts

Backend work is communication work. Requests, responses, headers, methods, bodies, status codes, and error formats form the contract between your service and its consumers.

A service that works but communicates poorly still creates product pain. API consumers need stable shapes, clear errors, and predictable semantics.

The fastest way to make backend work collaborative is to make contracts boringly clear.

http-server.js
1const { createServer } = require("node:http"); 2 3createServer((req, res) => { 4 res.statusCode = 200; 5 res.setHeader("Content-Type", "application/json"); 6 res.end(JSON.stringify({ ok: true })); 7}).listen(3000);
Real-World Scenario

A frontend team is blocked because a backend endpoint behaves differently depending on hidden assumptions that were never documented.

What you learned
  • HTTP details are part of the product contract.
  • Consistent responses reduce consumer confusion.
  • Clear API semantics make systems easier to evolve.
Build Mission

Design one endpoint with method, request shape, validation rules, error cases, and response schema written down before coding.

Check Yourself
  • Why should error responses be consistent too?
  • What makes an endpoint contract easy for another team to adopt?
  • Which details belong in a request header versus a request body?
Definition of Done
  • Endpoint contract is explicit and reviewable.
  • Success and failure responses are structured consistently.
  • A second engineer can consume the endpoint without guessing.
Lesson 3 of 10

Modules, Configuration, and Validation Boundaries

Backend reliability often fails at the edges: misconfigured environments, unvalidated input, and unclear module ownership.

Configuration should be explicit, validated at startup, and separated from request logic. Validation should happen at system boundaries, not as an afterthought buried deep inside the code.

Good module design keeps handlers thin and pushes reusable logic into focused units.

config-validation.js
1const requiredEnv = ["DATABASE_URL", "JWT_SECRET"]; 2 3for (const key of requiredEnv) { 4 if (!process.env[key]) { 5 throw new Error("Missing environment variable: " + key); 6 } 7}
Real-World Scenario

A deployment succeeded technically, but production requests failed because a secret was missing and bad payloads slipped through to internal logic.

What you learned
  • Configuration problems should fail early.
  • Validation belongs at the boundary of the system.
  • Focused modules are easier to test and maintain.
Build Mission

Write a configuration and validation checklist for a service before its first production deployment.

Check Yourself
  • What should fail at startup rather than at first request?
  • Why is boundary validation safer than scattered checks?
  • What kind of logic should not live directly in route handlers?
Definition of Done
  • Critical environment configuration is validated at startup.
  • Input validation is performed before business logic runs.
  • Module boundaries are clean enough to test independently.
Lesson 4 of 10

Persistence, Data Modeling, and Query Design

Backend systems live or die on data clarity. A poor data model leaks confusion into APIs, reports, jobs, and performance work.

Schema design is not only about normalization or denormalization. It is about expressing the real business entities and access patterns of the system.

The right question is not simply 'SQL or NoSQL?' It is 'what access patterns, consistency needs, and relationships does the product require?'

data-model.ts
1type Invoice = { 2 id: string; 3 customerId: string; 4 amountCents: number; 5 status: "draft" | "sent" | "paid"; 6};
Real-World Scenario

A team keeps changing a billing service, but the schema does not clearly represent invoice lifecycle or customer relationships.

What you learned
  • The data model shapes every downstream system.
  • Queries should follow access patterns, not only raw structure.
  • Schema choices are product decisions with technical consequences.
Build Mission

Model one product domain with entities, relationships, and the three most important queries the system must support.

Check Yourself
  • Which fields define the real lifecycle of this entity?
  • What queries will happen most often?
  • Where would the current model create duplication or ambiguity?
Definition of Done
  • Entity model matches the real business domain.
  • Important queries are identified alongside the schema.
  • Tradeoffs around consistency and performance are explained clearly.
Lesson 5 of 10

Database Indexing, Query Plans, and Performance Paths

Good data models still fail if the query path is slow. Indexes exist to support real access patterns, but every index adds storage cost, write overhead, and operational tradeoffs.

Backend engineers should learn to ask: which query is slow, what access path is the database taking, and is the current index strategy aligned with the product's hottest reads?

Teaching indexing well means moving past the vague idea of 'indexes make things faster' and into query plans, composite indexes, selectivity, and the cost of maintaining them.

invoice-index.sql
1SELECT customer_id, status, created_at 2FROM invoices 3WHERE customer_id = $1 4ORDER BY created_at DESC; 5 6CREATE INDEX invoices_customer_created_idx 7 ON invoices (customer_id, created_at DESC);
Real-World Scenario

A dashboard query was instant with a thousand rows, but at production volume it now stalls because the database is scanning far more data than the product can tolerate.

What you learned
  • Indexes should follow access patterns, not guesswork.
  • Query plans explain why a query is slow or fast.
  • Read optimization always comes with write and maintenance tradeoffs.
Build Mission

Take one slow query from a realistic product workflow and propose an index strategy with the tradeoffs written down.

Check Yourself
  • Which query pattern justifies a new index?
  • Why can too many indexes hurt write-heavy systems?
  • What does a composite index optimize better than two separate indexes?
Definition of Done
  • The learner can explain why an index helps one query and not another.
  • Index choice is tied to an explicit read pattern and ordering requirement.
  • Tradeoffs include write cost and operational overhead, not only speed.
Lesson 6 of 10

Authentication, Authorization, and Trust Boundaries

Authentication answers who the caller is. Authorization answers what the caller is allowed to do. Backend systems need both, and they need them at the right layer.

Trust boundaries matter because every request boundary is an opportunity for misuse, confusion, or escalation. Tokens, sessions, roles, and permissions are tools, not magic.

Strong backend engineers model trust assumptions clearly and fail safely when those assumptions break.

authorization-rule.js
1function canManageProject(role) { 2 return role === "owner" || role === "admin"; 3}
Real-World Scenario

An admin tool exposes finance records, and the team needs role enforcement that is clear, reviewable, and difficult to bypass accidentally.

What you learned
  • Authentication and authorization solve different problems.
  • Trust boundaries need explicit rules.
  • Permission logic should be visible and testable.
Build Mission

Write the role and permission matrix for one service before adding protected routes.

Check Yourself
  • What is the difference between identity and permission?
  • Which routes should be protected first?
  • Where could over-broad trust cause data exposure?
Definition of Done
  • Authentication method is chosen and documented.
  • Authorization rules are explicit and testable.
  • High-risk routes are protected at the service boundary, not only in the UI.
Lesson 7 of 10

Async Work, Streams, Caching, and Background Jobs

Not all backend work belongs inside a request. Some work should stream, some should cache, and some should move into background jobs entirely.

The job of a backend engineer is to keep user-facing latency predictable while still doing the necessary work safely.

That means understanding queues, backpressure, cache invalidation, and how the runtime handles streaming workloads.

streaming-handler.js
1const http = require("node:http"); 2 3http.createServer((request, response) => { 4 if (request.method === "POST" && request.url === "/echo") { 5 request.pipe(response); 6 return; 7 } 8 9 response.statusCode = 404; 10 response.end(); 11}).listen(8080);
Real-World Scenario

Uploads, reports, and email generation are all happening inside request handlers, and latency is becoming unpredictable under load.

What you learned
  • Expensive work should not always stay on the critical path.
  • Streams and jobs are tools for controlling latency and memory.
  • Caching only helps when invalidation is understood.
Build Mission

Take one expensive endpoint and decide what should stay synchronous, what should be cached, and what should become background work.

Check Yourself
  • What belongs in the request path versus a worker queue?
  • When would a stream beat buffering everything in memory?
  • Which cache invalidation risk matters most for this service?
Definition of Done
  • The critical path is narrower and better justified.
  • At least one async or streaming boundary is chosen intentionally.
  • Cache or queue behavior is documented well enough to explain failures.
Lesson 8 of 10

Observability, Incidents, and Reliability

A backend that cannot explain itself during failure is not production-ready. Logs, metrics, traces, alerts, and runbooks are part of the service.

Observability is not only for SRE teams. It is how backend engineers prove that the system is behaving and diagnose why it is not.

Incidents reward systems that were made legible before the incident started.

structured-log.js
1const logContext = { 2 requestId: "req_123", 3 route: "/invoices", 4 userId: "user_42", 5}; 6 7console.log("invoice_fetch_failed", logContext);
Real-World Scenario

A payment-sync service fails overnight, but the team cannot tell whether the problem is upstream latency, queue backlog, or a broken deployment.

What you learned
  • Reliability depends on visibility, not only correctness.
  • Structured logs and basic metrics speed up diagnosis.
  • Runbooks turn panic into repeatable recovery.
Build Mission

Define the logs, metrics, and runbook steps for one backend failure that would matter to your users.

Check Yourself
  • Which metric would warn you before users complain?
  • What context should every high-value log line include?
  • What is the first recovery step during an incident?
Definition of Done
  • The service exposes enough telemetry to diagnose one realistic incident.
  • Logs include context that joins requests to failures.
  • A runbook exists for the first-response path during an outage.
Lesson 9 of 10

Monitoring Systems, SLOs, and Alerting Discipline

Observability gives you raw signals; monitoring decides which signals matter enough to wake people up. That distinction matters because noisy alerts destroy trust and silent systems hide damage.

As a startup grows, the monitoring strategy has to mature too. A small team might begin with basic uptime and error-rate alerts, but production systems eventually need SLIs, SLOs, alert thresholds, and on-call discipline.

A backend team should know what healthy service behavior looks like, how much error budget exists, and which alert means immediate action versus later investigation.

service-slo.js
1const serviceSLO = { 2 availabilityTarget: 99.9, 3 p95LatencyMs: 400, 4 alertOnErrorRatePct: 2, 5 pagingWindowMinutes: 10, 6};
Real-World Scenario

A startup has grown past founder-debugging mode, but the on-call rotation still gets flooded with vague alerts that do not tell anyone what users are actually experiencing.

What you learned
  • Monitoring is about actionability, not collecting every metric.
  • SLOs turn reliability expectations into concrete targets.
  • Alert design should reduce noise while catching meaningful regressions.
Build Mission

Define one service's health model with core SLIs, an SLO target, and the exact conditions that should trigger a page versus a ticket.

Check Yourself
  • Which metrics are truly user-facing for this service?
  • What makes an alert actionable enough to wake someone up?
  • How should a startup's monitoring evolve as traffic and risk grow?
Definition of Done
  • Health signals are tied to real user impact, not vanity graphs.
  • At least one SLO and alerting policy is written in concrete terms.
  • The learner can explain the difference between telemetry, monitoring, and on-call action.
Lesson 10 of 10

Capstone: Production Backend Service

Now combine contracts, configuration, persistence, auth, async workflows, observability, and reliability into one service that can survive real usage.

A backend capstone should prove not only that requests succeed, but that the service can be operated, debugged, and evolved.

This is where backend engineering becomes visible as ownership, not only implementation.

backend-capstone-dod.ts
1const serviceReadiness = { 2 validatedConfig: true, 3 protectedRoutes: true, 4 backgroundJobs: true, 5 telemetry: true, 6 rollbackPlan: true, 7};
Real-World Scenario

You are the primary owner of a service launch and need to present its design, risks, safeguards, and rollout plan to the rest of engineering.

What you learned
  • Production backend work includes operation and recovery, not only code paths.
  • Readiness should be measurable and reviewable.
  • A strong capstone proves judgment under realistic constraints.
Build Mission

Ship a service such as billing, notifications, or admin records with a written service contract, runbook, threat notes, and release checklist.

Check Yourself
  • What would make another engineer trust this service in production?
  • Which failure mode still feels underdefined?
  • Can you explain the service contract, risks, and recovery plan in one short review?
Definition of Done
  • The service has a clear contract, validated boundaries, and operational notes.
  • At least one realistic incident path has logging and recovery steps.
  • The capstone is strong enough to discuss in interviews or design reviews.
Back to Curriculum