Created 2025/04/27 at 06:43PM
Last Modified 2025/04/27 at 10:39PM
Sometime back, me and my friends decided to play Ghost of Tsushima multiplayer.
Headset on, controller ready, excitement high -- until
"Cannot join game. NAT error."
The dreaded "NAT Type 3" curse. After blaming each other's internet, resetting routers, we ended up playing two separate single-player campaigns.
Now, I've been playing games for years, and it was about time that I started looking into this issue and understanding it on a deeper level.
The Invisible Barrier: NAT
So what is NAT?
When you’re on home WiFi, your PlayStation (or PC) isn’t directly exposed to the internet. Instead, your router hides all your devices behind a single public IP address. Your router does the work of mapping your devices to this public IP via Network Address Translation (NAT).
It’s good for
- Security -- Even though it's not something intended for security, since it works with IP headers, it can route/block based on source-destination and port. So you could do things like
- Enable a server on the Inside Network to reach the Internet using a Public IP address
- Enable users on the Inside Network to access the Internet using the Outside Interface's Public IP Address
- Dealing with IPv4 Address Exhaustion -- The initial version of the Internet Protocol, IPv4, uses 32-bit addresses. This means there are approximately 4.3 billion (2<sup>32</sup>) unique IPv4 addresses available. NAT allows multiple devices within a private network (like your home network) to share a single public IP address. Instead of each device needing its own unique public IP address to communicate with the internet, the router acts as an intermediary. It takes all the requests from devices on your private network and translates their private IP addresses and port numbers to its own public IP address and a different port number for each outgoing connection.
But terrible for peer-to-peer gaming, because:
- Your device has no way to accept incoming connections directly.
- Only your router knows how to "translate" your private network to the outside world.
- But routers don’t know how to route unsolicited inbound packets properly.
Thus, the NAT Type error:
- Type 1 NAT (Open) -- full access (rare, direct public IP)
- Type 2 NAT (Moderate) -- behind NAT but ports properly forwarded
- Type 3 NAT (Strict) -- hardcore NAT, usually no direct incoming possible
To work around this, games and apps do something ingenious: Network hole punching. There's both TCP hole punching and UDP hole punching, but we'll be focussing on UDP hole punching only.
Here’s the idea:
- Initial Connection - Both players send outgoing packets to a known rendezvous server (called a coordinator or STUN/ICE/TURN server). This outgoing connection is crucial because NAT generally allows outgoing connections. These outgoing connections punch a hole which later allow incoming connections to flow.
- STUN servers help devices discover their public IP / port.
- TURN servers relay data if direct connections fail.
- Sharing Information - The rendezvous server facilitates the exchange of each player's public IP address and the port number their router has temporarily opened for the connection to the server.
- Simultaneous Connection Attempts - Now, each player's device, knowing the other's public IP and port, attempts to establish a direct connection to the other player's public IP and the port they learned from the rendezvous server.
- NAT Traversal - Because both devices are simultaneously trying to connect to each other's external address and port, their respective routers might create a temporary "hole" in their NAT translation table, allowing the incoming connection to succeed. This leads to a Peer-to-Peer link being formed between the two devices.
Imagine two players: Alice and Bob Both are behind NAT routers. They want to connect directly without using a relay.

What Happens
- Alice and Bob each connect to the Coordinator (small public server).
- Coordinator sees their
<Public IP>:<Port>
from NAT mapping.
- Coordinator tells Alice what Bob’s public address is, and vice versa.
- Alice starts sending UDP packets to Bob's public address, even if Bob can’t accept yet.
- Bob does the same toward Alice.
- NAT routers notice outgoing UDP traffic and open temporary mappings.
- Eventually, packets punch through, and direct peer-to-peer connection is made.
- Heartbeats are sent regularly to keep NAT bindings alive (because routers close idle mappings).
Whiy does it still fail sometimes?
Network hole punching depends heavily on:
- How routers behave (different brands act differently)
- Carrier-grade NAT (ISPs sometimes double-NAT you)
- Symmetric NAT (where outgoing packets change port mappings unpredictably)
- Firewall rules (that block even established UDP connections)
- VPNs often worsen NAT restrictions
If both players are behind strict NATs, hole punching can fail, and you'll need:
- Manual port forwarding,
- UPnP enabled (automatic port mapping),
- Or fallback to relay servers (TURN servers), which add latency.
Every NAT error you hit, every connection error you see in co-op games, every "can't join lobby" issue is almost always a hole punching failure behind the scenes. When you and your friend were unable to connect, it was because your routers couldn’t find a way to let packets through.
Major game network giants, like Xbox Live and PS Network, maintain massive relay server farms. They have thousands of TURN relay servers distributed globally. When network hole punching fails, your traffic is relayed through Azure or AWS (you don’t realize it). That's why sometimes you feel extra lag when playing Co-op games -- your packets taking a 10,000km detour (via TURN relay server) instead of direct peer-to-peer.
Now that I understood NAT and hole-punching better, I wanted to recreate what professional games do under the hood and implement similar system from scratch which has following features (several trade-offs have been made for simplicity)
- Encrypted peer-to-peer UDP communication
- Heartbeat messages to keep NAT bindings alive
- No relay servers (direct only, for simplicity)
- A Coordinator server to exchange addresses
- A Main Server to manage rooms, players, and passwords
The code can be found at this github repo
Architecture Overview
Components
- Main Server (main_server.rs) - Creates and manages game rooms, issues JWT tokens to players.
- Coordinator (coordinator.rs) - Exposes public IP:Port mappings to peers, lightweight matchmaking.
- Peer (peer.rs) - Connects directly to other peers via UDP hole punching, encrypted messaging, heartbeats.
Code Flow
- All servers are started: Main Server + Coordinator
- Peers processes start. A peer can register as a host or a normal peer. A host peer has the ability to create a room with a password, whereas a normal peer can only join an existing room if it knows the password of that room.
- Peers connect over TCP to Coordinator for registering their local address to a room. If no one is in the room, the peer is kept waiting, otherwise coordinator sends broadcasts back the other peer addresses.
- Peers periodically reconnect to Coordinator again to check if any new peers have joined the room, and attempt hole punching for new peers. This creates a P2P network mesh where each peer is connected and communicated with each other peer in the room.
- These periodic reconnects provide a way for Coodinator to track inacitve peers and empty rooms, and cleaning up resources when not needed, forcing inactive peers to disconnect.
- Peers start receving list of other peers in the same room, and start UDP hole punching to connect to each other.
- Peers authenticate with an encrypted "HELLO {token}" for handshake message.
- Peers start exchanging UDP packets directly over the Internet. Packets are AES-GCM encrypted using a derived key based on room password.
- Heartbeats start to keep NAT mapping alive.
- A heartbeat watcher starts which tries re-punching hole in case heartbeats are lost.


[========]

Takeaways
- Networking is hard ("._.)
- Modern games heavily engineer around NAT issues silently behind the scenes
- At scalre, adding complexity (like STUN/TURN/ICE servers) is inevitable
Every time you see a "Cannot connect to lobby" error,remember -- there's a war happening between your router, your ISP, and your poor little packet trying to reach your friend's device.