Created 2025/04/27 at 06:43PM

Last Modified 2025/04/27 at 10:39PM

Sometime back, me and my friends decided to play Ghost of Tsushima multiplayer.

Headset on, controller ready, excitement high -- until

"Cannot join game. NAT error."

The dreaded "NAT Type 3" curse. After blaming each other's internet, resetting routers, we ended up playing two separate single-player campaigns.

Now, I've been playing games for years, and it was about time that I started looking into this issue and understanding it on a deeper level.

The Invisible Barrier: NAT

So what is NAT?

When you’re on home WiFi, your PlayStation (or PC) isn’t directly exposed to the internet. Instead, your router hides all your devices behind a single public IP address. Your router does the work of mapping your devices to this public IP via Network Address Translation (NAT).

It’s good for

But terrible for peer-to-peer gaming, because:

Thus, the NAT Type error:

To work around this, games and apps do something ingenious: Network hole punching. There's both TCP hole punching and UDP hole punching, but we'll be focussing on UDP hole punching only.

Here’s the idea:

Imagine two players: Alice and Bob Both are behind NAT routers. They want to connect directly without using a relay.

What Happens

  1. Alice and Bob each connect to the Coordinator (small public server).
  2. Coordinator sees their <Public IP>:<Port> from NAT mapping.
  3. Coordinator tells Alice what Bob’s public address is, and vice versa.
  4. Alice starts sending UDP packets to Bob's public address, even if Bob can’t accept yet.
  5. Bob does the same toward Alice.
  6. NAT routers notice outgoing UDP traffic and open temporary mappings.
  7. Eventually, packets punch through, and direct peer-to-peer connection is made.
  8. Heartbeats are sent regularly to keep NAT bindings alive (because routers close idle mappings).

Whiy does it still fail sometimes?

Network hole punching depends heavily on:

If both players are behind strict NATs, hole punching can fail, and you'll need:

Every NAT error you hit, every connection error you see in co-op games, every "can't join lobby" issue is almost always a hole punching failure behind the scenes. When you and your friend were unable to connect, it was because your routers couldn’t find a way to let packets through.

Major game network giants, like Xbox Live and PS Network, maintain massive relay server farms. They have thousands of TURN relay servers distributed globally. When network hole punching fails, your traffic is relayed through Azure or AWS (you don’t realize it). That's why sometimes you feel extra lag when playing Co-op games -- your packets taking a 10,000km detour (via TURN relay server) instead of direct peer-to-peer.


Now that I understood NAT and hole-punching better, I wanted to recreate what professional games do under the hood and implement similar system from scratch which has following features (several trade-offs have been made for simplicity)

The code can be found at this github repo

Architecture Overview

Components

Code Flow

[========]

VIDEO DEMO

Takeaways


Every time you see a "Cannot connect to lobby" error,remember -- there's a war happening between your router, your ISP, and your poor little packet trying to reach your friend's device.