Introduction

kcp-go is a Production-Grade Reliable-UDP library for golang.

This library intents to provide a smooth, resilient, ordered, error-checked and anonymous delivery of streams over UDPpackets, it has been battle-tested with opensource project kcptun. Millions of devices(from low-end MIPS routers to high-end servers) have deployed kcp-go powered program in a variety of forms like online games, live broadcasting, file synchronization and network acceleration.

Lastest Release

Features

  1. Designed for Latency-sensitive scenarios.
  2. Cache friendly and Memory optimized design, offers extremely High Performance core.
  3. Handles >5K concurrent connections on a single commodity server.
  4. Compatible with net.Conn and net.Listener, a drop-in replacement for net.TCPConn.
  5. FEC(Forward Error Correction) Support with Reed-Solomon Codes
  6. Packet level encryption support with AES, TEA, 3DES, Blowfish, Cast5, Salsa20, etc. in CFB mode, which generates completely anonymous packet.
  7. Only A fixed number of goroutines will be created for the entire server application, costs in context switch between goroutines have been taken into consideration.
  8. Compatible with skywind3000's C version with various improvements.

Documentation

For complete documentation, see the associated Godoc.

Specification

+-----------------+
| SESSION         |
+-----------------+
| KCP(ARQ)        |
+-----------------+
| FEC(OPTIONAL)   |
+-----------------+
| CRYPTO(OPTIONAL)|
+-----------------+
| UDP(PACKET)     |
+-----------------+
| IP              |
+-----------------+
| LINK            |
+-----------------+
| PHY             |
+-----------------+
(LAYER MODEL OF KCP-GO)

Usage

Client: full demo

kcpconn, err := kcp.DialWithOptions("192.168.0.1:10000", nil, 10, 3)

Server: full demo

lis, err := kcp.ListenWithOptions(":10000", nil, 10, 3)

Benchmark

  Model Name:	MacBook Pro
  Model Identifier:	MacBookPro14,1
  Processor Name:	Intel Core i5
  Processor Speed:	3.1 GHz
  Number of Processors:	1
  Total Number of Cores:	2
  L2 Cache (per Core):	256 KB
  L3 Cache:	4 MB
  Memory:	8 GB
$ go test -v -run=^$ -bench .
beginning tests, encryption:salsa20, fec:10/3
goos: darwin
goarch: amd64
pkg: github.com/xtaci/kcp-go
BenchmarkSM4-4                 	   50000	     32180 ns/op	  93.23 MB/s	       0 B/op	       0 allocs/op
BenchmarkAES128-4              	  500000	      3285 ns/op	 913.21 MB/s	       0 B/op	       0 allocs/op
BenchmarkAES192-4              	  300000	      3623 ns/op	 827.85 MB/s	       0 B/op	       0 allocs/op
BenchmarkAES256-4              	  300000	      3874 ns/op	 774.20 MB/s	       0 B/op	       0 allocs/op
BenchmarkTEA-4                 	  100000	     15384 ns/op	 195.00 MB/s	       0 B/op	       0 allocs/op
BenchmarkXOR-4                 	20000000	        89.9 ns/op	33372.00 MB/s	       0 B/op	       0 allocs/op
BenchmarkBlowfish-4            	   50000	     26927 ns/op	 111.41 MB/s	       0 B/op	       0 allocs/op
BenchmarkNone-4                	30000000	        45.7 ns/op	65597.94 MB/s	       0 B/op	       0 allocs/op
BenchmarkCast5-4               	   50000	     34258 ns/op	  87.57 MB/s	       0 B/op	       0 allocs/op
Benchmark3DES-4                	   10000	    117149 ns/op	  25.61 MB/s	       0 B/op	       0 allocs/op
BenchmarkTwofish-4             	   50000	     33538 ns/op	  89.45 MB/s	       0 B/op	       0 allocs/op
BenchmarkXTEA-4                	   30000	     45666 ns/op	  65.69 MB/s	       0 B/op	       0 allocs/op
BenchmarkSalsa20-4             	  500000	      3308 ns/op	 906.76 MB/s	       0 B/op	       0 allocs/op
BenchmarkCRC32-4               	20000000	        65.2 ns/op	15712.43 MB/s
BenchmarkCsprngSystem-4        	 1000000	      1150 ns/op	  13.91 MB/s
BenchmarkCsprngMD5-4           	10000000	       145 ns/op	 110.26 MB/s
BenchmarkCsprngSHA1-4          	10000000	       158 ns/op	 126.54 MB/s
BenchmarkCsprngNonceMD5-4      	10000000	       153 ns/op	 104.22 MB/s
BenchmarkCsprngNonceAES128-4   	100000000	        19.1 ns/op	 837.81 MB/s
BenchmarkFECDecode-4           	 1000000	      1119 ns/op	1339.61 MB/s	    1606 B/op	       2 allocs/op
BenchmarkFECEncode-4           	 2000000	       832 ns/op	1801.83 MB/s	      17 B/op	       0 allocs/op
BenchmarkFlush-4               	 5000000	       272 ns/op	       0 B/op	       0 allocs/op
BenchmarkEchoSpeed4K-4         	    5000	    259617 ns/op	  15.78 MB/s	    5451 B/op	     149 allocs/op
BenchmarkEchoSpeed64K-4        	    1000	   1706084 ns/op	  38.41 MB/s	   56002 B/op	    1604 allocs/op
BenchmarkEchoSpeed512K-4       	     100	  14345505 ns/op	  36.55 MB/s	  482597 B/op	   13045 allocs/op
BenchmarkEchoSpeed1M-4         	      30	  34859104 ns/op	  30.08 MB/s	 1143773 B/op	   27186 allocs/op
BenchmarkSinkSpeed4K-4         	   50000	     31369 ns/op	 130.57 MB/s	    1566 B/op	      30 allocs/op
BenchmarkSinkSpeed64K-4        	    5000	    329065 ns/op	 199.16 MB/s	   21529 B/op	     453 allocs/op
BenchmarkSinkSpeed256K-4       	     500	   2373354 ns/op	 220.91 MB/s	  166332 B/op	    3554 allocs/op
BenchmarkSinkSpeed1M-4         	     300	   5117927 ns/op	 204.88 MB/s	  310378 B/op	    6988 allocs/op
PASS
ok  	github.com/xtaci/kcp-go	50.349s

Key Design Considerations

  1. slice vs. container/list
kcp.flush()

I've wrote a benchmark for comparing sequential loop through slice and container/list here:

https://github.com/xtaci/notes/blob/master/golang/benchmark2/cachemiss_test.go

BenchmarkLoopSlice-4   	2000000000	         0.39 ns/op
BenchmarkLoopList-4    	100000000	        54.6 ns/op
kcp.flush()
  1. Timing accuracy vs. syscall clock_gettime
time.Now()

The benchmark for time.Now() lies here:

https://github.com/xtaci/notes/blob/master/golang/benchmark2/syscall_test.go

BenchmarkNow-4         	100000000	        15.6 ns/op
kcp.output()kcp.flush()kcp.output()time.Now()

Connection Termination

Control messages like SYN/FIN/RST in TCP are not defined in KCP, you need some keepalive/heartbeat mechanism in the application-level. A real world example is to use some multiplexing protocol over session, such as smux(with embedded keepalive mechanism), see kcptun for example.

FAQ

Q: I'm handling >5K connections on my server, the CPU utilization is so high.

agentgateintervalSetNoDelayconn.SetNoDelay(1, 40, 1, 1)

Who is using this?

  1. https://github.com/xtaci/kcptun -- A Secure Tunnel Based On KCP over UDP.
  2. https://github.com/getlantern/lantern -- Lantern delivers fast access to the open Internet.
  3. https://github.com/smallnest/rpcx -- A RPC service framework based on net/rpc like alibaba Dubbo and weibo Motan.
  4. https://github.com/gonet2/agent -- A gateway for games with stream multiplexing.
  5. https://github.com/syncthing/syncthing -- Open Source Continuous File Synchronization.
  6. https://play.google.com/store/apps/details?id=com.k17game.k3 -- Battle Zone - Earth 2048, a world-wide strategy game.

Links

  1. https://github.com/xtaci/libkcp -- FEC enhanced KCP session library for iOS/Android in C++
  2. https://github.com/skywind3000/kcp -- A Fast and Reliable ARQ Protocol
  3. https://github.com/klauspost/reedsolomon -- Reed-Solomon Erasure Coding in Go