Introduction
kcp-go is a Production-Grade Reliable-UDP library for golang.
This library intents to provide a smooth, resilient, ordered, error-checked and anonymous delivery of streams over UDPpackets, it has been battle-tested with opensource project kcptun. Millions of devices(from low-end MIPS routers to high-end servers) have deployed kcp-go powered program in a variety of forms like online games, live broadcasting, file synchronization and network acceleration.
Lastest Release
Features
- Designed for Latency-sensitive scenarios.
- Cache friendly and Memory optimized design, offers extremely High Performance core.
- Handles >5K concurrent connections on a single commodity server.
- Compatible with net.Conn and net.Listener, a drop-in replacement for net.TCPConn.
- FEC(Forward Error Correction) Support with Reed-Solomon Codes
- Packet level encryption support with AES, TEA, 3DES, Blowfish, Cast5, Salsa20, etc. in CFB mode, which generates completely anonymous packet.
- Only A fixed number of goroutines will be created for the entire server application, costs in context switch between goroutines have been taken into consideration.
- Compatible with skywind3000's C version with various improvements.
Documentation
For complete documentation, see the associated Godoc.
Specification
+-----------------+
| SESSION |
+-----------------+
| KCP(ARQ) |
+-----------------+
| FEC(OPTIONAL) |
+-----------------+
| CRYPTO(OPTIONAL)|
+-----------------+
| UDP(PACKET) |
+-----------------+
| IP |
+-----------------+
| LINK |
+-----------------+
| PHY |
+-----------------+
(LAYER MODEL OF KCP-GO)
Usage
Client: full demo
kcpconn, err := kcp.DialWithOptions("192.168.0.1:10000", nil, 10, 3)
Server: full demo
lis, err := kcp.ListenWithOptions(":10000", nil, 10, 3)
Benchmark
Model Name: MacBook Pro
Model Identifier: MacBookPro14,1
Processor Name: Intel Core i5
Processor Speed: 3.1 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 4 MB
Memory: 8 GB
$ go test -v -run=^$ -bench .
beginning tests, encryption:salsa20, fec:10/3
goos: darwin
goarch: amd64
pkg: github.com/xtaci/kcp-go
BenchmarkSM4-4 50000 32180 ns/op 93.23 MB/s 0 B/op 0 allocs/op
BenchmarkAES128-4 500000 3285 ns/op 913.21 MB/s 0 B/op 0 allocs/op
BenchmarkAES192-4 300000 3623 ns/op 827.85 MB/s 0 B/op 0 allocs/op
BenchmarkAES256-4 300000 3874 ns/op 774.20 MB/s 0 B/op 0 allocs/op
BenchmarkTEA-4 100000 15384 ns/op 195.00 MB/s 0 B/op 0 allocs/op
BenchmarkXOR-4 20000000 89.9 ns/op 33372.00 MB/s 0 B/op 0 allocs/op
BenchmarkBlowfish-4 50000 26927 ns/op 111.41 MB/s 0 B/op 0 allocs/op
BenchmarkNone-4 30000000 45.7 ns/op 65597.94 MB/s 0 B/op 0 allocs/op
BenchmarkCast5-4 50000 34258 ns/op 87.57 MB/s 0 B/op 0 allocs/op
Benchmark3DES-4 10000 117149 ns/op 25.61 MB/s 0 B/op 0 allocs/op
BenchmarkTwofish-4 50000 33538 ns/op 89.45 MB/s 0 B/op 0 allocs/op
BenchmarkXTEA-4 30000 45666 ns/op 65.69 MB/s 0 B/op 0 allocs/op
BenchmarkSalsa20-4 500000 3308 ns/op 906.76 MB/s 0 B/op 0 allocs/op
BenchmarkCRC32-4 20000000 65.2 ns/op 15712.43 MB/s
BenchmarkCsprngSystem-4 1000000 1150 ns/op 13.91 MB/s
BenchmarkCsprngMD5-4 10000000 145 ns/op 110.26 MB/s
BenchmarkCsprngSHA1-4 10000000 158 ns/op 126.54 MB/s
BenchmarkCsprngNonceMD5-4 10000000 153 ns/op 104.22 MB/s
BenchmarkCsprngNonceAES128-4 100000000 19.1 ns/op 837.81 MB/s
BenchmarkFECDecode-4 1000000 1119 ns/op 1339.61 MB/s 1606 B/op 2 allocs/op
BenchmarkFECEncode-4 2000000 832 ns/op 1801.83 MB/s 17 B/op 0 allocs/op
BenchmarkFlush-4 5000000 272 ns/op 0 B/op 0 allocs/op
BenchmarkEchoSpeed4K-4 5000 259617 ns/op 15.78 MB/s 5451 B/op 149 allocs/op
BenchmarkEchoSpeed64K-4 1000 1706084 ns/op 38.41 MB/s 56002 B/op 1604 allocs/op
BenchmarkEchoSpeed512K-4 100 14345505 ns/op 36.55 MB/s 482597 B/op 13045 allocs/op
BenchmarkEchoSpeed1M-4 30 34859104 ns/op 30.08 MB/s 1143773 B/op 27186 allocs/op
BenchmarkSinkSpeed4K-4 50000 31369 ns/op 130.57 MB/s 1566 B/op 30 allocs/op
BenchmarkSinkSpeed64K-4 5000 329065 ns/op 199.16 MB/s 21529 B/op 453 allocs/op
BenchmarkSinkSpeed256K-4 500 2373354 ns/op 220.91 MB/s 166332 B/op 3554 allocs/op
BenchmarkSinkSpeed1M-4 300 5117927 ns/op 204.88 MB/s 310378 B/op 6988 allocs/op
PASS
ok github.com/xtaci/kcp-go 50.349s
Key Design Considerations
- slice vs. container/list
kcp.flush()
I've wrote a benchmark for comparing sequential loop through slice and container/list here:
https://github.com/xtaci/notes/blob/master/golang/benchmark2/cachemiss_test.go
BenchmarkLoopSlice-4 2000000000 0.39 ns/op
BenchmarkLoopList-4 100000000 54.6 ns/op
kcp.flush()
- Timing accuracy vs. syscall clock_gettime
time.Now()
The benchmark for time.Now() lies here:
https://github.com/xtaci/notes/blob/master/golang/benchmark2/syscall_test.go
BenchmarkNow-4 100000000 15.6 ns/op
kcp.output()kcp.flush()kcp.output()time.Now()
Connection Termination
Control messages like SYN/FIN/RST in TCP are not defined in KCP, you need some keepalive/heartbeat mechanism in the application-level. A real world example is to use some multiplexing protocol over session, such as smux(with embedded keepalive mechanism), see kcptun for example.
FAQ
Q: I'm handling >5K connections on my server, the CPU utilization is so high.
agentgateintervalSetNoDelayconn.SetNoDelay(1, 40, 1, 1)
Who is using this?
- https://github.com/xtaci/kcptun -- A Secure Tunnel Based On KCP over UDP.
- https://github.com/getlantern/lantern -- Lantern delivers fast access to the open Internet.
- https://github.com/smallnest/rpcx -- A RPC service framework based on net/rpc like alibaba Dubbo and weibo Motan.
- https://github.com/gonet2/agent -- A gateway for games with stream multiplexing.
- https://github.com/syncthing/syncthing -- Open Source Continuous File Synchronization.
- https://play.google.com/store/apps/details?id=com.k17game.k3 -- Battle Zone - Earth 2048, a world-wide strategy game.
Links
- https://github.com/xtaci/libkcp -- FEC enhanced KCP session library for iOS/Android in C++
- https://github.com/skywind3000/kcp -- A Fast and Reliable ARQ Protocol
- https://github.com/klauspost/reedsolomon -- Reed-Solomon Erasure Coding in Go