golang中byte和rune类型的区别

简述

byterunebyteuint8runeint32

在unicode中，一个中文占两个字节，utf-8中一个中文占三个字节。
UTF-8、UTF-16、UTF-32 都是 Unicode 的一种实现。

golang默认编码是utf-8

str := "hello 世界"
fmt.Println(len(str))   // 12

golang默认的编码是utf-8

str := "hello 世界"
fmt.Println(utf8.RuneCountInString(str))     //  8

上面说了byte类型实际上是一个int8类型，int8适合表达ascii编码的字符，而int32可以表达更多的数，可以更容易的处理unicode字符，因此，我们可以通过rune类型来处理unicode字符

str := "hello 世界"
str2 := []rune(str)
fmt.Println(len(str2))     // 8

string 底层为 []byte (8-bit bytes), 遍历字符串应使用range关键字，原因是range会隐式的unicode解码。

除开rune和byte底层的类型的区别，在使用上，rune能处理一切的字符，而byte仅仅局限在ascii

代码实例

    var a byte = 'A'
    var b rune = 'B'
    fmt.Printf("a 占用 %d 个字节数\n", unsafe.Sizeof(a))
    fmt.Printf("b 占用 %d 个字节数\n",unsafe.Sizeof(b))
    
    // output
    a 占用 1 个字节数
    b 占用 4 个字节数

s1 := "abcd"
b1 := []byte(s1)
fmt.Println(b1) // [97 98 99 100]

s2 := "中文"
b2 := []byte(s2)
fmt.Println(b2) // [228 184 173 230 150 135], unicode，每个中文字符会由三个byte组成

r1 := []rune(s1)
fmt.Println(r1) // [97 98 99 100], 每个字一个数值
r2 := []rune(s2)
fmt.Println(r2) // [20013 25991], 每个字一个数值

字符串截取

可以使用len(字符串变量)获取字符串的字节长度,其中英文占1个字节长度,中文占用3个字节长度.

n+1

func main() {
    s := "smallming张"
    a := s[0]
    fmt.Println(a)        //输出:115
    fmt.Printf("%T\n", a) //输出uint8
    b := fmt.Sprintf("%c", a)
    fmt.Printf("%T\n", b) //输出:string
    fmt.Println(b)        //输出s
}

func main() {
    s := "smallming张"
    s1 := []rune(s)
    fmt.Println(len(s1))    //输出:10
    fmt.Println(s1[9])      //输出24352
    fmt.Printf("%c", s1[9]) //输出:张

    //遍历字符串中内容
    for i, n := range s {
        fmt.Println(i, n)
    }
}