Golang学习笔记（二）——计算字符串长度 len()和RuneCountInString()

1. 内建函数 len() 函数用来获取字符串的 ASCII 字符个数或字节长度。

由于 Go 语言的字符串都以 UTF-8 格式保存，每个中文占用 3 个字节，因此使用 len() 获得两个中文文字对应的 6 个字节。

package main

import "fmt"

func main() {
	str1 := "hello world"
	fmt.Println(len(str1)) //11

	str2 := "你好"
	fmt.Println(len(str2)) //6
}

2. UTF-8 包提供的utf8.RuneCountInString() 函数用来统计 Uncode 字符数量。

package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
	str1 := "hello world"
	fmt.Println(utf8.RuneCountInString(str1)) //11

	str2 := "你好"
	fmt.Println(utf8.RuneCountInString(str2)) //2
}

3.相应的，在遍历一个字符串时，也要注意ascii和Unicode的区别。

用普通的 for 循环遍历时，取到的是对应的ascii码。

package main

import (
	"fmt"
)

func main() {
	str := "hello 世界"
	for i := 0; i < len(str); i++ {
		fmt.Printf("ascii: %c\n", str[i])
	}
}

输出：

12
ascii: h
ascii: e
ascii: l
ascii: l
ascii: o
ascii:
ascii: ä
ascii: ¸
ascii:
ascii: ç
ascii:
ascii:

用 for ... range 循环遍历时，取到的是对应的Unicode码。

package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
    str := "hello 世界"
	fmt.Println(utf8.RuneCountInString(str))
	for _, s := range str {
	    fmt.Printf("Unicode: %c\n", s)
	}
}

输出：

8
Unicode: h
Unicode: e
Unicode: l
Unicode: l
Unicode: o
Unicode:
Unicode: 世
Unicode: 界