Go 语言系列7：字符串

在 Go 语言中， 字符串(string) 是一个不可改变的字节序列。Go 中的字符串是兼容 Unicode 编码的，并且使用 UTF-8 进行编码。文本字符串通常被解释为采用 UTF8 编码的 Unicode 码点(rune)序列。字符串的定义使用下面的语句：

var str string = "Let's Go"

获取字符串字节数目

内置的 len 函数可以返回一个字符串中的字节数目，索引操作 str[i] 返回第 i 个字节的字节值， i 必须满足 0 ≤ i< len(str) 条件约束。如果试图访问超出字符串索引范围的字节将会导致 panic 异常。

package main

import "fmt"

func main() {
 str := "Let's go"
 fmt.Println("len(str) = ", len(str))
 fmt.Println("s[0] = ", str[0])
 fmt.Println("s[1] = ", str[1])
}

运行后输出如下：

len(str) =  8
s[0] =  76
s[1] =  101

其中 ASCII 码值 76 对应的字符为 L ， ASCII 码值 101 对应的字符为 e 。

获取字符串的长度

unicode/utf8 包中的 RuneCountInString 方法用来获取字符串的长度。这个方法传入一个字符串参数然后返回字符串中的 rune 的数量。

package main

import (
 "fmt"
 "unicode/utf8"
)

func main() {
 str1 := "Let's go"
 str2 := "你好世界"
 fmt.Printf("length of %s is %d\n", str1, utf8.RuneCountInString(str1))
 fmt.Printf("length of %s is %d\n", str2, utf8.RuneCountInString(str2))
}

上面程序的输出结果是：

length of Let's go is 8
length of 你好世界 is 4

获取字符串的每一个字节

我们可以通过循环的方式获取字符串中的每一个字节。

package main

import "fmt"

func main() {
 str1 := "Let's go"
 for i := 0; i < len(str1); i++ {
  fmt.Printf("str1[%d] = %c\n", i, str1[i])
 }

 str2 := "你"
 for i := 0; i < len(str2); i++ {
  fmt.Printf("str2[%d] = %c\n", i, str2[i])
 }
}

运行上面的程序你会看到字符串 str1 正常输出了每个字符。但是字符串 str2 虽然只有一个中文字符，却输出了三行。如下：

str1[0] = L
str1[1] = e
str1[2] = t
str1[3] = '
str1[4] = s
str1[5] =  
str1[6] = g
str1[7] = o
str2[0] = ä
str2[1] = ½
str2[2] =

这是为什么呢？其实，一个中文字符是用 UTF-8 进行编码的，一个中文字符占用了三个字节，所以在打印输出时会打印三行。那么我们应该如何解决这个问题呢？答案是使用 rune 。

rune

rune 是 Go 中的内建类型，也是 int32 的别称。其代表一个 代码点 ，代码点无论占用多少个字节，都可以用一个 rune 来表示。下面我们就通过 rune 来打印字符。

package main

import "fmt"

func main() {
 str2 := "你好，世界"
 runes := []rune(str2)
 for i := 0; i < len(runes); i++ {
  fmt.Printf("runes[%d] = %c\n", i, runes[i])
 }
}

程序运行后输出如下：

runes[0] = 你
runes[1] = 好
runes[2] = ，
runes[3] = 世
runes[4] = 界

字符串的 for range 循环

上面的例子可以用一种更加简单的方法来做到字符串的遍历。那就是使用 for range 循环。

package main

import "fmt"

func main() {
 str2 := "你好，世界"
 for index, word := range str2{
  fmt.Printf("%c starts at byte %d\n", word, index)
 }
}

该程序运行后输出如下，从中我们可以看到每个中文字符占了三个字节。

你 starts at byte 0
好 starts at byte 3
， starts at byte 6
世 starts at byte 9
界 starts at byte 12

用字节切片构造字符串

package main

import "fmt"

func main() {
 byteSlice := []byte{0x4c, 0x65, 0x74, 0x27, 0x73, 0x20, 0x67, 0x6f}
 str := string(byteSlice)
 fmt.Println(str)
}

该程序中的 byteSlice 包含字符串 Let's go 编码后的十六进制字节，程序输出如下：

Let's go

子字符串操作

子字符串操作 str[i:j] 基于原始的 str 字符串的第 i 个字节开始到第 j 个字节(并不包含 j 本身)生成一个新字符串。生成的新字符串将包含 j-i 个字节。同样，如果索引超出字符串范围或者 j 小于 i 的话将导致 panic 异常。不管 i 还是 j 都可能被忽略，当它们被忽略时将采用 0 作为开始位置，采用 len(s) 作为结束的位置。上面的这些机制其实都和 Python 中的字符串切片一致。

str := "Let's go"
fmt.Println(str[:])     // Let's go
fmt.Println(str[0:3])   // Let
fmt.Println(str[:5])    // Let's
fmt.Println(str[6:])    // go

参考文献：

[1] Alan A. A. Donovan; Brian W. Kernighan, Go 程序设计语言, Translated by 李道兵, 高博, 庞向才, 金鑫鑫 and 林齐斌, 机械工业出版社, 2017.

👇周一至周五更新，期待你的关注👇

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。