Golang三种方式拷贝文件

本文介绍三种典型拷贝文件方式，同时比较三种方法的效率，让我们了解什么场景下选择合适的方法。

1. 拷贝的文件三种方法

三种典型的方法分别为Go标准库提供的io.Copy()，第二种方法是利用ioutil.ReadFile() 和 ioutil.WriteFile() ，最后是使用os.Read() 和 os.Write()方法。

1.1 io.Copy()方法

首先使用Go标准库的io.Copy()方法。copy()方法实现详细逻辑：

func copy(src, dst string) (int64, error) {
        sourceFileStat, err := os.Stat(src)
        if err != nil {
                return 0, err
        }

        if !sourceFileStat.Mode().IsRegular() {
                return 0, fmt.Errorf("%s is not a regular file", src)
        }

        source, err := os.Open(src)
        if err != nil {
                return 0, err
        }
        defer source.Close()

        destination, err := os.Create(dst)
        if err != nil {
                return 0, err
        }
        defer destination.Close()
        nBytes, err := io.Copy(destination, source)
        return nBytes, err
}

io.Copy(destination, source)

下面定义main函数进行调用：

func main() {
	if len(os.Args) != 3 {
		fmt.Println("Please provide two command line arguments!")
		return
	}

	sourceFile := os.Args[1]
	destinationFile := os.Args[2]

	nBytes, err := copy(sourceFile, destinationFile)
	if err != nil {
		fmt.Printf("The copy operation failed %q\n", err)
	} else {
		fmt.Printf("Copied %d bytes!\n", nBytes)
	}
}

这种方法很简单但对开发者来说不灵活，虽不是坏事，但有时需要灵活读取文件或写文件。

1.2 ioutil.ReadFile() 和 ioutil.WriteFile()

第二种方法使用ioutil.ReadFile() 和 ioutil.WriteFile() 。第一个函数把整个文件读到字节类型切片中，第二个函数负责写至文件中。

我们定义copy2()函数：

    input, err := ioutil.ReadFile(sourceFile)
    if err != nil {
            fmt.Println(err)
            return
    }

    err = ioutil.WriteFile(destinationFile, input, 0644)
    if err != nil {
            fmt.Println("Error creating", destinationFile)
            fmt.Println(err)
            return
    }

当然也需要文件名判断部分，读者可参考上节内容，主要功能就是读和写部分。
这种方法也实现了拷贝功能，但在拷贝大文件时效率不高，因为读取大文件暂用内存也大。

1.3 os.Read() 和 os.Write()

第三种方法使用os.Read() 和 os.Write()，实现内容公共部分都一致，但多了一个参数，即缓冲大小。核心代码在for循环中，请看代码：

    buf := make([]byte, BUFFERSIZE)
    for {
            n, err := source.Read(buf)
            if err != nil && err != io.EOF {
                    return err
            }
            if n == 0 {
                    break
            }

            if _, err := destination.Write(buf[:n]); err != nil {
                    return err
            }
    }

os.Read()方法每次读取文件的一小部分至缓冲区，os.Write()方法写缓冲区至文件。在读过程有错误或读到文件结尾(io.EOF)拷贝过程停止.

完整代码为：

package main

import (
	"fmt"
	"io"
	"os"
	"path/filepath"
	"strconv"
)

var BUFFERSIZE int64

func copy(src, dst string, BUFFERSIZE int64) error {
	sourceFileStat, err := os.Stat(src)
	if err != nil {
		return err
	}

	if !sourceFileStat.Mode().IsRegular() {
		return fmt.Errorf("%s is not a regular file.", src)
	}

	source, err := os.Open(src)
	if err != nil {
		return err
	}
	defer source.Close()

	_, err = os.Stat(dst)
	if err == nil {
		return fmt.Errorf("File %s already exists.", dst)
	}

	destination, err := os.Create(dst)
	if err != nil {
		return err
	}
	defer destination.Close()

	if err != nil {
		panic(err)
	}

	buf := make([]byte, BUFFERSIZE)
	for {
		n, err := source.Read(buf)
		if err != nil && err != io.EOF {
			return err
		}
		if n == 0 {
			break
		}

		if _, err := destination.Write(buf[:n]); err != nil {
			return err
		}
	}
	return err
}

func main() {
	if len(os.Args) != 4 {
		fmt.Printf("usage: %s source destination BUFFERSIZE\n", filepath.Base(os.Args[0]))
		return
	}

	source := os.Args[1]
	destination := os.Args[2]
	BUFFERSIZE, err := strconv.ParseInt(os.Args[3], 10, 64)
	if err != nil {
		fmt.Printf("Invalid buffer size: %q\n", err)
		return
	}

	fmt.Printf("Copying %s to %s\n", source, destination)
	err = copy(source, destination, BUFFERSIZE)
	if err != nil {
		fmt.Printf("File copying failed: %q\n", err)
	}
}

2. 测试

下面我们利用linux的time命令实现简单基准测试，首先对三种方法进行基准测试，然后第三种方法采用不同的缓冲区大小参数进行测试。

下面使用三种方法测试500M文件拷贝，对比三种性能：

$ ls -l INPUT
-rw-r--r--  1 mtsouk  staff  512000000 Jun  5 09:39 INPUT
$ time go run cp1.go INPUT /tmp/cp1
Copied 512000000 bytes!

real    0m0.980s
user    0m0.219s
sys     0m0.719s
$ time go run cp2.go INPUT /tmp/cp2

real    0m1.139s
user    0m0.196s
sys     0m0.654s
$ time go run cp3.go INPUT /tmp/cp3 1000000
Copying INPUT to /tmp/cp3

real    0m1.025s
user    0m0.195s
sys     0m0.486s

我们看到三者差别不大，说明标准库提供的方法是经过优化的。下面测试第三种方法不同缓冲区大小参数的性能，10、20和1000字节三种情况分别拷贝500M文件：

$ ls -l INPUT
-rw-r--r--  1 mtsouk  staff  512000000 Jun  5 09:39 INPUT
$ time go run cp3.go INPUT /tmp/buf10 10
Copying INPUT to /tmp/buf10

real    6m39.721s
user    1m18.457s
sys         5m19.186s
$ time go run cp3.go INPUT /tmp/buf20 20
Copying INPUT to /tmp/buf20

real    3m20.819s
user    0m39.444s
sys         2m40.380s
$ time go run cp3.go INPUT /tmp/buf1000 1000
Copying INPUT to /tmp/buf1000

real    0m4.916s
user    0m1.001s
sys     0m3.986s

输出结果显示较大的缓存区考核性能更好。同时使用20字节以下拷贝速度非常慢。

3. 总结

本文讨论了三种拷贝方法，并通过time命令进行基准测试比对性能。