一、清除和回收
前面标记分析完成后,又把内存写屏障分析了,唠唠叨叨的终于扯回来分析GC在标记完成后的动作了。在GC过程中,标记是一个非常重要的过程,类似于一个敌我识别系统。在判定成功后,就可以动用后面的清理和回收内存的动作了。清除和回收不过是把内存重新放回前面所分析的几个缓冲区内罢了。也就是说过的内存池。
内存从分配到标记再回到清除回收,就形成了一个闭环,GC机制就是这其中后个重要的关键点。
二、源码分析
先看一下Sweep中的相关的数据结构:
// State of background sweep.
type sweepdata struct {
lock mutex
g *g
parked bool
started bool
nbgsweep uint32
npausesweep uint32
// pacertracegen is the sweepgen at which the last pacer trace
// "sweep finished" message was printed.
pacertracegen uint32
}
这个数据结构是一个后台清扫协程的状态数据结构体。互斥体是为了保证清扫过程中状态参数的原子性,g表示当前协程,parked表示是否为可用状态,started表示是否开始,nbgsweep和npausesweep 表示清扫的协程的数量 ,这个在前面分析过,Go是支持并发GC的。最后一个参数是一个输出“清扫完成”的判断值。
在前面分析标记时可以看到在函数gcMarkTermination会调用gcSweep这个函数:
func gcMarkTermination() {
......
systemstack(func() {
work.heap2 = work.bytesMarked
if debug.gccheckmark > 0 {
// Run a full stop-the-world mark using checkmark bits,
// to check that we didn't forget to mark anything during
// the concurrent mark process.
gcResetMarkState()
initCheckmarks()
gcMark(startTime)
clearCheckmarks()
}
// marking is complete so we can turn the write barrier off
setGCPhase(_GCoff)
//此处调用清理
gcSweep(work.mode)
if debug.gctrace > 1 {
startTime = nanotime()
// The g stacks have been scanned so
// they have gcscanvalid==true and gcworkdone==true.
// Reset these so that all stacks will be rescanned.
gcResetMarkState()
finishsweep_m()
// Still in STW but gcphase is _GCoff, reset to _GCmarktermination
// At this point all objects will be found during the gcMark which
// does a complete STW mark and object scan.
setGCPhase(_GCmarktermination)
gcMark(startTime)
setGCPhase(_GCoff) // marking is done, turn off wb.
//此处调用清理
gcSweep(work.mode)
}
})
......
}
所以这里重点看一下这个函数:
func gcSweep(mode gcMode) {
if gcphase != _GCoff {
throw("gcSweep being done but phase is not GCoff")
}
lock(&mheap_.lock)
mheap_.sweepgen += 2
mheap_.sweepdone = 0
if mheap_.sweepSpans[mheap_.sweepgen/2%2].index != 0 {
// We should have drained this list during the last
// sweep phase. We certainly need to start this phase
// with an empty swept list.
throw("non-empty swept list")
}
unlock(&mheap_.lock)
//判断是否block模式,非并行,强制
if !_ConcurrentSweep || mode == gcForceBlockMode {
// Special case synchronous sweep.
// Record that no proportional sweeping has to happen.
lock(&mheap_.lock)
mheap_.sweepPagesPerByte = 0
mheap_.pagesSwept = 0
unlock(&mheap_.lock)
// Sweep all spans eagerly.清扫所有Span
for sweepone() != ^uintptr(0) {
sweep.npausesweep++
}
// Do an additional mProf_GC, because all 'free' events are now real as well.
mProf_GC()
mProf_GC()
return
}
//并行清除
// Concurrent sweep needs to sweep all of the in-use pages by
// the time the allocated heap reaches the GC trigger. Compute
// the ratio of in-use pages to sweep per byte allocated.
heapDistance := int64(memstats.gc_trigger) - int64(memstats.heap_live)
// Add a little margin so rounding errors and concurrent
// sweep are less likely to leave pages unswept when GC starts.
heapDistance -= 1024 * 1024
if heapDistance < _PageSize {
// Avoid setting the sweep ratio extremely high
heapDistance = _PageSize
}
lock(&mheap_.lock)
mheap_.sweepPagesPerByte = float64(mheap_.pagesInUse) / float64(heapDistance)
mheap_.pagesSwept = 0
mheap_.spanBytesAlloc = 0
unlock(&mheap_.lock)
// Background sweep.唤醒后台清扫任务
lock(&sweep.lock)
if sweep.parked {
sweep.parked = false
ready(sweep.g, 0, true)
}
unlock(&sweep.lock)
}
代码一开始仍然是各种一致性保障和状态设置,然后就是分成阻塞和并行两种状态进行处理。阻塞状态下需要注意的是mProf_GC执行了两次,说明也很清楚就是为了确保一致性。并行中一开始的计算和前面提到的相关参数的计算有关,可以对比看一看,如何设置启动GC的相关参数。
并行GC时, 初始化的过程就会启动 bgsweep()这个函数,不断的循环判断各种标志来确定是否启动相关的GC动作。看一下相关代码:
// The main goroutine.
func main() {
g := getg()
......
runtime_init() // must be before defer
// Defer unlock so that runtime.Goexit during init does the unlock too.
needUnlock := true
defer func() {
if needUnlock {
unlockOSThread()
}
}()
gcenable()
......
}
// gcenable is called after the bulk of the runtime initialization,
// just before we're about to start letting user code run.
// It kicks off the background sweeper goroutine and enables GC.
func gcenable() {
c := make(chan int, 1)
go bgsweep(c)
<-c
memstats.enablegc = true // now that runtime is initialized, GC is okay
}
func bgsweep(c chan int) {
sweep.g = getg()
lock(&sweep.lock)
sweep.parked = true
c <- 1
goparkunlock(&sweep.lock, "GC sweep wait", traceEvGoBlock, 1)
for {
//清扫 span,如果清扫了一部分 span,则记录 bgsweep 的次数
for gosweepone() != ^uintptr(0) {
sweep.nbgsweep++
Gosched()
}
lock(&sweep.lock)
if !gosweepdone() {
// This can happen if a GC runs between
// gosweepone returning ^0 above
// and the lock being acquired.
unlock(&sweep.lock)
continue
}
sweep.parked = true
goparkunlock(&sweep.lock, "GC sweep wait", traceEvGoBlock, 1)
}
}
//go:nowritebarrier
func gosweepone() uintptr {
var ret uintptr
systemstack(func() {
ret = sweepone()
})
return ret
}
//go:nowritebarrier
func gosweepdone() bool {
return mheap_.sweepdone != 0
}
但是,它们都需要通过调用sweepone这个函数来实现真正的清扫动作。看一下这个函数:
// sweeps one span
// returns number of pages returned to heap, or ^uintptr(0) if there is nothing to sweep
//go:nowritebarrier
func sweepone() uintptr {
_g_ := getg()
// increment locks to ensure that the goroutine is not preempted
// in the middle of sweep thus leaving the span in an inconsistent state for next GC
_g_.m.locks++
sg := mheap_.sweepgen
for {
s := mheap_.sweepSpans[1-sg/2%2].pop()
if s == nil {
mheap_.sweepdone = 1
_g_.m.locks--
if debug.gcpacertrace > 0 && atomic.Cas(&sweep.pacertracegen, sg-2, sg) {
print("pacer: sweep done at heap size ", memstats.heap_live>>20, "MB; allocated ", mheap_.spanBytesAlloc>>20, "MB of spans; swept ", mheap_.pagesSwept, " pages at ", mheap_.sweepPagesPerByte, " pages/byte\n")
}
return ^uintptr(0)
}
if s.state != mSpanInUse {
// This can happen if direct sweeping already
// swept this span, but in that case the sweep
// generation should always be up-to-date.
if s.sweepgen != sg {
print("runtime: bad span s.state=", s.state, " s.sweepgen=", s.sweepgen, " sweepgen=", sg, "\n")
throw("non in-use span in unswept list")
}
continue
}
if s.sweepgen != sg-2 || !atomic.Cas(&s.sweepgen, sg-2, sg-1) {
continue
}
npages := s.npages
if !s.sweep(false) {
// Span is still in-use, so this returned no
// pages to the heap and the span needs to
// move to the swept in-use list.
npages = 0
}
_g_.m.locks--
return npages
}
}
内存的控制都是以span这个单元为基础操作的,一定要清楚的搞明白。在gcsweep中也有这个索引除2的动作,它的意思是每次sweepgen都增长两次,GC完成后sweepSpans做角色互换(GC内存管理上有说明)。最后经历各种条件判断后,进行sweep,源代码如下:
// Sweep frees or collects finalizers for blocks not marked in the mark phase.
// It clears the mark bits in preparation for the next GC round.
// Returns true if the span was returned to heap.
// If preserve=true, don't return it to heap nor relink in MCentral lists;
// caller takes care of it.
//TODO go:nowritebarrier
func (s *mspan) sweep(preserve bool) bool {
// It's critical that we enter this function with preemption disabled,
// GC must not start while we are in the middle of this function.
_g_ := getg()
if _g_.m.locks == 0 && _g_.m.mallocing == 0 && _g_ != _g_.m.g0 {
throw("MSpan_Sweep: m is not locked")
}
sweepgen := mheap_.sweepgen
if s.state != mSpanInUse || s.sweepgen != sweepgen-1 {
print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
throw("MSpan_Sweep: bad span state")
}
//启动跟踪
if trace.enabled {
traceGCSweepStart()
}
//更新清扫的页数
atomic.Xadd64(&mheap_.pagesSwept, int64(s.npages))
cl := s.sizeclass
size := s.elemsize
res := false
nfree := 0
c := _g_.m.mcache
freeToHeap := false
// The allocBits indicate which unmarked objects don't need to be
// processed since they were free at the end of the last GC cycle
// and were not allocated since then.
// If the allocBits index is >= s.freeindex and the bit
// is not marked then the object remains unallocated
// since the last GC.
// This situation is analogous to being on a freelist.
// Unlink & free special records for any objects we're about to free.
// Two complications here:
// 1. An object can have both finalizer and profile special records.
// In such case we need to queue finalizer for execution,
// mark the object as live and preserve the profile special.
// 2. A tiny object can have several finalizers setup for different offsets.
// If such object is not marked, we need to queue all finalizers at once.
// Both 1 and 2 are possible at the same time.
specialp := &s.specials
special := *specialp
for special != nil {
// A finalizer can be set for an inner byte of an object, find object beginning.
objIndex := uintptr(special.offset) / size
p := s.base() + objIndex*size
//判断在special中的对象是否存活,是否至少有一个finalizer,释放没有finalizer的对象,把有finalizer的对象组成队列
mbits := s.markBitsForIndex(objIndex)
if !mbits.isMarked() {
// This object is not marked and has at least one special record.
// Pass 1: see if it has at least one finalizer.
hasFin := false
endOffset := p - s.base() + size
for tmp := special; tmp != nil && uintptr(tmp.offset) < endOffset; tmp = tmp.next {
if tmp.kind == _KindSpecialFinalizer {
// Stop freeing of object if it has a finalizer.
mbits.setMarkedNonAtomic()
hasFin = true
break
}
}
// Pass 2: queue all finalizers _or_ handle profile record.
for special != nil && uintptr(special.offset) < endOffset {
// Find the exact byte for which the special was setup
// (as opposed to object beginning).
p := s.base() + uintptr(special.offset)
if special.kind == _KindSpecialFinalizer || !hasFin {
// Splice out special record.
y := special
special = special.next
*specialp = special
freespecial(y, unsafe.Pointer(p), size)
} else {
// This is profile record, but the object has finalizers (so kept alive).
// Keep special record.
specialp = &special.next
special = *specialp
}
}
} else {
// object is still live: keep special record
specialp = &special.next
special = *specialp
}
}
if debug.allocfreetrace != 0 || raceenabled || msanenabled {
// Find all newly freed objects. This doesn't have to
// efficient; allocfreetrace has massive overhead.
mbits := s.markBitsForBase()
abits := s.allocBitsForIndex(0)
for i := uintptr(0); i < s.nelems; i++ {
if !mbits.isMarked() && (abits.index < s.freeindex || abits.isMarked()) {
x := s.base() + i*s.elemsize
if debug.allocfreetrace != 0 {
tracefree(unsafe.Pointer(x), size)
}
if raceenabled {
racefree(unsafe.Pointer(x), size)
}
if msanenabled {
msanfree(unsafe.Pointer(x), size)
}
}
mbits.advance()
abits.advance()
}
}
// Count the number of free objects in this span.计算可回收量
nfree = s.countFree()
if cl == 0 && nfree != 0 {
s.needzero = 1
freeToHeap = true
}
nalloc := uint16(s.nelems) - uint16(nfree)
nfreed := s.allocCount - nalloc
if nalloc > s.allocCount {
print("runtime: nelems=", s.nelems, " nfree=", nfree, " nalloc=", nalloc, " previous allocCount=", s.allocCount, " nfreed=", nfreed, "\n")
throw("sweep increased allocation count")
}
s.allocCount = nalloc
wasempty := s.nextFreeIndex() == s.nelems
s.freeindex = 0 // reset allocation index to start of span.
// gcmarkBits becomes the allocBits.
// get a fresh cleared gcmarkBits in preparation for next GC
s.allocBits = s.gcmarkBits
s.gcmarkBits = newMarkBits(s.nelems)
// Initialize alloc bits cache.
s.refillAllocCache(0)
// We need to set s.sweepgen = h.sweepgen only when all blocks are swept,
// because of the potential for a concurrent free/SetFinalizer.
// But we need to set it before we make the span available for allocation
// (return it to heap or mcentral), because allocation code assumes that a
// span is already swept if available for allocation.
if freeToHeap || nfreed == 0 {
// The span must be in our exclusive ownership until we update sweepgen,
// check for potential races.
if s.state != mSpanInUse || s.sweepgen != sweepgen-1 {
print("MSpan_Sweep: state=", s.state, " sweepgen=", s.sweepgen, " mheap.sweepgen=", sweepgen, "\n")
throw("MSpan_Sweep: bad span state after sweep")
}
// Serialization point.
// At this point the mark bits are cleared and allocation ready
// to go so release the span.
atomic.Store(&s.sweepgen, sweepgen)
}
//根据内存大小确定回的位置
if nfreed > 0 && cl != 0 {
c.local_nsmallfree[cl] += uintptr(nfreed)
res = mheap_.central[cl].mcentral.freeSpan(s, preserve, wasempty)
// MCentral_FreeSpan updates sweepgen
} else if freeToHeap {
// Free large span to heap
// NOTE(rsc,dvyukov): The original implementation of efence
// in CL 22060046 used SysFree instead of SysFault, so that
// the operating system would eventually give the memory
// back to us again, so that an efence program could run
// longer without running out of memory. Unfortunately,
// calling SysFree here without any kind of adjustment of the
// heap data structures means that when the memory does
// come back to us, we have the wrong metadata for it, either in
// the MSpan structures or in the garbage collection bitmap.
// Using SysFault here means that the program will run out of
// memory fairly quickly in efence mode, but at least it won't
// have mysterious crashes due to confused memory reuse.
// It should be possible to switch back to SysFree if we also
// implement and then call some kind of MHeap_DeleteSpan.
if debug.efence > 0 {
s.limit = 0 // prevent mlookup from finding this span
sysFault(unsafe.Pointer(s.base()), size)
} else {
mheap_.freeSpan(s, 1)
}
c.local_nlargefree++
c.local_largefree += size
res = true
}
if !res {
// The span has been swept and is still in-use, so put
// it on the swept in-use list.
mheap_.sweepSpans[sweepgen/2%2].push(s)
}
if trace.enabled {
traceGCSweepDone()
}
return res
}
上面的Debug部分和Trace部分都可以忽略,重点抓住主干。
在清扫完成后,会根据情况启动将相关内存交回到堆内存中:
// Free the span back into the heap.
func (h *mheap) freeSpan(s *mspan, acct int32) {
systemstack(func() {
mp := getg().m
lock(&h.lock)
memstats.heap_scan += uint64(mp.mcache.local_scan)
mp.mcache.local_scan = 0
memstats.tinyallocs += uint64(mp.mcache.local_tinyallocs)
mp.mcache.local_tinyallocs = 0
if msanenabled {
// Tell msan that this entire span is no longer in use.
base := unsafe.Pointer(s.base())
bytes := s.npages << _PageShift
msanfree(base, bytes)
}
if acct != 0 {
memstats.heap_objects--
}
if gcBlackenEnabled != 0 {
// heap_scan changed.
gcController.revise()
}
h.freeSpanLocked(s, true, true, 0)
unlock(&h.lock)
})
}
这样基本上清除回收就基本完成了。有些细节还是需要根据不同的版本对比才能更好的理解相关GC的演进。
三、总结
其实GC的过程,就是一个内存保持安全稳定的过程。有出有进,有增有减,在较长的一个时间段来看,把内存稳定在一定的范围波动内。需要内存时,可以迅速的分配出来,不需要了可以较快的回收。回收时,不需要或者能较短的STW,这就是一个好的GC算法。
但是现在GC仍然没有达到上面的要求,只能说,在环境比较相对不苛刻的情况下可以实现上述的要求。否则就只能使用传统的方法,加机器,加机器,加机器。所以刚刚提到的较长的一段时间,到底多长为好?当然像c++这种立刻回收为好,可内存碎片怎么控制?这都是GC面临的问题。诸君仍需努力!