# Perf
[![GoDoc](https://godoc.org/github.com/hodgesds/perf-utils?status.svg)](https://godoc.org/github.com/hodgesds/perf-utils)

This package is a go library for interacting with the `perf` subsystem in
Linux. It allows you to do things like see how many CPU instructions a function
takes, profile a process for various hardware events, and other interesting
things. The library is by no means finalized and should be considered pre-alpha
at best.

# Use Cases
A majority of the utility methods in this package should only be used for
testing and/or debugging performance issues. Due to the nature of the go
runtime profiling on the goroutine level is extremely tricky, with the
exception of a long running worker goroutine locked to an OS thread. Eventually
this library could be used to implement many of the features of `perf` but in
accessible via Go directly.

## Caveats
* Some utility functions will call
  [`runtime.LockOSThread`](https://golang.org/pkg/runtime/#LockOSThread) for
  you, they will also unlock the thread after profiling. ***Note*** using these
  utility functions will incur significant overhead.
* Overflow handling is not implemented.

# Setup
Most likely you will need to tweak some system settings unless you are running as root. From `man perf_event_open`:

```
   perf_event related configuration files
       Files in /proc/sys/kernel/

           /proc/sys/kernel/perf_event_paranoid
                  The perf_event_paranoid file can be set to restrict access to the performance counters.

                  2   allow only user-space measurements (default since Linux 4.6).
                  1   allow both kernel and user measurements (default before Linux 4.6).
                  0   allow access to CPU-specific data but not raw tracepoint samples.
                  -1  no restrictions.

                  The existence of the perf_event_paranoid file is the official method for determining if a kernel supports perf_event_open().

           /proc/sys/kernel/perf_event_max_sample_rate
                  This sets the maximum sample rate.  Setting this too high can allow users to sample at a rate that impacts overall machine performance and potentially lock up the machine.  The default value is 100000  (samples  per
                  second).

           /proc/sys/kernel/perf_event_max_stack
                  This file sets the maximum depth of stack frame entries reported when generating a call trace.

           /proc/sys/kernel/perf_event_mlock_kb
                  Maximum number of pages an unprivileged user can mlock(2).  The default is 516 (kB).

```

# Example
Say you wanted to see how many CPU instructions a particular function took:

```
package main

import (
	"fmt"
	"log"
	"github.com/hodgesds/perf-utils"
)

func foo() error {
	var total int
	for i:=0;i<1000;i++ {
		total++
	}
	return nil
}

func main() {
	profileValue, err := perf.CPUInstructions(foo)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("CPU instructions: %+v\n", profileValue)
}
```

# Benchmarks
To profile a single function call there is an overhead of ~0.4ms.

```
$ go test  -bench=BenchmarkCPUCycles .
goos: linux
goarch: amd64
pkg: github.com/hodgesds/perf-utils
BenchmarkCPUCycles-8        3000            397924 ns/op              32 B/op          1 allocs/op
PASS
ok      github.com/hodgesds/perf-utils  1.255s
```

The `Profiler` interface has low overhead and suitable for many use cases:

```
$ go test  -bench=BenchmarkProfiler .
goos: linux
goarch: amd64
pkg: github.com/hodgesds/perf-utils
BenchmarkProfiler-8      3000000               488 ns/op              32 B/op          1 allocs/op
PASS
ok      github.com/hodgesds/perf-utils  1.981s
```

# BPF Support
BPF is supported by using the `BPFProfiler` which is available via the
`ProfileTracepoint` function. To use BPF you need to create the BPF program and
then call `AttachBPF` with the file descriptor of the BPF program. This is not
well tested so use at your own peril.

# Misc
Originally I set out to use `go generate` to build Go structs that were
compatible with perf, I found a really good
[article](https://utcc.utoronto.ca/~cks/space/blog/programming/GoCGoCompatibleStructs)
on how to do so. Eventually, after digging through some of the `/x/sys/unix`
code I found pretty much what I was needed. However, I think if you are
interested in interacting with the kernel it is a worthwhile read.