Skip to content

Performance

Tutorials

Functions as Arguments (1.9)

When passing functions as arguments, the compiler by default fails to specialize. The result is runtime dispatch and lots of allocations. That is

# This is slow
function foo(f, x)
   y = map(f, x);
   for yi in y
      # something
   end
end
# This specializes and is fast
function foo(f :: F, x) where F <: Function
   # same body
end

The problem is that the type of y is not known at compile time (no specialization on f). Hence, we get runtime dispatch in the for loop.

Note that a function barrier solves the problem:

function foo(f, x)
  y = map(f, x);
  z = result_of_for_loop(y);
end

References: Discourse

Profiling (1.5)

A blog post on profiling and benchmarking.

If the profiler shows that an assignment (e.g. setindex!) takes a lot of time, it may indicate dynamic dispatch on the RHS of the assignment.

I find it most efficient to have a global profiling environment that Pkg.adds Profile, BenchmarkTools, StatProfilerHTML. To use it:

using MyPackage
Pkg.activate("path/to/profiling")
using BenchmarkTools, Profile, StatProfilerHTML
Pkg.activate(".")
include("profiling_code.jl") # for this package

The output generated by the built-in profiler is hard to read. Fortunately, there are packages that improve readability or graph the results.

ProfileView does compile now (1.3), taking a surprisingly long time. Personally, I find the presentation of StatProfilerHTML more convenient, though.

ProfileCanvas.jl

Can visualize allocations. Cannot click through to see underlying code.

StatProfilerHTML

  • It provides a flame graph with clickable links that show which lines in a function take up most time.
  • Need to locate index.html and open it by hand in the browser after running statprofilehtml(). But can click on path link in terminal as well.

PProf.jl

  • requires Graphviz. On MacOS, install using brew install graphviz. But it has TONS of dependencies and did not install on my system (as of 2023). Then PProf cannot be used.

ProfileView

Shows a flame graph, but does not create tables with line-by-line timing (cf. StatProfilerHTML). Interesting option.

TimerOutputs.jl

  • can be used to time selected lines of code
  • produces a nicely formatted table that is much easier to digest than profiler output.

Tracy.jl

  • profiles annotated sections

Loops (1.5)

LoopVectorization.jl can give massive speed improvements for for loops. An example.

Manual dispatch (1.5)

It is beneficial to manually dispatch at runtime when a variable could potentially take on many types (as far as the compiler knows) but we know that only a few of those are possible. This is done automatically for small unions (known as union splitting). But for parametric types, the compiler has to look up methods in the method table at runtime because they could be extended.

The package ManualDispatch.jl has a @unionsplit macro for this purpose. But AFAIK one may just as well write out an explicit if else. This would look weird:

if x isa A
  foo(x);
elseif x isa B
  foo(x);
end

but it seems to work. See the discussion on discourse.

GPU computing (1.5)

Tutorials - Nextjournal 2019 - Cuda.jl tutorial