Performance¶
Tutorials
- 2022 practical guide with links to other tutorials.
- High performance Julia
Functions as Arguments (1.9)¶
When passing functions as arguments, the compiler by default fails to specialize. The result is runtime dispatch and lots of allocations. That is
# This is slow
function foo(f, x)
y = map(f, x);
for yi in y
# something
end
end
# This specializes and is fast
function foo(f :: F, x) where F <: Function
# same body
end
The problem is that the type of y
is not known at compile time (no specialization on f
). Hence, we get runtime dispatch in the for
loop.
Note that a function barrier solves the problem:
function foo(f, x)
y = map(f, x);
z = result_of_for_loop(y);
end
References: Discourse
Profiling (1.5)¶
A blog post on profiling and benchmarking.
If the profiler shows that an assignment (e.g. setindex!
) takes a lot of time, it may indicate dynamic dispatch on the RHS of the assignment.
I find it most efficient to have a global profiling environment that Pkg.add
s Profile, BenchmarkTools, StatProfilerHTML
. To use it:
using MyPackage
Pkg.activate("path/to/profiling")
using BenchmarkTools, Profile, StatProfilerHTML
Pkg.activate(".")
include("profiling_code.jl") # for this package
The output generated by the built-in profiler is hard to read. Fortunately, there are packages that improve readability or graph the results.
ProfileView
does compile now (1.3), taking a surprisingly long time. Personally, I find the presentation of StatProfilerHTML more convenient, though.
ProfileCanvas.jl¶
Can visualize allocations. Cannot click through to see underlying code.
StatProfilerHTML¶
- It provides a flame graph with clickable links that show which lines in a function take up most time.
- Need to locate index.html and open it by hand in the browser after running
statprofilehtml()
. But can click on path link in terminal as well.
PProf.jl¶
- requires Graphviz. On MacOS, install
using brew install graphviz
. But it has TONS of dependencies and did not install on my system (as of 2023). Then PProf cannot be used.
ProfileView¶
Shows a flame graph, but does not create tables with line-by-line timing (cf. StatProfilerHTML). Interesting option.
TimerOutputs.jl¶
- can be used to time selected lines of code
- produces a nicely formatted table that is much easier to digest than profiler output.
- profiles annotated sections
Loops (1.5)¶
LoopVectorization.jl
can give massive speed improvements for for
loops. An example.
Manual dispatch (1.5)¶
It is beneficial to manually dispatch at runtime when a variable could potentially take on many types (as far as the compiler knows) but we know that only a few of those are possible. This is done automatically for small unions (known as union splitting). But for parametric types, the compiler has to look up methods in the method table at runtime because they could be extended.
The package ManualDispatch.jl
has a @unionsplit
macro for this purpose. But AFAIK one may just as well write out an explicit if else
. This would look weird:
if x isa A
foo(x);
elseif x isa B
foo(x);
end
but it seems to work. See the discussion on discourse.
GPU computing (1.5)¶
Tutorials - Nextjournal 2019 - Cuda.jl tutorial