Skip to content


Profiling (1.5)

A blog post on profiling and benchmarking.

If the profiler shows that an assignment (e.g. setindex!) takes a lot of time, it may indicate dynamic dispatch on the RHS of the assignment.

I find it most efficient to have a global profiling environment that Pkg.adds Profile, BenchmarkTools, StatProfilerHTML. To use it:

using MyPackage
using BenchmarkTools, Profile, StatProfilerHTML
include("profiling_code.jl") # for this package

The output generated by the built-in profiler is hard to read. Fortunately, there are packages that improve readability or graph the results.

ProfileView does compile now (1.3), taking a surprisingly long time. Personally, I find the presentation of StatProfilerHTML more convenient, though.


  • It provides a flame graph with clickable links that show which lines in a function take up most time.
  • Need to locate index.html and open it by hand in the browser after running statprofilehtml(). But can click on path link in terminal as well.


  • requires Graphviz. On MacOS, install using brew install graphviz. But it has TONS of dependencies and did not install on my system. Then PProf cannot be used.


  • can be used to time selected lines of code
  • produces a nicely formatted table that is much easier to digest than profiler output.

Loops (1.5)

LoopVectorization.jl can give massive speed improvements for for loops. An example.

Manual dispatch (1.5)

It is beneficial to manually dispatch at runtime when a variable could potentially take on many types (as far as the compiler knows) but we know that only a few of those are possible. This is done automatically for small unions (known as union splitting). But for parametric types, the compiler has to look up methods in the method table at runtime because they could be extended.

The package ManualDispatch.jl has a @unionsplit macro for this purpose. But AFAIK one may just as well write out an explicit if else. This would look weird:

if x isa A
elseif x isa B

but it seems to work. See the discussion on discourse.

GPU computing (1.5)

Tutorials - Nextjournal 2019 - Cuda.jl tutorial