A blog post on profiling and benchmarking.
I find it most efficient to have a global profiling environment that
Profile, BenchmarkTools, StatProfilerHTML. To use it:
using MyPackage Pkg.activate("path/to/profiling") using BenchmarkTools, Profile, StatProfilerHTML Pkg.activate(".") include("profiling_code.jl") # for this package
The output generated by the built-in profiler is hard to read. Fortunately, there are packages that improve readability or graph the results.
ProfileView does compile now (1.3), taking a surprisingly long time. Personally, I find the presentation of StatProfilerHTML more convenient, though.
- It provides a flame graph with clickable links that show which lines in a function take up most time.
- Need to locate index.html and open it by hand in the browser after running
statprofilehtml(). But can click on path link in terminal as well.
- requires Graphviz. On MacOS, install using brew install graphviz. But it has TONS of dependencies and did not install on my system. Then PProf cannot be used.
- can be used to time selected lines of code
- produces a nicely formatted table that is much easier to digest than profiler output.
LoopVectorization.jl can give massive speed improvements for
for loops. An example.
Manual dispatch (1.5)¶
It is beneficial to manually dispatch at runtime when a variable could potentially take on many types (as far as the compiler knows) but we know that only a few of those are possible. This is done automatically for small unions (known as union splitting). But for parametric types, the compiler has to look up methods in the method table at runtime because they could be extended.
ManualDispatch.jl has a
@unionsplit macro for this purpose. But AFAIK one may just as well write out an explicit
if else. This would look weird:
if x isa A foo(x); elseif x isa B foo(x); end
but it seems to work. See the discussion on discourse.