# Data Handling¶

DataFrames tutorials:

Useful packages:

- DataSkimmer.jl produces a summary of tabular data (e.g.
`DataFrames`

), including histogram. `Strapping.jl`

converts between`struct`

s and tables.`SplitApplyCombine.jl`

contains data manipulation routines, such as`splitdims`

(converting between vectors of vectors and matrices etc.),`group`

,`innerjoin`

. Similar to what`DataFrames`

offers, but for additional data types.`InvertedIndices.jl`

for selecting when conditions are not true.

## DataFrames¶

Chaining transformations

- only have a single combine with multiple transformations as in

`combine(df, :a => sum, :b => mean)`

- even with
`@chain`

from`Chain.jl`

multiple`combine`

in a row do not work. The result of each`combine`

is fed into the next step. Which makes sense.

Converting to multi-dimensional array:

Deleting columns:

- Using
`Not`

from`InvertedIndices`

:`select!(df, Not(:x1));`

Renaming columns:

`rename!(df, :old => :new)`

Vector valued outputs of a transformation:

- Example: compute grouped quantiles
`combine(gdf, [:y, :wt] => Ref ∘ ((y, wt) -> quantile(y, FrequencyWeights(wt), [0.1, 0.7])))`

- The composition of
`Ref`

with the actual transformation prevents broadcasting the results (which would produce one row for each quantile)

### Grouping¶

The following gives all the rows of the original `DataFrame`

for which the "keys" have the selected values:

```
df = DataFrame(pk1=rand(1:10, 100), pk2=rand('a':'z', 100), value=rand(100));
gdf = groupby(df, [:pk1, :pk2]);
dfSub = gdf[(1, 'a')]
```

## Missing Values¶

`Missings.jl`

has convenience functions for dealing with missing values.

Also useful, but more general is Skipper.jl. Example:

```
sa = skip(x -> isnan(x) || isinf(x), data);
dataMean = mean(sa); # Ignores skipped value
sa .* 2; # Ignores skipped values
sa[2]; # Uses original indices, if not skipped.
complement(sa) .= mean(sa); # Sets skipped value
```