## Documentation Center |

The `dataset` data type might be removed in
a future release. To work with heterogeneous data, use the MATLAB^{®} `table` data
type instead. See MATLAB `table` documentation
for more information.

`C = join(A,B)C = join(A,B,keys)C = join(A,B,`

`C = join(A,B)` creates a
dataset array `C` by merging observations from the
two dataset arrays `A` and `B`. `join` performs
the merge by first finding *key variables*, that
is, pairs of dataset variables, one in `A` and one
in `B`, that share the same name. Each observation
in `B` must contain a unique combination of values
in the key variables, and must contain all combinations of values
that are present in the keys from `A`. `join` then
uses these key variables to define a many-to-one correspondence between
observations in `A` and those in `B`. `join` uses
this correspondence to replicate the observations in `B` and
combine them with the observations in `A` to create `C`.

`C = join(A,B,keys)` performs
the merge using the variables specified by `keys` as
the key variables in both `A` and `B`. `keys` is
a positive integer, a vector of positive integers, a variable name,
a cell array of variable names, or a logical vector.

`C` contains one observation for each observation
in `A`. Variables in `C` include
all of the variables from `A`, as well as one variable
corresponding to each variable in `B` (except for
the keys from `B`). If `A` and `B` contain
variables with identical names, `join` adds the suffix `'_left'` and `'_right'` to
the corresponding variables in `C`.

`C = join(A,B,param1,val1,param2,val2,...)` specifies
optional parameter name/value pairs to control how the dataset variables
in

`'Keys'`— Specifies the variables to use as keys in both`A`and`B`.`'LeftKeys'`— Specifies the variables to use as keys in`A`.`'RightKeys'`— Specifies the variables to use as keys in`B`.

You may provide either the `'Keys'` parameter,
or both the `'LeftKeys'` and `'RightKeys'` parameters.
The value for these parameters is a positive integer, a vector of
positive integers, a variable name, a cell array containing variable
names, or a logical vector. `'LeftKeys'` or `'RightKeys'` must
both specify the same number of key variables, and `join` pairs
the left and right keys in the order specified.

`'LeftVars'`— Specifies which variables from`A`to include in`C`. By default,`join`includes all variables from`A`.`'RightVars'`— Specifies which variables from`B`to include in`C`. By default,`join`includes all variables from`B`except the key variables.

You can use `'LeftVars'` or `'RightVars'` to
include or exclude key variables as well as data variables. The value
for these parameters is a positive integer, a vector of positive integers,
a variable name, a cell array containing one or more variable names,
or a logical vector.

`[C,IB] = join(...)` returns
an index vector `IB`, where `join` constructs `C` by
horizontally concatenating `A(:,LeftVars)` and `B(IB,RightVars)`. `join` can
also perform more complicated inner and outer join operations that
allow a many-to-many correspondence between `A` and `B`,
and allow unmatched observations in either `A` or `B`.

`C = join(A,B,'Type',TYPE,...)` performs
the join operation specified by `TYPE`. `TYPE` is
one of `'inner'`, `'leftouter'`, `'rightouter'`, `'fullouter'`,
or `'outer'` (which is a synonym for `'fullouter'`).
For an inner join, `C` only contains observations
corresponding to a combination of key values that occurred in both `A` and `B`.
For a left (or right) outer join, `C` also contains
observations corresponding to keys in `A` (or `B`)
that did not match any in `B` (or `A`).
Variables in `C` taken from `A` (or `B`)
contain null values in those observations. A full outer join is equivalent
to a left and right outer join. `C` contains variables
corresponding to the key variables from both `A` and `B`,
and `join` sorts the observations in `C` by
the key values.

For inner and outer joins, `C` contains variables
corresponding to the key variables from both `A` and `B` by
default, as well as all the remaining variables. `join` sorts
the observations in the result `C` by the key values.

`C = join(A,B,'Type',TYPE,'MergeKeys',true,...)` includes
a single variable in `C` for each key variable pair
from `A` and `B`, rather than including
two separate variables. For outer joins, `join` creates
the single variable by merging the key values from `A` and `B`,
taking values from `A` where a corresponding observation
exists in `A`, and from `B` otherwise.
Setting the `'MergeKeys'` parameter to `true` overrides
inclusion or exclusion of any key variables specified via the `'LeftVars'` or `'RightVars'` parameter.
Setting the `'MergeKeys'` parameter to `false` is
equivalent to not passing in the `'MergeKeys'` parameter.

`[C,IA,IB] = join(A,B,'Type',TYPE,...)` returns
index vectors `IA` and `IB` indicating
the correspondence between observations in `C` and
those in `A` and `B`. For an inner
join, `join` constructs `C` by horizontally
concatenating `A(IA,LeftVars)` and `B(IB,RightVars)`.
For an outer join, `IA` or `IB` may
also contain zeros, indicating the observations in `C` that
do not correspond to observations in `A` or `B`,
respectively.

Create a dataset array from Fisher's iris data:

load fisheriris NumObs = size(meas,1); NameObs = strcat({'Obs'},num2str((1:NumObs)','%-d')); iris = dataset({nominal(species),'species'},... {meas,'SL','SW','PL','PW'},... 'ObsNames',NameObs);

Create a separate dataset array with the diploid chromosome counts for each species of iris:

snames = nominal({'setosa';'versicolor';'virginica'}); CC = dataset({snames,'species'},{[38;108;70],'cc'}) CC = species cc setosa 38 versicolor 108 virginica 70

Broadcast the data in `CC` to the rows of `iris` using
the key variable `species` in each dataset:

iris2 = join(iris,CC); iris2([1 2 51 52 101 102],:) ans = species SL SW PL PW cc Obs1 setosa 5.1 3.5 1.4 0.2 38 Obs2 setosa 4.9 3 1.4 0.2 38 Obs51 versicolor 7 3.2 4.7 1.4 108 Obs52 versicolor 6.4 3.2 4.5 1.5 108 Obs101 virginica 6.3 3.3 6 2.5 70 Obs102 virginica 5.8 2.7 5.1 1.9 70

Create two datasets and join them using the `'MergeKeys'` flag:

% Create two data sets that both contain the key variable % 'Key1'. The two arrays contain observations with common % values of Key1, but each array also contains observations % with values of Key1 not present in the other. a = dataset({'a' 'b' 'c' 'e' 'h'}',[1 2 3 11 17]',... 'VarNames',{'Key1' 'Var1'}) b = dataset({'a' 'b' 'd' 'e'}',[4 5 6 7]',... 'VarNames',{'Key1' 'Var2'}) % Combine a and b with an outer join, which matches up % observations with common key values, but also retains % observations whose key values don't have a match. % Keep the key values as separate variables in the result. couter = join(a,b,'key','Key1','Type','outer') % Join a and b, merging the key values as a single variable % in the result. coutermerge = join(a,b,'key','Key1','Type','outer',... 'MergeKeys',true) % Join a and b, retaining only observations whose key % values match. cinner = join(a,b,'key','Key1','Type','inner',... 'MergeKeys',true) a = Key1 Var1 'a' 1 'b' 2 'c' 3 'e' 11 'h' 17 b = Key1 Var2 'a' 4 'b' 5 'd' 6 'e' 7 couter = Key1_left Var1 Key1_right Var2 'a' 1 'a' 4 'b' 2 'b' 5 'c' 3 '' NaN '' NaN 'd' 6 'e' 11 'e' 7 'h' 17 '' NaN coutermerge = Key1 Var1 Var2 'a' 1 4 'b' 2 5 'c' 3 NaN 'd' NaN 6 'e' 11 7 'h' 17 NaN cinner = Key1 Var1 Var2 'a' 1 4 'b' 2 5 'e' 11 7

Was this topic helpful?