Inner join() Producing Duplicate Entries

6 Ansichten (letzte 30 Tage)
Paul Quelet
Paul Quelet am 1 Sep. 2014
Bearbeitet: Oleg Komarov am 3 Sep. 2014
I have several large times series of meteorological variables from the same measurement tower. I wanted to compare data values as the exact same measurement points in time. I setup serial date numbers and values into dataset() arrays similar to the following post:
I followed Message 5 to code something like C = join(Dataset1, Dataset2, 'Type', 'inner'). The results looked good at first input dates like the following:
[DS1.Time DS2.Time] =
01-Jan-2012 00:07:22 01-Jan-2012 00:07:22
01-Jan-2012 00:17:22 01-Jan-2012 00:17:22
01-Jan-2012 00:37:22 01-Jan-2012 00:37:22
01-Jan-2012 00:47:22 01-Jan-2012 00:47:22
01-Jan-2012 00:57:22 01-Jan-2012 00:57:22
01-Jan-2012 01:47:22 01-Jan-2012 01:07:22
01-Jan-2012 01:57:22 01-Jan-2012 01:27:22
01-Jan-2012 02:07:22 01-Jan-2012 01:47:22
01-Jan-2012 02:17:22 01-Jan-2012 01:57:22
01-Jan-2012 02:27:22 01-Jan-2012 02:07:22 ...
so that the resulting dates (with data) using C = join(DS1,DS2,'Type','inner') would be:
C.Time =
01-Jan-2012 00:07:22
01-Jan-2012 00:17:22
01-Jan-2012 00:37:22
01-Jan-2012 00:47:22
01-Jan-2012 00:57:22
01-Jan-2012 01:47:22
01-Jan-2012 01:57:22
01-Jan-2012 02:07:22
01-Jan-2012 02:17:22
01-Jan-2012 02:27:22 ...
The problems started when I would take the output C to perform more time series merging. From and inner join being like and intersection of the times in two datasets, it stands to reason that length(C) <= length(DS1) and length(C) <= length(DS2). This became not the case using Cnew = join(C,DS4,'Type','inner'). Checking the times on the ends looked fine, but I finally discovered repeated data rows in the middle of the resulting dataset like:
Cnew.Time =
...
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11
02-Nov-2012 09:50:11 ...
After much investigation, the only way I found to fix this problem after an inner join was to use the unique() function in the following way:
Cnew = join( C, DS4, 'key', 'Time', 'Type', 'inner', 'MergeKeys', true ) ;
CnewUnique = unique(Cnew , 'Time') ;
This would finally produce the output I was looking for:
CnewUnique.Time = ...
02-Nov-2012 09:00:11
02-Nov-2012 09:10:11
02-Nov-2012 09:20:11
02-Nov-2012 09:30:11
02-Nov-2012 09:40:11
02-Nov-2012 09:50:11
02-Nov-2012 10:00:11
02-Nov-2012 10:10:11
02-Nov-2012 10:20:11 ...
This took many hours to figure out so I wanted to ask the following question(s):
  1. Why was the join(...,'inner',...) not working the way I expected, as it did before?
  2. Is there a better way to match up the times from several time series? (I did not have success with the synchronize function either for an "intersection" of the times.)
  3. Has anyone else had a similar problem? Is Matlab possibly having a "bug"-type behavior here?
Any insights are appreciated. Thank you for contributing this this post.
  4 Kommentare
per isakson
per isakson am 2 Sep. 2014
Bearbeitet: per isakson am 2 Sep. 2014
Disclaimer: I have not worked with dataset of the Stat Toolbox. But, I have worked with time series, meteorological and others.
I looked at the code of join. It uses the function unique for comparison. unique cannot handle double well. So why isn't there a test in the code? At least warning would have been appropriate.
The documentation of join says: &nbsp C = join(A,B,keys) performsthe merge using the variables specified by keys as the key variables in both A and B. keys is a positive integer , a vector of positive integers, a variable name ,a cell array of variable names, or a logical vector .
My conclusion is that serial date numbers (double) cannot be used as keys in join.
I have ended up using "serial second number" stored as uint32 to avoid problem like this.
Oleg Komarov
Oleg Komarov am 3 Sep. 2014
Bearbeitet: Oleg Komarov am 3 Sep. 2014
What if you try to use table() instead of dataset? The table.join() has no restriction on the type of variable that you can use as keys.

Melden Sie sich an, um zu kommentieren.

Antworten (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by