Unibz 2008/2009 Distributed Databases Final Project FAQ
created: 2009-04-30 12:00
updated: 2009-05-01 15:00
updated: 2009-05-28 16:54
What algorithms do I need to implement?
The article talks about:
What aggregate function do I need to implement?
Implement count(). Optionally, add others such
as sum() or avg().
I don't understand how to merge the leaves?
Just think about the last step in merge sort, where two sorted arrays are merged into
one (final) array. You can do that in O(n). Don't use nested loops.
I don't understand how to compute the global partition set in TDM-C?
The article does not really give a good, general algorithm. You should just do something simple,
like evenly split your total interval. In the general case this is not the most efficient solution,
but we start from random data that is equally distributed, so it doesn't really matter.
My benchmark results don't really make any sense. What should I do?
First of all you should be aware that embedded HSQLDB is thread-safe but not multi-threaded.
If your benchmark includes the I/O step (get the data from HSQLDB into the worker
threads) it will be very much skewed by HSQLDB's locking issues. One solution is to
load all the data into an java.util.ArrayList and then start the worker
threads that access the list in an unsynchronized way (java.util.ArrayList,
like all members of the collection framework, is not thread-safe, but read access won't give
you any problems). Idea credits Heinisch' group.
Next, you should note that TDM-C starts to win over PM only if you've got very high
parallelism. For a very high thread count PM's merge steps become inefficient: think about
1024 threads: you need 10 iterations of pairwise merges to get the final result, each
iteration must wait for the previous to complete. The article mentions that TDM-C starts
to win if the core count is more than 32 cores - way more than we've got on our computers.
How do I hand in the final project?
updated
You need to discuss your work with Prof. Gamper and me (chris at 1006 dot org, +39 340 2251141).
A formal presentation is not required (no slides, no report), but you should bring
your notebook and be prepared to run the code and answer questions about it. All group members
must be present. Also, send an archive of your source code (tar.gz or zip) to me.
The discussion can be done: