Clark School Home UMD

ISR News Story

The Dwarf Structure for Creating, Storing, and Querying Highly Compressed Data Cubes (ISR IP)

ISR intellectual property available to license

Inventors: Nicholas Roussopoulos, John Sismanis, Antonios Deligiannakis, Ioannis Kotidis

U.S. Patent: 7,133,876

Description
University of Maryland researchers have developed and implemented a super-compressed data structure called Dwarf for storing and querying cubes of multidimensional data. Dwarf can "slice and dice" data cubes on every possible projection. For example, click-data from a web ISP server can be stored very efficiently in data slice that group and aggregate data by click destination, or by destination, source, by time (minute/hour/day/week/month/year, etc.). Such groupings are then used in "data mining" over an extremely large set of log data.

Dwarf automatically identifies prefix and suffix structural redundancies and factors them out by coalescing their store. Prefix redundancy is high on dense areas of cubes but suffix redundancy is significantly higher for sparse areas. Putting the two together fuses the exponential sizes of high dimensional full cubes into a dramatically condensed data structure. The elimination of suffix redundancy has an equally dramatic reduction in the computation of the cube because recomputation of the redundant suffixes is avoided. This effect is multiplied in the presence of correlation amongst attributes in the cube. A Petabyte 25-dimensional cube was shrunk this way to a 2.3GB Dwarf Cube, in less than 20 minutes, a 1:400000 storage reduction ratio. Still, Dwarf provides 100% precision on cube queries and is a self-sufficient structure which requires no access to the fact table. What makes Dwarf practical is the automatic discovery, in a single pass over the fact table, of the prefix and suffix redundancies without user involvement or knowledge of the value distributions.

The dwarf structure plays a double role as a storage and indexing mechanism for high dimension data. Roll-up and drill-down queries seem to benefit from the dwarf structure due to common paths that are exploited while caching. In terms of update speed, dwarf by far outperforms the closest competitor for storing the full data cube, while their performance is comparable when the competitor is reduced to storing only a partial cube of the same size as Dwarf. For the APB-1 Benchmark density 5, the fastest performance results are published by Oracle-Express and take 4.5 hours to generate a partial cube of about 30GB while the corresponding Dwarf cube is generated in 21 minutes and has a size of 4.3GB.

A number of Dwarf patents are pending.

For more information
If you would like to license this intellectual property, have questions, would like to contact the inventors, or need more information, contact ISR External Relations Director Jeff Coriale at coriale@umd.edu or 301.405.6604.

Find more ISR IP
You can go to our main IP search page to search by research category or faculty name. Or view the entire list of available IP on our complete IP listing page.

ISR-IP-Roussopoulos ISR-IP-multidimensional ISR-IP databases

Related Articles:
Probabilistic Wavelet Synopses For Multiple Measures (ISR IP)
Confidentiality Preserving Rank-Ordered Search (ISR IP)
Indexing RDF and Temporal RDF Databases (ISR IP)
The T-REX RDF Extraction System (ISR IP)
Method and Apparatus for Bandwidth-Constrained Queries in Sensor Networks (ISR IP)
D-Dupe: A Visual Interface for Relational DeDuplication (ISR IP)
HCE: Hierarchical Clustering Explorer (ISR IP)
Treemap 4.0 (ISR IP)
Optimal Data Diagnosis Algorithms (ISR IP)
Treemap 3.0 (ISR IP)

June 19, 2007


Prev   Next

 

 

For more information, contact ISR External Relations Director
Jeff Coriale at coriale@umd.edu or 301.405.6604.

Current Headlines

Alumni Profile: Paul Gaske

Ephremides is PI for new ONR 'age of information' grant

UMD Hosts Third Technica with over 800 Students from Around the Country

Srivastava Named Clark School Inaugural Associate Dean for Graduate Affairs

Pamela Abshire elected IEEE Fellow

Shihab Shamma elected IEEE Fellow

Wu Named AAAS Fellow

Baras, Kenyon, Hughes Network Systems win MIPS Global Impact Award

New microsystems detect, treat bacterial biofilms that cause post-operative infections

Srivastava named Clark School Dean for Graduate Affairs

News Resources

Return to Newsroom

Search News

Archived News

Events Resources

Events Calendar