Clark School Home UMD

ISR News Story

The Dwarf Structure for Creating, Storing, and Querying Highly Compressed Data Cubes (ISR IP)

ISR intellectual property available to license

Inventors: Nicholas Roussopoulos, John Sismanis, Antonios Deligiannakis, Ioannis Kotidis

U.S. Patent: 7,133,876

University of Maryland researchers have developed and implemented a super-compressed data structure called Dwarf for storing and querying cubes of multidimensional data. Dwarf can "slice and dice" data cubes on every possible projection. For example, click-data from a web ISP server can be stored very efficiently in data slice that group and aggregate data by click destination, or by destination, source, by time (minute/hour/day/week/month/year, etc.). Such groupings are then used in "data mining" over an extremely large set of log data.

Dwarf automatically identifies prefix and suffix structural redundancies and factors them out by coalescing their store. Prefix redundancy is high on dense areas of cubes but suffix redundancy is significantly higher for sparse areas. Putting the two together fuses the exponential sizes of high dimensional full cubes into a dramatically condensed data structure. The elimination of suffix redundancy has an equally dramatic reduction in the computation of the cube because recomputation of the redundant suffixes is avoided. This effect is multiplied in the presence of correlation amongst attributes in the cube. A Petabyte 25-dimensional cube was shrunk this way to a 2.3GB Dwarf Cube, in less than 20 minutes, a 1:400000 storage reduction ratio. Still, Dwarf provides 100% precision on cube queries and is a self-sufficient structure which requires no access to the fact table. What makes Dwarf practical is the automatic discovery, in a single pass over the fact table, of the prefix and suffix redundancies without user involvement or knowledge of the value distributions.

The dwarf structure plays a double role as a storage and indexing mechanism for high dimension data. Roll-up and drill-down queries seem to benefit from the dwarf structure due to common paths that are exploited while caching. In terms of update speed, dwarf by far outperforms the closest competitor for storing the full data cube, while their performance is comparable when the competitor is reduced to storing only a partial cube of the same size as Dwarf. For the APB-1 Benchmark density 5, the fastest performance results are published by Oracle-Express and take 4.5 hours to generate a partial cube of about 30GB while the corresponding Dwarf cube is generated in 21 minutes and has a size of 4.3GB.

A number of Dwarf patents are pending.

For more information
If you would like to license this intellectual property, have questions, would like to contact the inventors, or need more information, contact ISR External Relations Director Jeff Coriale at or 301.405.6604.

Find more ISR IP
You can go to our main IP search page to search by research category or faculty name. Or view the entire list of available IP on our complete IP listing page.

ISR-IP-Roussopoulos ISR-IP-multidimensional ISR-IP databases

Related Articles:
Probabilistic Wavelet Synopses For Multiple Measures (ISR IP)
Confidentiality Preserving Rank-Ordered Search (ISR IP)
Indexing RDF and Temporal RDF Databases (ISR IP)
The T-REX RDF Extraction System (ISR IP)
Method and Apparatus for Bandwidth-Constrained Queries in Sensor Networks (ISR IP)
D-Dupe: A Visual Interface for Relational DeDuplication (ISR IP)
HCE: Hierarchical Clustering Explorer (ISR IP)
Treemap 4.0 (ISR IP)
Optimal Data Diagnosis Algorithms (ISR IP)
Treemap 3.0 (ISR IP)

June 19, 2007

Prev   Next



For more information, contact ISR External Relations Director
Jeff Coriale at or 301.405.6604.

Current Headlines

Narayan is PI for NSF information-theoretic signal processing sampling research grant

The Battery Revolution

Alumnus Domenic Forte is PECASE recipient

Hyperdimensional computer theory featured in Voice of America video

Adomaitis explains model-based design in video talk

Ankur Srivastava named seventh director of ISR

Building Tomorrow’s Aviation Together: Airport, UAS Test Site Celebrate Ties

Derek Paley named Maryland Robotics Center director

Do Good Robotics Start-Up Competition coming this fall

Looking For Freshwater In All the Snowy Places

News Resources

Return to Newsroom

Search News

Archived News

Events Resources

Events Calendar