layout | title | people | publications | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
default |
Grappa: Scaling data-intensive applications on commodity clusters |
|
|
Note: Grappa is no longer under active development.
Grappa makes an entire cluster look like a single, powerful, shared-memory machine. By leveraging the massive amount of concurrency in large-scale data-intensive applications, Grappa can provide this useful abstraction with high performance. Unlike classic distributed shared memory (DSM) systems, Grappa does not require spatial locality or data reuse to perform well.
Data-intensive, or "Big Data", workloads are an important class of large-scale computations. However, the commodity clusters they are run on are not well suited to these problems, requiring careful partitioning of data and computation. A diverse ecosystem of frameworks have arisen to tackle these problems, such as MapReduce, Spark, Dryad, and GraphLab, which ease development of large-scale applications by specializing to particular algorithmic structure and behavior.
Grappa provides abstraction at a level high enough to subsume many performance optimizations common to these data-intensive platforms. However, its relatively low-level interface provides a convenient abstraction for building data-intensive frameworks on top of. Prototype implementations of (simplified) MapReduce, GraphLab, and a relational query engine have been built on Grappa that out-perform the original systems.
Grappa’s runtime system consists of three key components:
- Distributed shared memory (DSM)
- Provides fine-grain access to data anywhere in the system with strong consistency guarantees.
- Tasking system
- Supports millions of lightweight threads and global distributed work stealing for load balance.
- Communication layer
- Supports high throughput even for extremely small messages by delaying and aggregating them into larger network packets.
Grappa is freely available on Github under a BSD license. Anyone interested in seeing Grappa at work can follow the quick-start directions in the README to build and run it on their cluster. To learn how to write your own Grappa applications, check out the Tutorial.
Grappa is still quite young, so please don't hesitate to ask for help if you run into problems. To find answers to questions or ask new ones, please use Github Issues. The developers hang out in the #grappa.io
IRC channel on freenode; you can join with your favorite IRC client or this web interface. Finally, to stay up-to-date on the latest releases and information about the project, you can subscribe to the mailing list below.
{% for pub in page.publications %}
{% if pub.link %}{{ pub.title }}.{% else %}{{ pub.title }}.{% endif %} {% if pub.techreport %}(Expanded tech report){% endif %}
{{pub.authors}}
{{ pub.publication }}
{% endfor %}
Autogenerated API documentation
Grappa is a project group in the Sampa Group at the University of Washington.
{% for p in page.people %}
{% endfor %}