Filesystems for Grid Computing

PI: Douglas Thain, University of Notre Dame

Grid computing systems such as the Open Science Grid and the NSF TeraGrid give users easy access to hundreds or thousands of CPUs at once. However, within computing grids, it is not always easy to access one’s data. Traditional filesystems such as NFS and AFS are not usable in most grid computing systems, because they require privileged access to install and use at both client and server side. A user of grid computing rarely has such access.

To remedy this problem, we have designed and implemented a variety of filesystems for grid computing, all based on the Parrot and Chirp software. These user-level tools can be deployed without special privileges into existing grids, and used to access data wherever it may be located. We work directly with users in bioinformatics and high energy physics to design and deploy production filesystem services. You can download and use our software from this page.

Related Publications

  1. Wharf: Sharing Docker Images in a Distributed File System
    Chao Zheng, Lukas Rupprecht, Vasily Tarasov, Douglas Thain, Mohamed Mohamed, Dimitrios Skourtis, Amit S. Warke, and Dean Hildrebarnd
    In ACM Symposium on Cloud Computing, 2018
    doi: 10.1145/3267809.3267836
  2. Taming Metadata Storms in Parallel Filesystems with MetaFS
    Tim Shaffer and Douglas Thain
    In Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, 2017
    doi: 10.1145/3149393.3149401
  3. The Evolution of Global Scale Filesystems for Scientific Software Distribution
    Jakob Blomer, Predrag Buncic, Rene Meusel, Gerardo Ganis, Igor Sfiligoi, and Douglas Thain
    IEEE/AIP Computing in Science and Engineering, 2015
    doi: 10.1109/MCSE.2015.111
  4. Fine-Grained Access Control in the Chirp Distributed File System
    Patrick Donnelly and Douglas Thain
    In IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, 2012
    doi: 10.1109/CCGrid.2012.128
  5. A Rich Metadata Filesystem for Scientific Data
    Hoang Bui
    2012
  6. chirp-didc-chapter.webp
    Data Intensive Computing with Clustered Chirp Servers
    Douglas Thain, Michael Albrecht, Hoang Bui, Peter Bui, Rory Carmichael, Scott Emrich, and Patrick Flynn
    In Data Intensive Distributed Computing: Challenges and Solutions for Large Scale Information Management, 2012
    isbn: 9781615209712
  7. Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop
    Patrick Donnelly, Peter Bui, and Douglas Thain
    In IEEE International Conference on Cloud Computing Technology and Science, 2010
    doi: 10.1109/CloudCom.2010.74
  8. ROARS: A Scalable Repository for Data Intensive Scientific Computing
    Hoang Bui, Peter Bui, Patrick Flynn, and Douglas Thain
    In The Third International Workshop on Data Intensive Distributed Computing at ACM HPDC 2010, 2010
    doi: 10.1145/1851476.1851587
  9. CDF Software Distribution on the Grid using Parrot
    Gabrielle Compostella, Simone Pagan Griso, Donatella Lucchesi, Igor Sfiligoi, and Douglas Thain
    In Computing in High Energy Physics, 2009
    doi: 10.1088/1742-6596/219/6/062009
  10. Experience with BXGrid: A Data Repository and Computing Grid for Biometrics Research
    Hoang Bui, Michael Kelly, Christopher Lyon, Mark Pasquier, Deborah Thomas, Patrick Flynn, and Douglas Thain
    Journal of Cluster Computing, 2009
    doi: 10.1007/s10586-009-0098-7
  11. Chirp: A Practical Global Filesystem for Cluster and Grid Computing
    Douglas Thain, Christopher Moretti, and Jeffrey Hemmes
    Journal of Grid Computing, 2009
    doi: 10.1007/s10723-008-9100-5
  12. Efficient Access to Many Small Files in a Filesystem for Grid Computing
    Douglas Thain and Christopher Moretti
    In IEEE Grid Computing, 2007
    doi: 10.1109/GRID.2007.4354139
  13. Flexible Object Based Filesystems for Scientific Computing
    Christopher Moretti
    2007
  14. Grid Deployment of Legacy Bioinformatics Applications with Transparent Data Access
    Christophe Blanchet, Remi Mollon, Douglas Thain, and Gilbert Deleage
    In IEEE Grid Computing, 2006
    doi: 10.1109/ICGRID.2006.311006
  15. Operating System Support for Space Allocation in Grid Storage Systems
    Douglas Thain
    In IEEE Grid Computing, 2006
    doi: 10.1109/ICGRID.2006.311004
  16. Cacheable Decentralized Groups for Grid Resource Access Control
    Jeffrey Hemmes and Douglas Thain
    In IEEE Grid Computing, 2006
    doi: 10.1109/ICGRID.2006.311015
  17. Transparent Access to Grid Resources for User Software
    Sander Klous, Jamie Frey, Se-Chang Son, Douglas Thain, Alain Roy, Miron Livny, and Jo Brand
    Concurrency and Computation: Practice and Experience, 2006
    doi: 10.1002/cpe.961
  18. Using Condor Glide-Ins and Parrot to Move from Dedicated Resources to the Grid
    Stefano Belforte, Matthew Normal, Subir Sarkar, Ifor Sfiligoi, Douglas Thain, and Frank Wuerthwein
    Lecture Notes in Informatics, 2006
  19. Transparently Distributing CDF Software with Parrot
    Douglas Thain, Christopher Moretti, and Igor Sfiligoi
    In Computing in High Energy Physics, 2006
  20. The Consequences of Decentralized Security in a Cooperative Storage System
    Douglas Thain, Christopher Moretti, Paul Madrid, Phil Snowberger, and Jeff Hemmes
    In Workshop on Security in Storage at IEEE FAST, 2005
    doi: 10.1109/SISW.2005.11
  21. Separating Abstractions from Resources in a Tactical Storage System
    Douglas Thain, Sander Klous, Justin Wozniak, Paul Brenner, Aaron Striegel, and Jesus Izaguirre
    In IEEE/ACM Supercomputing, 2005
    doi: 10.1109/SC.2005.64
  22. Parrot: An Application Environment for Data-Intensive Computing
    Douglas Thain and Miron Livny
    Scalable Computing: Practice and Experience, 2005
  23. Explicit Control in a Batch Aware Distributed File System
    John Bent, Douglas Thain, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny
    In USENIX Networked Systems Design and Implementation (NSDI), 2004
  24. Parrot: Transparent User-Level Middleware for Data Intensive Computing
    Douglas Thain and Miron Livny
    In Workshop on Adaptive Grid Middleware at PACT, 2003
  25. The Kangaroo Approach to Data Movement on the Grid
    Douglas Thain, Jim Basney, Se-Chang Son, and Miron Livny
    In IEEE High Performance Distributed Computing, 2001
    doi: 10.1109/HPDC.2001.945200