[contrail xtreemfs]

A distributed and replicated file system for the Cloud

Cloud issues addressed

Cloud computing is distributed computing. This requires seamless integration and availability of data in many  (physical) places. For users the seamless and fast access part is important. For data providers, protection of data and providing the right levels of access counts. For infrastructure administrators, manageability of data and infrastructure is important. But overall it is reliability that counts: despite the distributed nature of the storage and the sometimes unreliable Cloud and networking infrastructure, the data has to be there when the user's application needs it. And it has to stay available when the application is active, even if some networking nodes or file servers crash.

Contrail's Solution

Cloud data storage capacity seems endless. Numerous virtual resources offer distributed and redundant pay for what you use capacity at a bargain. Reliable, elastic and always accessible. But where is your data? Somewhere invisible in the Cloud or on a private Cloud of your competitor? How reliable is your provider and how secure is your data?

Best would be to set up your own distributed storage, using XtreemFS. It is a distributed and replicated file system for the Cloud and offers significant leverage compared to other solutions. First: XtreemFS is POSIX compliant. From a user's point of view, it looks and behaves like a local file system. Any application can therefore access XtreemFS without having to be adapted to a specific Cloud storage systems and APIs. This can considerably reduce development and maintenance costs of applications for the Cloud. You are by no means limited to POSIX, since XtreemFS also provides alternative Cloud storage interfaces.

Second: XtreemFS offers high availability and data safety through replication. Unlike most other Cloud storage systems, it offers strong replica consistency and provides different replication mechanisms. Read-only replication can e.g. be used to prevent I/O bottlenecks in Content Distribution Network (CDN)-like scenarios, in which large chunks of data (e.g., virtual machine images) are accessed by many users at the same time. Alternatively, read-write replication can be used to ensure that any written data is safely stored in the face of storage device failures.

Third: Locality of data. You can tell XtreemFS for every file, which resource providers can be used, depending on your business needs.

Replication Features of the XtreemFS file system: SPOF-free: No single point of failure since all services are replicated; Efficient Read-Only Replication to increase I/O throughput for immutable files; POSIX compatibility; Support for SSL and X.509 certificates (no need for VPN); Global distributed (Cloud) installations; Elasticity & scalability; Extensible through policies and plug-ins; Striping for Parallel I/O; Asynchronous MRC backups and files system snapshots; Metadata caching on the client side; Network and geo location awareness; Data integrity.

In addition to this core functionality, XtreemFS offers a wide  range of additional features:

  • Striping: XtreemFS can spread chunks of a single file across multiple storage servers, thus increasing throughput when accessing large files.
  • Client-side Metadata Caching: XtreemFS clients can maintain a local metadata cache to ensure low-latency access to metadata.
  • Snapshots: XtreemFS can record consistent snapshots of volumes.
  • Checksums: Storage servers are capable of calculating and verifying checksums whenever data is read or written, so as to detect corruptions of file content.
  • Hadoop Support: XtreemFS can be accessed by Apache Hadoop applications through an HDFS adapter.
  • Monitoring: XtreemFS installations can be easily monitored with third-party monitoring tools like Ganglia and Nagios through an SNMP-based monitoring service.

For storage of Cloud user data there are many cloud storage solutions - but only few provide POSIX file system semantics! Ideally, users can run their applications on a cloud without major modifications. XtreemFS provides the same guarantees as a local file system when files are accessed, even if accesses are directed to different replicas. Furthermore, XtreemFS can use a shared pool of storage resources for the data of many different users while protecting the interests of individual users in terms of privacy and isolation.

There are several  features that make XtreemFS specifically suitable for Cloud computing environments.
One of them is XtreemFS' POSIX® compatibility that  provides the same interface and operation semantics as a common local Linux file system. Applications can thus use XtreemFS without having to be adapted to a specific storage subsystem.

XtreemFS supports elasticity. Servers can be easily and dynamically added to an XtreemFS installation in order to increase storage and I/O capacity of the file system. This can happen at any time without having maintenance downtimes. Newly added servers are immediately integrated in the system.

Data safety is inherent to XtreemFS that  provides robustness in the event of storage device failures by means of replication. Maintaining multiple replicas of files and metadata ensures data safety even if underlying storage devices take physical damage.

High Availability makes it suitable for larger environments. In peta-scale storage installations, hardware failures and downtimes are the norm rather than the exception. XtreemFS transparently resorts to available replicas of files and metadata if individual servers become unavailable. XtreemFS supports off-site replication over wide area networks to ensure availability even in the event of downtimes of entire data centres.

XtreemFS comes with an integrated security infrastructure that protects data from unauthorized access. SSL connections ensure that data transfers between clients and servers are  encrypted. X.509 certificates enable a secure authentication of individual users. POSIX® permissions and ACLs provide the basis for a fine-grained access control to different data volumes.

Extensibility is a last feature that makes XtreemFS Cloud ready. Most behavior in XtreemFS can be controlled by means of policies. Examples are authentication and authorization of users, placement of files and replicas, and selection of replicas. In addition to using predefined policies, XtreemFS offers a plug-in mechanism to support custom user-defined policies.

XtreemFS is available for Linux (openSUSE, SLE, Fedora, CentOS, RHEL, Mandriva, Debian, Ubuntu, Gentoo), Mac OS X and Windows.

More information

  1. Documentation: Contrail XtreemFS User Guide
  2. Download: Download directory
  3. Website: http://xtreemfs.org