Analysis of Distributed File System Properties

Migara Pramod
8 min readAug 24, 2021

A Distributed File System (DFS) is a file system that is distributed on multiple file servers. Many cloud platform providers like Google, AWS and Microsoft create their own DFS which is used internally and by external customers. Sharing of stored information is the most important aspect of DFS. When creating a DFS, they have considered many properties in a distributed system like transparency, scalability, high availability, security, simplicity, data integrity and many more.

In this blog, I’m going to review Google File System, Andrew File System and Hadoop File System based on some of the fundamental properties of a distributed file system.

Google File System (GFS)

GFS Architecture

GFS clusters include three types of interdependent entities namely client, master and chunk server. Clients could be computers or applications manipulating existing files or creating new files on the system. The master server is the orchestrator or manager of the cluster system that maintains the operation log. The master server keeps track of the location of the chunks with the cluster. The clients read or write file data directory to the chunk servers. Following are some of the key properties in GFS.

High availability

In a GFS cluster there are thousands of servers. At any given time some of them can be unavailable. Therefore, they are using fast recovery and replication strategies to maintain a high available cluster.

In fast recovery strategy, they have designed both master and chunk server to restore their state in seconds no matter how they are terminated. Cluster servers shut down just killing the process. Due to this strategy client and server can feel some hiccups in the system.

In replication strategy there are two forms of replication as chunk replication and master replication.

In chunk replication, each chunk is replicated on multiple chunk servers on different racks. In GFS users have been given the freedom to specify different replication levels for different parts of the file namespace. The master clones existing replicas as needed to keep each chunk fully replicated as chunk servers go offline or detect corrupted replicas through checksum verification.

In master replication, master server logs and checkpoints are replicated on multiple machines.

When a master fails it will restart instantly. But if the disk fails, monitoring infrastructure outside the GFS starts a new master process with the replicated operational log. Also in a GFS there are shadow masters which provide read only access to the file system even when the primary master is down.They enhance read availability for files that are not being actively mutated or applications that do not mind getting slightly stale results.

Data Integrity

With thousands of servers in GFS, it is very likely to cause a disk failure and data to get corrupted. In GFS checksumming is used by chunk servers to detect corruptions in stored data. Like other metadata, checksums are kept in memory and stored persistently with logging, separate from user data. For reads requests, the chuck server performs a checksum for the requested data range and if the checksum doesn’t match, it informs the master process. With the error response the requestor will get the data from another replica and the master will clone the corrupted copy from a replica to the chunkserver. This will guarantee corrupted data is not sent on read request. But this will not capture corruption in inactive blocks in the check server. When the chunk server is idle, it performs a checksum validation for the inactive blocks. If the blocks are corrupted, the chunk server will inform the master process. Then the master can create a new uncorrupted replica and delete the corrupted replica.

Scalability

GFS uses a cluster based architecture which is highly scalable. The file system consists of hundreds or even thousands of storage machines built from inexpensive commodity parts. The largest cluster has over 1000 storage nodes, over 300 TB of disk storage, and are heavily accessed by hundreds of clients on distinct machines on a continuous basis. GFS can be scaled horizontally by adding more and more chuck servers which will cope up with the velocity of growing data and user traffic.

Consistency

The GFS applies mutations to a chunk in the same order on all its replicas. A mutation is an operation that changes the contents or metadata of a chunk such as a write or an append operation. It uses the chunk version numbers to detect any replica that has become stale due to missed mutations while its chunkserver was down. The chance of a client reading from a stale replica stored in its cache is small. This is because the cache entry uses a timeout mechanism. Also, it purges all chunk information for that file on the next file open.

Security

Both physical and virtual security is considered in GFS. GFS data center locations are undisclosed as a physical security measure. Access to the physical location is allowed only to authorized employees and vendors. GFS encrypts files which are transferred and stored. Also to communicate with GFS, clients should first authenticate with the system and file access is protected by file permissions.

Cache Management

In GFS, cache metadata is saved in client memory. Chunk server does not need cache file data. Linux systems running on the chunk server caches frequently accessed data in memory.

Cache Consistency

Append-once-read-many model is adapted by GFS. It avoids the locking mechanism of files for writing in distributed environments. Clients can append the data to the existing file.

Communication

Communication between chucks and clusters within GFS is made through TCP connections. For data transfer, pipelining is used over TCP connections

Andrew File System (AFS)

AFS Architecture

Distributed network file system which uses a set of trusted servers to present a homogeneous, location transparent file name space to all the client workstations. Intention of this file system is to support information sharing on a large scale by minimizing client-server communication. Following are some of the key properties in AFS.

Transparency

AFS gives location transparency. Because of that users do not need to remember the location of the file servers where the data is stored. Due to this transparency, system administrators can move files from one server to another without inconveniencing users, who are completely unaware of such a move.

Security

Servers locations are physically secured. And they run trusted system software on the servers. No user programs are executed in the server system to protect from injecting and running unsecured files. Encryption based data transmission is used. Users should authenticate with the system before reading or writing stored data. In AFS there is a master authentication server and a slave authentication server. User password changes and new user addition requests are handled by the master authentication server which distributes these changes to the slave authentication servers. Client requests a token from the slave authentication server and establishes a secure connection with file servers. AFS has a file system protection mechanism. For that it uses a file system access list. These access lists are associated with directories rather than individual files. Also an access list can specify negative rights. An entry in a negative rights list indicates denial of the specified rights. Negative rights decouple the problems of rapid revocation and propagation of group membership information and are particularly valuable in a large distributed system.

User mobility

AFS supports user mobility. A user can use any workstation in the system and access the authorized files in the system.

Scalability

In AFS, scalability is achieved by reducing static bindings to a bare minimum and by maximizing the number of active clients that can be supported by a server. It has a cell based structure. Callback based caching is implemented in AFS to minimize server and network load which contribute to scaling.

Cache Management

In AFS whole file or file chunks are cached. Cache is a permanent surviving reboot of the computer. Cache on the local disk, with a further level of file caching by the UNIX kernel in main memory.

Cache Consistency

In AFS cache consistency is preserved by callbacks. Callbacks ensure the user is working with the most recent copy of the file.

Communication

AFS used RPC based protocol on top of TCP/IP.

Hadoop Distributed File System (HDFS)

Hadoop is a distributed software framework designed for transforming and managing large quantities of data. HDFS provides the base support for the storage of files in the storage node. HDFS splits the large data files into parts that are managed by different machines in the cluster. HDFS has a master-slave architecture which consists of a single namenode, a master server, and many datanodes, called slaves in the architecture. The namenode is a central server that manages the namespace in the file system. The data node manages the data stored in it. HDFS is a highly fault-tolerant distributed file system designed to be deployed on low-cost hardware. Following are some of the key properties in HDFS.

Replication management

HDFS is designed to store very large files across machines in a large cluster reliably. Each file was stored as a sequence of blocks such as all blocks in a file except the last block. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. HDFS used a rack-based system that maintains two copies of each block by default. These blocks are stored by different Datanodes in the same rack and a third copy is stored on a DataNode in a different rack. An application can specify the number of replicas of a file that should be maintained by HDFS. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time. The Namenode makes all decisions regarding the replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the Datanodes in the cluster. Receipt of a Heartbeat implies that the Datanode is functioning properly. A Blockreport contains a list of all blocks on a Datanode.

Consistency

The HDFS used checksums with each data block to maintain data consistency. The checksums were verified when the client is reading the file helped to detect corruption caused either by the client, the DataNodes, or the network. When a client created an HDFS file, it computed the checksum sequence for each block and sends that to a DataNode along with the data. DataNode stored checksums in a metadata file separate from the block’s data file. When HDFS reads a file, each block’s data and checksums are returned to the client.

Scalability

HDFS uses a cluster based architecture. HDFS clusters can scale horizontally.

Security

HDFS security is based on the POSIX model of users and groups. Security is limited to simple file permissions.

Cache management

HDFS uses distributed cache for cache management. It is a facility provided by MapReduce framework to cache application-specific, large, read-only files (text, archives, jars, and so on) It used private (belonging to one user) and public (belonging to all the user of the same node) Distributed Cache Files.

Cache consistency

HDFS’s write-once-read-many model that relaxes concurrency control requirements, simplifies data coherency, and enables high throughput access.

Communication

HDFS used RPC based protocol on the top of TCP/IP.

All right. That’s it. Hope you have got some idea about properties of different distributed file systems.

You can refer below links for more in depth details on DFS.

http://www.lnse.org/papers/27-D0046.pdf

https://static.googleusercontent.com/media/research.google.com/en//archive/gfs-sosp2003.pdf

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.454.4159&rep=rep1&type=pdf

--

--