A Stitch in Code Saves Nine - An Interview with Dr. Vijay Kumar, IISc

Prashanth Hebbar, Managing Partner, Knobly Consulting
Vijay Kumar, ACCS-CDAC

Prof. Vijay Kumar was conferred the ACCS-CDAC Foundation award for 2018, along with Prof. V. Kamakoti of Indian Institute of Technology, Madras. Prof. Vijay Kumar is recognized for his seminal contributions to error-correcting codes and signal design for wireless communication which has been adopted into WCDMA Cellular Standards.

In today’s data hungry world, Prof. Vijay Kumar’s work, which appears deeply theoretical, is a very important contribution towards solving practical problems faced by the industry. Prof. Vijay Kumar did his BTech from IIT Kharagpur, and MTech from Kanpur before leaving for US to pursue his PhD at the University of Southern California in 1979. He joined the same department as faculty in 1983. His association at USC with Dr. Irving Reed of the Reed-Solomon Codes fame gave a thrust to his deep-set interest in pursuing a career in coding theory. Prof. Vijay Kumar joined the Department of Electrical Communication Engineering at the Indian Institute of Science (IISc) in 2003.

Prof. Vijay Kumar’s work started turning heads in industry and academia when he started delving into erasure coding as applied to computing and storage. CLAY (short for coupled-layer) codes are erasure codes designed to bring about significant savings in terms of network bandwidth and disk IO when a failed node/OSD/rack is being repaired. These codes, belonging to Minimum Storage Regenerating (MSR) class of code, offer a number of properties including low storage overhead and optimal disk I/O among others and make recovery of lost data from a storage node faster, accurate and with less expensive. Prof. Vijay Kumar and his team of researchers are applying regenerative erasure coding to new interesting domains.

Here’s an excerpt from an interview by Prashanth Hebbar from ACC Journal .

Your team made seminal contributions to Clay code by showing that any MDS code can be used to build the Clay code. Can you explain what led to the important work?

Our work on Clay code was started soon after a discussion on the limitations of Reed-Solomon codes with a visiting Berkeley professor. Two of our bright students from my department came up with the first instructions to what would be a Clay Code that works with MDS (Maximum Distance Separable) code making it a regenerating code instead of a replication code.

The Clay code, as the very name suggest, is constructed by MDS code in multiple layers and performing pair-wise coupling across layers. A Clay code can be built from Reed-Solomon code or any MDS codes. For example, take a block of storage which is encoded using Reed-Solomon code, we can pick up 4 layers and start coupling the layers by replacing every pair with a different pair to make a Clay code. We use Galois Field computations to do these operations. Among the three teams which contributed to this work, the fact that you can use any MDS code was shown by our team. What it means is that, if someone gives us a [20, 16] code, where the block length is 20 and dimension is 16 (which means, there are 16 information symbols and four pieces of redundancy making it 20), we take that and build the Clay code using MDS code by applying either the Galois Field operations or the more simpler XOR operations if required.

How can practitioners use an implementation of your work and how accessible is it?

We have implemented Clay codes and have made it available as open-source under LGPL license. Clay code is available as a plug-in for the open source unified distributed storage software, CEPH. CEPH traditionally supported scalar erasure codes like Reed-Solomon code. We have modified the CEPH to support Clay code, which is a vector code. Our work is now part of the master code base of CEPH.

You have also worked on Local Reconstruction Codes (LRC) and have extended that work in an interesting direction, Hierarchical codes, can you give us a context to this problem domain?

Today, we have built several classes of codes and with different properties. The regenerating codes is one direction, another direction is what we call the locally repairable codes. We are making a lot of important contribution to this branch of erasure coding. This branch was started by a team in Microsoft consisting of very smart computer scientists, Sergey Yekhanin, Parikshit Gopalan and others. They came up with this scheme and called it Codes with Locality but now it is called Local Reconstruction Codes (LRC). Microsoft used it in the Windows Azure Systems. Windows Azure data is protected by this code, a [18, 14] code (18 total node, 14 data chunks). That has reportedly saved Microsoft anywhere between millions to hundreds of millions of dollars. Because they were able to replace an existing RS code which had 50% [storage] overhead as against the LRC which only has 29% [storage] overhead. When you spread this gain over huge amount of data, it translates into huge cost savings.

We have made some interesting continuations to the LRC. For example, we had a situation where within a larger erasure code, there were smaller erasure codes built in. Earlier, it was done using RS codes which allowed recovering the data from a localized code. The question of what happens in the case of hierarchical nodes, how much of the network should be disturbed became important. We looked at it and came up with a code which allowed one to say, if you do not have this machine then you can look at the next level and go on through the hierarchy until you find the node which enables you to reconstruct lost data. You do not have to touch machines that do not directly belong to the hierarchy. For the same reason, it is called the Hierarchical code and we contributed that code to the literature [on LRC]. We brought the notion of Hierarchical code and build the code. Later, we realized that there existed a notion of Hierarchical code but the team which worked on it had no claims to what the fundamental limits are, the bounds beyond which you cannot do better.

What makes Clay codes so important, what are its benefits?

Any erasure code is useful to the society or industry if we show it is optimum within the bounds. For example Clay code is optimum in four respects. First, they have the smallest amount of storage overheads because they are examples of MDS codes; they have smallest repair bandwidth because they are regenerating codes; there is also the concept of the least possible distance – i.e. if one node fails, you can call upon other helper nodes to fix it; there is an industry which is concerned with how much you read and download; let’s say you read a lot [of data] but actually download less; still you are disturbing the machine; that’s called discrete or optimal access. Clay code allows us to be minimal and optimum in that.

What do you see in the horizon that is interesting in regenerating erasure codes? Are you working on any interesting problem domain?

Two things. One, in this regenerating code there is still one hole that needs to be filled namely the number of layers. If you want the minimum possible repair bandwidth which is the least amount of network traffic, Clay codes are good but they require you to layer a certain number. In the case of 20 [block length] it’s 1024 [layers of RS code] but the industry may say that I am only willing to do 16 layers, for that nobody has an answer. So, the question is if I limit the number of layers because I want to work in smaller chunks what is the best you can do for me. I understand that you can give me everything a Clay code can but can you make me a half-way which gets me the best of both worlds. So, there is a code worked upon by a former student of mine, Rashmi who is now with CMU, and with another student Nihar. They published a paper called Piggybacking where they took only two layers of Reed-Solomon codes and that already got attention because it reduces the bandwidth significantly. So, the next natural question is if you can do something with Piggybacking with two layers and Clay code does all of this, what is the tradeoff in between, what can you do to reduce the layers. So that’s an open problem.

We are really in the business of using erasure coding wherever it is applicable. One of my students is working on streaming code. Nowadays, there is data that is streaming and you have a stream of packets coming in and packets may occasionally get dropped. So you have to build redundancy into the stream so that it doesn’t cause unacceptable losses. And also you want to do it with very low latency, meaning that if a streaming packets are coming and you want to decode a particular packet, you are willing to delay by a few packets – maybe 3, 4 – if this is the fifth packet, by the ninth packet comes in, I want you to decode the fifth. That puts a constraint on the erasure coding because it essentially means that I can’t build a long code and wait for the entire code to arrive and then decode. So, I have to do it on the fly. There was a group in MIT which developed a class of code which was very nicely built using Galois Field arithmetic. My student, Nikhil has come up with a construction in which we can solve the general problem with the field size which is the square of the length of the block – so if you are blocking 9 [packets], its square is 81 then the Galois field must have 81 elements. It’s not a problem because in the finite field arithmetic 256 is very common. But in many cases we can reduce it to just a block which is 9 and you can’t do it with less. That’s a very recent result. He is just writing it up for a paper. These are called streaming codes.

Isn’t it expensive to run erasure codes on a channel, let us say for your streaming code which tries to optimize redundancy for fault tolerant transmission of packets.

We have been looking into this aspect. First, it depends upon what is the block length of the code. For many types of erasure packets there is a model that says that we are building this code to handle certain types of erasures. Take a window with a stream of packets. One is a burst erasure where you can have a burst of these packets dropping. Second, you can have smaller number of random erasures. If you make the block length small, the algorithm to handle a burst is very simple but for certain random erasures we have to do an equivalent of decoding a RS code but again for short block length we can improvise. So far, we have just shown that such a code exists, and field size is small. What’s ahead of us is to minimize or optimize the field size.

We are addressing a channel model just now and we need to work on a lot of things.

Another thing we are interested in is retrieving private data which involves retrieving information from a database without letting the database know what information you are retrieving. Computer scientists have shown us how to do it. But a coding theorist says that the computer scientists are good but they are taking up a lot of storage, we can do erasure coding to reduce the storage and still do what computer scientists allow you to do. These are called private information retrieval codes.

Then there is something called coding for computation which brings us to the Map Reduce operation in large datasets, say, a Hadoop database. Suppose you are doing a word count on a huge file, you break it up into a hundred fragments and do the word occurrence on each of these and add them all up to get the total. The Map phase is where you compute the number of occurrence and the Reduce phase is where you collect them together. In between Map and Reduce there is the Shuffle phase where erasure coding can drop the shuffle time by duplicating the computation and bringing down the operation from 900 seconds to 270 seconds in one experimental instance. That’s the kind of returns one can expect. There is a USC professor who has done a nice job of showing how erasure coding can be done here. We are looking at that.
After distributed storage, these are some of the novel work that is going on.

How can someone get into erasure coding?

I had my NPTEL lectures some years ago on erasure coding. For regenerating code, we wrote a 45 page survey which was published in Science Channel. It’s just been published in October of this year. It gives an overview of the field. This is the most detailed survey of this field. I believe this is accessible online.

There is another angle, the complexity angle. If you are willing to give enough computational resources and computational complexity can be minimized. If you can come up with an algorithm, someone can come up with an implementation. For that you should know about the various thresholds and bounds. You should be aware of tradeoff and find the codes that operate on it.