Lab3.html

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/><meta name="exporter-version" content="Evernote Mac 7.2.2 (456839)"/><meta name="created" content="2014-04-20 06:26:32 +0000"/><meta name="updated" content="2014-05-10 14:33:36 +0000"/><title>Lab 3: Cache for Locks</title></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<p style="text-align: left;"><b><span style="font-size: 24pt;">Lab-3: Cache for Locks </span><span style="font-size: 24pt; text-align: center;"> </span></b></p>
<div>
<h3><span style="font-size: 24pt;"><b>Due: 11-19-2018 23:59 (</b><b>UTC</b><b>+8)</b></span></h3>
</div>
<div><span style="color: #fe1701; font-family: Verdana;"><strong>Important update: we have updated the rpc library (librpc64.a). To avoid annoying git-merge issues, you can fetch the git repo in a new directory and replace your librpc64.a with the new one. The new rpc library will handle loss of rpc packets and provide exactly-once guarantee.</strong></span></div>
<div><span><strong>% mkdir newdir; cd newdir; (enter a clean directory)</strong></span></div>
<div><span><strong>% git clone https://ipads.se.sjtu.edu.cn:1312/lab/cse-2018-fall.git;</strong></span></div>
<div><span><strong>% cd cse-2018-fall; git checkout lab3;</strong></span></div>
<div><span><strong>% cp rpc/librpc64.a path-to-your-lab-dir/rpc/;</strong></span></div>
<div><span style="font-size: 24pt;"><br/></span></div>
<p><span style="font-family: Verdana;"><strong style="font-size: 24px;">Introduction</strong><br clear="none"/></span></p>
<p><span style="font-family: Verdana;">In this lab you will build a lock server and client that cache locks at the client, reducing the load on the server and improving client performance. For example, suppose that an application using YFS creates 100 files in a directory. Your Lab 2 yfs_client will probably send 100 acquire and release RPCs for the directory's lock. This lab will modify the lock client and server so that the lock client sends (in the common case) just one acquire RPC for the directory's lock, and caches the lock thereafter, only releasing it if another yfs_client needs it.</span></p>
<div><span style="font-family: Verdana;">The challenge in the lab is the protocol between the clients and the server. For example, when client 2 acquires a lock that client 1 has cached, the server must revoke that lock from client 1 by sending a revoke RPC to client 1. The server can give client 2 the lock only after client 1 has released the lock, which may be a long time after sending the revoke (e.g., if a thread on client 1 holds the lock for a long period). The protocol is further complicated by the fact that concurrent RPC requests and replies may not be delivered in the same order in which they were sent.</span></div>
<div><span style="font-family: Verdana;"> </span></div>
<div><span style="font-family: Verdana;">We'll test your caching lock server and client by seeing whether it reduces the amount of lock RPC traffic that your yfs_client generates. We will test with both <span style="font-family: 'courier new', courier, monospace;">RPC_LOSSY</span> set to 0 and <span style="font-family: 'courier new', courier, monospace;">RPC_LOSSY</span> set to 5.</span></div>
<p><span style="font-family: Verdana;">If you have questions about this lab, either in programming environment or requirement, please ask TA: Mo Zou<span>(</span><span>lostzoumo@gmail.com</span><span>)</span>.<br clear="none"/></span></p>
<p> </p>
<hr/>
<p> </p>
<p><span style="font-family: Verdana;"><strong>Getting started</strong><br clear="none"/></span></p>
<div><span style="color: #fe1701; font-family: Verdana;"><strong> </strong></span></div>
<div><span style="color: #fe1701; font-family: Verdana;"><strong>At first, please remember to save your lab2 solution:</strong></span></div>
<div><br clear="none"/></div>
<div><span><strong>% cd lab-cse/lab</strong></span></div>
<div><strong> </strong></div>
<div><span style="font-family: Verdana;"><strong>% git commit -a -m “solution for lab2” </strong></span></div>
<div><span style="font-family: Verdana;"> </span></div>
<div><span>Then, pull from the repository:</span><span style="font-family: Verdana;"><br clear="none"/></span></div>
<div><br clear="none"/></div>
<div><span><strong>% git pull</strong></span></div>
<div><span>remote: Counting objects: 43, done.</span></div>
<div><span style="font-size: small;"><span style="font-family: Verdana;">…</span><br clear="none"/></span></div>
<div><span style="font-family: Verdana; font-size: small;"><span>...</span></span></div>
<div><span style="font-family: Verdana; font-size: small;"><span> * [new branch]      lab3      -&gt; origin/lab3</span></span></div>
<div><span style="font-family: Verdana; font-size: small;"><span>Already up-to-date</span></span></div>
<div><span style="font-family: Verdana;"><span> </span></span></div>
<div><span style="font-family: Verdana;"><span>Then, change to lab3 branch:</span></span></div>
<div><span style="font-family: Verdana;"><span> </span></span></div>
<div><span><strong>% git checkout lab3</strong></span><span style="font-family: Verdana;"><span><br clear="none"/></span></span></div>
<div><span> </span></div>
<div><span style="font-family: Verdana;"><span>Merge with lab2, and solve the conflict by yourself (</span></span><span style="font-family: Verdana;">Git may not be able to figure out how to merge your changes with the new lab assignment (e.g., if you modified some of the code that the second lab assignment changes). In that case, git merge will tell you which files have conflicts, and you should first resolve the conflicts (by editing the relevant files) and then run 'git commit -a'<span>):</span></span></div>
<div><span style="font-family: Verdana;"><span> </span></span></div>
<div><span style="font-family: Verdana;"><span><strong>% git merge lab2</strong></span></span></div>
<div><span style="font-family: Verdana; font-size: small;"><span>Auto-merging fuse.cc</span></span></div>
<div><span style="font-family: Verdana; font-size: small;"><span>CONFLICT (content): Merge conflict in yfs_client.cc</span></span></div>
<div><span style="font-family: Verdana; font-size: small;"><span>Auto-merging yfs_client.cc</span></span></div>
<div><span style="font-family: Verdana; font-size: small;"><span>CONFLICT (content): Merge conflict in ifs_client.cc</span></span></div>
<div><span style="font-family: Verdana; font-size: small;"><span>Automatic merge failed; fix conflicts and then commit the result</span></span></div>
<div><span style="font-family: Verdana;"><span>…</span></span></div>
<div><span style="font-family: Verdana;"><span> </span></span></div>
<div><span style="font-family: Verdana;"><span>After merge all of the conflicts, you should be able to compile successfully:</span></span></div>
<div><span style="font-family: Verdana;"> </span></div>
<div><strong><span>% </span><span>make</span></strong><span style="font-family: Verdana;"><br clear="none"/></span></div>
<div><span> </span></div>
<div><span style="font-family: Verdana;">if there's no error in make, 6 executable files <strong> </strong></span><span style="font-family: 'Comic Sans MS';"><strong>yfs_client, lock_server, lock_tester, extent_server, test-lab-3-a, test-lab-3-b </strong></span><span style="font-family: Verdana;">will be generated.</span></div>
<div><span style="font-family: Verdana;"> </span></div>
<div><span style="font-family: Verdana;"> </span></div>
<div><span style="font-family: Verdana;"><strong> </strong></span></div>
<span style="font-family: Verdana;">This will add these new files to your lab directory:</span>
<div>
<ul>
<li><span>lock_client_cache.{cc,h}: This will be the new lock client class that the lock_tester and your yfs_client should instantiate. lock_client_cache must receive revoke RPCs from the server (as well as retry RPCs, explained below), so we have provided you with code in the lock_client_cache constructor that picks a random port to listen on, creates an rpcs for that port, and constructs an id string with the client's IP address and port that the client can send to the server when requesting a lock.</span> <span>Note that although lock_client_cache extends the lock_client class from Lab 2, you probably won't be able to reuse any code from the parent class; we use a subclass here so that yfs_client can use the two implementations interchangeably. However, you might find some member variables useful (such as lock_client's RPC client cl).</span></li>
</ul>
</div>
<div>
<ul>
<li><span>lock_server_cache.{cc,h}: Similarly, you will not necessarily be able to use any code from lock_server. lock_server_cache should be instantiated by</span> <a shape="rect" href="http://lock_smain.cc/" target="_blank">lock_smain.cc</a><span>, which should also register the RPC handlers for the new class.</span></li>
</ul>
</div>
<div>
<ul>
<li><span>handle.{cc,h}: this class maintains a cache of RPC connections to other servers. You will find it useful in your lock_server_cache when sending revoke and retry RPCs to lock clients. Look at the comments at the start of handle.h. You can pass the lock client's id string to handle to tell it which lock client to talk to.</span></li>
</ul>
</div>
<div>
<ul>
<li><span>tprintf.h: this file defines a macro that prints out the time when a printf is invoked. You may find this helpful in debugging distributed deadlocks.</span></li>
</ul>
</div>
<div><span style="font-family: Verdana;"> </span></div>
<div><span style="font-family: Verdana;">We have also made changes to the following files:</span></div>
<div>
<ul>
<li><span>GNUmakefile: links lock_client_cache into lock_tester and yfs_client, and link lock_server_cache into lock_server.</span></li>
<li><span>lock_tester.cc</span><span>: creates lock_client_cache objects.</span></li>
<li><span>lock_protocol.h: Contains a second protocol for the RPCs that the lock server sends to the client.</span></li>
<li><span>lock_smain.cc</span><span>: #include lock_server_cache.h instead of lock_server.h </span></li>
</ul>
</div>
<div><br clear="none"/></div>
<div>
<hr/></div>
<p><span style="font-family: Verdana;"><strong>Your Job</strong><br clear="none"/></span></p>
<p><strong><span style="font-family: Verdana;">Step One: Design the Protocol</span></strong></p>
<p><span style="font-family: Verdana;">Your lock client and lock server will each keep some state about each lock, and will have a protocol by which they change that state. Start by making a design (on paper) of the states, protocol, and how the protocol messages generate state transitions. Do this before you implement anything (though be prepared to change your mind in light of experience).</span></p>
<div><span style="font-family: Verdana;">Here is the set of states we recommend for the client:</span></div>
<div>
<ul>
<li><span>none: client knows nothing about this lock</span></li>
<li><span>free: client owns the lock and no thread has it</span></li>
<li><span>locked: client owns the lock and a thread has it</span></li>
<li><span>acquiring: the client is acquiring ownership</span></li>
<li><span>releasing: the client is releasing ownership</span></li>
</ul>
</div>
<p><span style="font-family: Verdana;">A single client may have multiple threads waiting for the same lock, but only one thread per client ever needs to be interacting with the server; once that thread has acquired and released the lock it can wake up other threads, one of which can acquire the lock (unless the lock has been revoked and released back to the server). If you need a way to identify a thread, you can use its thread id (tid), which you can get using pthread_self().</span></p>
<p><span style="font-family: Verdana;">When a client asks for a lock with an acquire RPC, the server grants the lock and responds with OK if the lock is not owned by another client (i.e., the lock is free). If the lock is not free, and there are other clients waiting for the lock, the server responds with a RETRY. Otherwise, the server sends a revoke RPC to the owner of the lock, and waits for the lock to be released by the owner. Finally, the server sends a retry to the next waiting client (if any), grants the lock and responds with OK.</span></p>
<p><span style="font-family: Verdana;"><span>Note that RETRY and retry are two different things.</span> RETRY is the value the server returns for a acquire RPC to indicate that the requested lock is not currently available. retry is the RPC that the server sends the client which is scheduled to hold a previously requested lock next.</span></p>
<p><span style="font-family: Verdana;">Once a client has acquired ownership of a lock, the client caches the lock (i.e., it keeps the lock instead of sending a release RPC to the server when a thread releases the lock on the client). The client can grant the lock to other threads on the same client without interacting with the server.</span></p>
<p><span style="font-family: Verdana;">The server sends the client a revoke RPC to get the lock back. This request tells the client that it should send the lock back to the server when it releases the lock or right now if no thread on the client is holding the lock.</span></p>
<p><span style="font-family: Verdana;">The server's per-lock state should include whether it is held by some client, the ID (host name and port number) of that client, and the set of other clients waiting for that lock. The server needs to know the holding client's ID in order to sent it a revoke message when another client wants the lock. The server needs to know the set of waiting clients in order to send one of them a retry RPC when the holder releases the lock.</span></p>
<p><span style="font-family: Verdana;">For your convenience, we have defined a new RPC protocol called rlock_protocol in lock_protocol.h to use when sending RPCs from the server to the client. This protocol contains definitions for the retry and revoke RPCs.</span></p>
<div><span style="font-family: Verdana;"><span style="color: #fc2902;">Hint: don't hold any mutexes while sending an RPC.</span> An RPC can take a long time, and you don't want to force other threads to wait. Worse, holding mutexes during RPCs is an easy way to generate distributed deadlock.</span></div>
<div><span style="font-family: Verdana;"> </span></div>
<div><br clear="none"/></div>
<div><span style="font-family: Verdana;">The following questions might help you with your design (they are in no particular order):</span></div>
<div>
<ul>
<li><span>If a thread on the client is holding a lock and a second thread calls acquire(), what happens? You shouldn't need to send an RPC to the server.</span></li>
<li><span>How do you handle a revoke on a client when a thread on the client is holding the lock? How do you handle a retry showing up on the client before the response on the corresponding acquire?</span></li>
</ul>
</div>
<span>Hint: a client may receive a revoke RPC for a lock before it has received an OK response from its acquire RPC. Your client code will need to remember the fact that the revoke has arrived, and release the lock as soon as you are done with it. The same situation can arise with retry RPCs, which can arrive at the client before the corresponding acquire returns the RETRY failure code.</span><br clear="none"/>
<ul>
<li><span>How do you handle a revoke showing up on the client before the response on the corresponding acquire?</span></li>
</ul>
<p><strong><span style="font-family: Verdana;">Step Two: Lock Client and Server, and Testing with RPC_LOSSY=0</span></strong></p>
<p><em><span style="font-family: Verdana;">If you finished your design, or decided to refer to the design presented in the comments of the handout code, please move on.</span></em></p>
<p><span style="font-family: Verdana;">A reasonable first step would be to implement the basic design of your acquire protocol on both the client and the server, including having the server send revoke messages to the holder of a lock if another client requests it, and retry messages to the next waiting client.<br clear="none"/></span></p>
<p><span>Next you'll probably want to implement the release code path on both the client and the server. Of course, the client should only inform the server of the release if the lock has been revoked.</span></p>
<p><span style="font-family: Verdana;">Also make sure you instantiate a lock_server_cache object in lock_smain.cc, and correctly register the RPC handlers.</span></p>
<p><span style="font-family: Verdana;">Once you have your full protocol implemented, you can run it using the lock tester, just as in Lab 2. For now, don't bother testing with loss:</span></p>
<p><span style="font-family: Verdana;"><strong>% export RPC_LOSSY=0<br clear="none"/></strong></span></p>
<p><span style="font-family: Verdana;"><strong>% ./lock_server 3772</strong></span></p>
<div><span style="font-family: Verdana;"> </span></div>
<div><span style="font-family: Verdana;">Then, in another terminal:<br clear="none"/></span></div>
<p><span style="font-family: Verdana;"><strong>% ./lock_tester 3772</strong></span></p>
<p><span style="font-family: Verdana;">Run lock_tester. You should pass all tests and see no timeouts. You can hit Ctrl-C in the server's window to stop it.</span></p>
<p><span style="font-family: Verdana;">A lock client might be holding cached locks when it exits. This may cause another run of lock_tester using the same lock_server to fail when the lock server tries to send revokes to the previous client. To avoid this problem without worrying about cleaning up, you must restart the lock_server for each run of lock_tester.<br clear="none"/></span></p>
<p><span style="font-family: Verdana;"> </span></p>
<p><span style="font-family: Verdana;"><b>Step Three: Testing the Lock Client and Server with RPC_LOSSY=5</b></span></p>
<p><span style="font-family: Verdana;">Now that it works without loss, you should try testing with RPC_LOSSY=5. Here you may discover problems with reordered RPCs and responses.<br clear="none"/></span></p>
<p><span style="font-family: Verdana;"><strong>% export</strong> <span>RPC_LOSSY=5</span><br clear="none"/></span></p>
<p><span style="font-family: Verdana;"><strong>% ./lock</strong><span>_server 3772</span><br clear="none"/></span></p>
<p><span style="font-family: Verdana;"> </span></p>
<p><span style="font-family: Verdana;">Then, in another terminal:<br clear="none"/></span></p>
<p><span style="font-family: Verdana;"><strong>% export RPC_LOSSY=5</strong><br clear="none"/></span></p>
<p><span style="font-family: Verdana;"><strong>% ./lock</strong><span>_tester 3772</span></span></p>
<p><span style="font-family: Verdana;">Again, you must restart the lock_server for each run of lock_tester.<br clear="none"/></span></p>
<p><span style="font-family: Verdana;"> </span></p>
<p><strong><span style="font-family: Verdana;">Step Four: Run File System Tests</span></strong></p>
<p><span style="font-family: Verdana;">In the constructor for your <span style="font-family: 'courier new', courier, monospace;">yfs_client</span>, you should now instantiate a lock_client_cache object, rather than a lock_client object. You will also have to include lock_client_cache.h. Once you do that, your YFS should just work under all the Lab 2 tests. We will run your code against all tests from Lab 2.</span></p>
<p><span style="font-family: Verdana;">You should also compare running your YFS code with the two different lock clients and servers, with RPC count enabled at the lock server. For this reason, it would be helpful to keep your Lab 2 code around and intact, the way it was when you submitted it. As mentioned before, you can turn on RPC statistics using the RPC_COUNT environment variable. Look for a dramatic drop in the number of acquire (0x7001) RPCs between your Lab 2 and Lab 3 code during the test-lab-3-b test.</span></p>
<p><span style="font-family: Verdana;">The file system tests should pass with RPC_LOSSY set as well. You can pass a loss parameter to <a shape="rect" href="http://start.sh/" target="_blank">start.sh</a> and it will enable RPC_LOSSY automatically:<br clear="none"/></span></p>
<p><span style="font-family: Verdana;"><span style="font-family: 'courier new', courier, monospace;">% ./start.sh 5 # sets RPC_LOSSY to 5</span><br clear="none"/></span></p>
<p><span style="font-family: Verdana;"> </span></p>
<p><span style="font-family: Verdana;">If you're having trouble, make sure that the Lab 2 tester passes. If it doesn't, then the issues are most likely with YFS under RPC_LOSSY, rather than your caching lock client.<br clear="none"/></span></p>
<p><span style="font-family: Verdana;"> </span></p>
<p><strong><span style="font-family: Verdana;">Evaluation Criteria</span></strong></p>
<p><span style="font-family: Verdana;">Our measure of performance is the number of acquire RPCs sent to your lock server while running yfs_client and test-lab-3-b.</span></p>
<p><span style="font-family: Verdana;">The RPC library has a feature that counts unique RPCs arriving at the server. You can set the environment variable RPC_COUNT to N before you launch a server process, and it will print out RPC statistics every N RPCs. For example, in the bash shell you could do:<br clear="none"/></span></p>
<p><span style="font-family: Verdana;"> </span></p>
<p><span style="font-family: Verdana;"><strong>% export RPC_COUNT=25<br clear="none"/></strong></span></p>
<p><span style="font-family: Verdana;"><strong>% .</strong><span>/lock_server 3772</span><br clear="none"/></span></p>
<p><span style="font-family: Verdana;">RPC STATS: 7001:23 7002:2<br clear="none"/></span></p>
<p><span style="font-family: Verdana;">...<br clear="none"/></span></p>
<p><span style="font-family: Verdana;"> </span></p>
<p><span style="font-family: Verdana;">This means that the RPC with the procedure number 0x7001 (acquire in the original lock_protocol.h file) has been called 23 times, while RPC 0x7002 (release) has been called twice.</span></p>
<p><span style="font-family: Verdana;">test-lab-3-b creates two subdirectories and creates/deletes 100 files in each directory, using each directory through only one of the two YFS clients. You should count the acquire RPCs for your lab 2 and for your lab 3 <i>(which means you should run test-lab-3-b in your lab 2 code)</i>. If your lab 3 produces a factor of 10 fewer acquire RPCs, then you are doing a good job. This performance goal is vague because the exact numbers depend a bit on how you use locks in yfs_client.<br clear="none"/></span></p>
<p><span style="font-family: Verdana;"> </span></p>
<div><span style="font-family: Verdana;">We will check the following:</span></div>
<div>
<ul>
<li><span>Your caching lock server passes lock_tester with RPC_LOSSY=0 and RPC_LOSSY=5.</span></li>
<li><span>Your file system using the caching lock client passes all the Lab 3 tests (a, b) with RPC_LOSSY=0 and RPC_LOSSY=5.</span></li>
<li><span style="font-family: Verdana;">Your lab 2 code generates about a tenth as many acquire RPCs as your lab 3 code on test-lab-3-b.</span><span style="font-family: Helvetica, Arial, sans-serif;"> </span></li>
</ul>
</div>
<div>
<div>
<hr/></div>
<p><span style="font-family: Verdana;"><strong>Handin procedure</strong><br clear="none"/></span></p>
<div><span style="font-family: Verdana;">After all above done:</span></div>
<div><span style="font-family: Verdana;"> </span></div>
<div><span style="font-family: Verdana;"><span><strong>% make handin</strong></span><br clear="none"/></span></div>
<div><br clear="none"/></div>
<div><span style="font-family: Verdana;">That should produce a file called lab3<span>.tgz</span> in the directory. Change the file name to your student id:</span></div>
<div><span style="font-family: Verdana;"> </span></div>
<div><span style="font-family: Verdana;"><span><strong>% mv lab3.tgz lab3_[your student id].tgz</strong></span><br clear="none"/></span></div>
<div><span style="font-family: Verdana;"> </span></div>
<div><span style="font-family: Verdana;">Then upload lab3_[your student id].tgz file to ftp://SJTU.Ticholas.Huang:public@public.sjtu.edu.cn/upload/cse/lab3/ before the deadline. You are only given the permission to list and create new file, but no overwrite and read. So make sure your implementation has passed all the tests before final submit.</span></div>
</div>
<span>You will receive 80% credit if your software passes the same tests we gave you when we run your software on our machines, other credit will be given you after TA reviewing your lab code.</span>
</body></html>