You are expected to do your own work on all homework assignments. You may (and are encouraged to) engage in general discussions with your classmates regarding the assignments, but specific details of a solution, including the solution itself, must always be your own work. (See the statement of Academic Dishonesty on the Syllabus)
Assignments need to be turned in via Laulima. Check the Syllabus for the late assignment policy for the course.
You should turn in a single plain text file named README.txt
with your answers to the assignment’s questions. NO PDF SUBMISSION ALLOWED.
For this assignment you only need to consult man
pages. You can do so in your own Linux environment
(see Assignment #0)
using the man command. Linux man pages are also
available on-line.
The pedagogic objective of this assignment is twofold:
Understand strace output and be exposed to well-known syscalls
Increase your familiarity with the Linux command-line. You can do everything in this assignment using these powerful and commonplace commands (help is available about these commands on-line, in man pages, on-line, and of course from your instructor and TA):
catgrepcutwcsortsedtailuniqA popular web server implementation is provided by the
Apache HTTP Server Project. Your company had
a machine that ran this web server, but eventually that machine
got compromised and was permanently retired. You are tasked with performing some forensics,
but unfortunately all log files have been
lost. The only thing that remains is an strace output that the
system administrator collected for about 6 minutes (the observation
period) and saved before the machine was retired.
The strace output was collected from the server using the following command:
strace -f -x -o /tmp/apache2.strace -v -s 1024 -T -ttt apache2You can use the strace -h command on Linux (or lookup the
strace man page) to see the list of all command-line
options to strace, so as to fully understand what the above
command did when producing
this 14,000-line plain text output.
Based on the above output, answer the questions hereafter.
q1.1 [3 pts]: How many different syscalls are invoked by the web server during the observation period? (that is, if a syscall is called multiple times we count it only once)
q1.2 [2 pts]: Give a one-line, piped Shell command that prints the number of different syscalls. Your answer should look like: cat apache2.strace | ...
q1.3 [3 pts]: What are the top 5 syscalls that are invoked the most frequently by the web server?
q1.4 [2 pts]: Give a one-line, piped Shell command that prints the occurrence counts and names of the top 5 most frequently invoked syscalls, sorted by increasing number of occurrences. (Your answer should look like: cat apache2.strace | ... ). The output produced by your answer should look like:
101 pineapple
230 orange
356 banana
1231 guava
2902 mangoEach line of the strace output starts with a PID (Process ID),
which is a unique number associated to the process that invoked the system
call on that line. The web server uses multiple processes during
its execution. (We will talk more about PIDs later this semester.)
q2.1 [3 pts]: How many different processes were active during the observation period?
q2.2 [2 pts]: What is the name of the syscall used to create new processes? (hint: that syscall returns the PID of the newly created process)
q2.3 [3 pts]: How many times is this syscall invoked? Does it make sense? That is, does the number of calls during the observation period correspond to the number of processes that are active during the observation period?
q2.4 [2 pts]: What is the stack size, in MiB, of each newly created process? (hint: read the man page of the syscall you identified in q2.2)
The web server responds to many HTTP GET requests for downloading image files in the PNG format. There is suspicion that the AWS-Logo-for-dark-150x150-1.png image file served by the web server was corrupted.
q3.1 [2 pts]: After how many seconds after the beginning of the observation period is the first GET request for this PNG file received by the web server?
q3.2 [2 pts]: How many GET requests for this PNG files are received in total?
q3.3 [2 pts]: Give a one-line, piped Shell command that prints the number of GET requests for this PNG file? (Your answer should look like: cat apache2.strace | ... )
q3.4 [4 pts]: Is this PNG file corrupted? That is, is its byte content, which is sent back as the answer to the GET request, what’s expected for a PNG file? If your answer is “no” explain.
The web server answers requests sent by clients (e.g., web browsers) that run on various machines during the observation period. The main thing that a web server does is wait for a connection and then “accept” the connection to handle whatever request was sent. The name of the syscall to accept a connection is very intuitive.
q4.1 [2 pts]: How many times does the web server accept a connection from a remote IP during the observation period?
q4.2 [2 pts]: You’ll see that a single process accepts all requests. How many seconds into the observation period was that process created and which process created it?
q4.3 [3 pts]: What is the average request arrival rate during the observation period? (i.e., the average number of requests that arrive per second).
q4.4 [3 pts]: How many different IPs contact the web server during the observation period?
Your boss, for some reason, is particularly interested in the file 935.json.
q5.1 [4 pts]: At time-stamp 1684276562.314195, the web server receives a request for file 935.json (a read syscall). After a few syscalls, the web server sends back data to the client via the writev syscall. Based on reading the man page for this syscall, determine the HTTP data payload overhead, that is, the percentage of bytes sent back that are not bytes of the JSON content requested by the user.
q5.2 [3 pts]: In the call to writev, the size of the 935.json file in bytes is passed as an argument. What syscall was used to determine this size?
q5.3 [3 pts]: On what day was the 935.json file last modified on the web server’s machine?