ICS 332 Spring 2024 | Homework Assignment #2

Homework Assignment #2 – Using strace to perform forensics on the Apache web server [50 pts]

You are expected to do your own work on all homework assignments. You may (and are encouraged to) engage in general discussions with your classmates regarding the assignments, but specific details of a solution, including the solution itself, must always be your own work. (See the statement of Academic Dishonesty on the Syllabus)

How to turn in?

Assignments need to be turned in via Laulima. Check the Syllabus for the late assignment policy for the course.

What to turn in?

You should turn in a single plain text file named README.txt with your answers to the assignment’s questions. NO PDF SUBMISSION ALLOWED.

Environment

For this assignment you only need to consult man pages. You can do so in your own Linux environment (see Assignment #0) using the man command. Linux man pages are also available on-line.

Overall objective

The pedagogic objective of this assignment is twofold:

Understand strace output and be exposed to well-known syscalls
Increase your familiarity with the Linux command-line. You can do everything in this assignment using these powerful and commonplace commands (help is available about these commands on-line, in man pages, on-line, and of course from your instructor and TA):
- cat
- grep
- cut
- wc
- sort
- sed
- tail
- uniq

Exercise #1: Apache web server forensics [10 pts]

A popular web server implementation is provided by the Apache HTTP Server Project. Your company had a machine that ran this web server, but eventually that machine got compromised and was permanently retired. You are tasked with performing some forensics, but unfortunately all log files have been lost. The only thing that remains is an strace output that the system administrator collected for about 6 minutes (the observation period) and saved before the machine was retired.

The strace output was collected from the server using the following command:

strace -f -x -o /tmp/apache2.strace -v -s 1024 -T -ttt apache2

You can use the strace -h command on Linux (or lookup the strace man page) to see the list of all command-line options to strace, so as to fully understand what the above command did when producing this 14,000-line plain text output.

Based on the above output, answer the questions hereafter.

Question #1: Counting syscalls [10 pts]

q1.1 [3 pts]: How many different syscalls are invoked by the web server during the observation period? (that is, if a syscall is called multiple times we count it only once)
q1.2 [2 pts]: Give a one-line, piped Shell command that prints the number of different syscalls. Your answer should look like: cat apache2.strace | ...
q1.3 [3 pts]: What are the top 5 syscalls that are invoked the most frequently by the web server?
q1.4 [2 pts]: Give a one-line, piped Shell command that prints the occurrence counts and names of the top 5 most frequently invoked syscalls, sorted by increasing number of occurrences. (Your answer should look like: cat apache2.strace | ... ). The output produced by your answer should look like:

        101 pineapple
        230 orange
        356 banana
       1231 guava
       2902 mango

Question #2: Processes [10 pts]

Each line of the strace output starts with a PID (Process ID), which is a unique number associated to the process that invoked the system call on that line. The web server uses multiple processes during its execution. (We will talk more about PIDs later this semester.)

q2.1 [3 pts]: How many different processes were active during the observation period?
q2.2 [2 pts]: What is the name of the syscall used to create new processes? (hint: that syscall returns the PID of the newly created process)
q2.3 [3 pts]: How many times is this syscall invoked? Does it make sense? That is, does the number of calls during the observation period correspond to the number of processes that are active during the observation period?
q2.4 [2 pts]: What is the stack size, in MiB, of each newly created process? (hint: read the man page of the syscall you identified in q2.2)

Question #3: PNG Headers [10 pts]

The web server responds to many HTTP GET requests for downloading image files in the PNG format. There is suspicion that the AWS-Logo-for-dark-150x150-1.png image file served by the web server was corrupted.

q3.1 [2 pts]: After how many seconds after the beginning of the observation period is the first GET request for this PNG file received by the web server?
q3.2 [2 pts]: How many GET requests for this PNG files are received in total?
q3.3 [2 pts]: Give a one-line, piped Shell command that prints the number of GET requests for this PNG file? (Your answer should look like: cat apache2.strace | ... )
q3.4 [4 pts]: Is this PNG file corrupted? That is, is its byte content, which is sent back as the answer to the GET request, what’s expected for a PNG file? If your answer is “no” explain.

Question #4: Connected Clients [10 pts]

The web server answers requests sent by clients (e.g., web browsers) that run on various machines during the observation period. The main thing that a web server does is wait for a connection and then “accept” the connection to handle whatever request was sent. The name of the syscall to accept a connection is very intuitive.

q4.1 [2 pts]: How many times does the web server accept a connection from a remote IP during the observation period?
q4.2 [2 pts]: You’ll see that a single process accepts all requests. How many seconds into the observation period was that process created and which process created it?
q4.3 [3 pts]: What is the average request arrival rate during the observation period? (i.e., the average number of requests that arrive per second).
q4.4 [3 pts]: How many different IPs contact the web server during the observation period?

Question #5: The 935.json File [10 pts]

Your boss, for some reason, is particularly interested in the file 935.json.

q5.1 [4 pts]: At time-stamp 1684276562.314195, the web server receives a request for file 935.json (a read syscall). After a few syscalls, the web server sends back data to the client via the writev syscall. Based on reading the man page for this syscall, determine the HTTP data payload overhead, that is, the percentage of bytes sent back that are not bytes of the JSON content requested by the user.
q5.2 [3 pts]: In the call to writev, the size of the 935.json file in bytes is passed as an argument. What syscall was used to determine this size?
q5.3 [3 pts]: On what day was the 935.json file last modified on the web server’s machine?