doodle

UCSD CSE29 FA24 Syllabus and Logistics

Basics - Staff & Resources - Schedule - Course Components - Grading - Policies

CSE 29 introduces you to the broad field of systems programming, including 1) the basics of how programs execute on a computer, 2) programming in C with direct access to memory and system calls, 3) software tools to manage and interact with code and programs. All very cool stuff that makes every programmer better!

Basics

Staff Resources

Office Hours Calendar

Schedule

The schedule below outlines topics, due dates, and links to assignments. We'll typically update the material for the upcoming week before Monday's lecture so you can see what's coming.

Week 10 – File I/O and Goodbye 👋🥲

Week 9 – Process Creation and Control 🦃

Week 8 – Allocators and Virtual Memory

Week 7 – Implementing an Allocator

Week 6 – URLs, Servers, and a Bit of Everything

Week 5 – Managing (Heap) Memory

Week 4 – Representations and Memory

Week 3 – Where (Some) Things are in Memory

Week 2 – Number Representations, Sizes, and Signs

Week 1 – Strings, Memory and Bitwise Representations (in C)

Week 0 – Welcome!

  • Announcements

    • Lab attendance is required and a lot happens there, make sure to go to lab.
    • Submit the welcome survey before lab on Tuesday of week 1.
    • Assignments, quizzes, and other things with deadlines will start in week 1.
    • No discussion on Friday of week 0 (discussion starts in week 1).
  • Lecture Materials

Syllabus

There are several components to the course:

  • Lab sessions
  • Lecture and discussion sessions
  • Weekly quizzes
  • Assignments
  • Exams

Labs

The course's lab component meets for 2 hours. In each lab you'll switch between working on your own, working in pairs, and participating in group discussions about your approach, lessons learned, programming problems, and so on.

The lab sessions and groups will be led by TAs and tutors, who will note your participation in these discussions for credit. Note that you must participate, not merely attend, for credit.

If you miss lab, you'll still be held accountable for understanding the relevant material via Exams and Assignments. You can miss 2 labs without it impacting your grade (see Grading below). There is no way to make up a lab, even for illness, travel, or emergencies. Our preference would be to require all 10 labs for an A, and have some kind of excused absences. However, tracking excused absences doesn't really scale, so the “two for any reason” policy is how we handle it. You don't need to justify your missed labs. Contact the instructor if you'll miss more than 2 labs for unavoidable reasons.

Lecture and Discussion Sessions

Lecture sessions are on Monday, Wednesday, and Friday, and discussion sections are Wednesday and Friday. We recommend attending every lecture and one of the two discussion sections.

Weekly Quizzes

Each week there will be a (p)review quiz given on Gradescope or PrairieLearn to both review content you've seen and preview upcoming content, due before labs start on Tuesday at 8am.

You can submit these repeatedly with no penalty up to the deadline. The purpose of this quiz is to make sure everyone has checked in on the concepts we will be using in lab. They are open for late submission until the end of the quarter; see Grading below for how late submissions impact grades.

Assignments

The course has 5 assignments that involve programming and writing. Individual assignments will have detailed information about submission components; in general you'll submit some code and some written work to Gradescope. Two files will have special meaning in this class: in CREDITS.txt you'll put information about who or what helped you with the assignment, and in DESIGN.txt you'll put answers to open-ended written design questions.

For each assignment, we will give a 0-3 score along with feedback:

  • 3 for a complete submission with high code and writing quality with few mistakes, and no significant errors
  • 2 for a complete submission with some mistakes or some unclear writing
  • 1 for a submission missing key components, or with clear inaccuracies in multiple components
  • 0 for no submission or a submission unrecognizable as a partial or complete submission

After each assignment is graded, you'll have a chance to resubmit it based on the feedback you received, which will detail what you need to do to increase your score.

  • For an original score of 0 or 1, you can raise your score to 2 (but not to 3)
  • For an original score of 2, you can raise your score to 3

This is also the only late policy for assignments. Unsubmitted assignments are initially given a 0, and can get a maximum of 2 points on resubmission.

Exams

This course is participating in a pilot study of a computer-based testing facility on campus (see this paper if you're interested in some background).

Exams will take place in AP&M B349, which is a computer lab. You will schedule your exam at a time that's convenient for you in the given exam week, and you will go to that lab and check in for your exam at the time you picked. The exam will be proctored by staff from the Triton Testing Center (not by the course staff from this course). No study aids or devices are allowed to be used in the testing center. You will need only a photo ID and something to write with (scratch paper is available on request).

The Triton Testing Center has shared a document of rules and tips for using the testing center.

The exams will be administered through PrairieLearn; we will give you practice exams and exercises so you can get used to the format we'll use before you take the first one. The exams will have a mix of questions; they will typically include some that involve programming and interacting with a terminal.

There are three exams during the quarter in weeks 3, 5, and 8. On each you'll get a Full Pass (2 points), Partial Pass (1 point), or Try Again (0 points) as your score.

We don't have a traditionally-scheduled final exam for this course (you can ignore the block provided in Webreg). Instead, in week 10, you'll have the opportunity to retake up to two of the exams from during the quarter to improve your score up to a Full Pass regardless of the score on the first attempt. The retakes may be different than the original exam, but will test the same learning outcomes. This is also the only make-up option for missed exams during the quarter: if you miss an exam for any reason it will be scored as 0, and you can use one of your retake opportunities on that exam.

Grading

Each component of the course has a minimum achievement level to get an A, B, or C in the course. You must reach that achievement level in all of the categories to get an A, B, or C.

  • A achievement:
    • 8 or more lab participation (out of 10 labs)
    • At least 12 total assignment points (e.g. any scores that add up to 12: [3, 3, 2, 2, 2], [3, 3, 3, 2, 1], [3, 3, 3, 3, 0], etc)
    • At least 5 total exam points (Full Pass on any 2 of the exams, Partial Pass on the other)
  • B achievement:
    • 6 or 7 lab participation
    • At least 10 total assignment points (e.g. any scores that add up to 10: [2, 2, 2, 2, 2], [3, 3, 3, 1, 0], etc)
    • At least 4 total exam points (2 Full Pass and one No Pass, 1 Full Pass and 2 Partial Pass)
  • C achievement:
    • 4 or 5 lab participation
    • At least 8 total assignment points
    • At least 3 total exam points

Pluses and minuses will be given around the boundaries of these categories the based on quiz performance and to-be-determined cutoffs. We don't publish an exact number for these in advance, but it's consistent across the class. Some general examples: if you complete all the quizzes completely, correctly, and on time, you'll get a + modifier. If you meet some of the criteria for the next higher letter grade but not all, you may get a + modifier (e.g. B+ for 7 lab participation, 12 assignment points, 5 exam points). If you submit no quizzes on time or don't get them done completely or correctly, you will get a - modifier.

Policies

Individual assignments describe policies specific to the assignment. Some general policies for the course are here.

Assignments and Academic Integrity

You can use code that we provide or that your group develops in lab as part of your assignment. If you use code that you developed with other students (whether in lab or outside it), got from Piazza, or got from the internet, say which students you worked with and a sentence or two about what you did together in CREDITS.txt. All of the writing in assignments (e.g. in DESIGN.txt) must be your own.

You can use an AI assistant like ChatGPT or Copilot to help you author assignments in this class. If you do, you are required to include in CREDITS.txt:

  • The prompts you gave to the AI chat, or the context in which you used Copilot autocomplete
  • What its output was and how you changed the output after it was produced (if at all)

This helps us all learn how these new, powerful, and little-understood tools work (and don't).

If you don't include a CREDITS.txt and it's clear you included code from others or from an AI tool, you may lose credit or get a 0 on the assignment, and repeated or severe violations can be escalated to reports of academic integrity violations.

Exams and Academic Integrity

Examples for exams will be posted in the week before they happen. You're free to collaborate with others on preparing for the exam, trying things out beforehand, and so on.

You cannot share details of your exam with others until after you receive your grade for it. You cannot communicate with anyone during the exam.

Quizzes and Academic Integrity

You can work on weekly quizzes with other students.

FAQ/AFQ (Anticipated Frequent Questions)

Can I attend a lab section other than the one I'm enrolled in?

No, please do not try to do this. The lab sections have limited seating and are full. We cannot accommodate switching.

How can I switch sections?

You have to drop and re-add (which may involve getting [back on] the waitlist). Sorry.

Can I leave lab early if I'm done or have a conflict?

The labs are designed to not be things you can “finish”. Labs have plenty of extension and exploration activities at the end for you to try out, discuss, and help one another with. Co-located time with other folks learning the same things is precious and what courses are for. Also, if you need an extrinsic motivation, you won't get credit for participation if you don't stay, and participate, the whole time. We can often accommodate one-off exceptions – if you have a particular day where you need to leave early, it's a good idea to be extra-engaged in your participation so your lab leader can give you participation credit before you leave.

Do I have to come to lab?

Yes, see grading above.

What should I do if I'm on the waitlist?

Attend and complete all the work required while waitlisted (this is consistent with CSE policy).

I missed lab, what should I do?

You cannot makeup missed lab credit (but have a few “allowed” misses). Make sure you understand the material from lab because it will be used on exams and assignments; try to do the parts that don't involve discussion on your own, and review your group's lab notes.

I missed a quiz deadline, what should I do?

You can submit it late until the end of the quarter. Generally we allow lots (think like 1/3 to 1/2) of the quizzes to be late without it impacting your grade, but do take them seriously before lab so you're prepared.

I missed an assignment deadline, what should I do?

Some time after each assignment deadline (usually around 2 weeks) there is a late/resubmission deadline. You can resubmit then. See the assignment section above for grading details about resubmissions.

I missed a assignment resubmission deadline, what should I do?

You cannot get an extension on assignment resubmissions; we cannot support multiple late deadlines and still grade all the coursework on time.

I missed my exam time, what should I do?

Stay tuned for announcements about scheduling make-ups in week 10.

Where is the financial aid survey?

We do this for you; as long as you submit a quiz or do a lab participation in the first two weeks, we will mark you as commencing academic activity.

When are the midterms scheduled?

The midterms will be flexibly scheduled during the quarter using a testing center. More details will come; you will need to set aside some outside-of-class time to do them, but there is not a specific class-wide time you have to put on your calendar.

I have a conflict with the final exam time, what can I do?

The final exam will also be flexibly scheduled during week 10 using the testing center.

Week 1 – Command Line and Running C Programs

Lab Logistics

In this class, you are assigned to lab sections, in which you will be in the CSE basement lab rooms to work hands-on on activities related to the tools and techniques that you’ll be using on programming assignments.

Labs are graded by participation, based on your attendance and engagement with the lab activity. Throughout the lab, you will fill in a shared Google Doc with notes and results from what you try and learn. This will be shared with the group of students around you so you have a collective record of your activities (and so we can comment on your progress!)

Your TA/tutor will share the link to the Google Docs with you at the beginning of the lab. When the lab prompts you to write something in the Google doc, be sure to put your name next to your contributions so that we can confirm your participation!

Meet your Group!

We will split into groups of 6-8 students for discussion. For week 1, you may sit wherever you want and choose who you want to work with. Starting week 2, we will have assigned seating and groups. These groups will be somewhat stable throughout the quarter, though some small changes will likely happen. You will have a tutor or TA assigned to your group for help and discussion.

Your discussion leader (the tutor/TA in your lab) will share a Google Doc with your group where you can fill in notes as you work; this document is only for your group. Your discussion leader will not take notes for you.

Write down in notes: In your groups, share, and note in the running notes document (discussion leaders, you answer these as well!):

  • How you'd like people to refer to you (pronounce your name/nickname, pronouns like he/her/they, etc)
  • Your major
  • One of:
    • A UCSD student organization you're a member of or interested in
    • Your favorite place you've found on campus so far
    • A useful campus shortcut or trick you know
  • Your answer to the following question. Get to know your fellow group members!

If you could wake up tomorrow and be any (non-human) animal, what animal would you choose to be and why? Would you want to be a pet or be free? Have you had any pets growing up?

Github

Much of our work this quarter will happen through Github, which is a platform for sharing code and collaborating on projects. You should make sure you have a Github account – you're free to use an existing one if you already have one. We will make use of the resources available in the Github Student Developer Pack. This may require signing up with a @ucsd.edu email address, or adding it to your account, in order to verify your student status.

Making an Account

For this lab, you do need to make a Github account right away. You do not need the Student developer pack yet, but you should submit the application in lab if possible so that you'll have the needed resources going forward.

Write down in your Google Doc:

For each student, their @ucsd email and their Github username they will use for this class (this is helpful for the TAs to know who is who!), and a link to their Github profile, which looks something like https://github.com/jpolitz/.

Making a Repository

From your Github profile page, click the + in the upper right to make a new repository (you can call it something like lab1). For lab work, your repositories can be public (for assignments we will show you how to make them be private).

Definition: A repository is a place to store code and other files. It has some similarities to a shared folder in Google Drive: it stores a collection of files and folders, and there are ways to share it with others. The main special thing about repositories is that they store a detailed history of changes to files, which turns out to be really important for programming projects.

There are many ways to interact with a repository; we'll see a few today and many more throughout the quarter. After you create the repository, you'll see a page like this:

image

For a first step, you'll make a README file directly on github.com. Click the README link on your empty repository page and you'll be taken to an editor where you can write some text. You can write anything you like for this first section; when you're done, click the Commit changes... button.

Write down in your Google Doc:

For each student, a screenshot of their repository after adding the README file. If anyone ran into errors or issues, describe them!

Github Codespaces

Github has a feature called Codespaces that provide an online environment for the full development process – writing code, running programs, and managing a repository. We'll use this feature throughout the course to have a standardized programming environment for everyone; Codespaces is also similar to many production environments in tech companies and research labs.

For this lab, we'll create a Codespace to do our work. Click the Code button on your repository page, and then click the Create codespace on main button:

image

This will open a new Codespace (sometimes it takes a minute to start up), which will look like this:

image

(A cute name will be randomly generated for each new Codespace you make. Check out the URL to see what name you got!)

The README file you made can be seen in the file navigation on the left. You can click on it to open it in a new editor tab (it may have opened by default when the Codespace started, as well). You can type into it to make edits. You may need to double click on the file in the file navigation to open it in the editor.

Do now: Type some edits into the README file. Then, open a new tab and go find the repository again from your Github page. Do you see the edits to the file there?

Write down in your Google Doc: For each student, a screenshot of the edited text in your Codespace, and a screenshot of the repository on Github showing the contents of the README. (NOTE: there will be a difference in these two screenshots!)

To propagate the changes from our Codespace to the repository, we need to commit and push them. We'll talk in detail about these in future labs and lectures; for now we'll just show you how in the Codespace interface and defer explanations to later.

On the left sidebar there is a “Source Control” icon. Click on this, you'll see a M next to your README file listing, which means “modified”. Click on the + next to the M to stage the changes. Then, write a short message in the text box (something simple like “Edited README” suffices). Finally, click the checkmark icon to commit the changes.

Then, click the three dots next to the checkmark and choose “Push”. This will send the changes to the repository on Github.

Write down in your Google Doc: For each student, a screenshot of the repository on Github showing the contents of the README including the edits.

Github Codespaces Summary

In this intro, you've:

  • Set up your Github account
  • Created a repository on Github
  • Created a file through the Github editing interface
  • Created a Codespace
  • Edited the file in the Codespace
  • Committed and pushed the changes to your repository

You'll use all of this setup many times throughout the quarter, and likely the same or similar steps thousands of times throughout your programming career! 🚀

Terminal and ieng6 login

A working systems programmer spends a fair amount of time at the terminal – a text-based interface to a computer. In this course we'll make heavy use of the terminal, both for running programs and for interacting with the system.

In your Codespace, you can open a Terminal by clicking the Terminal menu and choosing New Terminal. This will open a terminal window at the bottom of the screen. A terminal may already be open at the bottom of the screen from when you created the codespace, if not you can also press Ctrl+Shift+` to open a new terminal (hold down the Control key and press backtick, the key to the left of 1 on the keyboard).

Try running the following commands in your Codespace terminal. To run a command, type it in and press enter.

pwd

ls

ls .

cat README.md

cat README

cat does-not-exist

mkdir myfirstfolder

cd myfirstfolder

ls

ls .

pwd

touch file1.txt

touch file2.txt

ls

rm file2.txt

ls

ls ..

cd ..

pwd

ls

ls myfirstfolder

Discuss in your group and write down in notes: For each command you just ran, include both a copy-paste of output you see at the terminal and any changes you see in the file navigation as observations of the command's behavior. Then, for the commands ls, cd, pwd, touch, cat, do your best to give a general description of the command. If you're not sure, that's okay! We'll talk more about all of these going forward, and after you are done trying this out, a staff member will add some definitions we have for them to your notes doc to make sure we have agreed-upon summaries.

Downloading a File From the Command Line

Run the following command:

curl https://raw.githubusercontent.com/ucsd-cse29/fa24/refs/heads/main/src/lec/week1/hello.c

You should see the contents of an example from class. Then run this command:

curl -o hello.c https://raw.githubusercontent.com/ucsd-cse29/fa24/refs/heads/main/src/lec/week1/hello.c

You should see the file hello.c from class downloaded and saved into your Codespace. You can observe it in the file navigation, and with ls and cat (try all of them!).

Try opening the link from above in your browser.

curl takes a URL (a link) and downloads the contents. The optional -o somefile command-line option or command-line flag specifies an output file to save the contents to.

Definition: A command-line option or command-line flag is a way to give extra information to a program. Options are usually specified with a dash (-) followed by a letter or word. For example, -o is an option for curl that specifies an output file. The full list of options and flags is usually available by doing a search for the program online, or by using the command man (short for “manual”, nothing to do with dudes). Try running man curl at the terminal (the result may “take over” your terminal, you can use the up and down arrows to scroll and q to get back to the command prompt).

Logging into ieng6

As a student, you are assigned an account on the ieng6 server hosted by UCSD. These are similar to accounts you might get on other systems at other institutions (or a future job). We’ll see how to use your machine to connect to a remote computer over the Internet to do work there.

Your account name is the same account name as the one that’s used for your school Google account, i.e. it is the string that precedes “@ucsd.edu” in your school email address. In case you need to check the status of your student account, refer to the UCSD Student Account Lookup page.

Next run this command (with yourusername replaced by your actual username):

ssh yourusername@ieng6.ucsd.edu

Since this is likely the first time you’ve connected to this server, you will probably get a message like this:

$ ssh yourusername@ieng6.ucsd.edu
The authenticity of host 'ieng6.ucsd.edu (128.54.70.227)' can't be established.
RSA key fingerprint is SHA256:ksruYwhnYH+sySHnHAtLUHngrPEyZTDl/1x99wUQcec.
Are you sure you want to continue connecting (yes/no/[fingerprint])?

Copy and paste the one of the corresponding listed public key fingerprints and press enter.

  • If you see the phrase ED25519 key fingerprint answer with: SHA256:8vAtB6KpnYXm5dYczS0M9sotRVhvD55GYz8EjN1DYgs
  • If you see the phrase ECDSA key fingerprint answer with: SHA256:/bQ70BSkHU8AEUqommBUhdAg0M4GaFIHLKq0YQyKvmw
  • If you see the phrase RSA key fingerprint answer with: SHA256:npmS8Gk0l+zAXD0nNGUxr7hLeYPn7zzhYWVKxlfNaeQ

(Getting the fingerprint from a trusted source is the best thing to do here. You can also just type “yes”, as it's pretty unlikely anything nefarious is going on. If you get this message when you're connecting to a server you connect to often, it could mean someone is trying to listen in on or control the connection. This answer is a decent description of what's going on and how you might calibrate your own risk assessment: Ben Voigt's answer.)

After this, you get a prompt to enter your password. This is the same password you use to log into your student account on other websites, like Canvas and Tritonlink. The terminal itself does not show what you type when you enter your password. This is conventionally done for your own security, so that others looking at your screen don’t see it. Just trust that it gets inputted when you type.

Now your terminal is connected to a computer in the CSE basement, and any commands you run will run on that computer! We call your computer the client and the computer in the basement the server based on how you are connected.

The whole interaction will look something like this:

# On your client
$ ssh yourusername@ieng6.ucsd.edu
The authenticity of host 'ieng6-202.ucsd.edu (128.54.70.227)' can't be established.
RSA key fingerprint is SHA256:ksruYwhnYH+sySHnHAtLUHngrPEyZTDl/1x99wUQcec.
Are you sure you want to continue connecting (yes/no/[fingerprint])? 
Password: 
# Now on remote server
Last login: Tue Oct 1 14:03:05 2024 from 107-217-10-235.lightspeed.sndgca.sbcglobal.net
quota: No filesystem specified.
Hello user, you are currently logged into ieng6-203.ucsd.edu

You are using 0% CPU on this system

Cluster Status 
Hostname     Time    #Users  Load  Averages  
ieng6-201   23:25:01   0  0.08,  0.17,  0.11
ieng6-202   23:25:01   1  0.09,  0.15,  0.11
ieng6-203   23:25:01   1  0.08,  0.15,  0.11

To begin work for one of your courses [ cs29fa24 ], type its name
at the command prompt.  (For example, "cs29fa24", without the quotes).

To see all available software packages, type "prep -l" at the command prompt,
or "prep -h" for more options

Then, execute the following commands:

export MODULEPATH=/home/linux/ieng6/cs29fa24/public/modulefiles:/public/modulefiles
module load cs29fa24

You should get the following output:

Tue Oct 01, 2024 11:28pm - Prepping cs29fa24

Now your terminal is connected to a computer in the CSE basement, and any commands you run will run on that computer! We call your computer the client and the computer in the basement the server based on how you are connected.

If, in this process, you run into errors and can't figure out how to proceed, ask! When you ask, take a screenshot of your problem and add it to your group's running notes document, then describe what the fix was. If you don't know how to take a screenshot, ask!

Running Commands, Remotely

Run all the commands from the earlier section again, while logged into ieng6 in the terminal. Include the curl commands as well.

Discuss and write down in notes: For each command, compare its output to what you got in the Codespace. Discuss any differences you see. Which commands work identically? Which seem to have different behavior? Why might that be? What does it mean for a file to be “in” your Codespace vs. “in” your ieng6 account?

Running C Programs

Sometimes we will work in the Codespace, and other times it will be important to do work on ieng6. This mirrors common practice in industry and research labs, where you may do some development locally, while other times you have to configure, edit, or run programs on a specific machine. In this part of the lab we'll see how to build and run a C program (like the one from class) in both places.

At this point you should have hello.c present in your Codespace and in your ieng6 account.

Compiling and Running in Both Places

You can open more than one terminal in your Codespace, and in fact you can see more than one terminal at a time. There is a little icon in the top right of the terminal that looks like a square with a plus sign in it. Clicking that will open a new terminal. You can also drag the tab of a terminal to the side of the screen to create a new panel, and you can see both terminals at the same time. You may want to try that for this activity! New terminals in your Codespace will always open in the context of your Codespace, not logged into ieng6.

Open two terminals, one logged into ieng6 and one in your Codespace.

In both, run the following commands one at a time:

ls
gcc hello.c -o hello
ls
./hello

Write down in notes: What effect did the gcc command have? What did the ./hello command do?

Some definitions:

  • gcc is a compiler for C programs. It takes .c files (and other things we'll see later) as inputs, and generates executable files (like hello) as outputs.
  • An executable file is specially encoded with instructions for a specific (type of) machine. We won't talk in much detail about it in this course, but the way an executable file runs on a computer is one of the the main topics of CSE30. These files can be run without any other tools. When we run hello by writing ./hello, we are directly telling the operating system to run the instructions in that file.

Making an Edit

Next, in your Codespace editor, make an edit to hello.c to change the string message to one of your choice. Save the file, then rerun the commands above in both terminals.

Write down in notes: What was the output in each terminal? What did you observe about the behavior of the program in each place?

When you change the file in your Codespace, the file on ieng6 does not change. You can use cat to verify this; the edit you made is only to the file stored in your Codespace. This is an important concept! Each computer has its own filesystem that stores files and folders. Each terminal is connected to one computer (and to one filesystem) at a time. Knowing which computer you're running commands on is really important for that reason!

It's worth asking how we can get our edits onto ieng6 to run the modified program there. There is a command, related to ssh, called scp, that does just that.

From your terminal connected to your Codespace, run this command (replacing yourusername with your username as before; you will be prompted for your password again):

scp hello.c YOURUSERNAME@ieng6.ucsd.edu:./

Then, in your terminal connected to ieng6, run the following commands:

cat hello.c
gcc hello.c -o hello
./hello

The command scp takes files and copies them to the computer and folder given by the second argument. The : is separating the username and name of the server from the path or folder to copy it into – ./ just means “your home directory”, which is the same folder you see when you log in. scp is a useful command for moving things around between computers. There are many other ways we could accomplish this, but this is one of the most direct.

Write down in notes: For each student, take a screenshot of successfully running the scp command and then using cat to see the new output on ieng6, and put it in the notes.

Summary

  • A repository is a place to store code; Github is a service for storing repositories
  • A Codespace is a web page that connects to a computer at Github that holds your repository and lets you run commands and edit your code
  • A terminal is a text-based interface to a computer.
  • We run commands in the terminal, which have options and flags and arguments. We can look up commands' behavior online or with man
  • commands are programs that can do all kinds of things, from file management (ls, cat, cd) to connecting to other computers (curl, ssh) to compiling and running programs (gcc, ./hello)
  • We can connect to ieng6, which is a collection of computers at UCSD, using ssh. Many times we need to connect to other computers we will use ssh
  • Each computer has its own filesystem, which is a collection of files and folders. We saw the Codespace filesystem and the ieng6 filesystem
  • We can compile C programs using gcc which creates executable files
  • We can run executable files with ./filename
  • The scp command can be used to move files between computers (using the same login as ssh)

Before you leave - Lab Group Preference Form

Please go ahead and fill out this Google form before you leave, this will help us create the seating chart for next week. Link to Google form

More Practice

These are open-ended exercises for you to try and discuss with your group. These are also great conversation-starters for office hours, deeper understanding with friends outside of class, or conceptual questions on Piazza. Feel free to take your results and write down in notes what you found!

  • Commit and push your changes to hello.c Github. Can you figure out how to download the code you wrote from Github to ieng6 using curl? (Hint: inspect the URLs we shared for downloading class code)
  • Try creating a new .c file from scratch that prints something different. Compile and run it in both your Codespace and on ieng6, using your method of choice to move it to ieng6.
  • Try running the scp command without the ./ at the end of the path. What happens?
  • Use scp to copy a file into a different folder on ieng6 (like the one you made with mkdir). What does the command look like?
  • Cause an error in hello.c (change the name of main, or drop a ", or otherwise cause an issue), then recompile and re-run. What does the gcc command output? What does the ./hello command output? Why?
  • Try running gcc without the -o hello part. What happens? What is the name of the output file? Can you run it?
  • Try running gcc with the -o flag with a different name. What happens? What is the name of the output file? Can you run it?
  • Run ls .. after logging into ieng6. What do you see?

Week 2 – Project Management and Testing

Meet your Group!

Meet your more permanent group for labs!

Write down in notes: In your groups, share, and note in the running notes document (discussion leaders, you answer these as well!):

  • How you'd like people to refer to you (pronounce your name/nickname, pronouns like he/her/they, etc)

  • Your github username

  • Your major

  • One of these that you didn't share in week 1:

    • A UCSD student organization you're a member of or interested in
    • Your favorite place you've found on campus so far
    • A useful campus shortcut or trick you know
  • Your answer to the following question. Get to know your fellow group members!

    • Pick a song as the soundtrack of your life! Why you did you choose this song? What’s your favorite genre? Who’s your favorite artist?

Testing

The main activity is around writing good tests, with some other activities around managing your code repository.

Some Tests are Always Better Than None

We ask you for a few categories of tests in PA1, which make sure some basic (and not-so-basic) functionality is working. Each of these has a few ways to write it, and you should definitely write more tests than these.

Write down in notes: To get started, everyone in your group should put a test into the shared doc (there should be an example of what one should look like in the doc already!). A test for this assignment includes both the input file and the .expect file. You shouldn't pick the simplest test you wrote; pick one you think is particularly interesting. If you haven't written any tests yet, write an interesting test!

Then, add all the tests from your group to your PA's tests and try them on your implementation, and show what happens for each.

If you haven't started yet, this is a great time to get your repository set up, get those tests in, and make sure you know how to run the tests. The program that reads from input from the first quiz, or the programs from discussion, are great starting points.

(NOTE: Google docs will automatically turn typed ASCII quotes (",') into fancy quotes (“,”), so be careful when copying quotes out of a google doc into code / test files. You can turn this off in Tools>Preferences>Use Smart Quotes)

Write down in notes: When your group runs the tests on one another's implementations, what happens? Did your test find a bug in anyone else's implementation? Did someone else's test find a bug in yours? (Does this whole experience just make you extremely motivated to start early next time 😬?)

What Makes Tests Good?

That process, and your own work before today, gave you a few example tests. These aren't thorough enough to test everything that needs to be covered by the program. But what does it mean to “cover all the cases”? In general, there is no theoretical limit to the number of strings we can create, so we can't test all possible inputs (also called exhaustive testing). So we can't say “cover all the cases” means that we write a test for every possible string.

Another way to think of this is in terms of implementations. There are many correct and many incorrect (or incomplete) versions of the UTF analyzer that we can imagine. Some of them are yours! Each test we write would pass or fail on each of these implementations. If it passes on an incorrect implementation, we could say that test doesn't “notice” or “witness” that implementation being wrong. If it fails on an incorrect implementation (while passing on all the correct ones), that's great! It means it is a high-quality test that tells us something about a bug in an that incorrect implementation.

Then, a way of thinking about “covering all the cases” is “how many bad implementations would these tests catch?” Of course, there are infinitely many bad implementations (just like there are infinitely many strings), so in general we can't expect perfection. But most incorrect implementations are close to correct ones (they just have a bug or two), and targeting tests at common mistakes that might be made is a useful way to organize our thinking.

To that end, we've handwritten some bad implementations of the UTF8 analyzer. Your job as a group will be to write a set of tests where some test fails on each of the bad implementations (while still being a correct test).

Your lab leader will share a repository with you that all of you can use for this lab (it will be public and you'll be able to access it later).

Everyone should be able to make their own Codespace from this repository – the Codespace is just for you, the repository is shared with your whole group. In the Codespace, spend a few minutes writing some tests and trying them out against the provided implementations. All of them are mostly right, but have some specific bugs related to their names. We made one very obviously wrong; it simply always returns 1 for is_ascii, so any test with a non-ASCII character will fail.

$ cat tests/crab.txt
🦀
$ ./bin/always_ascii < tests/crab.txt
Enter a UTF-8 encoded string:
Valid ASCII: true
... other output

If you run with the testing script, you'll see the failure:

$ ./test_script bin/always_ascii 
Test ./tests/crab.txt failed.
Expected line not found in output:
Valid ASCII: false

Test ./tests/invalid_and_uppercased_ascii_test.txt failed.
Expected line not found in output:
Valid ASCII: false

Test ./tests/is_valid_ascii_test.txt passed.

Your task (as a group) is to write tests to make all of these bad implementations fail using correct tests (e.g. tests that ought to pass on any correct implementation).

Do some work in your individual Codespaces, then commit and sync your tests. Make sure to pick unique names for your test files (consider prefixing with your username) so that the filenames don't conflict.

Can you trigger the bugs in all the bad implementations?

Managing Your Repository

The rest of the lab will have you work on your own repository for PA1 and make progress on the PA. We have some advice about managing your repository that we want you to try.

We recommend making commits, and pushing, to your repository whenever you get to a useful working checkpoint. That could mean that you got a new test to pass, for example, or got the first version of one of the functions written, and so on. We also recommend using .gitignore to manage files that are created by the compiler (but shouldn't be in version control).

  • Practice creating a small commit in lab – add a test and/or some small amount of code, and make a commit and push. In your notes document: put a screenshot of the commit that you pushed to Github.

    • If you've been working this way all along, you can just share a small commit you made earlier.
    • If you're just getting started, this can be your first commit of the main function.
    • If you're in the middle of working, you can make one commit and push with whatever the current state of your work is, and then another small commit

    Notice that you can click on any commit and see the individual files at that point in time! This can help you get back to an earlier state if you're stuck.

  • The file you create with gcc (probably called utf8analyzer or similar) changes every time you compile, and could be entirely different when compiled on different computers. It also doesn't have a meaningful “diff” between versions, since it's a binary file. For these reasons, built files are usually not included in a repository. The file .gitignore in the root of a repository can list files that should be completely ignored by git – it won't include them in the list of files to add or change.

    • Compile your program and show a screenshot of the status of the binary file that was created (for example, in the source control/Git tab of the codespace)
    • Change your .gitignore to include the path of this file
    • Observe that it no longer shows up as a file to add!

    It's really useful to keep repositories organized and tidy. .gitignore is one way to do that, separating the source code of a project from other “build artifacts” that are temporarily important but shouldn't be tracked and saved in the same way as code.

Working on the PA

Armed with your thorough tests, spend some time working on the PA. You might choose to just include some of the tests you've written in your own code at first as you work on invidual milestones.

Feel free to work together, though keep in mind the academic integrity policies and CREDITS.md. In this (and in all things learning-based) focus on what's best for your learning and understanding, not just what's best for finishing the PA. The conceptual knowledge pays off later in the course, on exams, and in your career, getting the code complete for a PA without understanding it has little effect.

Signing Up for An Exam Slot

Next week (week 3) is the first exam. It will have questions in the same platform you've seen in the PrairieLearn quizzes. You need to sign up for an exam slot during week 3, which you can do HERE

This document is the Triton Testing Center's guide: https://tritontesting.ucsd.edu/for-students/scheduling-a-test.html

You should not use lecture or lab time to take the exam; next week will be a normal week of lecture, lab, quiz, and so on with the added exam at some time outside of class.

(You can do this outside of lab time, too, but we wanted to make a point to remind you during lab!)

If you are done early:

  • Implement a Zalgo text generator (https://zalgo.org). Do your own research to figure out how that works.
  • Implement reverse that takes an argument buffer to store the reversed string. If they do that, then they should do reverse_in_place that uses as little extra memory as possible and reverses in place.
  • Research how to go about implementing uppercase/lowercase that includes accented characters. Can you find a library or tool that would help? How do you think that works in software tools like autocorrect that capitalize the first letter of a sentence?

Week 3 – Terminal Usage and Git

Lab Goals

  1. Set up SSH keys for ieng6 and Github
  2. Learn how to edit files on the terminal using vim
  3. Do all the steps for working on a PA entirely on a remote computer (ieng6)
  4. Learn how to manage repositories using git CLI

Lab Tasks

Part 1 – Let's get a Terminal Setup

  • If you're using Windows, install Git for Windows using all the default/recommended settings as suggested by the installation wizard. Mac users have a pre-installed terminal application.
  • Open the Git Bash terminal application (Windows) or Terminal (Mac) and try some commands from Lab 1
  • Try to ls your Desktop directory. Do those files look familiar?
  • Try to ssh into ieng6 with your username and password (just like you did in Lab 1)
  • Help each other if anyone has issues installing or figuring out the terminal.

Discuss and write in notes:

  • What was the working directory of the terminal when you launched it?
  • What is your home directory on this computer?
  • What files and folders are in the home directory?
  • Where do you think files that download from your web browser go? Can you list them with ls? What's the absolute path to that folder?
  • Do any commands work differently than you expect on this computer?
  • Are you able to use ssh with your username and password from your local terminal to log into ieng6 and enter the course-specific account?

Take a few screenshots of what you found particularly interesting, and discuss how this environment differs from the terminal you used on Codespaces.

Part 2 - Setting up SSH Keys for Easy Access to ieng6

With the setup we've used so far this quarter, each time you log in to your course-specific account, you have to type the password. This can get a bit tedious and luckily there is a cool and interesting way to avoid this while still staying secure using SSH keys.

We've labelled each step with whether it should run on [Y] our computer or [i] eng6. Make sure you follow the instructions carefully!

  1. [Y] In your local terminal, run ssh-keygen. This command will generate a pair of SSH keys for you, one public and one private.

  2. [Y] Keep pressing <Enter> until the program shows some text it calls the "randomart image".

    • Note the path where the public key is saved (underlined below).
    • Image
  3. [Y]/[I] Now, log into your remote course specific account on ieng6 with ssh (using your password as usual) DO NOT run the cs29fa24 command to prepare your course-specific environment!

  4. [I] Run mkdir .ssh in the terminal

  5. [I]/[Y] Log out of your remote account by pressing Ctrl-D or typing exit.

  6. [Y] Now, we want to copy the public SSH key you created on your local machine onto your remote account; specifically inside the .ssh directory you just created, into a file called authorized_keys. Scroll up in your terminal to where you were creating the SSH key, find the line where it says: Your public key has been saved in: <path to your public SSH key>, copy the path. Make sure you get the public key file, ending in .pub, here, not the private file.

  1. [Y] Think about a command that will perform the copying of the public key file from your local machine to the .ssh directory on your remote account with the appropriate name (HINT: you used this command in Week 1's lab). Work with your group members if you need help!
If you're really stuck, click here to see the answer From your local computer, run
scp PATH_TO_YOUR_PUBLIC_SSH_KEY USER@ieng6.ucsd.edu:~/.ssh/authorized_keys

Make sure to replace USER with your UCSD username and PATH... with the appropriate path


  1. [Y]/[I] Try to log onto your remote account again, you shouldn’t be prompted for a password anymore. If you are, ask for help and carefully review the steps above with your group. To review:
    • You should have a directory called .ssh on your computer in your home directory.
    • That folder should have a key file created by ssh-keygen and a corresponding .pub version of the file.
    • You should have a directory called .ssh on ieng6 (that you created with mkdir) in your home directory.
    • The .pub file from your computer should be copied to the .ssh/authorized_keys file on ieng6

Add to your notes: A screenshot of you logging into your ieng6 account without a password prompt.

Part 3 - Working in Terminal

So far, we've been primarily using our terminal to compile our C code (with gcc) and run our programs, but we've just scratched the service of what the terminal can do. The terminal is the ultimate gateway into communicating with our computer, and today we're going to dive more into the different ways we can use the terminal to make our lives easier.

3.1 - VIM

vim is a text editor that runs entirely in the terminal. For better or worse, many times developers find themselves with only terminal access to a remote computer, and need to do some programming – that is, editing source code files. A terminal-based editor makes this possible. (As a side effect you can look like hacker from a movie controlling everything without needing a mouse.)

There are a lot of online resources available for vim, but thankfully, the program itself comes with a interactive tutorial (that's pretty neat!). We can access this tutorial with the vimtutor command from our terminal

Task: Open the vimtutor tutorial in your terminal and complete Chapters 1 & 2. Image

Once you have completed the first 2 chapters of vimtutor, you will now be using some of the commands that you learned to correct a bug in a C program that we have written for you.

Task:

  1. [Y]/[I] If you aren't logged into ieng6, log in now. (password-free hopefully!)
  2. [I] Download our buggy C program using curl. The command would look like

curl https://raw.githubusercontent.com/ucsd-cse29/lab3-starter/refs/heads/main/average.c -o average.c

  1. [I] Compile and run the average.c file.
  2. [I] Open the average.c file in vim and read through the program.
  3. [I] Determine the bug in the program and correct it in vim. (Try doing it without introducing a new variable!)

Write in your notes: The keys/commands you are pressing/using while navigating vim

  1. [I] Re-compile and re-run the program to ensure that it now outputs the correct value.

Write in your notes:

  • The original output of the program.
  • What do you think the program is supposed to do/output?
  • What was the bug and how did you fix it?
  • The output of the program after the bug has been fixed.

When you are done, discuss with a partner discuss what was comfortable and what was tricky about correcting the file. Compare the commands you used with other members in your group and note the differences.

3.2 - VIM Telephone

Next, we will see how different members of your group approached fixing the bug in the provided C code. You will be running your group members' vim keystrokes to see the differences in how each of you navigated the vim editor.

Task:

  1. [I] Redownload our C file using a similar curl to the previous section. curl https://raw.githubusercontent.com/ucsd-cse29/lab3-starter/refs/heads/main/average.c -o average2.c.
  2. [I] Now, run the commands that another member of your group used to correct the average.c file. You should find the commands in your shared Google Doc.
  3. Write in your notes: What were some of the key differences you noticed between your keystrokes and your group members?

Part 4 - Github with ieng6

With the bug fixed, we now want to push our changes to our Github repository. Before we can do that though, we need to be able to perform actions against our Github repository from ieng6. This is where SSH keys come in handy once again!

4.1 - Setting up SSH Keys for Github

  1. [Y]/[I] Login to ieng6 as usual (hopefully, without typing a password now!)
  2. [I] Run the command ssh-keygen, and again press <Enter> until the command completes and shows the "randomart image". Just like before, this will put a key file and a .pub version of it into the .ssh directory – this time on ieng6!

Next, we want to add the public key to your Github account. This is like the step of copying the public key to authorized_keys on ieng6, but instead we're copying to Github.

  1. [I] Display the SSH public key generated above using cat <path of your ssh key .pub file> and copy it to your clipboard; you can copy it by highlighting and right-clicking

  2. Open your Github account on the browser.

  3. In the upper right corner, click on your profile photo, then click Settings. Image

  4. In the “Access” section of the sidebar, click SSH and GPG keys.

  5. Click New SSH key or Add SSH key under the “SSH keys” section. Image

  6. Add a “Title” to your key (ex: Aaron's ieng6 machine).

  7. Select the “Key Type” to be an Authentication Key

  8. Copy your public key from the output of the cat command and paste it into the “Key” field Image

  9. Click Add SSH key.

  10. If prompted, confirm access to your account on Github.

Go back to the ieng6 terminal and:

  1. [I] Run the following command to add github.com as a recognized host (this avoids the scary yes/no prompt about accepting new connections the first time you connect)
    • ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts
    • >> means "append stdout of the command to file"
  2. [I] Check your connection by running the following command:
    • ssh -T git@github.com
    • It will say something like "Hi supercoolstudent1234! You've successfully authenticated, but GitHub does not provide shell access."

Now we have an SSH key which can be used to authenticate to GitHub! In addition to using https clone URLs, we can now use SSH clone URLs that look like this:

Image

Crucially, these will allow both cloning and pushing to the repository (as long as your account has access) from ieng6!

4.2 - git CLI commands

So far you have been using the "Source Control" tab in your Github codespaces to commit and push your changes. Since we only have our terminal today, we will be learning how to use the git CLI to do the same thing.

  • Forking a repository

A fork of a repository is a personal copy of the repository that you can make changes to without affecting the original repository. To fork a repository, click the "Fork" button in the top right corner of the repository page.

Task: Create a fork of the Lab 3 starter repository here. Image

  • Cloning with git clone

To retrieve a local copy of our git repository, we use the git clone command. clone takes a link from github (usually beginning with https://github.com or git@github.com).

Task: Use the SSH clone URL to clone your forked repository to your ieng6 account into another new directory. Fix the bug that you fixed earlier in the average.c file again and then proceed!

Image

After we are done making changes to our local branch it's now time to push our changes to our remote branch (in this case github)


  • Getting the status with git status

Let's first run git status, to see the status of our repository. status returns which files are untracked (new) modified (changed) and deleted.

Task: After correcting the buggy C code, add the output of running the git status command in your terminal to your notes.

  • Staging file with git add

When we are done making changes to a file, we "stage" it to mark it as ready to be commited. Using the git add command with the path of the changed file(s) will stage each to be included in the next commit. Using git add . will stage all changed and/or new files in the current directory.

Task: Use git add to stage our corrected C file. Compare the output of git status to the output written in your notes.

  • Committing with git commit

A commit is a package of associated changes. Running the git commit command will take all of our staged files, and package them into a single commit. With the -m flag, we can specify a message detailing the changes of the commit. Without -m, git opens a vim window to write the commit message.

Task: Use git commit to commit our staged changes. Use vim to write your commit message.

  • Pushing to the remote repository with git push

With our commit now made, we can use the git push command to upload our changes to our remote branch (github). If this is the first time you are using git push, git may ask you to set the name and email you want associated with your commits.

  • Viewing the log with git log

Git also keeps a log of all the changes. The git log command prints this log into our terminal. From this log we can see the date, author, and message associated with each commit. The HEAD -> marker points to the latest commit locally, while the origin/ marker points to the latest commit on the remote. After running git push, both of these markers should be on the same line.

Task: After pushing your changes, add the output of git log to your notes.

Week 4 – Compiling and “Building”

Welcome to Week 4!

Take a moment to ensure you have your group's Google Doc pulled up and answer the following question as a group while also filling it out in the Google Doc!

Q: What is your favorite place that you’ve been to? If it’s a travel destination, who did you go there with, and when?

Lab Goals

  1. Set up SSH keys for connecting ieng6 to Github From Week 3 Lab
  2. Practice pushing PA2 code to Github from the command line
  3. Tools for solving Memory errors in C programs
  4. Testing with assert + Makefiles

Pushing PA2 Code to Github from ieng6

Make sure your ieng6 account has Github access.

  1. Run the corresponding prep command for your class (cs29fa24 for Joe's, cs29fa24b for Aaron's) after you are logged in. If you have already started working on your PA2 and had cloned the repository without running this command, you will need to reclone after running this command and continue working. This command sets up your working directory for CSE29 which is where you want to be working!

  2. Get your PA2 code onto ieng6; if you've already done a step, skip it

    • Follow the Github Classroom Link to create your PA2 repository. See the PA2 Assignment for the actual functions to write and instructions for this PA.

    • Log into ieng6 and clone the repository you just made using the Github Classroom Link with git clone [your-repository-link]. Remember to use the SSH clone link!

      Image

    • IMPORTANT NOTE: Make sure to clone the SSH URL and not the HTTPS URL! The Github link should start with git@github.com

    • Change into the directory with your PA2 repository. In the example above, that would be cd pa2-hashing-and-passwords-<your-github-username>

  3. Check status by running git status. This will tell you (from the command-line) what's going in the copy of your code you have checked out in the current working directory. If you haven't done any work yet, you'll see something like:

    On branch main
    Your branch is up to date with 'origin/main'.
    
    nothing to commit, working tree clean
    

    You can also use ls to see what's going on. In a brand-new repository, that will show these files:

    $ ls
    CREDITS.md  DESIGN.md  README.md
    

    This means you haven't edited or created any files since you cloned the code. If you've done some work, you might see all kinds of things here, most likely pwcrack.c, which you may have created.

    IMPORTANT NOTE: If you have already done some work and your file name is not pwcrack.c, use the command mv <your_file_name>.c pwcrack.c to rename it (mv is also used to move files from one place to another in a filesystem). This will be important for future sections of the lab!

  4. Create or edit pwcrack.c. If you already created pwcrack.c and the git status command shows that it is new or modified, there's nothing to do here. If you created it, but it has no modifications, open it and make a small edit (even just adding a comment like /* edit in lab */) If you haven't created it yet, you can create it using vim:

    $ vim pwcrack.c
    

    Then into the file type:

    #include <stdio.h>
    
    int main() {
        printf("did not find a matching password\n");
    }
    

    Then save and quit with :wq. Now git status should show it as a new untracked file:

    $ git status
    On branch main
    Your branch is up to date with 'origin/main'.
    
    Untracked files:
    (use "git add <file>..." to include in what will be committed)
            pwcrack.c
    
    nothing added to commit but untracked files present (use "git add" to track)
    
  5. Add the change you made to “stage” it with git add pwcrack.c. After you do this, git status will show it as ready to commit.

    $ git add pwcrack.c
    $ git status
    On branch main
    Your branch is up to date with 'origin/main'.
    
    Changes to be committed:
    (use "git restore --staged <file>..." to unstage)
            new file:   pwcrack.c
    
  6. Commit the file by using git commit -m "your message here", where your message might just be about adding or editing this file.

    $ git commit -m "created first version of pwcrack.c"
    [main 91d2f60] created first version of pwcrack.c
    1 file changed, 5 insertions(+)
    create mode 100644 pwcrack.c
    $ git status
    On branch main
    Your branch is ahead of 'origin/main' by 1 commit.
    (use "git push" to publish your local commits)
    
    nothing to commit, working tree clean
    
  7. Push your work to Github by using git push origin main. Then you should be able to visit your repository on Github and see the change!

    $ git push origin main
    Enumerating objects: 4, done.
    Counting objects: 100% (4/4), done.
    Delta compression using up to 8 threads
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (3/3), 383 bytes | 191.00 KiB/s, done.
    Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (1/1), completed with 1 local object.
    To github.com:ucsd-cse29-fa24/pa2-hashing-and-passwords-jpolitz.git
       0787d50..91d2f60  main -> main
    $ git status
    On branch main
    Your branch is up to date with 'origin/main'.
    
    nothing to commit, working tree clean
    

Write in your notes: A screenshot of the commit(s) you pushed to Github from your terminal and also a screenshot of the commit(s) on the Github website.

Check with the person sitting next to you to see if they got a similar result. Did you run into any errors? Try understanding and fixing them: maybe you made a typo in a command, or maybe something isn't set up correctly about your connection to Github. Make sure you can create and edit files, add and commit the changes, and push them to Github before moving on.

New Debugging Tools

For this section, you'll be working with the following program:

#include <stdio.h>

// edits string, then returns how many changed
int X_to_Y(char str[]) {
    int num_xes = 0;
    for (int i=0; str[i] != '\0'; i++) {
        if (str[i] = 'X') {
            str[i] = 'Y';
            num_xes += 1;
        }
    }
}

int main(int argc, char *argv[]) {
    for (int i=1; i<=argc; i++) {
        int num_xes = X_to_Y(argv[i]);
        printf("\"%s\": %d 'X'es changed\n", argv[i], num_xes);
    }
}
  1. Read through this program without running it.
  2. In your notes, write down what you think it would output if ran as ./buggy helloX XOXO
  3. Discuss your thoughts with your group, and see if you notice any bugs in it. In your notes: Write down any bugs you think you've fonud.

Getting set up on ieng6

  1. On ieng6, use the following command to curl the above program into a file called buggy.c:
curl -o buggy.c https://raw.githubusercontent.com/ucsd-cse29/fa24/refs/heads/main/src/week4/buggy.c
  1. Then run the following command to compile the program:
gcc -std=c11 buggy.c -o buggy 

Notice the new command-line flag introduced here! We use the -std=c11 flag to tell gcc what version of C we want to compile with. This is necessary since the default version on ieng6 would fail to compile this program due to us declaring i within the for loop.

  1. Now, try running the program with some string arguments:
./buggy helloX XOXO

Oh no! You should see it outputting far too many Y-s and zeroes, and then crashing (Segmentation Fault). Let's see if we can use -Wall to get to the bottom of this.

The -Wall

Many common errors can be caught by the compiler, but a lot of these checks aren't enabled by default. We can ask the compiler to warn us about them with the -Wall flag ("- W(arn) all")

  1. Run the following command to recompile the program with warnings enabled:
gcc -Wall -std=c11 buggy.c -o buggy
  1. This time, gcc should have given you two warnings, read through them. Would they have helped you catch those two bugs?
  2. In your notes: Write down the warnings you got ([-Wxxxxxx])

In general, It's almost always a good idea to have -Wall enabled by default because it will helpfully warn you about many common C errors.

NOTE: One unfortunate side effect of -Wall is that if you ever declare a variable but don't use it, the compiler will warn you about it. This can be helpful to find bugs sometimes, but often ends up being more annoying than helpful. You can disable this by with the -Wno-unused-variable flag:

gcc -Wall -Wno-unused-variable -std=c11 buggy.c -o buggy

Segfaults

Now, we've found two bugs, but why is the program still crashing? If we provide a single argument (argc = 2), ourfor loop iterates until i <= argc, which means it will try to access argv[argc], which is out of bounds.

Unfortunately, when we try to read this as a memory address (char*) , it ends up being an invalid memory address (it happens to be 0, though other values may give similar errors), so the program ended with a segmentation fault, which is a fancy term for “the program tried to access memory it shouldn't and was stopped by the operating system”.

This is pretty different from an error like ArrayIndexOutOfBounds in a language like Java, which gives information like a stack trace and the value of the incorrect index. Because of C's focus on doing only what is necessary and giving low-level access to memory, this kind of mistake is permitted, for better or worse. Aside from causing numerous security vulnerabilities, these memory errors are difficult to debug, since you can't even tell which line of code caused the crash.

Thankfully, there are a few tools that help debug invalid memory accesses.

AddressSanitizer

Let's use the AddressSanitizer to figure out which line of code we crashed on. AddressSanitizer (or ASan) is a tool, built into gcc and other C compilers these days, that turns invalid memory accesses (the kind that give segfaults) into descriptive errors.

  1. Run the following commands (these are needed if you don't run cs29fa24 on login, to work around an old version of gcc)
export ASAN_OPTIONS=symbolize=1:print_legend=0
export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer

  1. Compile the program again with the following option to enable the sanitizer:
gcc -std=c11 -g -fsanitize=address buggy.c -o buggy.asan

The -fsanitize=address part does extra work in the compiler to put special checks in to look for memory errors. The -g option we saw in lecture turns on some debugging information like line numbers in stack traces.

  1. After compiling, rerun the program like before. You should see it output a stacktrace that start something like this, which shows the function calls that led to the error:
$ ./buggy.asan helloX
"YYYYYY": 0 'X'es changed
AddressSanitizer:DEADLYSIGNAL
=================================================================
==303858==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55f690eb0320 bp 0x7fff2e028b50 sp 0x7fff2e028b30 T0)
==303858==The signal is caused by a READ memory access.
... (several lines omitted)
  1. Read through the stack trace and identify the functions that were called. In your notes: Write down the last function called, and what line of buggy.c the error was on.

Now, we can see exactly the path the code took before crashing, making our lives much easier! So, why isn't ASan the default? Mainly because it makes programs run slower. It's great for debugging, or for programs where speed isn't an issue, but if you're trying to hash as many passwords as you can in 10 seconds, it's best to not turn it on! For this reason, it's good to have the option to compile two separate executables, which we called buggy and buggy.asan above.

GDB

Now that we've identified where the crash happens, let's use gdb to help us debug it further. gdb (which you've already seen a bit of in lecture / discussion) is a command-line debugger that allows us to step through our program line by line and inspect the values of variables at each step. It can also give us backtraces through our program to show us where the error may have occurred.

For more GDB resources, see the Week 3 Reading on GDB Debugging, as well as the GDB Quick Reference Guide and GDB Tips and Tricks linked under the Week 3 Readings & Resources :-)

  1. After compiling with the -g flag, load our program into gdb
$ gdb buggy.asan
  1. To see our source code while in gdb, run the command:
(gdb) layout src
  1. Before running the program, set a breakpoint in the code before the program reaches its error. Here, the error is on line 6, so type in your gdb terminal:
(gdb) break buggy.c:6
  1. After setting the breakpoint, run the program from inside gdb with an argument (everything after run gets passed as command line args to your program)
(gdb) run helloX

If done correctly, gdb should print out the for(int i=0... line in the program. The breakpoint paused the execution before this line was executed.

  1. Try the following commands

    • info args: tells us what arguments were passed into this function,
    • info locals: tells the values of the local variables at this point of the execution.
    • p s[3]: this lets your print out the values of expressions, not just variables. Try some others!
      • Can also specify the format you want, e.g. p/x var1 prints the value of var1 in hex, p/t prints in binary).
    • bt should print out a stacktrace similar to what ASan did.
  2. Now, use the c command to continue executing until it hits the breakpoint a second time

  3. Use the commands you learned to print the variable that buggy.c is attempting to access (and crashing on).

  4. In your notes: Write the command you used, and what gdb printed out

    • NOTE: If your program runs too far ahead, you can always ask gdb to restart it by using run ARG1 ARG2 ... again

Testing with assert and Makefiles

In this section, we will be working on one of the functions that is part of PA2 and provide a cool way to incorporate unit testing into your main() functions without needing to change/comment lots of lines of code!

Finally, we will introduce a convenient way to compile our programs using Makefiles and the make utility!

Testing with assert

Task: While on your ieng6 account within your PA2 repo directory perform the following steps:

  1. Open your pwcrack.c file that we created/renamed in the previous portion of the lab. We are providing an incomplete implementation of the hex_to_byte function. Add the following function to your C file and complete it by determining what the function should return. To see the function's intended behavior, refer to the PA2 spec found here.

    NOTE: If you have already implemented your own hex_to_byte function, you can skip this step.

    #include <stdlib.h>
    #include <stdio.h>
    #include <stdint.h>
    
    uint8_t hex_to_byte(unsigned char h1, unsigned char h2) {
        uint8_t x = 0;
        uint8_t y = 0;
    
        // Convert h1 to a decimal value
        if (h1 >= '0' && h1 <= '9') {
            x += h1 - '0';
        }
        else if (h1 >= 'a' && h1 <= 'f') {
            x += h1 - 'a' + 10;
        }
    
        // Convert h2 to a decimal value
        if (h2 >= '0' && h2 <= '9') {
            y += h2 - '0';
        }
        else if (h2 >= 'a' && h2 <= 'f') {
            y += h2 - 'a' + 10;
        }
        // TODO: Determine what the function should return
    }
    
  1. Now we will need to add a main() function to this file to give the program an entrypoint. Copy the following main function into your pwcrack.c file.

    NOTE: If you already have a main() function from working on the PA, you can keep it as is and focus on the unit testing portion of the code given below.

    int main(int argc, char **argv) {
    
        // UNIT TESTING SECTION
        int test = 0; // Set this variable to 1 to run unit tests instead of the entire program
        if (test) {
            assert(hex_to_byte('a', '2') == 162);
            // ADD MORE TESTS HERE. MAKE SURE TO ADD TESTS THAT FAIL AS WELL TO SEE WHAT HAPPENS!
            
            printf("ALL TESTS PASSED!\n");
            return 0;
        }
    
        // MAIN PROGRAM SECTION
        if (argc < 2) {
            printf("Error: not enough arguments provided!\n");
            printf("Usage: %s <byte 1 in hex> <byte 2 in hex> ...\n", argv[0]);
            printf("Example: %s a2 b7 99\n", argv[0]);
            return 1;
        }
    
        int i = 1;
        for (; i < argc; i++) {
            printf("Value of hex byte %s is %d\n", argv[i], hex_to_byte(argv[i][0], argv[i][1]));
        }
    }
    

    You will note that this main() function serves two purposes:

    • It parses your command-line arguments to your program and runs your password cracker. In this lab, we will be demonstrating the use of command-line arguments to only test the hex_to_byte function but you will extend this to work for your complete password cracker implementation.
    • The if statement at the beginning of the main() function is a way to unit test specific functions as you work on your PA. Setting the value of the test variable to 1 will run the code within the if statement that should have several assert() statements to test your individual functions (we have provided an example test for hex_to_byte). If all your tests pass, the program will print ALL TESTS PASSED! and exit successfully. If not, you will receive an error when you run your program with int test = 1;.
  1. Commit and push your changes to Github. Make sure to include the new files you created in your commit.

In your notes: Briefly explain what you had the hex_to_byte function return and why?

Makefiles

Thus far, we have been typing out the various gcc commands needed to compile our C programs. We have also learned some command-line flags that can be useful when compiling our programs like -g for debugging information, -std=c11 to use the correct C version when compiling on ieng6 and -fsanitize for memory errors. Wouldn't it be nice if we could just type one command and have all the necessary files compiled with the correct flags? This is where make comes in.

make is a utility for building C programs. A Makefile is a file that contains a set of rules that tell the make utility what commands it should run for that rule. We will be creating a Makefile with rules to compile our C programs with the correct flags and dependencies.

Task: Perform the following steps to create a Makefile for your PA2 project:

  1. In your PA2 repository, create a new file named Makefile and add the following content to it:

    all: pwcrack
      
    pwcrack: pwcrack.c
        gcc -std=c11 -Wall -Wno-unused-variable -fsanitize=address -g pwcrack.c -o pwcrack
    
    clean:
        rm -f pwcrack
    

    IMPORTANT NOTE: The indentation in the Makefile is done with a TAB character, not spaces. Make sure to use a TAB character when indenting the commands in the Makefile.

    NOTE: If your file names are different than what is given above at this point, change the rules to match the file names you are using.

  2. Now, in your terminal, run make all. This will perform the same operation as running gcc -std=c11 -fsanitize=address -g pwcrack.c -o pwcrack.out without needing to type or remember that command! You should see the pwcrack file in your directory. However, if you run make all again, you will see that it doesn't recompile the files because they have not changed since the last time they were compiled. This is one of the benefits of using make to compile your programs.

  3. Now run make clean. This should delete the executable file.

  4. Next, run just make with no arguments.

In your notes: What happened when you ran make without any arguments? Why do you think this is the case?

We now have a clean and simple way to compile our C programs with the correct flags and dependencies using make. This will be very useful as we continue to work on our PA2 project and need to compile our programs multiple times.

If you are done early

  • Work on PA2! Use the remaining time in lab to make progress on PA2 along with your group members and your tutor!
  • Try adding a separate rule that would build pwcrack without address sanitizer enabled. Rename the rules so that one generates pwcrack.asan, and the other generates pwcrack. You may need to adjust all and clean as well to work correctly with this.
  • If you are done with PA2 here is a fun extension you could work on: Extend your password cracker such that when given a hash, it will attempt to go through all possible passwords of length 6 until the password is found.

Week 5 – Building a Simple Server

Welcome to Week 5!

Take a moment to ensure you have your group's Google Doc pulled up and answer the following question as a group while also filling it out in the Google Doc!

Q: If you could pick up a new skill in an instant what would it be and why?

Lab Goals

  1. Let's Have a Chat
  2. Headers and Servers
  3. Implementing the Number Server

Part 1: Let's Have a Chat

Entering lab, you may have noticed something interesting running on the projector. We've started up a demonstration of the next PA to play around with. For the next PA, you'll be creating your own chatroom server, where users can send and react to other's messages.

We've included a client on the ieng6 machines that you can use to communicate in our chatroom.

We can run our chat-client with: (where MACHINE is ieng6-201 for Joe's lab, or ieng6-202 for Aaron's lab)

/home/linux/ieng6/cs29fa24/public/chat-client USERNAME MACHINE 8000

Task:

  1. Choose a username and send a message to your lab room's chat server.
  2. Try reacting to a member of your lab group's message. Try reacting with an emoji 😁
  3. In your notes: Add a screenshot of your message and reaction.

The client is cool, but there are many other ways we can also interact with the chat server: In the past we've been using curl primarily to download single files from the internet. We can also use curl to send messages!

To see all the messages of the chat server we can curl the url:

curl 'MACHINE.ucsd.edu:8000/chats'

To send a message to the chat server, we can curl the url:

curl 'MACHINE.ucsd.edu:8000/post?user=USERNAME&message=MESSAGE'

Task:

  1. Try sending a message to the chatroom server using curl

To react to another user's message, we can curl the url:

curl 'MACHINE.ucsd.edu:8000/react?user=USERNAME&message=REACTION&id=ID'

REACTION can be any 16 byte message and ID is the id of the message you wish to react to.

Task:

  1. Try reacting to your group member's message from your web browser (Chrome, Safari, Firefox, etc.)
  2. In your notes: Add a screenshot from your web browser of MACHINE.ucsd.edu:8000/chats.

Part 2: Headers and Servers

In parts 2 and 3, you'll be writing your own simple webserver that keeps track of and lets you modify the value of a number (which should be a bit easier than writing a whole chat server).

2.0 Getting started

Task

  1. If you haven't done so already, log onto ieng6, then run the corresponding prep command for your class (cs29fa24 for Joe's, cs29fa24b for Aaron's).
  2. Fork the lab5 starter repo, then clone it. (If you're unsure how, you can refer back to how we did this in Lab 3 and Lab 4)

Header files

Looking into the lab5 repository, you'll find two source files (number-server.c, http-server.c), and one header file (http-server.h). In the previous labs and PAs for the class, our code has consisted of a single .c file, containing a main() and all associated functions. As programs get more complex, it becomes increasingly important to organize the code by seperating it into multiple files.

http-server.h should look like this:

#ifndef HTTP_SERVER_H
#define HTTP_SERVER_H

#include <stdlib.h>
// ... more #includes

void start_server(void(*handler)(char*, int), int port);
#define BUFFER_SIZE 2048

#endif

This header file contains the declaration of start_server, which includes all info needed for C to call a function, which are its arguments and return type. However the definition of the function (i.e. the actual code that it runs) are in http-server.c. To call this function, number-server.c needs the declaration, which it gets by including http-server.h.

EXTRA INFO 📝:

  • It's not necessary to fully understand the implementation of start_server, but the idea behind the function is available in the pa3 writeup.

  • Every function in a C file should have its visibility defined in one of two ways:

    • Public functions (meant to be called from other source files) need to be declared in a header file.
    • Private functions (typically helpers or utilities) should always be declared as static (e.g, static int foo() {...}) to specify that they shouldn't be visible to other .c files.
  • The ifndef/define/endif at the start and end of a header are an include guard

2.1 Compiling with multiple files

When compiling a program with multiple source files, all source (.c) files must be included as arguments to gcc. NOTE: you don't need to specify the header files to gcc; they are automatically pulled in by #include statements.

Task

  1. Try compiling the number_server using gcc. (Note: you may need to pass in the -std=c11 flag)
  2. In your notes: Write down what happens when you don't include the http-server.c file in gcc. What happens when you ONLY include the http-server.c file in gcc?
  3. Once you're successfully able to compile this program, create a Makefile so that you can compile number_server just by typing make.
    • (Note: You can refer to how we did this in the last lab)
    • You may find it useful to also include the debugging flags from last lab: -Wall -Wno-unused-variable -fsanitize=address -g
  4. Run the number-server you just compiled; it should say something like "Server started on port PPPPP".
  5. Open your browser and go to MACHINE.ucsd.edu:PPPPP where MACHINE is the ieng6 server you are on like ieng6-201, ieng6-202, or ieng6-203; PPPPP is the port number from your server like 8000. Your browser will show an error page, but you should see some output from number-server in your terminal
  6. In your notes: Add a screenshot of the terminal output from number-server

Congrats, you've got a working server running!

2.2 String Hunt

For the purposes of this lab, we'll also need a few more functions from the C <string> library in order to build our backend.

In your notes: For each of the following functions, add its signature and write a brief description of what the function does. You can look it up in the manpages by typing (man XXX) in the terminal, where XXX is a function name. For example, typing man strstr in the terminal would give you the manual for the strstr function.

  • strstr

  • sscanf

  • snprintf

2.3 HTTP 404 Response

Currently our number server isn't responding to requests at all. In part 3 of the lab, you'll make your server handle three kinds of requests, but for now lets start by responding with a '404 Page Not Found'.

HTTP Requests

When we request a given url e.g. ieng6-201.ucsd.edu:8000/post?user=joe&message=hi, our web-client (a browser, curl, or our fancy-client) does some work on our behalf.

The start of the url gives us the address (ieng6-201.ucsd.edu) and port (8000) of the server our client connects to. Once connected, it sends an HTTP request, which looks something like this:

GET /post?user=joe&message=hi HTTP/1.1
User-Agent: curl/7.29.0
... many other headers

The string after GET is the path of the request, and that's the part that will change depending on the type of request. In part 3, you'll be examining this path to determine how to respond.

HTTP Responses

Once we've decided what to do with a request, we can send a response back . The second argument to handle_response is an int that is a file descriptor (called the "client socket") for us to write a response to. To send data back to the client, we can use the write function:


write(client_sock, message, message_size);
//    int          char[]   size_t

An HTTP response for an unknown path should return a "404: Not Found" response that looks like the following:

HTTP/1.1 404 Not Found
Content-Type: text/plain

... 404 message to display to user ...

Note: A nuance of this is that the newlines in HTTP responses are not just \n; they are required to be \r\n (which, interestingly, is how line endings work on Windows). Some software will work with just a \n for line breaks, but the correct thing to do is use \r\n, so in a string literal the HTTP header above would look like

"HTTP/1.1 404 Not Found\r\nContent-Type: text/plain\r\n\r\n"

Task

  1. Use curl to access MACHINE.ucsd.edu:PPPPP with the numeric server running. How does curl respond?

  2. In your notes: Add a screenshot of your browser accessing MACHINE.ucsd.edu:PPPPP with the numeric-server still running. How is this different than if the server was not running?

  3. In number-server.c, modify handle_404 to send an HTTP response back to the client. You'll need to call write twice, once to write the response header, and once to write out the body of the response.

    IMPORTANT NOTE: To see the response header being printed, you will need to add the -v flag to the curl command, i.e.

    curl -v MACHINE.ucsd.edu:PPPPP
    
  4. Once you've done this, connect to your server from a browser and make sure you can see the error message, including the path of the url you tried to connect to.

  5. In your notes: Add a screenshot of the 'Not Found' page in your browser. How is this different from the previous error page?

  6. Try connecting to different paths on your server. Does you notice anything interesting if you put in add special characters to the path? Spaces? Emoji?

Part 3: Implementing the Number Server

Now that we've completed our walkthrough of the basics of the http-server library. It's time to use the library to create our own server.

The number-server needs handle 3 different routes:

  • /shownum

    This function responds to the client socket with the current value of num.

  • /increment

    This function increments the value of num by one and responds to the client with the new value of num.

  • /add?value=NNN

    This function increments the value of num by the value encoded in NNN (a decimal integer) and responds to the client with the new value of num

We'll walk you through the first one, then you can try the others on your own.

3.1 Getting started

Successful Responses

However, now that we're adding support for /shownum and friends, we need to be able to send back a successful response, not just a "page not found". The HTTP status code for success is 200, so a successful response would look something like this:

HTTP/1.1 200 OK
Content-Type: text/plain

... body of response ...

As a string literal, the response header would look like this:

"HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\n"

Path parsing:

We also need to ensure that our /shownum, /increment, and /add responses only trigger when the correct path is requested, a path like /badpath should still result in a 404 response. Using a function like strcmp or strstr can be useful to check if the path matches a given string.

Maintaining state

In the lab5 starter, num is a global int variable, so functions can read/write its value.


After completing the /shownum route, curl the route with:

curl localhost:<port>/shownum

You should see something like:

Your number is 0

Congrats! You officially got a web server running! Now, you have all the tools to implement the other functions of number-server.

3.2 Query parameters:

This section might be useful once you start working on the /add endpoint.

When sending a GET request in HTTP, any data accompanying the request needs to be sent as part of the path. This data is encoded as query parameters which follow a ? in the path and usually take the form of

?keyA=foo&keyB=bar&...

For the case of the /add endpoint, we need to encode the amount to add as part of the query string. A valid request to /add looks like

/add?value=<number>

Parsing query parameters correctly can be tricky, but in this case we only have one parameter to worry about, which makes it easier.

Task:

  1. What string function could we use to scan through our path and extract the number?
  2. Try using this function to print out the number after value= from the url path
  3. Implement the /add endpoint for your number-server
  4. In your notes: Put a screenshot of your number-server working with the endpoints you implemented.
  5. In your notes: Think of a request you could make that would cause your numeric-server to crash.

Week 6 – Work on PA3 (Chat Server)

Work on your PA3! Your tutors are here to help.

You are welcome to work with your classmates, share ideas, and so on.

In your notes doc, you should be able to show by the end of the lab:

  • You loading a URL via curl or browser a request on one of your classmates' chat servers that shows adding at least one chat, or part of a chat.
  • One of your classmates loading a URL via curl or browser on your chat server that shows adding at least one chat, or part of a chat.

If you think you're all the way done with PA3, here are some suggestions:

  • Implement threads in your chat server.
  • Right now the chat server allows anyone to post with any username. Brainstorm ways to have a user who uses a username first “claim” it somehow, so others can't. HINT: add a path for “registering” a username and share a secret between the server and the user to recognize them in the future.

Week 7 - More GDB + Valgrind

Welcome to Week 7

The goal of this weeks' lab is to have you perform various interesting tasks involving using GDB and Valgrind!

To get started, login to ieng6 and clone this repository which contains the activities for today's lab and cd into the lab7-starter directory.

If you see the following upon running ls, you're ready to get started!

[ygarde@ieng6-203]:lab7-starter:505$ ls
README.md  ctf  gradebook  valgrind

Lab Goals

  1. Crack the password
  2. Change your Grade with Buffer Overflows
  3. Addressing your Memory Errors

Crack the Password

The relevant files for this part of the lab are found within the password-crack directory in the lab7-starter repository you cloned earlier.

You are provided with an executable binary that performs a similar task to your password cracker from PA2. It randomly selects a word as the password (this word is unique to your student ID) and displays the hash of this word and then prompts you to guess the password. You are also given the main.c file that calls the function that generates the secret word and hashes it. However, you do not have the source code for the word generator itself.

Your goal is to crack the password given only the hash and the main.c file.

./ctf
Generating and hashing secret word...
the hash is 229478424054275
What is the password?: 
yash
Password not found, Try again 
hello
Password not found, Try again 
return
Password not found, Try again
.
.
.

TASK:

  1. Using tools you have learned about in CSE 29, determine what your random secret word is and crack the password!

In your notes: Describe the tool(s)/process you used and provide a screenshot of you cracking the password!

Change your Grade with Buffer Overflows

The relevant file for this part of the lab is found within the gradebook directory in the lab7-starter repository you cloned at the start of lab.

In this directory you will find an executable file called gradebook. Running this program will display your (fictional) assignment grade for Assignment 1 and ask whether you would like to leave a comment for a regrade:

./gradebook
Welcome to your gradebook!
=========== ASSIGNMENT 1 SUBMISSION ===========
Due Date : 10/24
Grade    : D (65/100)
===============================================

If you'd like to leave a comment on your submission, please enter it here (or Ctrl + D to end):
> 

After entering a comment, you will see an updated assignment entry printed out with your comment and the option to edit the comment:

If you'd like to leave a comment on your submission, please enter it here (or Ctrl + D to end):
> please regrade my submission
=========== UPDATED ASSIGNMENT 1 SUBMISSION ===========
Due Date : 10/24
Grade    : D (65/100)
Comment  : please regrade my submission
=======================================================

If you'd like to edit your comment, please type your new comment here (or Ctrl + D to end):
> 

A confidential source has informed you of a vulnerability in this gradebook that allows you to change your grade for the assignment if you provide an appropriate comment. This source has also given you the following screenshot of the source code to help you:

gradebook.c Source Code:

gradebook source code

TASK:

  1. Using the information above and tools you have learned about in CSE 29, determine how you would change your grade in Assignment 1 to an A.
  2. Once you have successfully changed your grade to an A, change your percent to reflect your new grade as well, a 95% sounds about right!

In your notes: Describe how you changed your letter grade and percentage for the assignment. What properties of the program did you exploit for this? Compare your approach to that of your group members and discuss any differences in your approaches.

Addressing your Memory Errors

The relevant files for this part of the lab are found within the valgrind directory in the lab7-starter repository you cloned earlier.

In this directory, you will find a Makefile, jstr.c and arraylist.c which contain the implementations of the String and List structs you have seen in lecture. In addition to these files, there are 6 additional C files labeled q1-q6.

Each of these q files implements function(s) on the String and List structs but with one or more memory related errors in them. Your task is to use valgrind to determine these errors and fix them. The files are numbered in increasing order of difficulty, so it is recommended to start with q1_expand_list.c and work your way up.

TASKS:

  1. Open jstr.c and arraylist.c and familiarize yourself with the implementations of the String and List structs.

  2. Run the make command to compile all the buggy C files. You should now have 6 executables in your directory. (Check this by using ls).

  3. For each program (starting with q1), run it both with and without valgrind to see what happens. Use the program output (whether it crashed or not) and valgrind to determine where the error may be and edit the C file to fix it. Remember to use --leak-check=full to get more detailed information about memory leaks.

    NOTE: The names of the files as well as the function names within the files are hints for where the error may be.

    NOTE: All the changes should be in the buggy_ functions found in each file. The main function should not need to be modified.

  4. Once you have fixed the error(s) in a file, run the program with valgrind again to ensure that there are no memory leaks or other memory-related errors. Your valgrind output should look like: valgrind output All heap memory should be freed and there should be no errors reported in the ERROR SUMMARY.

  5. When you determine the bug in each of the files, describe the error you found and how you fixed it in your notes. Provide a screenshot of the valgrind output for the fixed program.

Week 8 – Work on PA4 (Malloc)

Work on your PA4! Your tutors are here to help.

You are welcome to work with your classmates, share ideas, and so on.

In your notes doc, by the end of lab please include a screenshot of you using the various functions provided to you in the starter code:

  • vminit() to initialize the heap to a certain size
  • vmload() to load one of the images as the heap
  • vminfo() to print the heap layout

along with your understanding of these functions.

Week 8 Lab Quiz Answers

Week 9 - More UNIX Commands

Welcome to Week 9

The goal of this week's lab is to introduce you to various new and interesting UNIX commands and features to help you get more comfortable with working on a terminal environment!

To get started, login to ieng6 and fork and clone this repository which contains the various files needed for the lab and cd into the lab9-starter directory.

If you see the following upon running ls, you're ready to get started!

~/cse29/lab9-starter$ ls
Makefile  README.md  message.txt  pokemon  problem.c

Lab Goals

  1. Searching and Filtering Program Output
  2. Flow like Water with Pipes
  3. Pokemon Mailtime!

Viewing and Redirecting Program Output

In the starter code, you are given a problem.c source code file. This program creates a large 2-D array, fills it with numbers, and prints them out. Fortunately, whoever wrote this was thoughtful enough to write code to check for an out-of-bounds error. Unfortunately, whoever wrote this was not thoughtful enough to debug the program. We’re going to use this program as an example of how the following UNIX commands can be used to parse a lot of program output easily.

Compile and run problem to see exactly what “a lot of program output” means.

$ make problem
$ ./problem

You will notice that the program occassionally prints out an error amidst all the other output of the program. For programs with more errors and more output, you could imagine how we wouldn’t want to keep scrolling up the terminal to read. There are multiple ways in which we can filter out everything but the error messages.

Say less

During the quarter, you have used input redirection (<) and output redirection (>) operators to read from and write to files. Let’s first save the output of the program to a text file:

$ ./problem > problem.txt

Note: output redirection (>) will replace the contents of the specified file. If you'd rather append to it instead, use >>.

To view larger files like this, we can use the less command to be able to scroll through and/or search their contents.

$ less problem.txt

TASK:

  1. Redirect the output of ./problem to a file problem.txt
  2. Use less to view the contents of problem.txt; try using j/k to scroll around.
  3. Hit h while in less to show the help screen, and skim through some of the commands (note: ^ refers to the Ctrl key)
  4. Use the help screen to find how to search for a pattern, and how to repeat that search.
  5. IN YOUR NOTES: Provide a screenshot of the output of less with the search results for error (it should be highlighted) .

grep

The grep command searches files for lines that match a pattern. grep takes in two arguments: the string to match and a file to read from.

$ grep error problem.txt

We can also run grep with only one argument, e.g. grep error, in which case grep will read input from stdin instead of from a file. This means we can also write an equivalent command using input redirection, like so:

$ grep error < problem.txt

This grep command looks similar to the first, but here "problem.txt" is not passed as an argument to grep; we'll see how this can be useful in part 2 of this lab.

TASK:

  1. We have provided an additional file named message.txt file with more program output, but some lines were written way out of bounds at column index 42.
  2. Use less to skim through the contents of message.txt
  3. Use grep to search for all the lines that contain the string 42 in message.txt.
  4. IN YOUR NOTES: Provide a screenshot of your search results and the message that was hidden in the out of bounds entries.

Output Streams

In the examples above, we use redirection to manipulate the standard output (stdout) of a program. By default, printf() outputs to this stdout stream, which we illustrate above. What these diagrams do not show is that there actually exists another output stream called standard error (stderr).

To use stderr, we use an alternative to printf(), fprintf(), and specify that it should output to the stderr stream. In problem.c, this line is given to you and commented out. Uncomment this line, and comment the line above (the original print statement). Recompile the program.

TASK: Write your observations for each step in your notes and compare!

  1. Edit problem.c to print out errors to stderr instead of stdout and recompile.
  2. Run the program ($ ./problem) and observe the output. Do you notice anything different?
  3. Run the program with output redirection to a file ($ ./problem > problem.txt) and observe the output. What do you notice? What is the difference between what's in the file and what's on the terminal?
  4. Run the program with output redirection to a file ($ ./problem 2> errors.txt) and observe the output. What do you notice?

UNIX uses 1> implicitly as the default redirection operator when we just use >. We can read this as “redirect to the stream with file descriptor 1”, where the stream with file descriptor 1 is stdout. Similarly, 2> redirects to the stream with file descriptor 2, which is stderr.

We can also use two redirections in conjuction with each other to redirect different streams to different files in one command.

$ ./problem > problem.txt 2> errors.txt

Or we can redirect both streams to the same file.

$ ./problem > output.txt 2>&1

where this syntax is read as “redirect the stream with file descriptor 2 to the same place as the stream with file descriptor 1”.

Flow like Water with Pipes

Before starting, revert the fprintf() line in problem.c back to the printf() that you commented out!

The searching and filtering methods we have seen thus far require that the output of a program be saved to a file before we can manipulate it. However, there are times when we want to filter the output of a program without saving it to a file. This is where pipes come in!

Within UNIX, the pipe operator (|) is used to directly feed output from one program as input into another program. We can pipe directly from the problem program into either less or grep to accomplish the same result as running these commands on the problem.txt file:

$ ./problem | less
$ ./problem | grep error

We can compound indirection, redirection, and pipe operators to create helpful commands. Understanding how these operators work in conjunction with each other can be understood as an analogy to actual pipes, rendered below in immaculate ASCII art.

  • By default, print statements in programs output directly to the terminal interface. So this command:

    $ ./problem
    

    looks like this:

    +-----------+
    |           |==
    | ./problem | ~ ~ ~ ~ ~> terminal
    |           |==
    +-----------+
    
  • By using the redirection operator, we install an elbow pipe to redirect the flow of output into a file. The output no longer flows into the terminal.

    $ ./problem > problem.txt
    
    +-----------+  redirect
    |           |==------+    
    | ./problem | ~ ~ ~  |   terminal
    |           |==--+ $ +
    +-----------+    | $ |
                     | v |
                 problem.txt
    
  • By using the pipe operator, we install a coupling between the output and input of two programs. Without any other pipes at the end, the output of the second program flow into the terminal.

    $ ./problem | grep error
    
    +-----------+   pipe    +-----------+
    |           |==-------==|           |==
    | ./problem | ~ ~ ~ ~ ~>|   grep    | ~ ~ ~ ~ ~> terminal
    |           |==-------==|           |==
    +-----------+           +-----------+
    
  • By using both piping and redirection, we can chain multiple commands and file writing actions. In this command, we run the problem program, filter its output with grep, and output the filtered lines into a file. This file now contains the error messages from problem, and can serve as an error log. IN YOUR NOTES: Does this output to the terminal as well?

    $ ./problem | grep error > errors.txt
    
    +-----------+   pipe    +-----------+  redirect
    |           |==-------==|           |==------+
    | ./problem | ~ ~ ~ ~ ~>|   grep    | ~ ~ ~  |   terminal
    |           |==-------==|           |==--+ $ +
    +-----------+           +-----------+    | $ |
                                             | v |
                                         errors.txt
    

Having your tee and Drinking it too

Redirection lets us write to a file. Not having redirection lets us directly see the output in the terminal. Why don’t we have both?

By using the tee command, we can simultaneously write output to a file and and continue flowing data rightward, usually to the terminal. tee is a command, not an operator like redirection and piping, so we must pipe into tee first. tee takes a filename as an argument, which it outputs to.

$ ./problem | tee problem.txt
+-----------+   pipe    +-----------+
|           |==-------==|           |==
| ./problem | ~ ~ ~ ~ ~>|   tee     | ~ ~ ~ ~ ~> terminal
|           |==-------==|           |==
+-----------+           +-----------+
                            | $ |
                            | v |
                         problem.txt

We can continue to construct longer pipe systems to feed output through multiple commands. This command runs problem, filters its output with grep, then outputs to both a file and the terminal.

$ ./problem | grep error | tee errors.txt
+-----------+   pipe    +-----------+   pipe    +-----------+
|           |==-------==|           |==-------==|           |==
| ./problem | ~ ~ ~ ~ ~>|   grep    | ~ ~ ~ ~ ~>|   tee     | ~ ~ ~ ~ ~> terminal
|           |==-------==|           |==-------==|           |==
+-----------+           +-----------+           +-----------+
                                                    | $ |
                                                    | v |
                                                  errors.txt

Pokemon Mailtime!

The mail command, unlike other commands we’ve taught you in this lab and previous ones, is especially unique: literally no one uses this! As such, this section is not relevant to any course material. But the idea of sending each other mail via the terminal, all 1970s-core, is too appealing to pass up on.

Using the mail Shell

Throughout this section, we’ll use myname and friendname to refer to your and your partner’s UCSD username, respectively. This is the username you use for your UCSD email, and the one you use to log into ieng6.

In order for mail to work, you and whoever you are mailing to must be on the same cluster on ieng6. If each line of your command prompt starts with this:

[myname@ieng6-203]:~:500$

then ieng6-203 is the cluster you are logged into. In order to SSH into a specific cluster, exit out of your current SSH session, and in your local machine terminal, type:

$ ssh myname@ieng6-203.ucsd.edu

You can choose between clusters 201, 202, and 203, as long as you make sure you and your partner are in the same cluster.

Once you and your partner are in the same cluster, try using the mail command to begin composing an e-mail:

$ mail friendname@ieng6-203.ucsd.edu

This will prompt you with Subject: so you can type your email’s subject. The subject is only one line of text, so when you press Enter you are now typing the contents of your mail. When done writing your message, press Ctrl + D to finish and send.

Now, your partner can use the mail command by itself to see that they have received mail! It will have a number next to it as it enumerates messages every time you open the mail shell. Type this number and press Enter to see the message!

While in the mail shell, you can type ? to see a list of commands you can use to navigate your mail.

You will likely not need to use all of these commands but at least a few are worth noting:

  • type <message list> can be used to print out the contents of selected messages. For example, type 3 4 prints out the contents of messages 3 and 4.
  • headers will print out the list of enumerated messages along with their senders, subjects, and statuses.
  • You can respond to a message with Reply <message list>, which will open a response to the message to type in a reply. Again, you can finish and send with Ctrl + D. For example, Reply 5 opens a response to message 5.

Sending Mail from outside the mail Shell

There also exists another method to send mail from outside the mail shell, using the pipe operator (|) we learned earlier.

This command also makes use of the echo command, which outputs its argument to stdout (echo "hello" will print hello to your terminal!). Sounds redundant, but it’s intended to be used in this way to input strings into other commands, or to be used as print statements in bash scripts.

$ echo “email body here” | mail -s “subject here” friendname@ieng6-203.ucsd.edu

Trading Pokemon

For this section, you will be using the files within the pokemon directory of the starter code. This directory contains a pokeget.sh file which will generate a Pokemon when given its Pokedex number (don't worry if you don't know what that is, it's just a unique number for each Pokemon). The format to use the script is as follows:

$ ./pokeget.sh <Pokedex number> > <filename>.pk

Sender Tasks

  1. Use the pokeget.sh script to generate a Pokemon of your choice and save it to a file. You can do this by running the script with the Pokedex number of the Pokemon you want to generate. For example, to generate Pikachu, you would run ./pokeget.sh 25 > pikachu.pk.
  2. Once you have generated your .pk file, use the following command to send your it to your partner, using the -a option. Replace pikachu.pk with the name of your Pokemon file if it is different.
$ echo "email body here" | mail -s "subject here" -a pikachu.pk friendname@ieng6-203.ucsd.edu

Receiver Tasks

  1. Open your mail shell from within the pokemon directory and check for new mail. You should see the email with the Pokemon file attached.
  2. Save the email to a file in the pokemon directory by typing save <message number> pikachu.email. For example, if the message number is 3, you would type save 3 pikachu.email.
  3. Extract the attachment to retrieve the Pokemon that you were sent using the provided script:
$ ./extract_pokemon.sh pikachu.email > pikachu.pk
$ cat pikachu.pk

NOTE: the filenames here aren't official or necessary, we just picked them to make it easier to keep track of what kinds files things are

If everything went well, you should see get the Pokemon your partner sent you! Have a bit of fun with this and send each other some cool Pokemon.

PA3 Resubmission - Web Server

Resubmission instructions

If you want to resubmit PA3, please read this section carefully. You need to pass all the tests in the original PA3, while also implementing an extra functionality and answering a new design question described below.

In addition to /chats, /post, /react, /reset requests, the chat server also listens for the following request:

/edit

/edit?id=<id>&message=<message>

Edits the message in the post with the given id (the ids are the #N at the beginning of posts) by replacing it with the new message. It must respond with the list of all chats (including the chat with the replaced message).

Limits and constraints:

  • If the id is not the ID of some existing chat, respond with some kind of error (HTTP code 400 or 500)
  • If a parameter ( id or message) is missing, respond with some kind of error (HTTP code 400 or 500)
  • If message is longer than 255 bytes, respond with some kind of error (HTTP code 400 or 500)

Updated DESIGN questions for the resubmission

We recommended representing chats and reactions as structs. Another option would be to just represent every chat as the string of text that gets printed (including the id number, timestamp, username, message, any reactions, etc). This representation would be a single char* per chat. Then, the list of all chats would be a char**. Adding a reaction would append a new reaction to the string for that chat.

  1. What is one good and one bad thing about this alternate design? For this part, consider only the original implementation without /edit.
  2. How would this char** design make it harder to add the /edit feature in your code?

Web Servers and HTTP

HTTP is one of the most common protocols for communicating across computers. At the systems programming level, this means using system calls (usually in C) to tell the operating system to send bytes over a network.

One nice feature of HTTP is that it is a primarily text-based protocol, which makes it more straightforward for humans to read and debug. It is also well-understood by web browsers and programs like curl, making it easy to test and connect to user-facing devices.

It's useful to get experience with the format of HTTP, and with using system calls in C to manipulate HTTP requests.

Task – Chat Server

In this programming assignment, you'll write a C program to implement a chat room (think a plain-text version of Slack or Discord).

It's best to complete the PA on ieng6, because it gives a consistent testing environment for the live server.

You can also use the client from Lab 5 to try out your server.

Your programs should compile and run with:

$ make chat-server
$ ./chat-server <optional port number>

The server should start with ./chat-server and print a single message:

$./chat-server
Server started on port PPPPP

If a port number was provided, it should use that port, otherwise it should print an open port that was selected.

It should continue running, listening for requests on that port, until shutdown with Ctrl-c. It can print any other logging messages or other output needed to the terminal.

The requests the chat server listens for are described in this section

/chats

A request to /chats responds with the plain text rendering of all the chats.

The rendered chat format is

[#N 20XX-MM-DD HH:MM]   <username>: <message>
                    (<rusername>)  <reaction>
                        ... [more reactions] ...
... [more chats] ...

Where:

  • #N is # followed by a number like #5 or #33, where the integer is the id of the chat (ids start at 1 and count up).
  • YYYY-mm-dd HH:MM is the timestamp of the chat (when it was sent).
    • YYYY is the year, like 2024
    • mm is the two-digit month, like 10 for October
    • dd is the two-digit day, like 31
    • HH is the two-digit hour in 24-hour format, like 14 for 2pm
    • MM is the two-digit minute, like 55
  • <username> is the username of the user who sent the chat
  • <message> is the text of the message the user entered
  • <rusername> is the name of a user who reacted to the message
  • <reaction> is a reaction to a message

In terms of alignment and spacing:

  • There should be at least one space between the number and the timestamp
  • There should be at least one space between the closing ] and the username
  • There should be no space right after the username in a chat, it should be immediately followed by a colon
  • There should be a single space after the colon before the message
  • There should be some space before the ( in a reaction, no space between the () and the username, and some space after the ) before the reaction message

You can put in whatever effort you like into lining things up nicely within these constraints, but these are the requirements.

So an example chats rendering might look like:

[#1 2024-10-06 09:01]         joe: hi aaron
[#2 2024-10-06 09:02]       aaron: sup
[#3 2024-10-06 09:04]         joe: working on the example chat for the PA
[#4 2024-10-06 09:06]       aaron: oh cool what should it say
[#5 2024-10-06 09:07]         joe: I dunno we could go pretty meta with it? I pushed an example go check it out. like a chat about the chat
[#6 2024-10-06 09:10]       aaron: eh kinda lame tbh
[#7 2024-10-06 09:11]         joe: whatever I already wrote it, going with it as-is
[#8 2024-10-06 09:12]       aaron: ok but make sure we don't look like jerks
[#9 2024-10-06 09:12]       aaron: or at least not me
                            (joe)  👍🏻 
[#10 2024-10-06 09:12]         joe: good talk

Here's another:

[#1 2024-10-24 13:01]        yash: hi all! react with 👍🏻 if you think the lab is good to go, or 😬 if you think it needs more testing
                          (aaron)  👍🏻 
                          (elena)  😬 
                         (arunan)  😬 
                          (janet)  😬 
                         (travis)  😬 
                            (joe)  🐿️
[#2 2024-10-24 13:02]        yash: OK we'll go with what joe said

/post

A post request looks like this:

/post?user=<username>&message=<message>

This creates a new chat with the given username and message string with a timestamp given by the time the request is received by the server. It must respond with the list of all chats (including the new one).

Limits and constraints:

  • If a parameter (username or message) is missing, respond with some kind of error (HTTP code 400 or 500)
  • If username is longer than 15 bytes, respond with some kind of error (HTTP code 400 or 500)
  • If message is longer than 255 bytes, respond with some kind of error (HTTP code 400 or 500)
  • If a post would make there be more than 100000 (one hundred thousand) chats, the server should respond with an error (HTTP code 404 or 500)

/react

/react?user=<username>&message=<reaction>&id=<id>

Creates a new reaction to a chat by the given username with the given message string, reacting to the post with the given id (the ids are the #N at the beginning of posts). It must respond with the list of all chats (including the new one).

Limits and constraints:

  • If the id is not the ID of some existing chat, respond with some kind of error (HTTP code 400 or 500).
  • If a parameter (username or message or id) is missing, respond with some kind of error (HTTP code 400 or 500)
  • If username is longer than 15 bytes, respond with some kind of error (HTTP code 400 or 500)
  • If message is longer than 15 bytes, respond with some kind of error (HTTP code 400 or 500) – reactions are intended to be short!
  • If a reaction would make a chat have more than 100 reactions, the server should respond with an error (HTTP code 404 or 500)

/reset

A /reset request resets the chat server to have no chats or reactions, starting from the empty initial state. Should respond with a successful HTTP response with an empty body.

It should be possible to /reset the room many times, and after resetting the memory usage of the program should be the same as in the empty initial state.

After a reset, it should be possible to immediately shut down the program and have valgrind report no memory leaks.

Design Questions (You do not need to answer these for the resubmission)

  1. How much working memory do 10 chats take in your program, in between processing requests (assume no one has reacted to them)? How about 100? 1000? We can call this part of the memory your program uses the chat storage. How much additional working memory is used when handling the /chats request in your code for 10, 100, and 1000 chats? Speculate on ways to lower either the chat storage or the memory used to process a request, or explain why your approach minimizes them.

  2. The PA writeup specifies many limits and constraints on the input (usernames, messages, ids, and so on). Choose one of these limits or constraints, and imagine removing it. What's something that would now work and might make a user happy because the limit is removed? What's a problem that could result from removing that limit? What impact would it have on your implementation to remove it (any new possibilities of errors, new cases to handle, etc)?

Implementation Guide

This page is the entire specification for the assignment; it's what you need to implement. You are free to make whatever choices you like in your code within these constraints. To help you on your way, we have an implementation guide:

Handin

  • Hand in your repository on Gradescope
  • Make sure make chat-server builds your server (that's the command we will run), and the server runs on a predictable port with, for example, ./chat-server 8000
  • Don't forget DESIGN.md and CREDITS.md

Implementation Guide: The http-server Library

We've provided you with http-server.c and http-server.h. This is code for you to treat as a library, you shouldn't change it at all (though you may learn from reading it!).

It provides one function for you to use:

void start_server(void(*handler)(char*, int), int port);

This function takes two arguments: a handler that takes HTTP requests and writes responses, and a port for telling the server what address to use. The port can be 0 to ask the operating system to choose one for us.

The type for handler is the type of a function pointer, which is the address of a function. To use start_server, we define a function (it can have any name) with the appropriate signature and pass it into start_server as an argument. The handler function takes two arguments:

void(*handler)(char*, int)
  • The first, the char*, is a string containing the request from the internet. It will be an HTTP request.
  • The second, the int, is a file descriptor that our program can use to write a response using the write system call

Here's a straightforward example that just responds to the user with a constant string:

#include "http-server.h" // http-server includes a lot of other useful things
char const* HTTP_200_OK = "HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\n";
void handle_response(char *request, int client_socket) {
    printf("The user sent a request with: %s\n", request);
    write(client_socket, HTTP_200_OK, strlen(HTTP_200_OK));
    write(client_socket, "Hello!", 6);
}
int main() {
    start_server(&handle_response, 0);
}
$ gcc -o print-request print-request.c http-server.c
$ ./print-request
Server started on port PPPPP (will be a specific number for you)
# the lines below print *when the request is received*
The user sent a request with: GET /post?user=joe&message=hi HTTP/1.1
Host: localhost:36611
User-Agent: curl/7.68.0
Accept: */*
The user sent a request with: GET /post?user=aaron&message=sup HTTP/1.1
Host: localhost:36611
User-Agent: curl/7.68.0
Accept: */*

(in another terminal)

$ curl "http://localhost:PPPPP/post?user=joe&message=hi"
Hello!
$ curl "http://localhost:PPPPP/post?user=aaron&message=sup"
Hello!

Then, every time the server gets a request, it will call the handle_request function, which will print the contents of the request, and then use the write system call to send a response back to the client (in this case with a constant string). This is the entry point to our program, and it's a good starting point for you to build up from!

One other tip for using start_server:

  • If port is 0, the library will pick an open port provided by the operating system. This is convenient for getting started
  • However, when you restart a program it's annoying to have to change the port number in any requests you have in your bash history, etc. So you can also provide the port number as an argument, which could be hardcoded or taken from a command-line argument. You can do that to re-run the program with the same port it used last time to keep your examples consistent

HTTP Requests

The first argument to handler is a char* with a a reference to the HTTP request. For our chat-server, all requests will be GET requests and will look something like

GET /post?user=joe&message=hi HTTP/1.1
Host: localhost:36611 User-Agent:
curl/7.68.0 Accept: */ *

The string after GET is the path of the request, and that's the part that will change depending on the type of request we get. Part of the job of the chat-server will be to extract the relevant information from the path, like which action to take (post, react, etc.) and the parameters of the action (user, message, etc.) in order to work with them.

It probably makes sense to use some C string library functions to do this. In particular strtok will allow you to “split up” a string by replacing delimiters like , &, and = with null terminators. You may also find strcspn, strstr, or strpbrk useful; it's up to you to come up with a plan to put these together to extract the information out of each request.

HTTP Responses

The second argument to handler is an int that is a file descriptor for the response. The http-server library has set things up so writing to the socket sends bytes directly back to whatever client made the request.

The format of a HTTP response for a successful response is

HTTP/1.1 200 OK
Content-Type: text/plain

... response body ...

Note: A nuance of this is that the newlines in HTTP responses are not just \n; they are required to be \r\n (which, interestingly, is how line endings work on Windows). Some software will work with just a \n for line breaks, but the correct thing to do is use \r\n, so in a string literal the HTTP header above would look like

"HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\n...response body...

There are many other Content-Types in the world other than text/plain! But we will focus on just plain text reponses for this assignment.

Working Incrementally: Data vs. Requests

One way to break down the work the server needs to do is:

  • Parsing and interpreting requests (is the request a new post, a reaction, etc)
  • Updating the current data (chats and reactions) based on the parameters in the request
  • Responding to requests based on the current state of the data (chats and requests)

One way to work incrementally is to separate the data handling and the request handling parts into different functions.

The data handling functions can be tested with asserts, and the request handling can be tested with curl or a client.

We think the following functions might be useful for you to implement. In your program you might have slightly different signatures or ideas, but these are a useful starting point. Also, our staff is more familiar with this approach, so it will take us less time to help you in office hours!

Data Handling Functions

These functions can be written and tested without starting a server at all. You could consider having a separate main function in its own file that just tests these!

add_chat

A function add_chat can add a single chat.

uint8_t add_chat(char* username, char* message)

This function might have several tasks:

  1. Update the current id
  2. Get the current timestamp
  3. Create a new chat and fill in its username and message fields
  4. As needed, allocate new space, put the new chat in heap memory, store a reference to it in an array, etc. depending on your specific representation of chats

This is testable by setting up initial states of chats and reactions, running the function, and then using assert or printf on the results.

add_reaction

Similar to add_chat, this adds a single reaction:

uint8_t add_reaction(char* username, char* message, char* id)

This function might have several tasks:

  • Use the id to locate the chat that this reaction is for (and maybe return early with an error if the id is invalid/out of range)
  • Create a new reaction and fill in its username and message fields
  • Add the reaction to the chat struct somehow, maybe with newly allocated space, an added element or reference in an array, etc. depending on your specific representation of chats and reactions
  • Update the count of reactions on the referenced chat

This is testable by setting up initial states of chats and reactions, running the function, and then using assert or printf on the results.

reset

This function can use free as necessary to clear dynamically-allocated heap memory and reset logical values (like a counter of the number of chats) back to 0.

This is testable by setting up initial states of chats and reactions, running the function, and then using assert or printf on the results. You can also set up a main function in a fresh file just for checking that after using a sequence of add_chat and add_reaction followed by reset, there are no memory leaks when run with valgrind.

Request and Response Handling Functions

You will definitely need to write a function for handling responses. But the work of handling individual responses can be broken up. One approach could be to get the path and query parameter string from the request and check if it's path is /post, /chats, etc, then pass the string to other functions

respond_with_chats

void respond_with_chats(int client)

This function is reponsible for using write or send to send the response to the client that made the request. It might include:

  • Using snprintf to format strings with data from the timestamp or ids
  • Calling write(client, str, size) on various strings (with the appropirate size) to directly send the data to the client

handle_post

// path is a string like "/post?user=joe&message=hi"
void handle_post(char* path, int client)

This function can have several tasks:

  • Use string functions to extract the username and message from the path
  • Call add_chat to do the data update
  • Call respond_with_chats to send the response

handle_reaction

// path is a string like "/react?user=joe&message=hi&id=3"
void handle_reaction(char* path, int client)

This function can have several tasks:

  • Use string functions to extract the username, message, and id from the path
  • Call add_reaction to do the data update
  • Call respond_with_chats to send the response

Parsing Functions

One crucial task is getting information out of requests. That is, given a string like this:

"GET /post?user=joe&message=hi HTTP/1.1\r\n"

How do we extract the parts that say "joe" and "hi"?

There are a few substeps you might consider, whether as helper functions or just parts of an overall approach:

  • Isolate the path part (the part starting with / and up until the space before HTTP)
  • Search for known strings and delimiters like user= and &
  • Copy parts of the string into new character arrays, paying careful attention to the necessary lengths (some of the limits on requests may be helpful here!). Alternatively, create interior pointers into the path at the appropriate places for the username and message

Remember that Week 4 Monday had a video specifically about strings and interior pointers, and Week 5 Discussions are specifically about parsing requests!

Representing Chats

Chats and reactions both have multiple fields, so a natural choice is to represent both chats and reactions as structs.

A chat has several components, which may be good candidates for struct fields:

  • The message
  • The username
  • The timestamp
  • The reactions to the message

A reaction has the message content and the user who posted it (no timestamp or reactions-to-reactions), both of which are fixed-size.

You could consider structures like these; what are some tradeoffs? (We've assumed that the program defines some useful constants to avoid repeating specific numbers).

struct Reaction {
    char user[USERNAME_SIZE];
    char message[REACTION_SIZE];
}
struct Chat {
    uint32_t id;
    char user[USERNAME_SIZE];
    char message[MESSAGE_SIZE];
    char timestamp[TIMESTAMP_SIZE];
    uint32_t num_reactions;
    Reaction reactions[MAX_REACTIONS];
}
struct Reaction {
    char *user;
    char *message;
}
struct Chat {
    uint32_t id;
    char *user;
    char *message;
    char *timestamp;
    uint32_t num_reactions;
    Reaction *reactions;
}

These are ideas – some combination of them might work, and they are not necessarily perfect or complete. Some things to think about:

  • Which fields are fixed-size?
  • Which fields can grow?
  • Which fields can change?
  • What are limits for them described in the specification?

Other Helpful Functions

This PA explores several features that are straightforward to use, but there are many of them. We might add more to this list as the PA goes on! Here are a few functions you'll probably find useful; try man on them, or follow the links, or do your own searching and research. Don't forget all the functions from class (e.g. malloc and other allocation functions, strstr and other string manipulation functions, and so on). This list is mainly focused on things we haven't tried in class.

  • atoi: convert char* to integer

    For example:

    #include <stdio.h>
    #include <stdlib.h>
    
    int main() {
      char numeric[] = "123";
      int asnum = atoi(numeric);
      printf("%d\n", asnum * 2);
    }
    
  • Time functions:

    • time: get the current time

    • localtime: convert the time to the current local time zone

    • strftime: print the time in a given format

      For example:

      #include <time.h>
      #include <stdio.h>
      
      int main() {
          char buffer[100];
          time_t now = time(NULL);
          struct tm *tm_info = localtime(&now);
          strftime(buffer, sizeof(buffer), "%Y-%m-%d %H:%M:%S", tm_info);
          printf("%s", buffer);
      }
      

PA4 Resubmission – Malloc

This assignment is thanks to the staff of CSE29 spring 2024, especially Gerald Soosairaj and Jerry Yu.

Resubmission instructions

If you want to resubmit PA4, please read this section carefully. You need to pass all the tests in the original PA4, while also implementing an extra functionality and answering a new design question described below.

vminfo

In utils.c, we have implemented the function vminfo(), which traverses through the heap blocks and prints out the metadata in each block header. Update the function so that at the end of its output, it prints details about the largest available block as follows:

The largest free block is #x with size y.

where x is the blockid starting from 0, and y is the size (including header) of the largest available block for allocation. If there are multiple candidates for the largest block, x should be the smallest blockid of the candidates. Refer the following updated output of vminfo as an example:

vminit: heap created at 0x707df3026000 (4096 bytes).
vminit: heap initialization done.
---------------------------------------
 #      stat    offset   size     prev   
---------------------------------------
 0      BUSY    8        48       BUSY   
 1      FREE    56       32       BUSY   
 2      BUSY    88       112      FREE   
 3      BUSY    200      96       BUSY   
 4      FREE    296      208      BUSY   
 5      BUSY    504      64       FREE   
 6      BUSY    568      320      BUSY   
 7      FREE    888      208      BUSY   
 8      BUSY    1096     160      FREE   
 9      BUSY    1256     368      BUSY   
 10     BUSY    1624     272      BUSY   
 11     FREE    1896     672      BUSY   
 12     BUSY    2568     128      FREE   
 13     BUSY    2696     464      BUSY   
 14     FREE    3160     672      BUSY   
 15     BUSY    3832     160      FREE   
 16     FREE    3992     96       BUSY   
 END    N/A     4088     N/A      N/A    
---------------------------------------
Total: 4080 bytes, Free: 6, Busy: 11, Total: 17
The largest free block is #11 with size 672

In this example, there are six FREE blocks with the two largest FREE blocks being 672 bytes in size: the output line shows the index of the 672 sized block that comes first.

Updated DESIGN question for the resubmission

In our implementation, when freeing an allocated block, we coalesce it with the previous and next blocks in the heap if they are free - to make one larger free block.

  1. Consider an updated implementation where in case of freeing, we only coalesce it with exactly 1 next block in the heap if it is free. Give an example of a program (in C or pseudocode) where all the allocations succeed in the current design (like in this PA), but some allocations would fail with the updated freeing strategy.

In class, quizzes, and PAs we've _used_ `malloc` and `free` to manage memory. These are functions [written in C](https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c)! `glibc` contains one of the [frequently used implementations](https://sourceware.org/glibc/wiki/MallocInternals).

There are a lot of details that go into these implementations – we don't expect that anyone could internalize all the details from the documentation above quickly. However, there are a few key details that are really interesting to study. In this project, we'll explore how to implement a basic version of malloc and free that would be sufficient for many of the programs we have written.

Specfically, we will practice the following concepts in C:

  • bitwise operations,
  • pointer arithmetic,
  • memory management, and of course,
  • using the terminal and vim.

Task Specification

You will implement and test two functions.

vmalloc

void *vmalloc(size_t size);

The size_t size parameter specifies the number of bytes the caller wants. This has the same meaning as the usual malloc we have been using: the returned pointer is to the start of size bytes of memory that is now exclusively available to the caller.

Note the void * return type, whic is the same as the usual malloc. void * is not associatd with any concrete data type. It is used as a generic pointer, and can be implicitly assigned to any other type of pointer.

This is how malloc (and our vmalloc) allow assigning the result to any pointer type:

int *p = malloc(sizeof(int) * 10);
char *c = malloc(sizeof(char) * 64);

If size is not greater than 0, or if the allocator could not find a heap block that is large enough for the allocation, then vmalloc should return NULL to indicate no allocation was made.

Your implementation of vmalloc must:

  • Use a “best fit” allocation policy
  • Always return an address that is a multiple of 16
  • Always allocate space on 16-byte boundaries

More details are in the implementation guide.

vmfree

void vmfree(void* ptr)

The vmfree function expects a 16-byte aligned address obtained by a previous call to vmalloc. If the address ptr is NULL, then vmfree does not perform any action and returns directly. If the block appears to be a free block, then vmfree does not perform any action and returns.

For allocated blocks, vmfree updates the block metadata to indicate that it is free, and coalesces with adjacent (previous and next) blocks if they are also free.

More details are in the implementation guide.

Design Questions (You do not need to answer these for the resubmission)

Heap fragmentation occurs when there are many available blocks that are small-sized and not adjacent (so they cannot be coalesced).

  1. Give an example of a program (in C or pseudocode) that sets up a situation where a 20-byte vmalloc call fails to allocate despite the heap having (in total) over 200 bytes free across many blocks. Assume all the policies and layout of the allocator from the PA are used (best fit, alignment, coalescing rules, and so on)

  2. Give an example of a program (in C or pseudocode) where all the allocations succeed if using the best fit allocation policy (like in this PA), but some allocations would fail due to fragmentation if the first fit allocation policy was used instead, keeping everything else the same.

Submission

You'll submit on Gradescope as usual. Refer to the policies on collaboration if you need to refresh your memory.

We will compile your program using the provided Makefiles, which you should not change.

To get started read through the next few sections about the PA:

Getting Started

The starter code for this assignment is hosted on GitHub Classroom. Use the following link to accept the GitHub Classroom assignment: https://classroom.github.com/a/ngxgHFB3

Just like on PA2/3, clone the repository to ieng6 server.

The Code Base

Unlike previous PAs, this PA comes with starter code that you should read and understand.

The Library

  • vmlib.h: This header file defines the public interface of our library. Other programs wishing to use our library will include this header file.
  • vm.h: This header file defines the internal data structures, constants, and helper functions for our memory management system. It is not meant to be exposed to users of the library.
  • vminit.c: This file implements functions that set up our “heap”.
  • vmalloc.c: This file implements our own allocation function called vmalloc.
  • vmfree.c: This file implements our own free function called vmfree.
  • utils.c: This file implements helper functions and debug functions.

Testing

  • vmtest.c: This file is not a part of the library. It defines a main function and uses the library functions we created. We can test our library by compiling this file into its own program to run tests. You can write code in the main() function here for testing purposes.
  • tests/: This directory contains small programs and other files which you should use for testing your code. We will explain this in more detail in a later section.

Compiling the Starter Code

To compile, run make in the terminal from the root directory of the repository. You should see the following items show up in your directory:

  • libvm.so: This is a dynamically linked library, i.e., our own memory allocation system that can be linked to other programs (e.g., vmtest). The interface for this library is defined in vmlib.h. The Makefiles handle compiling with a .so for you; we set it up this way to match how production systems include libraries like stdlib
  • vmtest: This executable is compiled from vmtest.c with libvm.so linked together. It uses your memory management library to allocate/free memory.
  • The starter code in vmtest.c is very simple: it calls vminit() to initialize our simulated “heap”, and calls the vminfo() function to print out the contents of the heap in a neatly readable format. Run the vmtest executable to find out what the heap looks like right after it’s been initialized! (Hint: it’s just one giant free block.)

Reading the Starter Code

For this programming assignment, we want the addresses returned by vmalloc to always be 16-byte aligned (i.e., at an address ending in 0x0). Since the metadata (block header) is 8 bytes in size, block headers will therefore always be placed at addresses ending in 0x8. To maintain this alignment, the first block header (*heapstart) starts at an offset of 8, and all blocks going forward must have sizes that are multiples of 16 bytes.

Block sizes here refer to the size of the block including the header. The smallest possible block is 16 bytes, and would consist of an 8 byte header followed by 8 bytes of allocatable memory. When free, the last 8 bytes of the block would instead contain the block footer, to be used for coalescing free blocks.

The size_t data type, which you will see frequently throughout this PA, is a 64-bit (8-byte) unsigned integer type.

Begin by reading through the vm.h file, where we define the internal data structures for the heap. This is where you will find the all-important block headers and block footers. Focus on understanding how struct block_header is used. You will see it in action later.

Next, open vminit.c. This is a very big file containing functions that create and set up the simulated heap for this assignment. Find the init_heap() function, and read through the entire thing to understand how it is setting up the heap. There is pointer arithmetic involved, you will need to do similar things for implementing vmalloc and vmfree, so make sure you have a solid understanding of that. (Why is it necessary to cast pointers to different types?)

Just like production allocators, we create a large chunk of memory as our heap using mmap, and allocations/frees are performed in that memory region.

Once you understand what the init_heap() function is doing, open up utils.c. Here we have implemented the function vminfo(), which will be your ally throughout this PA. This function traverses through the heap blocks and prints out the metadata in each block header. You should find inspiration for how to write your own vmalloc function here. Look at how it manipulates the pointer to jump between blocks!

Implementation Guide

This provides detailed implementation information and a suggested workflow.

Incremental Development Tips

  • Run the starter code and understand the output. Always figure out how to run your program first! You can’t do any testing if you don’t know how to run your program.
  • Understand the init_heap() function.
  • Understand the vminfo() function.
  • Begin writing the vmalloc() function:
    • write and test the size calculation code.
    • write and test the best-fit policy. (A helper function would be great here.) See later section for testing images.
    • write and test splitting free blocks.
  • Begin writing the vmfree() function:
    • write and test a basic vmfree() implementation that only frees one block.
    • write and test coalescing with the next block if it is also free.
  • Add block footers / coalescing backwards
    • update and test vmfree() and vmalloc() to write block footers into free blocks.
    • update and test vmfree() to coalesce with the previous block if it is free.
  • Test everything!

Consider some helper functions along the way:

  • A function to get the size of a block in bytes given a pointer to the beginning of it
  • A function to get the size of a block in 8-byte words given a pointer to the beginning of it
  • A function for getting a pointer to the next block given a pointer to the beginning of a block

Implementing vmalloc

Allocation Size Calculation

We need to do some calculations based on the requested size to get the actual block size that we need to look for in our heap.

  • Add 8 bytes for the block header to the requested size;
  • Round up the new size to the nearest multiple of 16.

For example, if the user calls vmalloc(10), we first add 8 bytes for the header to get 18, and then round up to the nearest multiple of 16, which gets us 32. So to allocate 10 bytes, we need to look for a heap block that is at least 32 bytes in size.

Allocation Policy

Many allocation policies are possible – choosing the first block that fits, choosing the last block that fits, choosing the most-recently-freed block that fits, and so on. The one you must implement for this PA is the best-fit policy.

That is, once you have determined the size requirement of the desired heap block, you need to traverse the entire heap to find the best-fitting block – the smallest block that is large enough to fit the allocation. If there are multiple candidates of the same size, then you should use the first one you find.

Splitting Large Blocks

If the best-fitting block you find is larger than what we need, then we need to split that block into two. For example, if we are looking for a 16-byte block for vmalloc(4), and the best fitting candidate is a 64-byte block, then we will split it into a 16-byte block and a 48-byte block. The former is allocated and returned to the caller, the latter remains free.

Updating Block Header(s)

If a new block was created as a result of a split, you need to create a new header for it, and

  • set the correct size.
  • set the status bit to 0.
  • set the previous bit to 1.

And for the block that was allocated, you need to

  • update the size (if splitting happened).
  • set the status bit to 1.
  • go to the next block header (if it’s not the end mark) and set its previous bit to 1.

Returning the Address

After updating all the relevant metadata, vmalloc should return a pointer to the start of the payload. Do not return the address of the block header!

Implementing vmfree

The simplest version of vmfree just finds the header for the given block and resets the block status bit to 0.

Coalescing freed blocks

Just freeing a block is not necessarily enough, since this may leave us with many small free blocks that can't hold a large allocation. When freeing, we also want to check if the next / previous block in the heap are free, and coalesce with them to make one large free block.

If you are coalescing two blocks, remember to update both the header and the footer!

Block footers / Previous Bit

Since a block's header contains its size, we know how far forward to move to get to the next block's header. However, when coalescing backwards, we need to know the size of the previous block to get to its header. To be able to do this without walking the entire list of blocks from the beginning, we write the block size to a footer in the last 8 bytes of the block.

Since footers are only needed during coalescing, we only need to add footers to free blocks; this means that footers don't take up extra space in allocated blocks, and free blocks weren't using that space anyway. However, we then need to make sure we update the "previous block busy" bit correctly so that we don't confuse user data in an allocated block with footer data in a free block.

You will need to update both your vmalloc and your vmfree implementation to add code for creating/updating accurate footers, and making sure the "previous block busy" bit is correct.

Testing

For this PA we provide you with some local testing code to help you make sure your program is working; you will need to write more thorough tests than what we've provided! But we give you a big start

Test Programs

In the repository, you will find the tests/ directory, where we have provided some small testing programs that test your library function.

By coding in to the tests/ directory and running make, you can compile all these test programs and run them yourself. These programs are what we will be using in the autograder as well.

For example, here’s the code for the very first test: alloc_basic.c:

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>

#include "vmlib.h"
#include "vm.h"

/*
 * Simplest vmalloc test.
 * Makes one allocation and confirms that it returns a valid pointer,
 * and the allocation takes place at the beginning of the heap.
 */
int main()
{
    vminit(1024);

    void *ptr = vmalloc(8);
    struct block_header *hdr = (struct block_header *)
                               ((char *)ptr - sizeof(struct block_header));

    // check that vmalloc succeeded.
    assert(ptr != NULL);
    // check pointer is aligned.
    assert((uint64_t)ptr % 16 == 0);
    // check position of malloc'd block.
    assert((char *)ptr - (char *)heapstart == sizeof(struct block_header));
    // check block header has the correct contents.
    assert(hdr->size_status == 19);

    vmdestroy();
    return 0;
}

In this test, we simply make one vmalloc() call, and we verify that the allocation returned a valid pointer at the correct location in the heap.

These tests use the assert() statement to check certain test conditions. If your code passes the tests, then nothing will be printed. If one of the assert statements fail, then you will see an error.

Test Images

When you are writing vmalloc, you may wonder how you can test the allocation policy fully when you don’t have the ability to free blocks to create a more realistic allocation scenarios (i.e., a heap with many different allocated/free blocks to search through).

To help you with that, we have created some images in the starter code in the tests/img directory. In your test programs, instead of calling vminit(), you can call the vmload() function instead to load one of these images.

Our alloc_basic2 program uses one such image. If you open tests/alloc_basic2.c, you will see that it creates the simulated heap using the following function call:

vmload("img/many_free.img");

If you load this image and call vminfo(), you can see exactly how this image is laid out:

vmload: heap created at 0x7c2abd1da000 (4096 bytes).
vmload: heap initialization done.
---------------------------------------
 #      stat    offset   size     prev   
---------------------------------------
 0      BUSY    8        48       BUSY   
 1      BUSY    56       48       BUSY   
 2      FREE    104      48       BUSY   
 3      BUSY    152      32       FREE   
 4      FREE    184      32       BUSY   
 5      BUSY    216      48       FREE   
 6      FREE    264      128      BUSY   
 7      BUSY    392      112      FREE   
 8      BUSY    504      32       BUSY   
 9      FREE    536      112      BUSY   
 10     BUSY    648      352      FREE   
 11     BUSY    1000     304      BUSY   
 12     BUSY    1304     336      BUSY   
 13     FREE    1640     320      BUSY   
 14     BUSY    1960     288      FREE   
 15     BUSY    2248     448      BUSY   
 16     BUSY    2696     256      BUSY   
 17     BUSY    2952     96       BUSY   
 18     BUSY    3048     368      BUSY   
 19     FREE    3416     672      BUSY   
 END    N/A     4088     N/A      N/A    
---------------------------------------
Total: 4080 bytes, Free: 6, Busy: 14, Total: 20

You can use this image to test allocating in a more realistic heap.

Three images exist in total:

  • last_free.img: the last block is free,
  • many_free.img: many blocks are free,
  • no_free.img: no block is free (use this to test allocation failure).

Project 5: The Pioneer Shell

Due Friday, December 6, 10:00PM

This assignment is from CSE29 SP24, which has its own list of acknowledgments!

Learning Goals

This assignment calls upon many of the concepts that you have practiced in previous PAs. Specfically, we will practice the following concepts in C:

  • string manipulation using library functions,
  • command line arguments,
  • process management using fork(), exec(), and wait(), and of course,
  • using the terminal and vim.

We'll also get practice with a new set of library functions for opening, reading from, and writing to files.

Introduction

Throughout this quarter, you have been interacting with the ieng6 server via the terminal – you’ve used vim to write code, used gcc and make to compile, used git to commit and push your changes, etc. At the heart of all this lies the shell, which acts as the user interface for the operating system.

At its core, the shell is a program that reads and parses user input, and runs built-in commands (such as cd) or executable programs (such as ls, gcc, make, or vim).

As a way to wrap up this quarter, you will now create your own shell (a massively simplified one of course). We shall call it–

The Pioneer Shell

The PIoneer SHell, or as we endearingly call it, pish (a name with such elegance as other popular programs in the UNIX world, e.g., git).

Interactive Mode

Your basic shell simply runs on an infinite loop; it repeatedly–

  • prints out a prompt,
  • reads keyboard input from the user
  • parses the input into a command and argument list
  • if the input is a built-in shell command, then executes the command directly
  • otherwise, creates a child process to run the program specified in the input command, and waits for that process to finish

This mode of operation is called the interactive mode. The other mode you will need to implement is–

Batch Mode

Shell programs (like bash, which is what you have been using on ieng6) also support a batch execution mode, i.e., scripts. In this mode, instead of printing out a prompt and waiting for user input, the shell reads from a script file and executes the commands from that file one line at a time.

In both modes, once the shell hits the end-of-file marker (EOF), it should call exit(EXIT_SUCCESS) to exit gracefully. In the interactive mode, this can be done by pressing Ctrl-D.

Parsing Input

Every time the shell reads a line of input (be it from stdin or from a file), it breaks it down into our familiar argv array.

For instance, if the user enters "ls -a -l\n" (notice the newline character), the shell should break it down into argv[0] = "ls", argv[1] = "-a", and argv[2] = "-l". In the starter code, we provide a struct to hold the parsed command.

Handling Whitespaces

You should make sure your code is robust enough to handle various sorts of whitespace characters. In this PA, we expect your shell to handle any arbitrary number of spaces ( ) and tabs (\t) between arguments.

For example, your shell should be able to handle the following input: " \tls\t\t-a -l ", and still run the ls program with the correct argv array.

Splitting the arguments is very much like what you did in PA3 parsing URLs. We give a specific suggestion of a function for this PA:

The strtok() function

char *strtok(char *str, const char *delim);

The strtok() function breaks a string into a sequence of zero or more nonempty tokens. On the first call to strtok(), the string to be parsed should be specified in str. In each subsequent call that should parse the same string, str must be NULL. Read that last bit again - strtok is not like other functions we've used!

The delim argument specifies a set of characters that delimit the tokens in the parsed string.

Each call to strtok() returns a pointer to a null-terminated string containing the next token. If no more tokens are found, strtok() returns NULL.

The above paragraphs are taken directly from the Linux manual (except for "read that last bit again". Joe added that). Now let’s see this function in action:

Suppose we have the following string:

char str[32] = "A|BC|D E   |FGH   \n"

And we would like to break it down by any of '|', ' ', or '\n', which would leave us "A", "BC", "D", "E", and "FGH". We can write the following code:

#include <stdio.h>
#include <string.h>

int main()
{
    char str[32] = "  A|BC|D E   |FGH   \n";
    // break down the string at '|', ' ', and '\n' delimiters.
    char *tok = strtok(str, " |\n");
    while (tok) {
        printf("\'%s\'\n", tok);
        tok = strtok(NULL, " |\n");
    }
    return 0;
}

And running this program gives us:

'A'
'BC'
'D'
'E'
'FGH'

Note again that we try to break down (tokenize) the string using three possible delimiter characters: '|', ' ', or '\n', the delim string we pass to strtok() has these three characters: " |\n".

We think you will find strtok() very useful for your implementation!

Built-In Commands

Whenver your shell executes a command, it should check whether the command is a built-in command or not. Specifically, the first whitespace-separated value in the user input string is the command. For example, if the user enters ls -a -l tests/, we break it down into argv[0] = "ls", argv[1] = "-a", argv[2] = "-l", and argv[3] = "tests/", and the command we are checking for is argv[0], which is "ls".

If the command is one of the following built-in commands, your shell should invoke your implmementation of that built-in command.

There are three built-in commands to implement for this project: exit, cd, and history.

Built-in Command: exit

When the user types exit, your shell should call the exit system call with EXIT_SUCCESS (macro for 0). This command does not take arguments. If any is provided, the shell raises a usage error.

Built-in Command: cd

cd should be run with precisely 1 argument, which is the path to change to. You should use the chdir() system call with the argument supplied by the user. If chdir() fails (refer to man page to see how to detect failure), you should use call perror("cd") to print an error message. We will explain the perror() function in a later section.

Built-in Command: history

When the user enters the history command, the shell should print out the list of commands a user has ever entered in interactive mode.

(If you are on ieng6, open the ~/.bash_history file to take a look at all the commands you have executed. How far you’ve come this quarter!)

To do this, we will need to write the execution history to a file for persistent storage. Just like bash, we designate a hidden file in the user’s home directory to store the command history.

Our history file will be stored at ~/.pish_history. (You will find a function in the starter code that help you get this file path.)

When adding a command to history, If the user enters an empty command (0 length or whitespace-only), it should not be added to the history.

When the user types in the history command to our shell, it should print out all the contents of our history file, adding a counter to each line:

▶  history
1 history
▶  pwd
/home/jpolitz/cse29fa24/pa5/Simple-Shell
▶  ls
Makefile  script.sh  pish  pish.c  pish.h	pish_history.c  pish_history.o  pish.o
▶  history
1 history
2 pwd
3 ls
4 history

Note that the number before each line is added by history. The contents of .pish_history should not contain the leading numbers.

Running Programs

If, instead, the command is not one of the aforementioned built-in commands, the shell treats it as a program, and spawns a child process to run and manage the program using the fork() and exec() family of system calls, along with wait().

Refer to the material from Week 9 for information on these system calls.

Excluded Features

Now because our shells are quite simple, there are a lot of things that you may be accustomed to using that will not be present in our shell. (Just so you are aware how much work the authors of the bash shell put into their product!)

You will not be able to

-use the arrow keys to navigate your command history, -use to autocomplete commands, -use the tilde character (~) to represent your home directory, -use redirection (> and <), -pipe between commands (|), -and many more…

Don’t be concerned when these things don’t work in your shell implementation!

If this were an upper-division C course, we would also ask you to implement redirection and piping, but it's [checks notes] week 10 already!

Handling Errors

Because the shell is quite a complex program, we expect you to handle many different errors and print appropriate error messages. To make this simple, we now introduce–

Usage Errors

This only applies to built-in commands. When the user invokes one of the shell’s built-in commands, we need to check if they are doing it correctly.

  • For cd, we expect argc == 2,
  • For history and exit, we expect argc == 1.

If the users enters an incorrect command, e.g. exit 1 or cd without a path, then you should call the usage_error() function in the starter code, and continue to the next iteration of the loop.

The perror() function

void perror(const char *s);

The perror() function produces a message on stderr describing the last error encountered during a library function/system call.

When printing the error message, perror() first prints whatever string s we give it, followed by a colon and a blank, followed by the error message.

Consider an example where we try to open a nonexistent file using fopen(), the call should fail and not return a valid FILE pointer, in which case, we call perror() to report the problem:

FILE *fp = fopen("noexist.txt", "r");
if (fp == NULL) {
    perror("open");
    return EXIT_FAILURE;
}

If we run this program, we would get the following output:

open: No such file or directory

Errors to handle

You need to handle errors from the following system calls/library functions using perror(). Please pay attention to the string we give to perror() in each case and reproduce it in your code exactly.

  • fopen() failure: perror("open"),
  • chdir() failure: perror("cd"),
  • execvp() failure: perror("pish"),
  • fork() failure: perror("fork")

Getting Started

The starter code for this assignment is hosted on GitHub classroom. Use the following link to accept the GitHub Classroom assignment:

Github Classroom: https://classroom.github.com/a/ZT3W6c0Z

The Code Base

You are given the following files:

  • pish.h: Defines struct pish_arg for handling command parsing; declares functions handling the history feature.
  • pish.c: Implements the shell, including parsing, some built-in commands, and running programs.
  • pish_history.c: Implements the history feature.
  • Makefile: Builds the project.
  • ref-pish: A reference implementation of the shell. Note that in this version, the history is written to ~/.ref_pish_history rather than ~/.pish_history, to avoid conflict with your own shell program.

(Note: The reference implementation contains a few inconsistencies with the write-up. For instance, the reference program does not report an error when cding into a nonexistent directory. If such an inconsistency occur, please stick to the writeup!)

Running pish

First, run make to compile everything. You should see the pish executable in your assignment directory.

To run pish in interactive mode (accepting keyboard input), type

$ ./pish

Or, to run a script (e.g., script.sh), type

$ ./pish script.sh

The same applies for the reference implementation ref-pish.

struct pish_arg

In pish.h, you will find the definition of struct pish_arg:

#define MAX_ARGC 64

struct pish_arg {
  int argc;
  char *argv[MAX_ARGC];
};

In our starter code, all functions handling a command (after it has been parsed) will be given a struct pish_arg argument.

Hints

This project once more requires you to think carefully about incremental development. There are many things to, how should you go about everything? In what order? Here’s one possibile plan:

  • Start from simple commands without any arguments, e.g., "ls".
  • After that, start implementing running programs with fork and exec.
  • Implementing input parsing using strtok(). Think about how to break down the line and put it into struct pish_arg.
  • Once command parsing is working, go on to implement the two simple built-in commands: exit and cd. Make sure to take care of error handling.
  • Next, make sure you can parse commands with arguments, e.g., "ls -a".
  • Next, make sure you can handle arbitrary whitespaces.
  • Make sure reading from a script file works just as well as from stdin.
  • Once that’s working, you can finish implementing the history command.
  • The list above is just a suggestion. You are of course encouraged to come up with your own implementation plan. But the most important thing is that you should have a plan!

Start early, start often!

Valgrind

You have a lot of freedom for how to implement this PA. You do not necessarily need to use malloc, but if you use dynamic memory allocation at any point, make sure you free everything properly. We will include a valgrind test when grading your assignment. So make sure there are no memory issues! (Memory leaks, invalid read/writes, using uninitialized variables, etc.)

Submission

  • Submit your code to Gradescope
  • We will run your program with make and ./pish in both interactive and batch mode
  • There are no design questions
  • CREDITS.md is still required