doodle

UCSD CSE29 FA24 Syllabus and Logistics

Basics - Staff & Resources - Schedule - Course Components - Grading - Policies

CSE 29 introduces you to the broad field of systems programming, including 1) the basics of how programs execute on a computer, 2) programming in C with direct access to memory and system calls, 3) software tools to manage and interact with code and programs. All very cool stuff that makes every programmer better!

Basics

Staff Resources

Office Hours Calendar

Schedule

The schedule below outlines topics, due dates, and links to assignments. We'll typically update the material for the upcoming week before Monday's lecture so you can see what's coming.

Week 8 – Allocators and Virtual Memory

Week 7 – Implementing an Allocator

Week 6 – URLs, Servers, and a Bit of Everything

Week 5 – Managing (Heap) Memory

Week 4 – Representations and Memory

Week 3 – Where (Some) Things are in Memory

Week 2 – Number Representations, Sizes, and Signs

Week 1 – Strings, Memory and Bitwise Representations (in C)

Week 0 – Welcome!

  • Announcements

    • Lab attendance is required and a lot happens there, make sure to go to lab.
    • Submit the welcome survey before lab on Tuesday of week 1.
    • Assignments, quizzes, and other things with deadlines will start in week 1.
    • No discussion on Friday of week 0 (discussion starts in week 1).
  • Lecture Materials

Syllabus

There are several components to the course:

  • Lab sessions
  • Lecture and discussion sessions
  • Weekly quizzes
  • Assignments
  • Exams

Labs

The course's lab component meets for 2 hours. In each lab you'll switch between working on your own, working in pairs, and participating in group discussions about your approach, lessons learned, programming problems, and so on.

The lab sessions and groups will be led by TAs and tutors, who will note your participation in these discussions for credit. Note that you must participate, not merely attend, for credit.

If you miss lab, you'll still be held accountable for understanding the relevant material via Exams and Assignments. You can miss 2 labs without it impacting your grade (see Grading below). There is no way to make up a lab, even for illness, travel, or emergencies. Our preference would be to require all 10 labs for an A, and have some kind of excused absences. However, tracking excused absences doesn't really scale, so the “two for any reason” policy is how we handle it. You don't need to justify your missed labs. Contact the instructor if you'll miss more than 2 labs for unavoidable reasons.

Lecture and Discussion Sessions

Lecture sessions are on Monday, Wednesday, and Friday, and discussion sections are Wednesday and Friday. We recommend attending every lecture and one of the two discussion sections.

Weekly Quizzes

Each week there will be a (p)review quiz given on Gradescope or PrairieLearn to both review content you've seen and preview upcoming content, due before labs start on Tuesday at 8am.

You can submit these repeatedly with no penalty up to the deadline. The purpose of this quiz is to make sure everyone has checked in on the concepts we will be using in lab. They are open for late submission until the end of the quarter; see Grading below for how late submissions impact grades.

Assignments

The course has 5 assignments that involve programming and writing. Individual assignments will have detailed information about submission components; in general you'll submit some code and some written work to Gradescope. Two files will have special meaning in this class: in CREDITS.txt you'll put information about who or what helped you with the assignment, and in DESIGN.txt you'll put answers to open-ended written design questions.

For each assignment, we will give a 0-3 score along with feedback:

  • 3 for a complete submission with high code and writing quality with few mistakes, and no significant errors
  • 2 for a complete submission with some mistakes or some unclear writing
  • 1 for a submission missing key components, or with clear inaccuracies in multiple components
  • 0 for no submission or a submission unrecognizable as a partial or complete submission

After each assignment is graded, you'll have a chance to resubmit it based on the feedback you received, which will detail what you need to do to increase your score.

  • For an original score of 0 or 1, you can raise your score to 2 (but not to 3)
  • For an original score of 2, you can raise your score to 3

This is also the only late policy for assignments. Unsubmitted assignments are initially given a 0, and can get a maximum of 2 points on resubmission.

Exams

This course is participating in a pilot study of a computer-based testing facility on campus (see this paper if you're interested in some background).

Exams will take place in AP&M B349, which is a computer lab. You will schedule your exam at a time that's convenient for you in the given exam week, and you will go to that lab and check in for your exam at the time you picked. The exam will be proctored by staff from the Triton Testing Center (not by the course staff from this course). No study aids or devices are allowed to be used in the testing center. You will need only a photo ID and something to write with (scratch paper is available on request).

The Triton Testing Center has shared a document of rules and tips for using the testing center.

The exams will be administered through PrairieLearn; we will give you practice exams and exercises so you can get used to the format we'll use before you take the first one. The exams will have a mix of questions; they will typically include some that involve programming and interacting with a terminal.

There are three exams during the quarter in weeks 3, 5, and 8. On each you'll get a Full Pass (2 points), Partial Pass (1 point), or Try Again (0 points) as your score.

We don't have a traditionally-scheduled final exam for this course (you can ignore the block provided in Webreg). Instead, in week 10, you'll have the opportunity to retake up to two of the exams from during the quarter to improve your score up to a Full Pass regardless of the score on the first attempt. The retakes may be different than the original exam, but will test the same learning outcomes. This is also the only make-up option for missed exams during the quarter: if you miss an exam for any reason it will be scored as 0, and you can use one of your retake opportunities on that exam.

Grading

Each component of the course has a minimum achievement level to get an A, B, or C in the course. You must reach that achievement level in all of the categories to get an A, B, or C.

  • A achievement:
    • 8 or more lab participation (out of 10 labs)
    • At least 12 total assignment points (e.g. any scores that add up to 12: [3, 3, 2, 2, 2], [3, 3, 3, 2, 1], [3, 3, 3, 3, 0], etc)
    • At least 5 total exam points (Full Pass on any 2 of the exams, Partial Pass on the other)
  • B achievement:
    • 6 or 7 lab participation
    • At least 10 total assignment points (e.g. any scores that add up to 10: [2, 2, 2, 2, 2], [3, 3, 3, 1, 0], etc)
    • At least 4 total exam points (2 Full Pass and one No Pass, 1 Full Pass and 2 Partial Pass)
  • C achievement:
    • 4 or 5 lab participation
    • At least 8 total assignment points
    • At least 3 total exam points

Pluses and minuses will be given around the boundaries of these categories the based on quiz performance and to-be-determined cutoffs. We don't publish an exact number for these in advance, but it's consistent across the class. Some general examples: if you complete all the quizzes completely, correctly, and on time, you'll get a + modifier. If you meet some of the criteria for the next higher letter grade but not all, you may get a + modifier (e.g. B+ for 7 lab participation, 12 assignment points, 5 exam points). If you submit no quizzes on time or don't get them done completely or correctly, you will get a - modifier.

Policies

Individual assignments describe policies specific to the assignment. Some general policies for the course are here.

Assignments and Academic Integrity

You can use code that we provide or that your group develops in lab as part of your assignment. If you use code that you developed with other students (whether in lab or outside it), got from Piazza, or got from the internet, say which students you worked with and a sentence or two about what you did together in CREDITS.txt. All of the writing in assignments (e.g. in DESIGN.txt) must be your own.

You can use an AI assistant like ChatGPT or Copilot to help you author assignments in this class. If you do, you are required to include in CREDITS.txt:

  • The prompts you gave to the AI chat, or the context in which you used Copilot autocomplete
  • What its output was and how you changed the output after it was produced (if at all)

This helps us all learn how these new, powerful, and little-understood tools work (and don't).

If you don't include a CREDITS.txt and it's clear you included code from others or from an AI tool, you may lose credit or get a 0 on the assignment, and repeated or severe violations can be escalated to reports of academic integrity violations.

Exams and Academic Integrity

Examples for exams will be posted in the week before they happen. You're free to collaborate with others on preparing for the exam, trying things out beforehand, and so on.

You cannot share details of your exam with others until after you receive your grade for it. You cannot communicate with anyone during the exam.

Quizzes and Academic Integrity

You can work on weekly quizzes with other students.

FAQ/AFQ (Anticipated Frequent Questions)

Can I attend a lab section other than the one I'm enrolled in?

No, please do not try to do this. The lab sections have limited seating and are full. We cannot accommodate switching.

How can I switch sections?

You have to drop and re-add (which may involve getting [back on] the waitlist). Sorry.

Can I leave lab early if I'm done or have a conflict?

The labs are designed to not be things you can “finish”. Labs have plenty of extension and exploration activities at the end for you to try out, discuss, and help one another with. Co-located time with other folks learning the same things is precious and what courses are for. Also, if you need an extrinsic motivation, you won't get credit for participation if you don't stay, and participate, the whole time. We can often accommodate one-off exceptions – if you have a particular day where you need to leave early, it's a good idea to be extra-engaged in your participation so your lab leader can give you participation credit before you leave.

Do I have to come to lab?

Yes, see grading above.

What should I do if I'm on the waitlist?

Attend and complete all the work required while waitlisted (this is consistent with CSE policy).

I missed lab, what should I do?

You cannot makeup missed lab credit (but have a few “allowed” misses). Make sure you understand the material from lab because it will be used on exams and assignments; try to do the parts that don't involve discussion on your own, and review your group's lab notes.

I missed a quiz deadline, what should I do?

You can submit it late until the end of the quarter. Generally we allow lots (think like 1/3 to 1/2) of the quizzes to be late without it impacting your grade, but do take them seriously before lab so you're prepared.

I missed an assignment deadline, what should I do?

Some time after each assignment deadline (usually around 2 weeks) there is a late/resubmission deadline. You can resubmit then. See the assignment section above for grading details about resubmissions.

I missed a assignment resubmission deadline, what should I do?

You cannot get an extension on assignment resubmissions; we cannot support multiple late deadlines and still grade all the coursework on time.

I missed my exam time, what should I do?

Stay tuned for announcements about scheduling make-ups in week 10.

Where is the financial aid survey?

We do this for you; as long as you submit a quiz or do a lab participation in the first two weeks, we will mark you as commencing academic activity.

When are the midterms scheduled?

The midterms will be flexibly scheduled during the quarter using a testing center. More details will come; you will need to set aside some outside-of-class time to do them, but there is not a specific class-wide time you have to put on your calendar.

I have a conflict with the final exam time, what can I do?

The final exam will also be flexibly scheduled during week 10 using the testing center.

Week 1 – Command Line and Running C Programs

Lab Logistics

In this class, you are assigned to lab sections, in which you will be in the CSE basement lab rooms to work hands-on on activities related to the tools and techniques that you’ll be using on programming assignments.

Labs are graded by participation, based on your attendance and engagement with the lab activity. Throughout the lab, you will fill in a shared Google Doc with notes and results from what you try and learn. This will be shared with the group of students around you so you have a collective record of your activities (and so we can comment on your progress!)

Your TA/tutor will share the link to the Google Docs with you at the beginning of the lab. When the lab prompts you to write something in the Google doc, be sure to put your name next to your contributions so that we can confirm your participation!

Meet your Group!

We will split into groups of 6-8 students for discussion. For week 1, you may sit wherever you want and choose who you want to work with. Starting week 2, we will have assigned seating and groups. These groups will be somewhat stable throughout the quarter, though some small changes will likely happen. You will have a tutor or TA assigned to your group for help and discussion.

Your discussion leader (the tutor/TA in your lab) will share a Google Doc with your group where you can fill in notes as you work; this document is only for your group. Your discussion leader will not take notes for you.

Write down in notes: In your groups, share, and note in the running notes document (discussion leaders, you answer these as well!):

  • How you'd like people to refer to you (pronounce your name/nickname, pronouns like he/her/they, etc)
  • Your major
  • One of:
    • A UCSD student organization you're a member of or interested in
    • Your favorite place you've found on campus so far
    • A useful campus shortcut or trick you know
  • Your answer to the following question. Get to know your fellow group members!

If you could wake up tomorrow and be any (non-human) animal, what animal would you choose to be and why? Would you want to be a pet or be free? Have you had any pets growing up?

Github

Much of our work this quarter will happen through Github, which is a platform for sharing code and collaborating on projects. You should make sure you have a Github account – you're free to use an existing one if you already have one. We will make use of the resources available in the Github Student Developer Pack. This may require signing up with a @ucsd.edu email address, or adding it to your account, in order to verify your student status.

Making an Account

For this lab, you do need to make a Github account right away. You do not need the Student developer pack yet, but you should submit the application in lab if possible so that you'll have the needed resources going forward.

Write down in your Google Doc:

For each student, their @ucsd email and their Github username they will use for this class (this is helpful for the TAs to know who is who!), and a link to their Github profile, which looks something like https://github.com/jpolitz/.

Making a Repository

From your Github profile page, click the + in the upper right to make a new repository (you can call it something like lab1). For lab work, your repositories can be public (for assignments we will show you how to make them be private).

Definition: A repository is a place to store code and other files. It has some similarities to a shared folder in Google Drive: it stores a collection of files and folders, and there are ways to share it with others. The main special thing about repositories is that they store a detailed history of changes to files, which turns out to be really important for programming projects.

There are many ways to interact with a repository; we'll see a few today and many more throughout the quarter. After you create the repository, you'll see a page like this:

image

For a first step, you'll make a README file directly on github.com. Click the README link on your empty repository page and you'll be taken to an editor where you can write some text. You can write anything you like for this first section; when you're done, click the Commit changes... button.

Write down in your Google Doc:

For each student, a screenshot of their repository after adding the README file. If anyone ran into errors or issues, describe them!

Github Codespaces

Github has a feature called Codespaces that provide an online environment for the full development process – writing code, running programs, and managing a repository. We'll use this feature throughout the course to have a standardized programming environment for everyone; Codespaces is also similar to many production environments in tech companies and research labs.

For this lab, we'll create a Codespace to do our work. Click the Code button on your repository page, and then click the Create codespace on main button:

image

This will open a new Codespace (sometimes it takes a minute to start up), which will look like this:

image

(A cute name will be randomly generated for each new Codespace you make. Check out the URL to see what name you got!)

The README file you made can be seen in the file navigation on the left. You can click on it to open it in a new editor tab (it may have opened by default when the Codespace started, as well). You can type into it to make edits. You may need to double click on the file in the file navigation to open it in the editor.

Do now: Type some edits into the README file. Then, open a new tab and go find the repository again from your Github page. Do you see the edits to the file there?

Write down in your Google Doc: For each student, a screenshot of the edited text in your Codespace, and a screenshot of the repository on Github showing the contents of the README. (NOTE: there will be a difference in these two screenshots!)

To propagate the changes from our Codespace to the repository, we need to commit and push them. We'll talk in detail about these in future labs and lectures; for now we'll just show you how in the Codespace interface and defer explanations to later.

On the left sidebar there is a “Source Control” icon. Click on this, you'll see a M next to your README file listing, which means “modified”. Click on the + next to the M to stage the changes. Then, write a short message in the text box (something simple like “Edited README” suffices). Finally, click the checkmark icon to commit the changes.

Then, click the three dots next to the checkmark and choose “Push”. This will send the changes to the repository on Github.

Write down in your Google Doc: For each student, a screenshot of the repository on Github showing the contents of the README including the edits.

Github Codespaces Summary

In this intro, you've:

  • Set up your Github account
  • Created a repository on Github
  • Created a file through the Github editing interface
  • Created a Codespace
  • Edited the file in the Codespace
  • Committed and pushed the changes to your repository

You'll use all of this setup many times throughout the quarter, and likely the same or similar steps thousands of times throughout your programming career! 🚀

Terminal and ieng6 login

A working systems programmer spends a fair amount of time at the terminal – a text-based interface to a computer. In this course we'll make heavy use of the terminal, both for running programs and for interacting with the system.

In your Codespace, you can open a Terminal by clicking the Terminal menu and choosing New Terminal. This will open a terminal window at the bottom of the screen. A terminal may already be open at the bottom of the screen from when you created the codespace, if not you can also press Ctrl+Shift+` to open a new terminal (hold down the Control key and press backtick, the key to the left of 1 on the keyboard).

Try running the following commands in your Codespace terminal. To run a command, type it in and press enter.

pwd

ls

ls .

cat README.md

cat README

cat does-not-exist

mkdir myfirstfolder

cd myfirstfolder

ls

ls .

pwd

touch file1.txt

touch file2.txt

ls

rm file2.txt

ls

ls ..

cd ..

pwd

ls

ls myfirstfolder

Discuss in your group and write down in notes: For each command you just ran, include both a copy-paste of output you see at the terminal and any changes you see in the file navigation as observations of the command's behavior. Then, for the commands ls, cd, pwd, touch, cat, do your best to give a general description of the command. If you're not sure, that's okay! We'll talk more about all of these going forward, and after you are done trying this out, a staff member will add some definitions we have for them to your notes doc to make sure we have agreed-upon summaries.

Downloading a File From the Command Line

Run the following command:

curl https://raw.githubusercontent.com/ucsd-cse29/fa24/refs/heads/main/src/lec/week1/hello.c

You should see the contents of an example from class. Then run this command:

curl -o hello.c https://raw.githubusercontent.com/ucsd-cse29/fa24/refs/heads/main/src/lec/week1/hello.c

You should see the file hello.c from class downloaded and saved into your Codespace. You can observe it in the file navigation, and with ls and cat (try all of them!).

Try opening the link from above in your browser.

curl takes a URL (a link) and downloads the contents. The optional -o somefile command-line option or command-line flag specifies an output file to save the contents to.

Definition: A command-line option or command-line flag is a way to give extra information to a program. Options are usually specified with a dash (-) followed by a letter or word. For example, -o is an option for curl that specifies an output file. The full list of options and flags is usually available by doing a search for the program online, or by using the command man (short for “manual”, nothing to do with dudes). Try running man curl at the terminal (the result may “take over” your terminal, you can use the up and down arrows to scroll and q to get back to the command prompt).

Logging into ieng6

As a student, you are assigned an account on the ieng6 server hosted by UCSD. These are similar to accounts you might get on other systems at other institutions (or a future job). We’ll see how to use your machine to connect to a remote computer over the Internet to do work there.

Your account name is the same account name as the one that’s used for your school Google account, i.e. it is the string that precedes “@ucsd.edu” in your school email address. In case you need to check the status of your student account, refer to the UCSD Student Account Lookup page.

Next run this command (with yourusername replaced by your actual username):

ssh yourusername@ieng6.ucsd.edu

Since this is likely the first time you’ve connected to this server, you will probably get a message like this:

$ ssh yourusername@ieng6.ucsd.edu
The authenticity of host 'ieng6.ucsd.edu (128.54.70.227)' can't be established.
RSA key fingerprint is SHA256:ksruYwhnYH+sySHnHAtLUHngrPEyZTDl/1x99wUQcec.
Are you sure you want to continue connecting (yes/no/[fingerprint])?

Copy and paste the one of the corresponding listed public key fingerprints and press enter.

  • If you see the phrase ED25519 key fingerprint answer with: SHA256:8vAtB6KpnYXm5dYczS0M9sotRVhvD55GYz8EjN1DYgs
  • If you see the phrase ECDSA key fingerprint answer with: SHA256:/bQ70BSkHU8AEUqommBUhdAg0M4GaFIHLKq0YQyKvmw
  • If you see the phrase RSA key fingerprint answer with: SHA256:npmS8Gk0l+zAXD0nNGUxr7hLeYPn7zzhYWVKxlfNaeQ

(Getting the fingerprint from a trusted source is the best thing to do here. You can also just type “yes”, as it's pretty unlikely anything nefarious is going on. If you get this message when you're connecting to a server you connect to often, it could mean someone is trying to listen in on or control the connection. This answer is a decent description of what's going on and how you might calibrate your own risk assessment: Ben Voigt's answer.)

After this, you get a prompt to enter your password. This is the same password you use to log into your student account on other websites, like Canvas and Tritonlink. The terminal itself does not show what you type when you enter your password. This is conventionally done for your own security, so that others looking at your screen don’t see it. Just trust that it gets inputted when you type.

Now your terminal is connected to a computer in the CSE basement, and any commands you run will run on that computer! We call your computer the client and the computer in the basement the server based on how you are connected.

The whole interaction will look something like this:

# On your client
$ ssh yourusername@ieng6.ucsd.edu
The authenticity of host 'ieng6-202.ucsd.edu (128.54.70.227)' can't be established.
RSA key fingerprint is SHA256:ksruYwhnYH+sySHnHAtLUHngrPEyZTDl/1x99wUQcec.
Are you sure you want to continue connecting (yes/no/[fingerprint])? 
Password: 
# Now on remote server
Last login: Tue Oct 1 14:03:05 2024 from 107-217-10-235.lightspeed.sndgca.sbcglobal.net
quota: No filesystem specified.
Hello user, you are currently logged into ieng6-203.ucsd.edu

You are using 0% CPU on this system

Cluster Status 
Hostname     Time    #Users  Load  Averages  
ieng6-201   23:25:01   0  0.08,  0.17,  0.11
ieng6-202   23:25:01   1  0.09,  0.15,  0.11
ieng6-203   23:25:01   1  0.08,  0.15,  0.11

To begin work for one of your courses [ cs29fa24 ], type its name
at the command prompt.  (For example, "cs29fa24", without the quotes).

To see all available software packages, type "prep -l" at the command prompt,
or "prep -h" for more options

Then, execute the following commands:

export MODULEPATH=/home/linux/ieng6/cs29fa24/public/modulefiles:/public/modulefiles
module load cs29fa24

You should get the following output:

Tue Oct 01, 2024 11:28pm - Prepping cs29fa24

Now your terminal is connected to a computer in the CSE basement, and any commands you run will run on that computer! We call your computer the client and the computer in the basement the server based on how you are connected.

If, in this process, you run into errors and can't figure out how to proceed, ask! When you ask, take a screenshot of your problem and add it to your group's running notes document, then describe what the fix was. If you don't know how to take a screenshot, ask!

Running Commands, Remotely

Run all the commands from the earlier section again, while logged into ieng6 in the terminal. Include the curl commands as well.

Discuss and write down in notes: For each command, compare its output to what you got in the Codespace. Discuss any differences you see. Which commands work identically? Which seem to have different behavior? Why might that be? What does it mean for a file to be “in” your Codespace vs. “in” your ieng6 account?

Running C Programs

Sometimes we will work in the Codespace, and other times it will be important to do work on ieng6. This mirrors common practice in industry and research labs, where you may do some development locally, while other times you have to configure, edit, or run programs on a specific machine. In this part of the lab we'll see how to build and run a C program (like the one from class) in both places.

At this point you should have hello.c present in your Codespace and in your ieng6 account.

Compiling and Running in Both Places

You can open more than one terminal in your Codespace, and in fact you can see more than one terminal at a time. There is a little icon in the top right of the terminal that looks like a square with a plus sign in it. Clicking that will open a new terminal. You can also drag the tab of a terminal to the side of the screen to create a new panel, and you can see both terminals at the same time. You may want to try that for this activity! New terminals in your Codespace will always open in the context of your Codespace, not logged into ieng6.

Open two terminals, one logged into ieng6 and one in your Codespace.

In both, run the following commands one at a time:

ls
gcc hello.c -o hello
ls
./hello

Write down in notes: What effect did the gcc command have? What did the ./hello command do?

Some definitions:

  • gcc is a compiler for C programs. It takes .c files (and other things we'll see later) as inputs, and generates executable files (like hello) as outputs.
  • An executable file is specially encoded with instructions for a specific (type of) machine. We won't talk in much detail about it in this course, but the way an executable file runs on a computer is one of the the main topics of CSE30. These files can be run without any other tools. When we run hello by writing ./hello, we are directly telling the operating system to run the instructions in that file.

Making an Edit

Next, in your Codespace editor, make an edit to hello.c to change the string message to one of your choice. Save the file, then rerun the commands above in both terminals.

Write down in notes: What was the output in each terminal? What did you observe about the behavior of the program in each place?

When you change the file in your Codespace, the file on ieng6 does not change. You can use cat to verify this; the edit you made is only to the file stored in your Codespace. This is an important concept! Each computer has its own filesystem that stores files and folders. Each terminal is connected to one computer (and to one filesystem) at a time. Knowing which computer you're running commands on is really important for that reason!

It's worth asking how we can get our edits onto ieng6 to run the modified program there. There is a command, related to ssh, called scp, that does just that.

From your terminal connected to your Codespace, run this command (replacing yourusername with your username as before; you will be prompted for your password again):

scp hello.c YOURUSERNAME@ieng6.ucsd.edu:./

Then, in your terminal connected to ieng6, run the following commands:

cat hello.c
gcc hello.c -o hello
./hello

The command scp takes files and copies them to the computer and folder given by the second argument. The : is separating the username and name of the server from the path or folder to copy it into – ./ just means “your home directory”, which is the same folder you see when you log in. scp is a useful command for moving things around between computers. There are many other ways we could accomplish this, but this is one of the most direct.

Write down in notes: For each student, take a screenshot of successfully running the scp command and then using cat to see the new output on ieng6, and put it in the notes.

Summary

  • A repository is a place to store code; Github is a service for storing repositories
  • A Codespace is a web page that connects to a computer at Github that holds your repository and lets you run commands and edit your code
  • A terminal is a text-based interface to a computer.
  • We run commands in the terminal, which have options and flags and arguments. We can look up commands' behavior online or with man
  • commands are programs that can do all kinds of things, from file management (ls, cat, cd) to connecting to other computers (curl, ssh) to compiling and running programs (gcc, ./hello)
  • We can connect to ieng6, which is a collection of computers at UCSD, using ssh. Many times we need to connect to other computers we will use ssh
  • Each computer has its own filesystem, which is a collection of files and folders. We saw the Codespace filesystem and the ieng6 filesystem
  • We can compile C programs using gcc which creates executable files
  • We can run executable files with ./filename
  • The scp command can be used to move files between computers (using the same login as ssh)

Before you leave - Lab Group Preference Form

Please go ahead and fill out this Google form before you leave, this will help us create the seating chart for next week. Link to Google form

More Practice

These are open-ended exercises for you to try and discuss with your group. These are also great conversation-starters for office hours, deeper understanding with friends outside of class, or conceptual questions on Piazza. Feel free to take your results and write down in notes what you found!

  • Commit and push your changes to hello.c Github. Can you figure out how to download the code you wrote from Github to ieng6 using curl? (Hint: inspect the URLs we shared for downloading class code)
  • Try creating a new .c file from scratch that prints something different. Compile and run it in both your Codespace and on ieng6, using your method of choice to move it to ieng6.
  • Try running the scp command without the ./ at the end of the path. What happens?
  • Use scp to copy a file into a different folder on ieng6 (like the one you made with mkdir). What does the command look like?
  • Cause an error in hello.c (change the name of main, or drop a ", or otherwise cause an issue), then recompile and re-run. What does the gcc command output? What does the ./hello command output? Why?
  • Try running gcc without the -o hello part. What happens? What is the name of the output file? Can you run it?
  • Try running gcc with the -o flag with a different name. What happens? What is the name of the output file? Can you run it?
  • Run ls .. after logging into ieng6. What do you see?

Week 2 – Project Management and Testing

Meet your Group!

Meet your more permanent group for labs!

Write down in notes: In your groups, share, and note in the running notes document (discussion leaders, you answer these as well!):

  • How you'd like people to refer to you (pronounce your name/nickname, pronouns like he/her/they, etc)

  • Your github username

  • Your major

  • One of these that you didn't share in week 1:

    • A UCSD student organization you're a member of or interested in
    • Your favorite place you've found on campus so far
    • A useful campus shortcut or trick you know
  • Your answer to the following question. Get to know your fellow group members!

    • Pick a song as the soundtrack of your life! Why you did you choose this song? What’s your favorite genre? Who’s your favorite artist?

Testing

The main activity is around writing good tests, with some other activities around managing your code repository.

Some Tests are Always Better Than None

We ask you for a few categories of tests in PA1, which make sure some basic (and not-so-basic) functionality is working. Each of these has a few ways to write it, and you should definitely write more tests than these.

Write down in notes: To get started, everyone in your group should put a test into the shared doc (there should be an example of what one should look like in the doc already!). A test for this assignment includes both the input file and the .expect file. You shouldn't pick the simplest test you wrote; pick one you think is particularly interesting. If you haven't written any tests yet, write an interesting test!

Then, add all the tests from your group to your PA's tests and try them on your implementation, and show what happens for each.

If you haven't started yet, this is a great time to get your repository set up, get those tests in, and make sure you know how to run the tests. The program that reads from input from the first quiz, or the programs from discussion, are great starting points.

(NOTE: Google docs will automatically turn typed ASCII quotes (",') into fancy quotes (“,”), so be careful when copying quotes out of a google doc into code / test files. You can turn this off in Tools>Preferences>Use Smart Quotes)

Write down in notes: When your group runs the tests on one another's implementations, what happens? Did your test find a bug in anyone else's implementation? Did someone else's test find a bug in yours? (Does this whole experience just make you extremely motivated to start early next time 😬?)

What Makes Tests Good?

That process, and your own work before today, gave you a few example tests. These aren't thorough enough to test everything that needs to be covered by the program. But what does it mean to “cover all the cases”? In general, there is no theoretical limit to the number of strings we can create, so we can't test all possible inputs (also called exhaustive testing). So we can't say “cover all the cases” means that we write a test for every possible string.

Another way to think of this is in terms of implementations. There are many correct and many incorrect (or incomplete) versions of the UTF analyzer that we can imagine. Some of them are yours! Each test we write would pass or fail on each of these implementations. If it passes on an incorrect implementation, we could say that test doesn't “notice” or “witness” that implementation being wrong. If it fails on an incorrect implementation (while passing on all the correct ones), that's great! It means it is a high-quality test that tells us something about a bug in an that incorrect implementation.

Then, a way of thinking about “covering all the cases” is “how many bad implementations would these tests catch?” Of course, there are infinitely many bad implementations (just like there are infinitely many strings), so in general we can't expect perfection. But most incorrect implementations are close to correct ones (they just have a bug or two), and targeting tests at common mistakes that might be made is a useful way to organize our thinking.

To that end, we've handwritten some bad implementations of the UTF8 analyzer. Your job as a group will be to write a set of tests where some test fails on each of the bad implementations (while still being a correct test).

Your lab leader will share a repository with you that all of you can use for this lab (it will be public and you'll be able to access it later).

Everyone should be able to make their own Codespace from this repository – the Codespace is just for you, the repository is shared with your whole group. In the Codespace, spend a few minutes writing some tests and trying them out against the provided implementations. All of them are mostly right, but have some specific bugs related to their names. We made one very obviously wrong; it simply always returns 1 for is_ascii, so any test with a non-ASCII character will fail.

$ cat tests/crab.txt
🦀
$ ./bin/always_ascii < tests/crab.txt
Enter a UTF-8 encoded string:
Valid ASCII: true
... other output

If you run with the testing script, you'll see the failure:

$ ./test_script bin/always_ascii 
Test ./tests/crab.txt failed.
Expected line not found in output:
Valid ASCII: false

Test ./tests/invalid_and_uppercased_ascii_test.txt failed.
Expected line not found in output:
Valid ASCII: false

Test ./tests/is_valid_ascii_test.txt passed.

Your task (as a group) is to write tests to make all of these bad implementations fail using correct tests (e.g. tests that ought to pass on any correct implementation).

Do some work in your individual Codespaces, then commit and sync your tests. Make sure to pick unique names for your test files (consider prefixing with your username) so that the filenames don't conflict.

Can you trigger the bugs in all the bad implementations?

Managing Your Repository

The rest of the lab will have you work on your own repository for PA1 and make progress on the PA. We have some advice about managing your repository that we want you to try.

We recommend making commits, and pushing, to your repository whenever you get to a useful working checkpoint. That could mean that you got a new test to pass, for example, or got the first version of one of the functions written, and so on. We also recommend using .gitignore to manage files that are created by the compiler (but shouldn't be in version control).

  • Practice creating a small commit in lab – add a test and/or some small amount of code, and make a commit and push. In your notes document: put a screenshot of the commit that you pushed to Github.

    • If you've been working this way all along, you can just share a small commit you made earlier.
    • If you're just getting started, this can be your first commit of the main function.
    • If you're in the middle of working, you can make one commit and push with whatever the current state of your work is, and then another small commit

    Notice that you can click on any commit and see the individual files at that point in time! This can help you get back to an earlier state if you're stuck.

  • The file you create with gcc (probably called utf8analyzer or similar) changes every time you compile, and could be entirely different when compiled on different computers. It also doesn't have a meaningful “diff” between versions, since it's a binary file. For these reasons, built files are usually not included in a repository. The file .gitignore in the root of a repository can list files that should be completely ignored by git – it won't include them in the list of files to add or change.

    • Compile your program and show a screenshot of the status of the binary file that was created (for example, in the source control/Git tab of the codespace)
    • Change your .gitignore to include the path of this file
    • Observe that it no longer shows up as a file to add!

    It's really useful to keep repositories organized and tidy. .gitignore is one way to do that, separating the source code of a project from other “build artifacts” that are temporarily important but shouldn't be tracked and saved in the same way as code.

Working on the PA

Armed with your thorough tests, spend some time working on the PA. You might choose to just include some of the tests you've written in your own code at first as you work on invidual milestones.

Feel free to work together, though keep in mind the academic integrity policies and CREDITS.md. In this (and in all things learning-based) focus on what's best for your learning and understanding, not just what's best for finishing the PA. The conceptual knowledge pays off later in the course, on exams, and in your career, getting the code complete for a PA without understanding it has little effect.

Signing Up for An Exam Slot

Next week (week 3) is the first exam. It will have questions in the same platform you've seen in the PrairieLearn quizzes. You need to sign up for an exam slot during week 3, which you can do HERE

This document is the Triton Testing Center's guide: https://tritontesting.ucsd.edu/for-students/scheduling-a-test.html

You should not use lecture or lab time to take the exam; next week will be a normal week of lecture, lab, quiz, and so on with the added exam at some time outside of class.

(You can do this outside of lab time, too, but we wanted to make a point to remind you during lab!)

If you are done early:

  • Implement a Zalgo text generator (https://zalgo.org). Do your own research to figure out how that works.
  • Implement reverse that takes an argument buffer to store the reversed string. If they do that, then they should do reverse_in_place that uses as little extra memory as possible and reverses in place.
  • Research how to go about implementing uppercase/lowercase that includes accented characters. Can you find a library or tool that would help? How do you think that works in software tools like autocorrect that capitalize the first letter of a sentence?

Week 3 – Terminal Usage and Git

Lab Goals

  1. Set up SSH keys for ieng6 and Github
  2. Learn how to edit files on the terminal using vim
  3. Do all the steps for working on a PA entirely on a remote computer (ieng6)
  4. Learn how to manage repositories using git CLI

Lab Tasks

Part 1 – Let's get a Terminal Setup

  • If you're using Windows, install Git for Windows using all the default/recommended settings as suggested by the installation wizard. Mac users have a pre-installed terminal application.
  • Open the Git Bash terminal application (Windows) or Terminal (Mac) and try some commands from Lab 1
  • Try to ls your Desktop directory. Do those files look familiar?
  • Try to ssh into ieng6 with your username and password (just like you did in Lab 1)
  • Help each other if anyone has issues installing or figuring out the terminal.

Discuss and write in notes:

  • What was the working directory of the terminal when you launched it?
  • What is your home directory on this computer?
  • What files and folders are in the home directory?
  • Where do you think files that download from your web browser go? Can you list them with ls? What's the absolute path to that folder?
  • Do any commands work differently than you expect on this computer?
  • Are you able to use ssh with your username and password from your local terminal to log into ieng6 and enter the course-specific account?

Take a few screenshots of what you found particularly interesting, and discuss how this environment differs from the terminal you used on Codespaces.

Part 2 - Setting up SSH Keys for Easy Access to ieng6

With the setup we've used so far this quarter, each time you log in to your course-specific account, you have to type the password. This can get a bit tedious and luckily there is a cool and interesting way to avoid this while still staying secure using SSH keys.

We've labelled each step with whether it should run on [Y] our computer or [i] eng6. Make sure you follow the instructions carefully!

  1. [Y] In your local terminal, run ssh-keygen. This command will generate a pair of SSH keys for you, one public and one private.

  2. [Y] Keep pressing <Enter> until the program shows some text it calls the "randomart image".

    • Note the path where the public key is saved (underlined below).
    • Image
  3. [Y]/[I] Now, log into your remote course specific account on ieng6 with ssh (using your password as usual) DO NOT run the cs29fa24 command to prepare your course-specific environment!

  4. [I] Run mkdir .ssh in the terminal

  5. [I]/[Y] Log out of your remote account by pressing Ctrl-D or typing exit.

  6. [Y] Now, we want to copy the public SSH key you created on your local machine onto your remote account; specifically inside the .ssh directory you just created, into a file called authorized_keys. Scroll up in your terminal to where you were creating the SSH key, find the line where it says: Your public key has been saved in: <path to your public SSH key>, copy the path. Make sure you get the public key file, ending in .pub, here, not the private file.

  1. [Y] Think about a command that will perform the copying of the public key file from your local machine to the .ssh directory on your remote account with the appropriate name (HINT: you used this command in Week 1's lab). Work with your group members if you need help!
If you're really stuck, click here to see the answer From your local computer, run
scp PATH_TO_YOUR_PUBLIC_SSH_KEY USER@ieng6.ucsd.edu:~/.ssh/authorized_keys

Make sure to replace USER with your UCSD username and PATH... with the appropriate path


  1. [Y]/[I] Try to log onto your remote account again, you shouldn’t be prompted for a password anymore. If you are, ask for help and carefully review the steps above with your group. To review:
    • You should have a directory called .ssh on your computer in your home directory.
    • That folder should have a key file created by ssh-keygen and a corresponding .pub version of the file.
    • You should have a directory called .ssh on ieng6 (that you created with mkdir) in your home directory.
    • The .pub file from your computer should be copied to the .ssh/authorized_keys file on ieng6

Add to your notes: A screenshot of you logging into your ieng6 account without a password prompt.

Part 3 - Working in Terminal

So far, we've been primarily using our terminal to compile our C code (with gcc) and run our programs, but we've just scratched the service of what the terminal can do. The terminal is the ultimate gateway into communicating with our computer, and today we're going to dive more into the different ways we can use the terminal to make our lives easier.

3.1 - VIM

vim is a text editor that runs entirely in the terminal. For better or worse, many times developers find themselves with only terminal access to a remote computer, and need to do some programming – that is, editing source code files. A terminal-based editor makes this possible. (As a side effect you can look like hacker from a movie controlling everything without needing a mouse.)

There are a lot of online resources available for vim, but thankfully, the program itself comes with a interactive tutorial (that's pretty neat!). We can access this tutorial with the vimtutor command from our terminal

Task: Open the vimtutor tutorial in your terminal and complete Chapters 1 & 2. Image

Once you have completed the first 2 chapters of vimtutor, you will now be using some of the commands that you learned to correct a bug in a C program that we have written for you.

Task:

  1. [Y]/[I] If you aren't logged into ieng6, log in now. (password-free hopefully!)
  2. [I] Download our buggy C program using curl. The command would look like

curl https://raw.githubusercontent.com/ucsd-cse29/lab3-starter/refs/heads/main/average.c -o average.c

  1. [I] Compile and run the average.c file.
  2. [I] Open the average.c file in vim and read through the program.
  3. [I] Determine the bug in the program and correct it in vim. (Try doing it without introducing a new variable!)

Write in your notes: The keys/commands you are pressing/using while navigating vim

  1. [I] Re-compile and re-run the program to ensure that it now outputs the correct value.

Write in your notes:

  • The original output of the program.
  • What do you think the program is supposed to do/output?
  • What was the bug and how did you fix it?
  • The output of the program after the bug has been fixed.

When you are done, discuss with a partner discuss what was comfortable and what was tricky about correcting the file. Compare the commands you used with other members in your group and note the differences.

3.2 - VIM Telephone

Next, we will see how different members of your group approached fixing the bug in the provided C code. You will be running your group members' vim keystrokes to see the differences in how each of you navigated the vim editor.

Task:

  1. [I] Redownload our C file using a similar curl to the previous section. curl https://raw.githubusercontent.com/ucsd-cse29/lab3-starter/refs/heads/main/average.c -o average2.c.
  2. [I] Now, run the commands that another member of your group used to correct the average.c file. You should find the commands in your shared Google Doc.
  3. Write in your notes: What were some of the key differences you noticed between your keystrokes and your group members?

Part 4 - Github with ieng6

With the bug fixed, we now want to push our changes to our Github repository. Before we can do that though, we need to be able to perform actions against our Github repository from ieng6. This is where SSH keys come in handy once again!

4.1 - Setting up SSH Keys for Github

  1. [Y]/[I] Login to ieng6 as usual (hopefully, without typing a password now!)
  2. [I] Run the command ssh-keygen, and again press <Enter> until the command completes and shows the "randomart image". Just like before, this will put a key file and a .pub version of it into the .ssh directory – this time on ieng6!

Next, we want to add the public key to your Github account. This is like the step of copying the public key to authorized_keys on ieng6, but instead we're copying to Github.

  1. [I] Display the SSH public key generated above using cat <path of your ssh key .pub file> and copy it to your clipboard; you can copy it by highlighting and right-clicking

  2. Open your Github account on the browser.

  3. In the upper right corner, click on your profile photo, then click Settings. Image

  4. In the “Access” section of the sidebar, click SSH and GPG keys.

  5. Click New SSH key or Add SSH key under the “SSH keys” section. Image

  6. Add a “Title” to your key (ex: Aaron's ieng6 machine).

  7. Select the “Key Type” to be an Authentication Key

  8. Copy your public key from the output of the cat command and paste it into the “Key” field Image

  9. Click Add SSH key.

  10. If prompted, confirm access to your account on Github.

Go back to the ieng6 terminal and:

  1. [I] Run the following command to add github.com as a recognized host (this avoids the scary yes/no prompt about accepting new connections the first time you connect)
    • ssh-keyscan -t rsa github.com >> ~/.ssh/known_hosts
    • >> means "append stdout of the command to file"
  2. [I] Check your connection by running the following command:
    • ssh -T git@github.com
    • It will say something like "Hi supercoolstudent1234! You've successfully authenticated, but GitHub does not provide shell access."

Now we have an SSH key which can be used to authenticate to GitHub! In addition to using https clone URLs, we can now use SSH clone URLs that look like this:

Image

Crucially, these will allow both cloning and pushing to the repository (as long as your account has access) from ieng6!

4.2 - git CLI commands

So far you have been using the "Source Control" tab in your Github codespaces to commit and push your changes. Since we only have our terminal today, we will be learning how to use the git CLI to do the same thing.

  • Forking a repository

A fork of a repository is a personal copy of the repository that you can make changes to without affecting the original repository. To fork a repository, click the "Fork" button in the top right corner of the repository page.

Task: Create a fork of the Lab 3 starter repository here. Image

  • Cloning with git clone

To retrieve a local copy of our git repository, we use the git clone command. clone takes a link from github (usually beginning with https://github.com or git@github.com).

Task: Use the SSH clone URL to clone your forked repository to your ieng6 account into another new directory. Fix the bug that you fixed earlier in the average.c file again and then proceed!

Image

After we are done making changes to our local branch it's now time to push our changes to our remote branch (in this case github)


  • Getting the status with git status

Let's first run git status, to see the status of our repository. status returns which files are untracked (new) modified (changed) and deleted.

Task: After correcting the buggy C code, add the output of running the git status command in your terminal to your notes.

  • Staging file with git add

When we are done making changes to a file, we "stage" it to mark it as ready to be commited. Using the git add command with the path of the changed file(s) will stage each to be included in the next commit. Using git add . will stage all changed and/or new files in the current directory.

Task: Use git add to stage our corrected C file. Compare the output of git status to the output written in your notes.

  • Committing with git commit

A commit is a package of associated changes. Running the git commit command will take all of our staged files, and package them into a single commit. With the -m flag, we can specify a message detailing the changes of the commit. Without -m, git opens a vim window to write the commit message.

Task: Use git commit to commit our staged changes. Use vim to write your commit message.

  • Pushing to the remote repository with git push

With our commit now made, we can use the git push command to upload our changes to our remote branch (github). If this is the first time you are using git push, git may ask you to set the name and email you want associated with your commits.

  • Viewing the log with git log

Git also keeps a log of all the changes. The git log command prints this log into our terminal. From this log we can see the date, author, and message associated with each commit. The HEAD -> marker points to the latest commit locally, while the origin/ marker points to the latest commit on the remote. After running git push, both of these markers should be on the same line.

Task: After pushing your changes, add the output of git log to your notes.

Week 4 – Compiling and “Building”

Welcome to Week 4!

Take a moment to ensure you have your group's Google Doc pulled up and answer the following question as a group while also filling it out in the Google Doc!

Q: What is your favorite place that you’ve been to? If it’s a travel destination, who did you go there with, and when?

Lab Goals

  1. Set up SSH keys for connecting ieng6 to Github From Week 3 Lab
  2. Practice pushing PA2 code to Github from the command line
  3. Tools for solving Memory errors in C programs
  4. Testing with assert + Makefiles

Pushing PA2 Code to Github from ieng6

Make sure your ieng6 account has Github access.

  1. Run the corresponding prep command for your class (cs29fa24 for Joe's, cs29fa24b for Aaron's) after you are logged in. If you have already started working on your PA2 and had cloned the repository without running this command, you will need to reclone after running this command and continue working. This command sets up your working directory for CSE29 which is where you want to be working!

  2. Get your PA2 code onto ieng6; if you've already done a step, skip it

    • Follow the Github Classroom Link to create your PA2 repository. See the PA2 Assignment for the actual functions to write and instructions for this PA.

    • Log into ieng6 and clone the repository you just made using the Github Classroom Link with git clone [your-repository-link]. Remember to use the SSH clone link!

      Image

    • IMPORTANT NOTE: Make sure to clone the SSH URL and not the HTTPS URL! The Github link should start with git@github.com

    • Change into the directory with your PA2 repository. In the example above, that would be cd pa2-hashing-and-passwords-<your-github-username>

  3. Check status by running git status. This will tell you (from the command-line) what's going in the copy of your code you have checked out in the current working directory. If you haven't done any work yet, you'll see something like:

    On branch main
    Your branch is up to date with 'origin/main'.
    
    nothing to commit, working tree clean
    

    You can also use ls to see what's going on. In a brand-new repository, that will show these files:

    $ ls
    CREDITS.md  DESIGN.md  README.md
    

    This means you haven't edited or created any files since you cloned the code. If you've done some work, you might see all kinds of things here, most likely pwcrack.c, which you may have created.

    IMPORTANT NOTE: If you have already done some work and your file name is not pwcrack.c, use the command mv <your_file_name>.c pwcrack.c to rename it (mv is also used to move files from one place to another in a filesystem). This will be important for future sections of the lab!

  4. Create or edit pwcrack.c. If you already created pwcrack.c and the git status command shows that it is new or modified, there's nothing to do here. If you created it, but it has no modifications, open it and make a small edit (even just adding a comment like /* edit in lab */) If you haven't created it yet, you can create it using vim:

    $ vim pwcrack.c
    

    Then into the file type:

    #include <stdio.h>
    
    int main() {
        printf("did not find a matching password\n");
    }
    

    Then save and quit with :wq. Now git status should show it as a new untracked file:

    $ git status
    On branch main
    Your branch is up to date with 'origin/main'.
    
    Untracked files:
    (use "git add <file>..." to include in what will be committed)
            pwcrack.c
    
    nothing added to commit but untracked files present (use "git add" to track)
    
  5. Add the change you made to “stage” it with git add pwcrack.c. After you do this, git status will show it as ready to commit.

    $ git add pwcrack.c
    $ git status
    On branch main
    Your branch is up to date with 'origin/main'.
    
    Changes to be committed:
    (use "git restore --staged <file>..." to unstage)
            new file:   pwcrack.c
    
  6. Commit the file by using git commit -m "your message here", where your message might just be about adding or editing this file.

    $ git commit -m "created first version of pwcrack.c"
    [main 91d2f60] created first version of pwcrack.c
    1 file changed, 5 insertions(+)
    create mode 100644 pwcrack.c
    $ git status
    On branch main
    Your branch is ahead of 'origin/main' by 1 commit.
    (use "git push" to publish your local commits)
    
    nothing to commit, working tree clean
    
  7. Push your work to Github by using git push origin main. Then you should be able to visit your repository on Github and see the change!

    $ git push origin main
    Enumerating objects: 4, done.
    Counting objects: 100% (4/4), done.
    Delta compression using up to 8 threads
    Compressing objects: 100% (3/3), done.
    Writing objects: 100% (3/3), 383 bytes | 191.00 KiB/s, done.
    Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (1/1), completed with 1 local object.
    To github.com:ucsd-cse29-fa24/pa2-hashing-and-passwords-jpolitz.git
       0787d50..91d2f60  main -> main
    $ git status
    On branch main
    Your branch is up to date with 'origin/main'.
    
    nothing to commit, working tree clean
    

Write in your notes: A screenshot of the commit(s) you pushed to Github from your terminal and also a screenshot of the commit(s) on the Github website.

Check with the person sitting next to you to see if they got a similar result. Did you run into any errors? Try understanding and fixing them: maybe you made a typo in a command, or maybe something isn't set up correctly about your connection to Github. Make sure you can create and edit files, add and commit the changes, and push them to Github before moving on.

New Debugging Tools

For this section, you'll be working with the following program:

#include <stdio.h>

// edits string, then returns how many changed
int X_to_Y(char str[]) {
    int num_xes = 0;
    for (int i=0; str[i] != '\0'; i++) {
        if (str[i] = 'X') {
            str[i] = 'Y';
            num_xes += 1;
        }
    }
}

int main(int argc, char *argv[]) {
    for (int i=1; i<=argc; i++) {
        int num_xes = X_to_Y(argv[i]);
        printf("\"%s\": %d 'X'es changed\n", argv[i], num_xes);
    }
}
  1. Read through this program without running it.
  2. In your notes, write down what you think it would output if ran as ./buggy helloX XOXO
  3. Discuss your thoughts with your group, and see if you notice any bugs in it. In your notes: Write down any bugs you think you've fonud.

Getting set up on ieng6

  1. On ieng6, use the following command to curl the above program into a file called buggy.c:
curl -o buggy.c https://raw.githubusercontent.com/ucsd-cse29/fa24/refs/heads/main/src/week4/buggy.c
  1. Then run the following command to compile the program:
gcc -std=c11 buggy.c -o buggy 

Notice the new command-line flag introduced here! We use the -std=c11 flag to tell gcc what version of C we want to compile with. This is necessary since the default version on ieng6 would fail to compile this program due to us declaring i within the for loop.

  1. Now, try running the program with some string arguments:
./buggy helloX XOXO

Oh no! You should see it outputting far too many Y-s and zeroes, and then crashing (Segmentation Fault). Let's see if we can use -Wall to get to the bottom of this.

The -Wall

Many common errors can be caught by the compiler, but a lot of these checks aren't enabled by default. We can ask the compiler to warn us about them with the -Wall flag ("- W(arn) all")

  1. Run the following command to recompile the program with warnings enabled:
gcc -Wall -std=c11 buggy.c -o buggy
  1. This time, gcc should have given you two warnings, read through them. Would they have helped you catch those two bugs?
  2. In your notes: Write down the warnings you got ([-Wxxxxxx])

In general, It's almost always a good idea to have -Wall enabled by default because it will helpfully warn you about many common C errors.

NOTE: One unfortunate side effect of -Wall is that if you ever declare a variable but don't use it, the compiler will warn you about it. This can be helpful to find bugs sometimes, but often ends up being more annoying than helpful. You can disable this by with the -Wno-unused-variable flag:

gcc -Wall -Wno-unused-variable -std=c11 buggy.c -o buggy

Segfaults

Now, we've found two bugs, but why is the program still crashing? If we provide a single argument (argc = 2), ourfor loop iterates until i <= argc, which means it will try to access argv[argc], which is out of bounds.

Unfortunately, when we try to read this as a memory address (char*) , it ends up being an invalid memory address (it happens to be 0, though other values may give similar errors), so the program ended with a segmentation fault, which is a fancy term for “the program tried to access memory it shouldn't and was stopped by the operating system”.

This is pretty different from an error like ArrayIndexOutOfBounds in a language like Java, which gives information like a stack trace and the value of the incorrect index. Because of C's focus on doing only what is necessary and giving low-level access to memory, this kind of mistake is permitted, for better or worse. Aside from causing numerous security vulnerabilities, these memory errors are difficult to debug, since you can't even tell which line of code caused the crash.

Thankfully, there are a few tools that help debug invalid memory accesses.

AddressSanitizer

Let's use the AddressSanitizer to figure out which line of code we crashed on. AddressSanitizer (or ASan) is a tool, built into gcc and other C compilers these days, that turns invalid memory accesses (the kind that give segfaults) into descriptive errors.

  1. Run the following commands (these are needed if you don't run cs29fa24 on login, to work around an old version of gcc)
export ASAN_OPTIONS=symbolize=1:print_legend=0
export ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer

  1. Compile the program again with the following option to enable the sanitizer:
gcc -std=c11 -g -fsanitize=address buggy.c -o buggy.asan

The -fsanitize=address part does extra work in the compiler to put special checks in to look for memory errors. The -g option we saw in lecture turns on some debugging information like line numbers in stack traces.

  1. After compiling, rerun the program like before. You should see it output a stacktrace that start something like this, which shows the function calls that led to the error:
$ ./buggy.asan helloX
"YYYYYY": 0 'X'es changed
AddressSanitizer:DEADLYSIGNAL
=================================================================
==303858==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55f690eb0320 bp 0x7fff2e028b50 sp 0x7fff2e028b30 T0)
==303858==The signal is caused by a READ memory access.
... (several lines omitted)
  1. Read through the stack trace and identify the functions that were called. In your notes: Write down the last function called, and what line of buggy.c the error was on.

Now, we can see exactly the path the code took before crashing, making our lives much easier! So, why isn't ASan the default? Mainly because it makes programs run slower. It's great for debugging, or for programs where speed isn't an issue, but if you're trying to hash as many passwords as you can in 10 seconds, it's best to not turn it on! For this reason, it's good to have the option to compile two separate executables, which we called buggy and buggy.asan above.

GDB

Now that we've identified where the crash happens, let's use gdb to help us debug it further. gdb (which you've already seen a bit of in lecture / discussion) is a command-line debugger that allows us to step through our program line by line and inspect the values of variables at each step. It can also give us backtraces through our program to show us where the error may have occurred.

For more GDB resources, see the Week 3 Reading on GDB Debugging, as well as the GDB Quick Reference Guide and GDB Tips and Tricks linked under the Week 3 Readings & Resources :-)

  1. After compiling with the -g flag, load our program into gdb
$ gdb buggy.asan
  1. To see our source code while in gdb, run the command:
(gdb) layout src
  1. Before running the program, set a breakpoint in the code before the program reaches its error. Here, the error is on line 6, so type in your gdb terminal:
(gdb) break buggy.c:6
  1. After setting the breakpoint, run the program from inside gdb with an argument (everything after run gets passed as command line args to your program)
(gdb) run helloX

If done correctly, gdb should print out the for(int i=0... line in the program. The breakpoint paused the execution before this line was executed.

  1. Try the following commands

    • info args: tells us what arguments were passed into this function,
    • info locals: tells the values of the local variables at this point of the execution.
    • p s[3]: this lets your print out the values of expressions, not just variables. Try some others!
      • Can also specify the format you want, e.g. p/x var1 prints the value of var1 in hex, p/t prints in binary).
    • bt should print out a stacktrace similar to what ASan did.
  2. Now, use the c command to continue executing until it hits the breakpoint a second time

  3. Use the commands you learned to print the variable that buggy.c is attempting to access (and crashing on).

  4. In your notes: Write the command you used, and what gdb printed out

    • NOTE: If your program runs too far ahead, you can always ask gdb to restart it by using run ARG1 ARG2 ... again

Testing with assert and Makefiles

In this section, we will be working on one of the functions that is part of PA2 and provide a cool way to incorporate unit testing into your main() functions without needing to change/comment lots of lines of code!

Finally, we will introduce a convenient way to compile our programs using Makefiles and the make utility!

Testing with assert

Task: While on your ieng6 account within your PA2 repo directory perform the following steps:

  1. Open your pwcrack.c file that we created/renamed in the previous portion of the lab. We are providing an incomplete implementation of the hex_to_byte function. Add the following function to your C file and complete it by determining what the function should return. To see the function's intended behavior, refer to the PA2 spec found here.

    NOTE: If you have already implemented your own hex_to_byte function, you can skip this step.

    #include <stdlib.h>
    #include <stdio.h>
    #include <stdint.h>
    
    uint8_t hex_to_byte(unsigned char h1, unsigned char h2) {
        uint8_t x = 0;
        uint8_t y = 0;
    
        // Convert h1 to a decimal value
        if (h1 >= '0' && h1 <= '9') {
            x += h1 - '0';
        }
        else if (h1 >= 'a' && h1 <= 'f') {
            x += h1 - 'a' + 10;
        }
    
        // Convert h2 to a decimal value
        if (h2 >= '0' && h2 <= '9') {
            y += h2 - '0';
        }
        else if (h2 >= 'a' && h2 <= 'f') {
            y += h2 - 'a' + 10;
        }
        // TODO: Determine what the function should return
    }
    
  1. Now we will need to add a main() function to this file to give the program an entrypoint. Copy the following main function into your pwcrack.c file.

    NOTE: If you already have a main() function from working on the PA, you can keep it as is and focus on the unit testing portion of the code given below.

    int main(int argc, char **argv) {
    
        // UNIT TESTING SECTION
        int test = 0; // Set this variable to 1 to run unit tests instead of the entire program
        if (test) {
            assert(hex_to_byte('a', '2') == 162);
            // ADD MORE TESTS HERE. MAKE SURE TO ADD TESTS THAT FAIL AS WELL TO SEE WHAT HAPPENS!
            
            printf("ALL TESTS PASSED!\n");
            return 0;
        }
    
        // MAIN PROGRAM SECTION
        if (argc < 2) {
            printf("Error: not enough arguments provided!\n");
            printf("Usage: %s <byte 1 in hex> <byte 2 in hex> ...\n", argv[0]);
            printf("Example: %s a2 b7 99\n", argv[0]);
            return 1;
        }
    
        int i = 1;
        for (; i < argc; i++) {
            printf("Value of hex byte %s is %d\n", argv[i], hex_to_byte(argv[i][0], argv[i][1]));
        }
    }
    

    You will note that this main() function serves two purposes:

    • It parses your command-line arguments to your program and runs your password cracker. In this lab, we will be demonstrating the use of command-line arguments to only test the hex_to_byte function but you will extend this to work for your complete password cracker implementation.
    • The if statement at the beginning of the main() function is a way to unit test specific functions as you work on your PA. Setting the value of the test variable to 1 will run the code within the if statement that should have several assert() statements to test your individual functions (we have provided an example test for hex_to_byte). If all your tests pass, the program will print ALL TESTS PASSED! and exit successfully. If not, you will receive an error when you run your program with int test = 1;.
  1. Commit and push your changes to Github. Make sure to include the new files you created in your commit.

In your notes: Briefly explain what you had the hex_to_byte function return and why?

Makefiles

Thus far, we have been typing out the various gcc commands needed to compile our C programs. We have also learned some command-line flags that can be useful when compiling our programs like -g for debugging information, -std=c11 to use the correct C version when compiling on ieng6 and -fsanitize for memory errors. Wouldn't it be nice if we could just type one command and have all the necessary files compiled with the correct flags? This is where make comes in.

make is a utility for building C programs. A Makefile is a file that contains a set of rules that tell the make utility what commands it should run for that rule. We will be creating a Makefile with rules to compile our C programs with the correct flags and dependencies.

Task: Perform the following steps to create a Makefile for your PA2 project:

  1. In your PA2 repository, create a new file named Makefile and add the following content to it:

    all: pwcrack
      
    pwcrack: pwcrack.c
        gcc -std=c11 -Wall -Wno-unused-variable -fsanitize=address -g pwcrack.c -o pwcrack
    
    clean:
        rm -f pwcrack
    

    IMPORTANT NOTE: The indentation in the Makefile is done with a TAB character, not spaces. Make sure to use a TAB character when indenting the commands in the Makefile.

    NOTE: If your file names are different than what is given above at this point, change the rules to match the file names you are using.

  2. Now, in your terminal, run make all. This will perform the same operation as running gcc -std=c11 -fsanitize=address -g pwcrack.c -o pwcrack.out without needing to type or remember that command! You should see the pwcrack file in your directory. However, if you run make all again, you will see that it doesn't recompile the files because they have not changed since the last time they were compiled. This is one of the benefits of using make to compile your programs.

  3. Now run make clean. This should delete the executable file.

  4. Next, run just make with no arguments.

In your notes: What happened when you ran make without any arguments? Why do you think this is the case?

We now have a clean and simple way to compile our C programs with the correct flags and dependencies using make. This will be very useful as we continue to work on our PA2 project and need to compile our programs multiple times.

If you are done early

  • Work on PA2! Use the remaining time in lab to make progress on PA2 along with your group members and your tutor!
  • Try adding a separate rule that would build pwcrack without address sanitizer enabled. Rename the rules so that one generates pwcrack.asan, and the other generates pwcrack. You may need to adjust all and clean as well to work correctly with this.
  • If you are done with PA2 here is a fun extension you could work on: Extend your password cracker such that when given a hash, it will attempt to go through all possible passwords of length 6 until the password is found.

Week 5 – Building a Simple Server

Welcome to Week 5!

Take a moment to ensure you have your group's Google Doc pulled up and answer the following question as a group while also filling it out in the Google Doc!

Q: If you could pick up a new skill in an instant what would it be and why?

Lab Goals

  1. Let's Have a Chat
  2. Headers and Servers
  3. Implementing the Number Server

Part 1: Let's Have a Chat

Entering lab, you may have noticed something interesting running on the projector. We've started up a demonstration of the next PA to play around with. For the next PA, you'll be creating your own chatroom server, where users can send and react to other's messages.

We've included a client on the ieng6 machines that you can use to communicate in our chatroom.

We can run our chat-client with: (where MACHINE is ieng6-201 for Joe's lab, or ieng6-202 for Aaron's lab)

/home/linux/ieng6/cs29fa24/public/chat-client USERNAME MACHINE 8000

Task:

  1. Choose a username and send a message to your lab room's chat server.
  2. Try reacting to a member of your lab group's message. Try reacting with an emoji 😁
  3. In your notes: Add a screenshot of your message and reaction.

The client is cool, but there are many other ways we can also interact with the chat server: In the past we've been using curl primarily to download single files from the internet. We can also use curl to send messages!

To see all the messages of the chat server we can curl the url:

curl 'MACHINE.ucsd.edu:8000/chats'

To send a message to the chat server, we can curl the url:

curl 'MACHINE.ucsd.edu:8000/post?user=USERNAME&message=MESSAGE'

Task:

  1. Try sending a message to the chatroom server using curl

To react to another user's message, we can curl the url:

curl 'MACHINE.ucsd.edu:8000/react?user=USERNAME&message=REACTION&id=ID'

REACTION can be any 16 byte message and ID is the id of the message you wish to react to.

Task:

  1. Try reacting to your group member's message from your web browser (Chrome, Safari, Firefox, etc.)
  2. In your notes: Add a screenshot from your web browser of MACHINE.ucsd.edu:8000/chats.

Part 2: Headers and Servers

In parts 2 and 3, you'll be writing your own simple webserver that keeps track of and lets you modify the value of a number (which should be a bit easier than writing a whole chat server).

2.0 Getting started

Task

  1. If you haven't done so already, log onto ieng6, then run the corresponding prep command for your class (cs29fa24 for Joe's, cs29fa24b for Aaron's).
  2. Fork the lab5 starter repo, then clone it. (If you're unsure how, you can refer back to how we did this in Lab 3 and Lab 4)

Header files

Looking into the lab5 repository, you'll find two source files (number-server.c, http-server.c), and one header file (http-server.h). In the previous labs and PAs for the class, our code has consisted of a single .c file, containing a main() and all associated functions. As programs get more complex, it becomes increasingly important to organize the code by seperating it into multiple files.

http-server.h should look like this:

#ifndef HTTP_SERVER_H
#define HTTP_SERVER_H

#include <stdlib.h>
// ... more #includes

void start_server(void(*handler)(char*, int), int port);
#define BUFFER_SIZE 2048

#endif

This header file contains the declaration of start_server, which includes all info needed for C to call a function, which are its arguments and return type. However the definition of the function (i.e. the actual code that it runs) are in http-server.c. To call this function, number-server.c needs the declaration, which it gets by including http-server.h.

EXTRA INFO 📝:

  • It's not necessary to fully understand the implementation of start_server, but the idea behind the function is available in the pa3 writeup.

  • Every function in a C file should have its visibility defined in one of two ways:

    • Public functions (meant to be called from other source files) need to be declared in a header file.
    • Private functions (typically helpers or utilities) should always be declared as static (e.g, static int foo() {...}) to specify that they shouldn't be visible to other .c files.
  • The ifndef/define/endif at the start and end of a header are an include guard

2.1 Compiling with multiple files

When compiling a program with multiple source files, all source (.c) files must be included as arguments to gcc. NOTE: you don't need to specify the header files to gcc; they are automatically pulled in by #include statements.

Task

  1. Try compiling the number_server using gcc. (Note: you may need to pass in the -std=c11 flag)
  2. In your notes: Write down what happens when you don't include the http-server.c file in gcc. What happens when you ONLY include the http-server.c file in gcc?
  3. Once you're successfully able to compile this program, create a Makefile so that you can compile number_server just by typing make.
    • (Note: You can refer to how we did this in the last lab)
    • You may find it useful to also include the debugging flags from last lab: -Wall -Wno-unused-variable -fsanitize=address -g
  4. Run the number-server you just compiled; it should say something like "Server started on port PPPPP".
  5. Open your browser and go to MACHINE.ucsd.edu:PPPPP where MACHINE is the ieng6 server you are on like ieng6-201, ieng6-202, or ieng6-203; PPPPP is the port number from your server like 8000. Your browser will show an error page, but you should see some output from number-server in your terminal
  6. In your notes: Add a screenshot of the terminal output from number-server

Congrats, you've got a working server running!

2.2 String Hunt

For the purposes of this lab, we'll also need a few more functions from the C <string> library in order to build our backend.

In your notes: For each of the following functions, add its signature and write a brief description of what the function does. You can look it up in the manpages by typing (man XXX) in the terminal, where XXX is a function name. For example, typing man strstr in the terminal would give you the manual for the strstr function.

  • strstr

  • sscanf

  • snprintf

2.3 HTTP 404 Response

Currently our number server isn't responding to requests at all. In part 3 of the lab, you'll make your server handle three kinds of requests, but for now lets start by responding with a '404 Page Not Found'.

HTTP Requests

When we request a given url e.g. ieng6-201.ucsd.edu:8000/post?user=joe&message=hi, our web-client (a browser, curl, or our fancy-client) does some work on our behalf.

The start of the url gives us the address (ieng6-201.ucsd.edu) and port (8000) of the server our client connects to. Once connected, it sends an HTTP request, which looks something like this:

GET /post?user=joe&message=hi HTTP/1.1
User-Agent: curl/7.29.0
... many other headers

The string after GET is the path of the request, and that's the part that will change depending on the type of request. In part 3, you'll be examining this path to determine how to respond.

HTTP Responses

Once we've decided what to do with a request, we can send a response back . The second argument to handle_response is an int that is a file descriptor (called the "client socket") for us to write a response to. To send data back to the client, we can use the write function:


write(client_sock, message, message_size);
//    int          char[]   size_t

An HTTP response for an unknown path should return a "404: Not Found" response that looks like the following:

HTTP/1.1 404 Not Found
Content-Type: text/plain

... 404 message to display to user ...

Note: A nuance of this is that the newlines in HTTP responses are not just \n; they are required to be \r\n (which, interestingly, is how line endings work on Windows). Some software will work with just a \n for line breaks, but the correct thing to do is use \r\n, so in a string literal the HTTP header above would look like

"HTTP/1.1 404 Not Found\r\nContent-Type: text/plain\r\n\r\n"

Task

  1. Use curl to access MACHINE.ucsd.edu:PPPPP with the numeric server running. How does curl respond?

  2. In your notes: Add a screenshot of your browser accessing MACHINE.ucsd.edu:PPPPP with the numeric-server still running. How is this different than if the server was not running?

  3. In number-server.c, modify handle_404 to send an HTTP response back to the client. You'll need to call write twice, once to write the response header, and once to write out the body of the response.

    IMPORTANT NOTE: To see the response header being printed, you will need to add the -v flag to the curl command, i.e.

    curl -v MACHINE.ucsd.edu:PPPPP
    
  4. Once you've done this, connect to your server from a browser and make sure you can see the error message, including the path of the url you tried to connect to.

  5. In your notes: Add a screenshot of the 'Not Found' page in your browser. How is this different from the previous error page?

  6. Try connecting to different paths on your server. Does you notice anything interesting if you put in add special characters to the path? Spaces? Emoji?

Part 3: Implementing the Number Server

Now that we've completed our walkthrough of the basics of the http-server library. It's time to use the library to create our own server.

The number-server needs handle 3 different routes:

  • /shownum

    This function responds to the client socket with the current value of num.

  • /increment

    This function increments the value of num by one and responds to the client with the new value of num.

  • /add?value=NNN

    This function increments the value of num by the value encoded in NNN (a decimal integer) and responds to the client with the new value of num

We'll walk you through the first one, then you can try the others on your own.

3.1 Getting started

Successful Responses

However, now that we're adding support for /shownum and friends, we need to be able to send back a successful response, not just a "page not found". The HTTP status code for success is 200, so a successful response would look something like this:

HTTP/1.1 200 OK
Content-Type: text/plain

... body of response ...

As a string literal, the response header would look like this:

"HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\n"

Path parsing:

We also need to ensure that our /shownum, /increment, and /add responses only trigger when the correct path is requested, a path like /badpath should still result in a 404 response. Using a function like strcmp or strstr can be useful to check if the path matches a given string.

Maintaining state

In the lab5 starter, num is a global int variable, so functions can read/write its value.


After completing the /shownum route, curl the route with:

curl localhost:<port>/shownum

You should see something like:

Your number is 0

Congrats! You officially got a web server running! Now, you have all the tools to implement the other functions of number-server.

3.2 Query parameters:

This section might be useful once you start working on the /add endpoint.

When sending a GET request in HTTP, any data accompanying the request needs to be sent as part of the path. This data is encoded as query parameters which follow a ? in the path and usually take the form of

?keyA=foo&keyB=bar&...

For the case of the /add endpoint, we need to encode the amount to add as part of the query string. A valid request to /add looks like

/add?value=<number>

Parsing query parameters correctly can be tricky, but in this case we only have one parameter to worry about, which makes it easier.

Task:

  1. What string function could we use to scan through our path and extract the number?
  2. Try using this function to print out the number after value= from the url path
  3. Implement the /add endpoint for your number-server
  4. In your notes: Put a screenshot of your number-server working with the endpoints you implemented.
  5. In your notes: Think of a request you could make that would cause your numeric-server to crash.

Week 6 – Work on PA3 (Chat Server)

Work on your PA3! Your tutors are here to help.

You are welcome to work with your classmates, share ideas, and so on.

In your notes doc, you should be able to show by the end of the lab:

  • You loading a URL via curl or browser a request on one of your classmates' chat servers that shows adding at least one chat, or part of a chat.
  • One of your classmates loading a URL via curl or browser on your chat server that shows adding at least one chat, or part of a chat.

If you think you're all the way done with PA3, here are some suggestions:

  • Implement threads in your chat server.
  • Right now the chat server allows anyone to post with any username. Brainstorm ways to have a user who uses a username first “claim” it somehow, so others can't. HINT: add a path for “registering” a username and share a secret between the server and the user to recognize them in the future.

Week 7 - More GDB + Valgrind

Welcome to Week 7

The goal of this weeks' lab is to have you perform various interesting tasks involving using GDB and Valgrind!

To get started, login to ieng6 and clone this repository which contains the activities for today's lab and cd into the lab7-starter directory.

If you see the following upon running ls, you're ready to get started!

[ygarde@ieng6-203]:lab7-starter:505$ ls
README.md  ctf  gradebook  valgrind

Lab Goals

  1. Crack the password
  2. Change your Grade with Buffer Overflows
  3. Addressing your Memory Errors

Crack the Password

The relevant files for this part of the lab are found within the password-crack directory in the lab7-starter repository you cloned earlier.

You are provided with an executable binary that performs a similar task to your password cracker from PA2. It randomly selects a word as the password (this word is unique to your student ID) and displays the hash of this word and then prompts you to guess the password. You are also given the main.c file that calls the function that generates the secret word and hashes it. However, you do not have the source code for the word generator itself.

Your goal is to crack the password given only the hash and the main.c file.

./ctf
Generating and hashing secret word...
the hash is 229478424054275
What is the password?: 
yash
Password not found, Try again 
hello
Password not found, Try again 
return
Password not found, Try again
.
.
.

TASK:

  1. Using tools you have learned about in CSE 29, determine what your random secret word is and crack the password!

In your notes: Describe the tool(s)/process you used and provide a screenshot of you cracking the password!

Change your Grade with Buffer Overflows

The relevant file for this part of the lab is found within the gradebook directory in the lab7-starter repository you cloned at the start of lab.

In this directory you will find an executable file called gradebook. Running this program will display your (fictional) assignment grade for Assignment 1 and ask whether you would like to leave a comment for a regrade:

./gradebook
Welcome to your gradebook!
=========== ASSIGNMENT 1 SUBMISSION ===========
Due Date : 10/24
Grade    : D (65/100)
===============================================

If you'd like to leave a comment on your submission, please enter it here (or Ctrl + D to end):
> 

After entering a comment, you will see an updated assignment entry printed out with your comment and the option to edit the comment:

If you'd like to leave a comment on your submission, please enter it here (or Ctrl + D to end):
> please regrade my submission
=========== UPDATED ASSIGNMENT 1 SUBMISSION ===========
Due Date : 10/24
Grade    : D (65/100)
Comment  : please regrade my submission
=======================================================

If you'd like to edit your comment, please type your new comment here (or Ctrl + D to end):
> 

A confidential source has informed you of a vulnerability in this gradebook that allows you to change your grade for the assignment if you provide an appropriate comment. This source has also given you the following screenshot of the source code to help you:

gradebook.c Source Code:

gradebook source code

TASK:

  1. Using the information above and tools you have learned about in CSE 29, determine how you would change your grade in Assignment 1 to an A.
  2. Once you have successfully changed your grade to an A, change your percent to reflect your new grade as well, a 95% sounds about right!

In your notes: Describe how you changed your letter grade and percentage for the assignment. What properties of the program did you exploit for this? Compare your approach to that of your group members and discuss any differences in your approaches.

Addressing your Memory Errors

The relevant files for this part of the lab are found within the valgrind directory in the lab7-starter repository you cloned earlier.

In this directory, you will find a Makefile, jstr.c and arraylist.c which contain the implementations of the String and List structs you have seen in lecture. In addition to these files, there are 6 additional C files labeled q1-q6.

Each of these q files implements function(s) on the String and List structs but with one or more memory related errors in them. Your task is to use valgrind to determine these errors and fix them. The files are numbered in increasing order of difficulty, so it is recommended to start with q1_expand_list.c and work your way up.

TASKS:

  1. Open jstr.c and arraylist.c and familiarize yourself with the implementations of the String and List structs.

  2. Run the make command to compile all the buggy C files. You should now have 6 executables in your directory. (Check this by using ls).

  3. For each program (starting with q1), run it both with and without valgrind to see what happens. Use the program output (whether it crashed or not) and valgrind to determine where the error may be and edit the C file to fix it. Remember to use --leak-check=full to get more detailed information about memory leaks.

    NOTE: The names of the files as well as the function names within the files are hints for where the error may be.

    NOTE: All the changes should be in the buggy_ functions found in each file. The main function should not need to be modified.

  4. Once you have fixed the error(s) in a file, run the program with valgrind again to ensure that there are no memory leaks or other memory-related errors. Your valgrind output should look like: valgrind output All heap memory should be freed and there should be no errors reported in the ERROR SUMMARY.

  5. When you determine the bug in each of the files, describe the error you found and how you fixed it in your notes. Provide a screenshot of the valgrind output for the fixed program.

Week 8 – Work on PA4 (Malloc)

Work on your PA4! Your tutors are here to help.

You are welcome to work with your classmates, share ideas, and so on.

In your notes doc, by the end of lab please include a screenshot of you using the various functions provided to you in the starter code:

  • vminit() to initialize the heap to a certain size
  • vmload() to load one of the images as the heap
  • vminfo() to print the heap layout

along with your understanding of these functions.

PA3 - Web Server

Web Servers and HTTP

HTTP is one of the most common protocols for communicating across computers. At the systems programming level, this means using system calls (usually in C) to tell the operating system to send bytes over a network.

One nice feature of HTTP is that it is a primarily text-based protocol, which makes it more straightforward for humans to read and debug. It is also well-understood by web browsers and programs like curl, making it easy to test and connect to user-facing devices.

It's useful to get experience with the format of HTTP, and with using system calls in C to manipulate HTTP requests.

Task – Chat Server

In this programming assignment, you'll write a C program to implement a chat room (think a plain-text version of Slack or Discord).

It's best to complete the PA on ieng6, because it gives a consistent testing environment for the live server.

You can also use the client from Lab 5 to try out your server.

Your programs should compile and run with:

$ make chat-server
$ ./chat-server <optional port number>

The server should start with ./chat-server and print a single message:

$./chat-server
Server started on port PPPPP

If a port number was provided, it should use that port, otherwise it should print an open port that was selected.

It should continue running, listening for requests on that port, until shutdown with Ctrl-c. It can print any other logging messages or other output needed to the terminal.

The requests the chat server listens for are described in this section

/chats

A request to /chats responds with the plain text rendering of all the chats.

The rendered chat format is

[#N 20XX-MM-DD HH:MM]   <username>: <message>
                    (<rusername>)  <reaction>
                        ... [more reactions] ...
... [more chats] ...

Where:

  • #N is # followed by a number like #5 or #33, where the integer is the id of the chat (ids start at 1 and count up).
  • YYYY-mm-dd HH:MM is the timestamp of the chat (when it was sent).
    • YYYY is the year, like 2024
    • mm is the two-digit month, like 10 for October
    • dd is the two-digit day, like 31
    • HH is the two-digit hour in 24-hour format, like 14 for 2pm
    • MM is the two-digit minute, like 55
  • <username> is the username of the user who sent the chat
  • <message> is the text of the message the user entered
  • <rusername> is the name of a user who reacted to the message
  • <reaction> is a reaction to a message

In terms of alignment and spacing:

  • There should be at least one space between the number and the timestamp
  • There should be at least one space between the closing ] and the username
  • There should be no space right after the username in a chat, it should be immediately followed by a colon
  • There should be a single space after the colon before the message
  • There should be some space before the ( in a reaction, no space between the () and the username, and some space after the ) before the reaction message

You can put in whatever effort you like into lining things up nicely within these constraints, but these are the requirements.

So an example chats rendering might look like:

[#1 2024-10-06 09:01]         joe: hi aaron
[#2 2024-10-06 09:02]       aaron: sup
[#3 2024-10-06 09:04]         joe: working on the example chat for the PA
[#4 2024-10-06 09:06]       aaron: oh cool what should it say
[#5 2024-10-06 09:07]         joe: I dunno we could go pretty meta with it? I pushed an example go check it out. like a chat about the chat
[#6 2024-10-06 09:10]       aaron: eh kinda lame tbh
[#7 2024-10-06 09:11]         joe: whatever I already wrote it, going with it as-is
[#8 2024-10-06 09:12]       aaron: ok but make sure we don't look like jerks
[#9 2024-10-06 09:12]       aaron: or at least not me
                            (joe)  👍🏻 
[#10 2024-10-06 09:12]         joe: good talk

Here's another:

[#1 2024-10-24 13:01]        yash: hi all! react with 👍🏻 if you think the lab is good to go, or 😬 if you think it needs more testing
                          (aaron)  👍🏻 
                          (elena)  😬 
                         (arunan)  😬 
                          (janet)  😬 
                         (travis)  😬 
                            (joe)  🐿️
[#2 2024-10-24 13:02]        yash: OK we'll go with what joe said

/post

A post request looks like this:

/post?user=<username>&message=<message>

This creates a new chat with the given username and message string with a timestamp given by the time the request is received by the server. It must respond with the list of all chats (including the new one).

Limits and constraints:

  • If a parameter (username or message) is missing, respond with some kind of error (HTTP code 400 or 500)
  • If username is longer than 15 bytes, respond with some kind of error (HTTP code 400 or 500)
  • If message is longer than 255 bytes, respond with some kind of error (HTTP code 400 or 500)
  • If a post would make there be more than 100000 (one hundred thousand) chats, the server should respond with an error (HTTP code 404 or 500)

/react

/react?user=<username>&message=<reaction>&id=<id>

Creates a new reaction to a chat by the given username with the given message string, reacting to the post with the given id (the ids are the #N at the beginning of posts). It must respond with the list of all chats (including the new one).

Limits and constraints:

  • If the id is not the ID of some existing chat, respond with some kind of error (HTTP code 400 or 500).
  • If a parameter (username or message or id) is missing, respond with some kind of error (HTTP code 400 or 500)
  • If username is longer than 15 bytes, respond with some kind of error (HTTP code 400 or 500)
  • If message is longer than 15 bytes, respond with some kind of error (HTTP code 400 or 500) – reactions are intended to be short!
  • If a reaction would make a chat have more than 100 reactions, the server should respond with an error (HTTP code 404 or 500)

/reset

A /reset request resets the chat server to have no chats or reactions, starting from the empty initial state. Should respond with a successful HTTP response with an empty body.

It should be possible to /reset the room many times, and after resetting the memory usage of the program should be the same as in the empty initial state.

After a reset, it should be possible to immediately shut down the program and have valgrind report no memory leaks.

Design Questions

  1. How much working memory do 10 chats take in your program, in between processing requests (assume no one has reacted to them)? How about 100? 1000? We can call this part of the memory your program uses the chat storage. How much additional working memory is used when handling the /chats request in your code for 10, 100, and 1000 chats? Speculate on ways to lower either the chat storage or the memory used to process a request, or explain why your approach minimizes them.

  2. The PA writeup specifies many limits and constraints on the input (usernames, messages, ids, and so on). Choose one of these limits or constraints, and imagine removing it. What's something that would now work and might make a user happy because the limit is removed? What's a problem that could result from removing that limit? What impact would it have on your implementation to remove it (any new possibilities of errors, new cases to handle, etc)?

Implementation Guide

This page is the entire specification for the assignment; it's what you need to implement. You are free to make whatever choices you like in your code within these constraints. To help you on your way, we have an implementation guide:

Handin

  • Hand in your repository on Gradescope
  • Make sure make chat-server builds your server (that's the command we will run), and the server runs on a predictable port with, for example, ./chat-server 8000
  • Don't forget DESIGN.md and CREDITS.md

Implementation Guide: The http-server Library

We've provided you with http-server.c and http-server.h. This is code for you to treat as a library, you shouldn't change it at all (though you may learn from reading it!).

It provides one function for you to use:

void start_server(void(*handler)(char*, int), int port);

This function takes two arguments: a handler that takes HTTP requests and writes responses, and a port for telling the server what address to use. The port can be 0 to ask the operating system to choose one for us.

The type for handler is the type of a function pointer, which is the address of a function. To use start_server, we define a function (it can have any name) with the appropriate signature and pass it into start_server as an argument. The handler function takes two arguments:

void(*handler)(char*, int)
  • The first, the char*, is a string containing the request from the internet. It will be an HTTP request.
  • The second, the int, is a file descriptor that our program can use to write a response using the write system call

Here's a straightforward example that just responds to the user with a constant string:

#include "http-server.h" // http-server includes a lot of other useful things
char const* HTTP_200_OK = "HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\n";
void handle_response(char *request, int client_socket) {
    printf("The user sent a request with: %s\n", request);
    write(client_socket, HTTP_200_OK, strlen(HTTP_200_OK));
    write(client_socket, "Hello!", 6);
}
int main() {
    start_server(&handle_response, 0);
}
$ gcc -o print-request print-request.c http-server.c
$ ./print-request
Server started on port PPPPP (will be a specific number for you)
# the lines below print *when the request is received*
The user sent a request with: GET /post?user=joe&message=hi HTTP/1.1
Host: localhost:36611
User-Agent: curl/7.68.0
Accept: */*
The user sent a request with: GET /post?user=aaron&message=sup HTTP/1.1
Host: localhost:36611
User-Agent: curl/7.68.0
Accept: */*

(in another terminal)

$ curl "http://localhost:PPPPP/post?user=joe&message=hi"
Hello!
$ curl "http://localhost:PPPPP/post?user=aaron&message=sup"
Hello!

Then, every time the server gets a request, it will call the handle_request function, which will print the contents of the request, and then use the write system call to send a response back to the client (in this case with a constant string). This is the entry point to our program, and it's a good starting point for you to build up from!

One other tip for using start_server:

  • If port is 0, the library will pick an open port provided by the operating system. This is convenient for getting started
  • However, when you restart a program it's annoying to have to change the port number in any requests you have in your bash history, etc. So you can also provide the port number as an argument, which could be hardcoded or taken from a command-line argument. You can do that to re-run the program with the same port it used last time to keep your examples consistent

HTTP Requests

The first argument to handler is a char* with a a reference to the HTTP request. For our chat-server, all requests will be GET requests and will look something like

GET /post?user=joe&message=hi HTTP/1.1
Host: localhost:36611 User-Agent:
curl/7.68.0 Accept: */ *

The string after GET is the path of the request, and that's the part that will change depending on the type of request we get. Part of the job of the chat-server will be to extract the relevant information from the path, like which action to take (post, react, etc.) and the parameters of the action (user, message, etc.) in order to work with them.

It probably makes sense to use some C string library functions to do this. In particular strtok will allow you to “split up” a string by replacing delimiters like , &, and = with null terminators. You may also find strcspn, strstr, or strpbrk useful; it's up to you to come up with a plan to put these together to extract the information out of each request.

HTTP Responses

The second argument to handler is an int that is a file descriptor for the response. The http-server library has set things up so writing to the socket sends bytes directly back to whatever client made the request.

The format of a HTTP response for a successful response is

HTTP/1.1 200 OK
Content-Type: text/plain

... response body ...

Note: A nuance of this is that the newlines in HTTP responses are not just \n; they are required to be \r\n (which, interestingly, is how line endings work on Windows). Some software will work with just a \n for line breaks, but the correct thing to do is use \r\n, so in a string literal the HTTP header above would look like

"HTTP/1.1 200 OK\r\nContent-Type: text/plain\r\n\r\n...response body...

There are many other Content-Types in the world other than text/plain! But we will focus on just plain text reponses for this assignment.

Working Incrementally: Data vs. Requests

One way to break down the work the server needs to do is:

  • Parsing and interpreting requests (is the request a new post, a reaction, etc)
  • Updating the current data (chats and reactions) based on the parameters in the request
  • Responding to requests based on the current state of the data (chats and requests)

One way to work incrementally is to separate the data handling and the request handling parts into different functions.

The data handling functions can be tested with asserts, and the request handling can be tested with curl or a client.

We think the following functions might be useful for you to implement. In your program you might have slightly different signatures or ideas, but these are a useful starting point. Also, our staff is more familiar with this approach, so it will take us less time to help you in office hours!

Data Handling Functions

These functions can be written and tested without starting a server at all. You could consider having a separate main function in its own file that just tests these!

add_chat

A function add_chat can add a single chat.

uint8_t add_chat(char* username, char* message)

This function might have several tasks:

  1. Update the current id
  2. Get the current timestamp
  3. Create a new chat and fill in its username and message fields
  4. As needed, allocate new space, put the new chat in heap memory, store a reference to it in an array, etc. depending on your specific representation of chats

This is testable by setting up initial states of chats and reactions, running the function, and then using assert or printf on the results.

add_reaction

Similar to add_chat, this adds a single reaction:

uint8_t add_reaction(char* username, char* message, char* id)

This function might have several tasks:

  • Use the id to locate the chat that this reaction is for (and maybe return early with an error if the id is invalid/out of range)
  • Create a new reaction and fill in its username and message fields
  • Add the reaction to the chat struct somehow, maybe with newly allocated space, an added element or reference in an array, etc. depending on your specific representation of chats and reactions
  • Update the count of reactions on the referenced chat

This is testable by setting up initial states of chats and reactions, running the function, and then using assert or printf on the results.

reset

This function can use free as necessary to clear dynamically-allocated heap memory and reset logical values (like a counter of the number of chats) back to 0.

This is testable by setting up initial states of chats and reactions, running the function, and then using assert or printf on the results. You can also set up a main function in a fresh file just for checking that after using a sequence of add_chat and add_reaction followed by reset, there are no memory leaks when run with valgrind.

Request and Response Handling Functions

You will definitely need to write a function for handling responses. But the work of handling individual responses can be broken up. One approach could be to get the path and query parameter string from the request and check if it's path is /post, /chats, etc, then pass the string to other functions

respond_with_chats

void respond_with_chats(int client)

This function is reponsible for using write or send to send the response to the client that made the request. It might include:

  • Using snprintf to format strings with data from the timestamp or ids
  • Calling write(client, str, size) on various strings (with the appropirate size) to directly send the data to the client

handle_post

// path is a string like "/post?user=joe&message=hi"
void handle_post(char* path, int client)

This function can have several tasks:

  • Use string functions to extract the username and message from the path
  • Call add_chat to do the data update
  • Call respond_with_chats to send the response

handle_reaction

// path is a string like "/react?user=joe&message=hi&id=3"
void handle_reaction(char* path, int client)

This function can have several tasks:

  • Use string functions to extract the username, message, and id from the path
  • Call add_reaction to do the data update
  • Call respond_with_chats to send the response

Parsing Functions

One crucial task is getting information out of requests. That is, given a string like this:

"GET /post?user=joe&message=hi HTTP/1.1\r\n"

How do we extract the parts that say "joe" and "hi"?

There are a few substeps you might consider, whether as helper functions or just parts of an overall approach:

  • Isolate the path part (the part starting with / and up until the space before HTTP)
  • Search for known strings and delimiters like user= and &
  • Copy parts of the string into new character arrays, paying careful attention to the necessary lengths (some of the limits on requests may be helpful here!). Alternatively, create interior pointers into the path at the appropriate places for the username and message

Remember that Week 4 Monday had a video specifically about strings and interior pointers, and Week 5 Discussions are specifically about parsing requests!

Representing Chats

Chats and reactions both have multiple fields, so a natural choice is to represent both chats and reactions as structs.

A chat has several components, which may be good candidates for struct fields:

  • The message
  • The username
  • The timestamp
  • The reactions to the message

A reaction has the message content and the user who posted it (no timestamp or reactions-to-reactions), both of which are fixed-size.

You could consider structures like these; what are some tradeoffs? (We've assumed that the program defines some useful constants to avoid repeating specific numbers).

struct Reaction {
    char user[USERNAME_SIZE];
    char message[REACTION_SIZE];
}
struct Chat {
    uint32_t id;
    char user[USERNAME_SIZE];
    char message[MESSAGE_SIZE];
    char timestamp[TIMESTAMP_SIZE];
    uint32_t num_reactions;
    Reaction reactions[MAX_REACTIONS];
}
struct Reaction {
    char *user;
    char *message;
}
struct Chat {
    uint32_t id;
    char *user;
    char *message;
    char *timestamp;
    uint32_t num_reactions;
    Reaction *reactions;
}

These are ideas – some combination of them might work, and they are not necessarily perfect or complete. Some things to think about:

  • Which fields are fixed-size?
  • Which fields can grow?
  • Which fields can change?
  • What are limits for them described in the specification?

Other Helpful Functions

This PA explores several features that are straightforward to use, but there are many of them. We might add more to this list as the PA goes on! Here are a few functions you'll probably find useful; try man on them, or follow the links, or do your own searching and research. Don't forget all the functions from class (e.g. malloc and other allocation functions, strstr and other string manipulation functions, and so on). This list is mainly focused on things we haven't tried in class.

  • atoi: convert char* to integer

    For example:

    #include <stdio.h>
    #include <stdlib.h>
    
    int main() {
      char numeric[] = "123";
      int asnum = atoi(numeric);
      printf("%d\n", asnum * 2);
    }
    
  • Time functions:

    • time: get the current time

    • localtime: convert the time to the current local time zone

    • strftime: print the time in a given format

      For example:

      #include <time.h>
      #include <stdio.h>
      
      int main() {
          char buffer[100];
          time_t now = time(NULL);
          struct tm *tm_info = localtime(&now);
          strftime(buffer, sizeof(buffer), "%Y-%m-%d %H:%M:%S", tm_info);
          printf("%s", buffer);
      }
      

PA4 – Malloc

This assignment is thanks to the staff of CSE29 spring 2024, especially Gerald Soosairaj and Jerry Yu.

In class, quizzes, and PAs we've used malloc and free to manage memory. These are functions written in C! glibc contains one of the frequently used implementations.

There are a lot of details that go into these implementations – we don't expect that anyone could internalize all the details from the documentation above quickly. However, there are a few key details that are really interesting to study. In this project, we'll explore how to implement a basic version of malloc and free that would be sufficient for many of the programs we have written.

Specfically, we will practice the following concepts in C:

  • bitwise operations,
  • pointer arithmetic,
  • memory management, and of course,
  • using the terminal and vim.

Task Specification

You will implement and test two functions.

vmalloc

void *vmalloc(size_t size);

The size_t size parameter specifies the number of bytes the caller wants. This has the same meaning as the usual malloc we have been using: the returned pointer is to the start of size bytes of memory that is now exclusively available to the caller.

Note the void * return type, whic is the same as the usual malloc. void * is not associatd with any concrete data type. It is used as a generic pointer, and can be implicitly assigned to any other type of pointer.

This is how malloc (and our vmalloc) allow assigning the result to any pointer type:

int *p = malloc(sizeof(int) * 10);
char *c = malloc(sizeof(char) * 64);

If size is not greater than 0, or if the allocator could not find a heap block that is large enough for the allocation, then vmalloc should return NULL to indicate no allocation was made.

Your implementation of vmalloc must:

  • Use a “best fit” allocation policy
  • Always return an address that is a multiple of 16
  • Always allocate space on 16-byte boundaries

More details are in the implementation guide.

vmfree

void vmfree(void* ptr)

The vmfree function expects a 16-byte aligned address obtained by a previous call to vmalloc. If the address ptr is NULL, then vmfree does not perform any action and returns directly. If the block appears to be a free block, then vmfree does not perform any action and returns.

For allocated blocks, vmfree updates the block metadata to indicate that it is free, and coalesces with adjacent (previous and next) blocks if they are also free.

More details are in the implementation guide.

Design Questions

Heap fragmentation occurs when there are many available blocks that are small-sized and not adjacent (so they cannot be coalesced).

  1. Give an example of a program (in C or pseudocode) that sets up a situation where a 20-byte vmalloc call fails to allocate despite the heap having (in total) over 200 bytes free across many blocks. Assume all the policies and layout of the allocator from the PA are used (best fit, alignment, coalescing rules, and so on)

  2. Give an example of a program (in C or pseudocode) where all the allocations succeed if using the best fit allocation policy (like in this PA), but some allocations would fail due to fragmentation if the first fit allocation policy was used instead, keeping everything else the same.

Submission

You'll submit on Gradescope as usual. Refer to the policies on collaboration if you need to refresh your memory.

We will compile your program using the provided Makefiles, which you should not change.

To get started read through the next few sections about the PA:

Getting Started

The starter code for this assignment is hosted on GitHub Classroom. Use the following link to accept the GitHub Classroom assignment: https://classroom.github.com/a/ngxgHFB3

Just like on PA2/3, clone the repository to ieng6 server.

The Code Base

Unlike previous PAs, this PA comes with starter code that you should read and understand.

The Library

  • vmlib.h: This header file defines the public interface of our library. Other programs wishing to use our library will include this header file.
  • vm.h: This header file defines the internal data structures, constants, and helper functions for our memory management system. It is not meant to be exposed to users of the library.
  • vminit.c: This file implements functions that set up our “heap”.
  • vmalloc.c: This file implements our own allocation function called vmalloc.
  • vmfree.c: This file implements our own free function called vmfree.
  • utils.c: This file implements helper functions and debug functions.

Testing

  • vmtest.c: This file is not a part of the library. It defines a main function and uses the library functions we created. We can test our library by compiling this file into its own program to run tests. You can write code in the main() function here for testing purposes.
  • tests/: This directory contains small programs and other files which you should use for testing your code. We will explain this in more detail in a later section.

Compiling the Starter Code

To compile, run make in the terminal from the root directory of the repository. You should see the following items show up in your directory:

  • libvm.so: This is a dynamically linked library, i.e., our own memory allocation system that can be linked to other programs (e.g., vmtest). The interface for this library is defined in vmlib.h. The Makefiles handle compiling with a .so for you; we set it up this way to match how production systems include libraries like stdlib
  • vmtest: This executable is compiled from vmtest.c with libvm.so linked together. It uses your memory management library to allocate/free memory.
  • The starter code in vmtest.c is very simple: it calls vminit() to initialize our simulated “heap”, and calls the vminfo() function to print out the contents of the heap in a neatly readable format. Run the vmtest executable to find out what the heap looks like right after it’s been initialized! (Hint: it’s just one giant free block.)

Reading the Starter Code

For this programming assignment, we want the addresses returned by vmalloc to always be 16-byte aligned (i.e., at an address ending in 0x0). Since the metadata (block header) is 8 bytes in size, block headers will therefore always be placed at addresses ending in 0x8. To maintain this alignment, the first block header (*heapstart) starts at an offset of 8, and all blocks going forward must have sizes that are multiples of 16 bytes.

Block sizes here refer to the size of the block including the header. The smallest possible block is 16 bytes, and would consist of an 8 byte header followed by 8 bytes of allocatable memory. When free, the last 8 bytes of the block would instead contain the block footer, to be used for coalescing free blocks.

The size_t data type, which you will see frequently throughout this PA, is a 64-bit (8-byte) unsigned integer type.

Begin by reading through the vm.h file, where we define the internal data structures for the heap. This is where you will find the all-important block headers and block footers. Focus on understanding how struct block_header is used. You will see it in action later.

Next, open vminit.c. This is a very big file containing functions that create and set up the simulated heap for this assignment. Find the init_heap() function, and read through the entire thing to understand how it is setting up the heap. There is pointer arithmetic involved, you will need to do similar things for implementing vmalloc and vmfree, so make sure you have a solid understanding of that. (Why is it necessary to cast pointers to different types?)

Just like production allocators, we create a large chunk of memory as our heap using mmap, and allocations/frees are performed in that memory region.

Once you understand what the init_heap() function is doing, open up utils.c. Here we have implemented the function vminfo(), which will be your ally throughout this PA. This function traverses through the heap blocks and prints out the metadata in each block header. You should find inspiration for how to write your own vmalloc function here. Look at how it manipulates the pointer to jump between blocks!

Implementation Guide

This provides detailed implementation information and a suggested workflow.

Incremental Development Tips

  • Run the starter code and understand the output. Always figure out how to run your program first! You can’t do any testing if you don’t know how to run your program.
  • Understand the init_heap() function.
  • Understand the vminfo() function.
  • Begin writing the vmalloc() function:
    • write and test the size calculation code.
    • write and test the best-fit policy. (A helper function would be great here.) See later section for testing images.
    • write and test splitting free blocks.
  • Begin writing the vmfree() function:
    • write and test a basic vmfree() implementation that only frees one block.
    • write and test coalescing with the next block if it is also free.
  • Add block footers / coalescing backwards
    • update and test vmfree() and vmalloc() to write block footers into free blocks.
    • update and test vmfree() to coalesce with the previous block if it is free.
  • Test everything!

Consider some helper functions along the way:

  • A function to get the size of a block in bytes given a pointer to the beginning of it
  • A function to get the size of a block in 8-byte words given a pointer to the beginning of it
  • A function for getting a pointer to the next block given a pointer to the beginning of a block

Implementing vmalloc

Allocation Size Calculation

We need to do some calculations based on the requested size to get the actual block size that we need to look for in our heap.

  • Add 8 bytes for the block header to the requested size;
  • Round up the new size to the nearest multiple of 16.

For example, if the user calls vmalloc(10), we first add 8 bytes for the header to get 18, and then round up to the nearest multiple of 16, which gets us 32. So to allocate 10 bytes, we need to look for a heap block that is at least 32 bytes in size.

Allocation Policy

Many allocation policies are possible – choosing the first block that fits, choosing the last block that fits, choosing the most-recently-freed block that fits, and so on. The one you must implement for this PA is the best-fit policy.

That is, once you have determined the size requirement of the desired heap block, you need to traverse the entire heap to find the best-fitting block – the smallest block that is large enough to fit the allocation. If there are multiple candidates of the same size, then you should use the first one you find.

Splitting Large Blocks

If the best-fitting block you find is larger than what we need, then we need to split that block into two. For example, if we are looking for a 16-byte block for vmalloc(4), and the best fitting candidate is a 64-byte block, then we will split it into a 16-byte block and a 48-byte block. The former is allocated and returned to the caller, the latter remains free.

Updating Block Header(s)

If a new block was created as a result of a split, you need to create a new header for it, and

  • set the correct size.
  • set the status bit to 0.
  • set the previous bit to 1.

And for the block that was allocated, you need to

  • update the size (if splitting happened).
  • set the status bit to 1.
  • go to the next block header (if it’s not the end mark) and set its previous bit to 1.

Returning the Address

After updating all the relevant metadata, vmalloc should return a pointer to the start of the payload. Do not return the address of the block header!

Implementing vmfree

The simplest version of vmfree just finds the header for the given block and resets the block status bit to 0.

Coalescing freed blocks

Just freeing a block is not necessarily enough, since this may leave us with many small free blocks that can't hold a large allocation. When freeing, we also want to check if the next / previous block in the heap are free, and coalesce with them to make one large free block.

If you are coalescing two blocks, remember to update both the header and the footer!

Block footers / Previous Bit

Since a block's header contains its size, we know how far forward to move to get to the next block's header. However, when coalescing backwards, we need to know the size of the previous block to get to its header. To be able to do this without walking the entire list of blocks from the beginning, we write the block size to a footer in the last 8 bytes of the block.

Since footers are only needed during coalescing, we only need to add footers to free blocks; this means that footers don't take up extra space in allocated blocks, and free blocks weren't using that space anyway. However, we then need to make sure we update the "previous block busy" bit correctly so that we don't confuse user data in an allocated block with footer data in a free block.

You will need to update both your vmalloc and your vmfree implementation to add code for creating/updating accurate footers, and making sure the "previous block busy" bit is correct.

Testing

For this PA we provide you with some local testing code to help you make sure your program is working; you will need to write more thorough tests than what we've provided! But we give you a big start

Test Programs

In the repository, you will find the tests/ directory, where we have provided some small testing programs that test your library function.

By coding in to the tests/ directory and running make, you can compile all these test programs and run them yourself. These programs are what we will be using in the autograder as well.

For example, here’s the code for the very first test: alloc_basic.c:

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>

#include "vmlib.h"
#include "vm.h"

/*
 * Simplest vmalloc test.
 * Makes one allocation and confirms that it returns a valid pointer,
 * and the allocation takes place at the beginning of the heap.
 */
int main()
{
    vminit(1024);

    void *ptr = vmalloc(8);
    struct block_header *hdr = (struct block_header *)
                               ((char *)ptr - sizeof(struct block_header));

    // check that vmalloc succeeded.
    assert(ptr != NULL);
    // check pointer is aligned.
    assert((uint64_t)ptr % 16 == 0);
    // check position of malloc'd block.
    assert((char *)ptr - (char *)heapstart == sizeof(struct block_header));
    // check block header has the correct contents.
    assert(hdr->size_status == 19);

    vmdestroy();
    return 0;
}

In this test, we simply make one vmalloc() call, and we verify that the allocation returned a valid pointer at the correct location in the heap.

These tests use the assert() statement to check certain test conditions. If your code passes the tests, then nothing will be printed. If one of the assert statements fail, then you will see an error.

Test Images

When you are writing vmalloc, you may wonder how you can test the allocation policy fully when you don’t have the ability to free blocks to create a more realistic allocation scenarios (i.e., a heap with many different allocated/free blocks to search through).

To help you with that, we have created some images in the starter code in the tests/img directory. In your test programs, instead of calling vminit(), you can call the vmload() function instead to load one of these images.

Our alloc_basic2 program uses one such image. If you open tests/alloc_basic2.c, you will see that it creates the simulated heap using the following function call:

vmload("img/many_free.img");

If you load this image and call vminfo(), you can see exactly how this image is laid out:

vmload: heap created at 0x7c2abd1da000 (4096 bytes).
vmload: heap initialization done.
---------------------------------------
 #      stat    offset   size     prev   
---------------------------------------
 0      BUSY    8        48       BUSY   
 1      BUSY    56       48       BUSY   
 2      FREE    104      48       BUSY   
 3      BUSY    152      32       FREE   
 4      FREE    184      32       BUSY   
 5      BUSY    216      48       FREE   
 6      FREE    264      128      BUSY   
 7      BUSY    392      112      FREE   
 8      BUSY    504      32       BUSY   
 9      FREE    536      112      BUSY   
 10     BUSY    648      352      FREE   
 11     BUSY    1000     304      BUSY   
 12     BUSY    1304     336      BUSY   
 13     FREE    1640     320      BUSY   
 14     BUSY    1960     288      FREE   
 15     BUSY    2248     448      BUSY   
 16     BUSY    2696     256      BUSY   
 17     BUSY    2952     96       BUSY   
 18     BUSY    3048     368      BUSY   
 19     FREE    3416     672      BUSY   
 END    N/A     4088     N/A      N/A    
---------------------------------------
Total: 4080 bytes, Free: 6, Busy: 14, Total: 20

You can use this image to test allocating in a more realistic heap.

Three images exist in total:

  • last_free.img: the last block is free,
  • many_free.img: many blocks are free,
  • no_free.img: no block is free (use this to test allocation failure).