doodle

UCSD CSE29 WI26 Syllabus and Logistics

CSE 29 introduces you to the broad field of systems programming, including 1) the basics of how programs execute on a computer, 2) programming in C with direct access to memory and system calls, 3) software tools to manage and interact with code and programs. All very cool stuff that makes every programmer better!

Basics

Lecture (attend the one you're enrolled in):
- A section: Tue/Thu 11:00am, Center Hall 115
- B section: Tue/Thu 12:30pm, Center Hall 113
Discussions:
- A section: Tue 5:00pm, Pepper Canyon Hall 109
- B section: Thu 5:00pm, Center Hall 212
Labs: Wednesdays (check your schedule!). Either B240 or B250 in the CSE building
Exams:
- Weeks 4 and 8: AP&M Testing Center, flexible scheduling on PrairieTest
- Week 10: During your regularly-scheduled lab time
- Finals week: In CSE labs scheduling instructions coming soon!
Final exam: AP&M Testing Center, flexible scheduling in final exam week on PrairieTest
Professor office hours (these are the standard times but check the calendar below for changes!)
- Tuesday 2:30-3:30pm, CSE 3206 (advising focused, open to all students)
- Wednesday 1:30-2:30pm CSE B270/tables outside labs (CSE29 students only)
Office Hours: Refer to calendar
Q&A forum: Piazza
PrairieLearn: https://us.prairielearn.com
Textbook/readings: Dive Into Systems, plus additional readings we will assign
- Free: MIT Missing Semester
- Not free but pretty cheap: Julia Evans Zines, especially The Pocket Guide to Debugging
Reference Sheets: Google Doc

Office Hours Calendar

Schedule

Week 10 – Review and Wrapup
- Lectures
  - Tuesday
  - Thursday
    - Please fill out your SETS and IA Evaluations! :)
Week 9 – HTTP Servers
- Lectures
  - Tuesday
  - Thursday
Week 8 – Allocator Wrapup, More on System Calls, Practice
- Readings & Resources
- Lectures
  - Tuesday
    - Blank Handout
    - Annotated Handout (11am then 12:30pm)
    - Lecture Summary Google Slides
    - Lecture Summary Slides PDF
  - Thursday
Week 7 – Implementing an Allocator
- Readings & Resources
- Lectures
  - Tuesday
    - Review Questions PDF, Slides
    - Heap Visualizer HTML Doc (Note: there may be bugs in this implementation, the specification given in the PA is the ground truth)
    - Lecture Videos
      - video 1 watch from time 9:30-end
      - video 2 watch from beginning to ≈15 minutes in.
    - Notes from video 1
  - Thursday
    - Blank Handout
    - Annotated Handout (both sections)
    - Lecture Summary Google Slides
    - Lecture Summary Slides PDF
Week 6 – Actually Structs, Heap Memory
- Readings and Resources
  - Same as week 5
  - Python UTF8 Struct: cpython/unicodeobject.h
- Lectures
  - Tuesday
    - Blank Handout
    - Annotated iPad notes (both sections 11am then 12:30pm)
    - Lecture Summary Google Slides
    - Lecture Summary Slides PDF
    - Lecture Summary with Speaker Notes PDF
  - Thursday
    - Blank Handout
    - Annotated iPad notes (both sections 11am then 12:30pm)
    - Lecture Summary Google Slides
    - Lecture Summary Slides PDF
Week 5 – Structs, Memory Management
- Readings
- Lectures
  - Tuesday
    - Blank Handout
    - Lecture Summary Google Slides
    - Lecture Summary Slides PDF
    - Lecture Summary with Speaker Notes PDF
    - layout.c (code from handout)
    - Notes on exam from Joe ---> exam.txt
    - Annotated Handout 12:30pm
    - Chalkboard (11am) Memory, Memory Extended, fork(), execvp(..), Full Board
    - Code shell.c shell-complete.c
  - Thursday
    - Review Question materials
      - reviewq1.txt reviewq2.txt
        fork2.c
        exec-arg0.c arg0.c exec-arg0-nonsense.c
    - Code
      - 11am shell.c
      - 12:30pm shell.c vm.c
    - Lecture Summary Google Slides
    - Lecture Summary Slides PDF
    - Lecture Summary with Speaker Notes PDF
Week 4 – Processes and Memory
- Readings
  - Processes, especially fork and exec
  - C Structs
  - make and Makefiles
- Lectures
  - Tuesday
    - Blank Handout
    - Annotated Handout 11am 12:30pm
    - Lecture Summary Google Slides
    - Lecture Summary Slides PDF
    - Lecture Summary with Speaker Notes PDF
  - Thursday
    - Blank Handout
    - Annotated Handout 11am 12:30pm
    - Code
      - strtok.c 11am 12:30pm
      - shell.c 11am 12:30pm
    - Lecture Summary Google Slides
    - Lecture Summary Slides PDF
    - Lecture Summary with Speaker Notes PDF
Week 3 – Where (Some) Things Are in Memory
- Readings
  - Debugging with GDB
  - Pointers and Functions
- Lectures
  - Tuesday
    - Blank Handout
    - Annotated Handout: 11am 12:30pm
    - Code
      - pre-lecture: analyzer.c
      - 11am analyzer.c adjacent.c
      - 12:30pm analyzer.c adjacent.c
  - Thursday
    - Blank Handout
    - Annotated Handout: 11am 12:30pm
    - Lecture Summary Google Slides
    - Lecture Summary Slides PDF
    - Lecture Summary with Speaker Notes PDF
Week 2: Numbers, Bitwise Representations, and UTF8
- Readings
  - Binary and Data Representaion (Sec 4.1-4.6)
  - Arrays in C
  - Scanf and fgets (though this week we'll only use fgets with stdin)
- Lecture Materials
  - Tuesday
  - Thursday
    - Handout
    - Annotated Handout: 11am 12:30pm
    - Code: 11am 12:30 pm
    - Lecture Summary Google Slides
    - Lecture Summary Slides PDF
    - Lecture Summary with Speaker Notes PDF
Week 1: Welcome, Strings and Bitwise Representations
- Readings and Videos
- Lecture Materials
  - Tuesday
    - Handout
    - Annotated Handout: 11am 12:30pm
    - Lecture Summary Google Slides
    - Lecture Summary
  - Thursday
    - Handout
    - Annotated Handout: 11am 12:30pm
    - Lecture Summary Google Slides
    - Lecture Summary Slides PDF
    - Lecture Summary with Speaker Notes PDF

Course Components

There are three main parts of the course: Assignments, Exams, and Social Learning.

Assignments

The course has 5 assignments that involve programming and writing. For each there will be:

A problem set that has practice problems related to the assignment
A programming project where you create a program that does something interesting and that we automatically test
Design questions about the programming project

Grading: There are 6 points available for each assignment.

2 points for the corresponding problem set (2 points for fully correct and complete by the deadline, 1 point for somewhat complete at the deadline, 0 points for incomplete or very little completed at the deadline).
2 points for program correctness (2 points for perfect or nearly-perfect, 1 point for significant errors, 0 for programs that work on very few or no examples)
2 points for design questions (2 points for perfect or nearly-perfect, 1 point for minor errors, 0 points for major errors or incomplete)

After each assignment is graded, you'll have a chance to resubmit all components based on the feedback you received, which will detail what you need to do to increase your score. You can increase your score by up to 1 point in each component on a resubmit.

This is also the only late policy for assignments. Unsubmitted assignments are initially given a 0, and can get a maximum of 3 points on resubmission.

Exams

There will be 3 exams during the quarter and make-up opportunities in finals week.

Exams will use a mix of the testing facility in AP&M B349 (weeks 4 and 8), and our regularly-scheduled lab times (week 10). You will schedule your week 4/8 exam on PrairieTest by logging in with your @ucsd.edu account. You can schedule the exam at a time that's convenient for you in the given exam week, and you will go to that lab and check in for your exam at the time you picked. The exam will be proctored by staff from the Triton Testing Center (not by the course staff from this course). No study aids or devices are allowed to be used in the testing center. You will need only a photo ID and something to write with (scratch paper is available on request). The Triton Testing Center has shared a document of rules and tips for using the testing center.

The exams will be administered through PrairieLearn and PrairieTest. The exams will have a mix of questions; they will typically include some that involve programming and interacting with a terminal.

We'll know the precise scheduling by the end of week 1 for both the exams during the quarter and the final exam.

Exams are (necessarily) cumulative. They typically focus on content that has already had a due date (e.g. PAs, problem sets, design questions, and labs that have their work due before the exam period starts), and typically don't focus on specific details only found in textbook chapters or mentioned conversationally in lecture. However, that's only a statement about the overall design, not a hard constraint: we will never promise that course content from any particular week, lecture, assignment, textbook chapter, or other course component won't appear on an exam. We say this publicly in the syllabus to avoid misunderstandings (“the professor/TA/my friend told me not to study that”) and to make it clear that all the content matters (we wouldn't assign it if it didn't).

Learning (in college) is in part, a social process. Your engagement with it forms part of your grade for the class.

Lectures are designed to introduce you to concepts, show how I (Joe) approach problems, give you a chance to ask questions about content, connect course content to the world outside the classroom, talk to your classmates and the staff, and provide an overarching story for the material.

Most lectures will come with two on-paper activities that are handed in for credit:

A review check, near the beginning of class. These serve mainly as a way to check in on content from previous lectures before starting the content for the day, and for us to get some information about how everyone is following along with lecture content. Don't think of them as quizzes, rather as an engagement check. You can discuss these with your classmates, and they are graded on completeness rather than correctness.
An exit slip at the end of class (last 5 minutes). These serve as a way to get feedback from you on the lecture; they will often ask you to restate or summarize a learning outcome, or say what the “fuzziest” concept from lecture was for you.

We say these will happen in “most” lectures because sometimes we may skip one or both for various timing or logistics reasons.

Grading: Each lecture has 3 social learning points available:

2 for the review check (only 1 point if handed in but incomplete)
1 for the exit slip

If you miss either or both during class, you can submit the makeup assignment later for 1 point total. This is the only way to make up credit for missing a lecture.

The makeup assignment requires both the review check and exit slip, regardless of whether you turned one in during class. So:

If you only hand in one during class, you can submit the full makeup assignment to earn 2/3 points for that day.
If you miss lecture entirely, you can submit the full makeup assignment to earn 1 point.

All resubmissions must be before the next lecture to count, with no exceptions.

The course's lab component meets for 2 hours. In each lab you'll switch between working on your own, working in pairs, and participating in group discussions about your approach, lessons learned, programming problems, and so on.

The lab sessions and groups will be led by TAs and tutors, who will note your participation in these discussions for credit. Note that you must participate, not merely attend, for credit.

Grading: For each lab there are 4 possible points to earn:

1 for being present
2 for being an active, professional, productive participant while present
1 for submitting and/or getting work checked off that was done in lab

If you miss lab, you can earn the submission/check-off point by submitting the check off work before the next Tuesday at 11am, but cannot earn the 3 points related to participation. There is no way to make up a lab beyond this 1 point, even for illness, travel, or emergencies. Our preference would be to require all 10 labs for an A, and have some kind of excused absences. However, tracking excused absences doesn't really scale, so this policy is how we handle it.

There are discussions on Tuesdays and Thursdays, led by the course staff, for going over problem set problems and the programming assignments.

Grading: Attendance is taken, and attending discussion gives 1 social learning point.

Grading

Each component of the course has a minimum achievement level to get an A, B, or C in the course. You must reach that achievement level in all of the categories to get an A, B, or C.

A achievement:
- ≥85 social learning points (may adjust depending on the total number of available lecture points)
- ≥27/30 assignment points
- ≥10/12 exam points
B achievement:
- 70-84 social learning points
- 23-26 assignment points
- 8-9 exam points
C achievement:
- 55-69 social learning points
- 20-23 assignment points
- 6-7 exam points

Below C achievement (in any category) is an F/No Pass.

Pluses and minuses will be given based on performance on social learning and assignments (exam grades will not be used to determine plus and minus grades; this is to avoid extra load on retake testing). We don't publish the exact cutoffs in advance.

Requests to change this grading policy (for a specific student or class-wide) will be denied with a link to this syllabus section. Consider this: we may, as instructors, decide for academic reasons that the most accurate way of assigning letter grades in the class needs to change (and we tend to only make changes that improve letter grades relative to this starting policy). However, it would be inappropriate for us to do so in response to student requests: that could create an appearance that we give students the grades they ask for rather than the grades they earned.

Policies and Perspective

CSE29, Large Language Models, and You

Evidence suggests that experts can generate (some kinds of) programs more quickly with the assistance of large language models. But, they did not become experts by using them.

We offer the following metaphor as guidance:

Consider physical conditioning, let's take running for fitness in particular. There are many machines in the world that are remarkably effective at moving people around: electric scooters for example. Taking a scooter for a few miles gets you to your destination more quickly and less sweatily than running, but completely misses the point of the run. The goal of running for fitness has little to do with getting to a destination and everything to do with the changes that happen inside your body. Many people can run the same miles on the same road and get the benefits from it, despite them all doing the same work that a machine (the scooter) could have done.

Fitness is just one example like this. Similar comparisons make sense for learning to play an instrument, perform onstage, or upskill at games like billiards, darts, or Fortnite.

The programming in this class is like running for fitness (doing the typing and thinking on your own), not about getting to a destination quickly (handing in a completed program). The answers to many programming problems are known, or close-to-known, and many tools exist that can generate part or all of them, and there are many people who can write them faster than you. We assign the work not because knowing a solution is particularly important, but because the act of creating the solution changes you.

So, you should not use AI to generate code or prose for work in this class.

That said, it's relatively difficult for us to detect AI usage on take-home work. It's also useful to know what AI and code generation can do. So feel free to experiment, and do not read this policy as prohibiting, or assigning formal consequences to LLM use. A good way to go about learning in this course is to do the work yourself first, then go back and see how the LLM would have done it. Then you may be able to notice things, like the LLM generating extra unnecessary code, or doing something in a more or less complex way than you did it, or maybe doing exactly what you did! The key thing is that you developed a sense of what a solution was supposed to look like, rather than trusting the version from the machine.

(Also, the exams require that you program on your own, without the aid of any AI assistants.)

Device Policy

To a large degree, you are responsible for managing your time, attention, and learning in lecture, and I hesitate to make policies that attempt to govern or restrict your choices with respect to note taking or device use. However, lecture is a communal space, and your actions can affect others' learning. In particular, what you have on your screen may be unavoidably in the field of view of other students. Because of this, you are responsible for a fragment of the attention of everyone in a cone of space behind you. With this in mind, the policy for lecture is that if you use a device, you must have lecture-related content onscreen. There is even research that shows that the content of screens in the classroom, even quite far away, can have a detrimental effect on learning. If you cannot resist checking social media, playing a game, or doing other off-topic tasks during lecture, sit in the back 2 rows so that you are only having an effect on your own attention, or the attention of others with a similar mindset.

Assignments and Academic Integrity

You can use code that we provide or that your group develops in lab as part of your assignment. If you use code that you developed with other students (whether in lab or outside it), got from Piazza, or got from the internet, say which students you worked with and a sentence or two about what you did together in CREDITS.txt. All of the writing in assignments (e.g. in open-ended written questions) must be your own.

You can use an AI assistant like ChatGPT or Copilot to help you author assignments in this class. If you do, you are required to include in CREDITS.txt:

The prompts you gave to the AI chat, or the context in which you used Copilot autocomplete
What its output was and how you changed the output after it was produced (if at all)

This helps us all learn how these new, powerful, and little-understood tools work (and don't).

If you don't include a CREDITS.txt and it's clear you included code from others or from an AI tool, you may lose credit or get a 0 on the assignment, and repeated or severe violations can be escalated to reports of academic integrity violations.

Exams and Academic Integrity

It is a violation of academic integrity to share details of your exam with others until after you receive your grade for it. Keep in mind that the exams are randomized to discourage casual cheating, so your exam may not be the same one that others see. It is a violation of academic integrity to communicate with anyone other than the official proctors during the exam, or to use devices other than the ones you are using to complete the exam.

Lab and Professionalism

Lab work will be highly collaborative, and designed to encourage communication between students. Some of the activities will have you talk to other students, or exchange code, ideas, or commands with other students, and write down what happened.

In all of this communication, remember to be polite, professional, and focus on the work. A huge part of the job of a working software professional or researcher is professional and clear communication.

Some tricks for this: avoid statements that reference the author of the code, frame negative feedback as possible improvements or ways your expectations were violated, and take responsibility for anything you don't understand.

Examples:

Don't say “You made a mistake here: ...” Instead say “On line I expect the condition to be true but I think it will be false because XYZ ... we could demonstrate that by doing ABC.”
Don't say “It seems like you don't know about...”, instead say “On line 10-12, these 3 lines could be shortened into one line using grep -r instead of find followed by grep: ....” And use some judgment: it's likely that some of this advice is helpful and constructive, and some of it is just showing off.
Don't say “This code is (going to be) slow...” Instead wait until you can run it and try an example that demonstrates it “This code took 10 seconds to run for input XYZ and that seems like a lot. When we were writing ... I think I noticed that we did X, if we did Y instead I think it will go faster
Don't say “That conditional is ugly:...” Instead say “I find it easier to read these conditions when they are written as .... because ...” If you don't have a good explanation to put after the “because”, how do you know it's a good suggestion? Refer also to the advice about not showing off
Don't say “You wrote lines 20-24 very confusingly.” Instead say “I'm having trouble understanding lines 20-24. It would help me to work through an example of how that part is supposed to work”

FAQ/AFQ (Anticipated Frequent Questions)

Q: I am waitlisted for one of the sections at position N. What should I do?

Come to class and do all the work as usual (you will have Canvas access to get all the needed links) so you can be up to speed if you do get a spot. This is aligned with official department policy.

We can't make accurate predictions about chances of getting into the course from a specific waitlist position, or control that process as instructors.

Can I attend a lab section other than the one I'm enrolled in?

No, please do not try to do this. The lab sections have limited seating and are full. We cannot accommodate switching.

How can I switch sections?

You have to drop and re-add (which may involve getting [back on] the waitlist). Sorry.

Can I leave lab early if I'm done or have a conflict?

The labs are designed to not be things you can “finish”. Labs have plenty of extension and exploration activities at the end for you to try out, discuss, and help one another with. Co-located time with other folks learning the same things is precious and what courses are for. Also, if you need an extrinsic motivation, you won't get credit for participation if you don't stay, and participate, the whole time. We can often accommodate one-off exceptions – if you have a particular day where you need to leave early, it's a good idea to be extra-engaged in your participation so your lab leader can give you participation credit before you leave.

Do I have to come to lab?

Yes, see grading above.

I missed lab, what should I do?

The lab page will have instructions on how to submit the make-up, which can get you 1 (of 4 possible) points.

I missed my exam time, what should I do?

Stay tuned for announcements about scheduling make-ups in final exam week.

Where is the financial aid survey?

We do this for you; as long as you submit something in the first two weeks, we will mark you as commencing academic activity.

When are the midterms scheduled?

The midterms will be flexibly scheduled during the quarter using a testing center. More details will come; you will need to set aside some outside-of-class time to do them, but there is not a specific class-wide time you have to put on your calendar.

I have a conflict with the final exam time, what can I do?

The final exam will also be flexibly scheduled during final exam week using the testing center.

Lab 1

January 7, 2026

Welcome! Lab setup

Fill out the Lab 1 Form

If you computer is Windows and looks like this windows

Restart the computer and when shown the following menu bootmenu

Select the top option: "Linux Mint 21.1 Cinnamon" by using the arrow keys and pressing enter.

If you were not presented with this menu, restart and hold f12 has it reboots.
Then on this menu ubuntu

Select the ubuntu option as shown using the arrow keys and press enter to reach the first black menu.

Then you can log into the computer you are sitting at (use your “Active Directory” username and password – the same one you use for Gmail/Canvas)

References

Terminal

On the lab machines, you can open a terminal either one of two ways:

On the toolbar at the bottom of the screen, there are a few icons on the far right. The last one (shaped like a black rectangle) will open a new terminal.

⠀

Right-click on the desktop, and a menu of options will appear. Select the option to create a new terminal.

⠀

The following are some commands you can run in your terminal. To run a command, type it in and press enter.

`pwd`

Prints the name of the working directory, which is the location in the file system that the terminal is currently in.

`whoami`

Prints the username of the account you are using on the computer.

`uname -a`

Prints information about the computer that the terminal is connected to.

`cd [directoryname]`

Changes to the directory named [directoryname]. The special directory .. takes you up one level.

`ls`

Prints a list of all the files and directories that are in the working directory.

`mkdir [directoryname]`

Creates a new directory called [directoryname].

`touch [filename]`

Creates a new file called [filename].

`rm [filename]`

Removes the file called [filename].

`man [commandname]`

Displays a documentation page for the terminal command [commandname].

Logging into ieng6

`ssh yourusername@ieng6.ucsd.edu`

Your account name is the same account name as the one that’s used for your school Google account, i.e. it is the string that precedes “@ucsd.edu” in your school email address. In case you need to check the status of your student account, refer to the UCSD Student Account Lookup page.

Your password is the same password that you use for your student account on other websites (i.e. Canvas). The terminal does not show the characters that you type when you enter your password.
The first time you use this command, you will get a message like this:

⠀$ ssh yourusername@ieng6.ucsd.edu
The authenticity of host 'ieng6.ucsd.edu (128.54.70.227)' can't be established.
RSA key fingerprint is SHA256:ksruYwhnYH+sySHnHAtLUHngrPEyZTDl/1x99wUQcec.
Are you sure you want to continue connecting (yes/no/[fingerprint])?

Copy and paste the one of the corresponding listed public key fingerprints and press enter. ¹

If you see the phrase ED25519 key fingerprint answer with: SHA256:8vAtB6KpnYXm5dYczS0M9sotRVhvD55GYz8EjN1DYgs
If you see the phrase ECDSA key fingerprint answer with: SHA256:/bQ70BSkHU8AEUqommBUhdAg0M4GaFIHLKq0YQyKvmw
If you see the phrase RSA key fingerprint answer with: SHA256:npmS8Gk0l+zAXD0nNGUxr7hLeYPn7zzhYWVKxlfNaeQ

Getting the fingerprint from a trusted source is the best thing to do here. You can also just type “yes”, as it's pretty unlikely anything nefarious is going on. If you get this message when you're connecting to a server you connect to often, it could mean someone is trying to listen in on or control the connection. This answer is a decent description of what's going on and how you might calibrate your own risk assessment: Ben Voigt's answer

PrairieLearn

PrairieLearn URL: us.prairielearn.com

Enroll in CSE 29: Systems Programming and Software Tools, Winter 2026 at https://us.prairielearn.com/pl/enroll

Complete parts 1 and 2 of the vimtutor demo. (Run “vimtutor” in the terminal)

Lab Checkoff Point (1/4)

Complete the vimtutor demo on Prairie Learn.
Submit the Lab 1 Prairie Learn assignment.
Fill out the Lab 1 form.

Part 1: git motivated

git: command line program that enables version control (in case you break it)

Repository: folder containing code

GitHub: website that holds (repos)itories; can contribute and share with others

Part 2: SSH Keys on GitHub

Command to get the Github People repo

$ git clone git@github.com:ucsd-cse29/lab2-git-people.git

This command should fail. We need to set up SSH Keys on GitHub to fix that!

Setting up SSH Keys on GitHub

Recall from last lab how to log into your ieng6 account:


$ ssh yourusername@ieng6.ucsd.edu

Please log into ieng6. Within your ieng6 account, use the following command to generate a new key pair, replacing github_email with your GitHub email address:


$ ssh-keygen -t rsa -b 4096 -C github_email

You’ll be prompted to “Enter a file in which to save the key”. Press Enter to accept the default location. You’ll then be prompted to enter a passphrase, which isn’t really necessary. Press Enter twice to continue without setting a passphrase. Though if you really want to set a passphrase, refer to GitHub docs on passphrases.

Adding your SSH key to GitHub

By default, the public SSH key is saved to a file at ~/.ssh/id_rsa.pub.

Instead of typing out the whole filename, you can type out some prefix of the name (e.g. ~/.ssh/id), and press Tab to autocomplete the name. In this case, tab complete won’t complete the full filename, since the private key happens to be named id_rsa. Please be too lazy to type out entire filenames and use tab complete instead!

View the contents of ~/.ssh/id_rsa.pub (using cat), then copy the contents of the public key file to your clipboard.

On the GitHub website, click your profile picture in the top right to open a menu, and click on “Settings”.

Click on Settings in your GitHub account menu

On the left, open “SSH and GPG keys”, then click on “New SSH key”.

Go to "New SSH key"

Populate the fields as follows:

Title: “ieng6”
- The title doesn’t affect the functionality of the key, it’s just a note for you that this key is tied to your ieng6 account.
Key type: “Authentication key”
Key: Paste the contents of the public key file here (entire block including the email portion).

Go to the "Key" box

Click “Add SSH key”. You may need to confirm access to your account on GitHub at this point.

Testing your SSH key

Finally, test your connection to GitHub with the command:

$ ssh -T git@github.com

If this is your first time connecting to GitHub, you might get a warning about the “authenticity of host can’t be established”. This is a warning for you to make sure that you’re connecting to the right thing. For the purposes of this lab, we assume that GitHub didn’t suddenly get hacked, so you can safely respond with “yes”. But if you’re really paranoid, you can check GitHub’s public key fingerprint here.

After a successful connection, it should output Hi <your-username>! You've successfully authenticated, but GitHub does not provide shell access.

Command to get the Github People repo (Take 2!)

If you set up the SSH Keys correctly in the previous steps, now you should be able clone the repo:

$ git clone git@github.com:ucsd-cse29/lab2-git-people.git

Part 3: Whiteboard Activity - UTF-8 Strings

★彡:)
¿Sí?
ㅋ😂!
Jé😀

Part 4: Submitting the PA (Lab Work Checkoff)

PA 1 GitHub Classroom Link: https://classroom.github.com/a/pff3Avby

To get the final 1 point for lab work checkoff this week (out of 4 total points), you just need to make a submission on Gradescope for pa1!

Important git commands

$ git status

Gets the status of our repository. It returns which files are untracked (new) modified (changed) and deleted.

$ git add [filename(s)]

When we are done making changes to a file, we "stage" it to mark it as ready to be committed. Using the git add command with the path of the changed file(s) will stage each to be included in the next commit.

$ git commit -m “message”

A commit is a package of associated changes. Running the git commit command will take all of our staged files, and package them into a single commit. We specify the -m option to specify we want to write a commit message, and write our commit message in the “”. Without -m, git opens a vim window to write the commit message.

$ git push

Pushes the commit with our changes to the remote repository (i.e. GitHub).

Part 1: SSH Keys to ieng6

Note: if you have a personal laptop you can set up ssh keys on it to be able to ssh to ieng6. If you're on windows, you’ll need a linux-like terminal, so use the terminal in VSCode, a program like git bash, or WSL for this.

If you don’t have a laptop or it isn’t set up with a linux terminal, work from your lab computer for today. You can set up ssh keys on your laptop at home on your own time.

Generate a new pair of ssh keys (on your laptop, or on your lab computer)

cd to your .ssh directory, (if it doesn’t exist, first create it with mkdir ~/.ssh)
IF YOU ALREADY HAVE AN 'id_rsa', SKIP THIS STEP (otherwise you'll overwrite your github keys)
If you don't have an id_rsa, run: ssh-keygen -t rsa -b 4096 -C YOUR_EMAIL_ADDRESS
- You'll be using the default filename and no password, so just press enter twice
Copy the contents of your new public key (you can print out the contents with cat)

Copy your public key onto ieng6, into the file ~/.ssh/authorized_keys

On ieng6, in your .ssh directory, create a file called authorized_keys (or open it if it exists already)
Use vim to paste in the contents of the new public key into the authorized_keys file
- (if your authorized_keys already existed, paste the new key on a new line at the end)

Log out from ieng6 (use exit), and make sure you can ssh into it without typing your password

If you used a different name than id_rsa, you have to ssh with a -i [private key path] flag

Color ls

Use vim to edit the following file in your home directory: vim .bash_profile
Add a line to the end of this file (you can scroll to the bottom of the file, and press ‘o’ to enter insert mode on a new line): alias ls="ls --color"

if you have file permission issues, you can use sudo chmod u+rwx ~/.bash_profile (or search up ways to fix permission issues for your .bash_profile)

Part 2: Debugging with -Wall

Clone the repository we’ll be using for this lab (you’ll need to get the ssh URL). You’ll be working with one of the three programs, depending on your group: https://classroom.github.com/a/tYCFrrGl

Many common errors can be caught by the compiler, but a lot of these checks aren't enabled by default. We can ask the compiler to warn us by adding the -Wall flag ("- W(arn) all") when you compile:

$ gcc -Wall YOUR_CODE.c -o YOUR_CODE

Part 3: Debugging with `gdb`

Intro to gdb commands:

To use gdb, make sure you’ve compiled your program with “debug symbols”

$ gcc -g YOUR_CODE.c

Run your compiled code in gdb

$ gdb YOUR_BINARY

This puts you into a gdb prompt: normal terminal commands don’t work here, and you can instead run gdb-specific commands

(gdb) run     # starts running your program

If your program stops running or segfaults in gdb, print out the backtrace (also called a “stack trace”)

(gdb) backtrace

(gdb) bt

Use the following commands to print out values at the point the program stopped:

(gdb) info locals
(gdb) info args
(gdb) print VALUE (or p VALUE)
- You can print any variable or expression, e.g.
  - print x, p arr[5], p ((x & 0b1111) << 3)
- You can also specify a format to print in
  - print/t (binary), print/x (hex), print/d (decimal)
(gdb) x ADDRESS
- This prints out memory at an address, e.g. strings / arrays / pointers
(gdb) x/16cb str1
- This prints the first 16 bytes of str1 as characters
(gdb) x/20xb str2
- This prints 20 bytes of str2 in hex
(gdb) x/4dw arr
- This prints 4 “words” (i.e. int32s) of arr, as decimal numbers
You can use the following reference card for reference on gdb commands, and format commands for x and print.

If you’re done early, use the reference card to try exploring other gdb commands!

Lab 3 Work Check-Off (due before next Tuesday lecture):

In your copy of the lab3-buggy repo: fix the bugs in your assigned program, then commit and push your changes so they show up on GitHub.

Lab 4 Reference Document

Part 1: `.vimrc`, `.bash_profile`, and 'ssh config'

In the file ~/.bash_profile on ieng6, add:

alias ls="ls --color"

Create the file ~/.vimrc on ieng6, and copy in:

set tabstop=4
set softtabstop=4
set shiftwidth=4
set cindent
set number

On Your computer (not ieng6) create the file ~/.ssh/config:

Host <shortcut> <possibly more shortcuts>
    HostName <the host name of the ssh server>
    User <your username on the ssh server>

Part 2: GDB Debugging

To run a program in GDB with command-line arguments, use the --args flag:

gdb --args ./program arg1 arg2 arg3

New GDB commands

`start [ARGUMENTS]`

Starts the program with the specified arguments and pauses at the first line of main.

`break LINE_NUMBER`

Sets a breakpoint at the specified line number. (Breakpoint: a line in the program at which GDB will stop at before the line executes.)

`next`

Executes to the next line of code.

`step`

If the next line contains a function call, steps into the first line of the function. Otherwise, behaves the same way as next.

`continue`

Continues running the program until the next breakpoint.

`layout src`

Shows a view of the code on the top, and a window for gdb interactions below.

Screenshot of GDB

The line that is shown above the (gdb) prompt has not run yet. It is the line that will run if you type next.

Activity

(clone the github classroom repo from here: https://classroom.github.com/a/HivgO-T6)

Part 3: Hacking

3.1. Background

Imagine that you're a less ethical student than I'm sure you actually are. You overhear from some other students in lab that there's a binary available on the pi-cluster that can show you your grades on assignments before we formally release them. You hear quieter whispers that someone found a way to use it to change their grade. Given our less-than-ethical assumption about your state of mind, you might be tempted to exploit this for yourself.

3.2. The Plot Thickens

You see some code open on the professor's laptop during office hours. You do your best to commit it to memory and write it down (remember, you're acting quite unethically in this story), because it strikes you that the code was something regarding assignment scores.
gradebook source code

Using this information, you decide to give yourself and A with a score to match while maintaining a real due date.
HINT When important values are adjacent on the stack, overflowing an array with values that you control can let you assign into other stack-allocated values.

GDB commands that may be useful for this activity:

(gdb) info locals
(gdb) info args
(gdb) print VALUE (or p VALUE) You can print any variable or expression, e.g.
print x, p arr[5], p ((x & 0b1111) << 3)
You can also specify a format to print in
print/t (binary), print/x (hex), print/d (decimal)
(gdb) x ADDRESS This prints out memory at an address, e.g. strings / arrays / pointers
(gdb) x/16cb str1 This prints the first 16 bytes of str1 as characters
(gdb) x/20xb str2 This prints 20 bytes of str2 in hex
(gdb) x/4dw arr This prints 4 "words" (i.e. int32s) of arr, as decimal numbers
You can use the following reference card for reference on gdb commands, and format commands for x and print.

Work Check-off

Push a copy-paste or screenshot of your gdb session from part 2 when investigating index_of_E to the github classroom assignment for this lab.

Lab 5 Reference Document

Part 1: `.swp`

rm .my_program.c.swp
Press E for last save you made and
Press R to recover if you didn't save and it is in the .swp file

Part 2: Writing a Search Program

(clone the github classroom repo from here: https://classroom.github.com/a/tRWujWB3)

For this lab, you'll be testing your program on the "rockyou" password list from your PA2, you can copy it over from this path on ieng6:

/home/linux/ieng6/CSE29_WI26_A00/public/pa2/rockyou_clean.txt

You'll be writing "mysearch.c". Your program should take 1 argument (the string to search for). It should read lines from standard input, and print out all of the input lines that contain the search string.

./mysearch PATTERN

You can use the strstr() function to search for a string within another string.

Part 3: Extra Options

Expand your program to handle extra flags:

./mysearch -n PATTERN : print line numbers before each line

./mysearch -v PATTERN : print only lines that don't contain PATTERN

./mysearch -c PATTERN : don't print pattern matches, just print out the count of matching lines at the end

Each person in your group should implement a different one of these; you'll need all three to do the whiteboard activity.

You can use the strcmp() function to check whether two strings are equal.

Make sure you keep the original functionality of the program without flags!!

Work Check-off

Commit and push your code. You should have at least one of -n, -c, or -v implemented.

Sample Test Cases

$ ./mysearch -n alpaca < /home/linux/ieng6/CSE29_WI26_A00/public/pa2/rockyou_clean.txt
49372: alpaca
261400: alpacas
646255: englandalpaca
2388349: hugoalpaca
2413735: hildaalpaca
3132574: babyalpaca
3253313: alpacast
$ ./mysearch -v a < /home/linux/ieng6/CSE29_WI26_A00/public/pa2/rockyou_clean.txt | head -n 10
iloveyou
princess
nicole
monkey
lovely
qwerty
iloveu
tigger
sunshine
soccer
$ ./mysearch -c alpaca < /home/linux/ieng6/CSE29_WI26_A00/public/pa2/rockyou_clean.txt
7

If done early, implement some of the following (in no particular order)

Make your program handle all 3 of -n, -v, and -c.
Add the option -i, for case-insensitive search (i.e. search -i pattern should match lines containing "Pattern" or "PATTERN")
Use ANSI Escape Codes to make your program bold or highlight the matches in every matching line, either always or with a --color option. (more info on Wikipedia)
Add the option -A to print extra context after a match, so e.g. search -A 2 pattern would print an extra 2 lines after every match.
Add the option -B to print extra context before a match (search -B 2 pattern would print an extra 2 lines before every match.) (Note: to be able to do this in C with the tools we've seen so far, you might need to set an upper limit on how many lines back your program will be able to support)
Make your program handle options more flexibly, e.g. it could:
- be able to handle multiple options simultaneously
- accept arguments in any order
- exit with a help message for unrecognized options (e.g. -f, -o) instead of treating them as search patterns. (Or if you pass it the -h or --help options)
- if it sees the option "--", treat all following arguments as search patterns, even if they start with a "-". (This is how actual command line programs allow you to search for the string "-n" instead of specifying the "-n" option.)
- allow long versions of options, e.g. --invert-match, --line-number, --count, etc

Lab 6 Reference Document

(clone the GitHub Classroom repo from here: https://classroom.github.com/a/EM-1PiRu)

Part 1: Header Guards and Makefiles

Header Guards

#ifndef EXAMPLE_H
#define EXAMPLE_H

struct example {
    char *str;
};

#endif

Header guards prevent the content within them from being processed multiple times by the compiler. This can be problematic if the header file intends to define any symbol, not just declare them. In the example above, the header guard ensures that struct example is defined at most once. Let's illustrate the utility of header guards with a concrete example.

After cloning the Github classroom repository onto ieng6, cd into 2lab6-headers-and-makefiles. Then cd into headers and inspect the contents of the five files inside. These files together represent 3 "modules" with the following dependency graph:

header_diagram

When the compiler reads test.c, its preprocessor will process span.h twice: once through the direct arrow pointing to span.h and once through queries.h, which also points to span.h. As a result, the contents of span.h will be "pasted" into the source file twice. Since span.h contains a struct definition for struct string_span, this definition will be repeated twice. Try the following compilation command to see what this causes:

$ gcc span.c queries.c test.c -Wall -o test

The compiler seems to be confused by the duplicated definition for struct string_span, which is the first error it reports.

To Do: Use what you have learned about header guards to fix this compiler error! Please note that by convention, everything in a header file is wrapped in a header guard.

Makefiles

Part 2-0: Recipes and Dependencies

Exit the headers directory and enter the part2-0 directory. We have given you an example Makefile that illustrates its basic structure. A Makefile mostly consists of "rules", which have the form:

target: dependencies
	recipe

In a lot of ways, you can think of defining rules in Makefiles like defining functions in C, but there are important differences.

The target could be thought of as the name of the rule. We use the target to tell make which rule should be used. Unlike functions, Makefiles expect targets to be the names of files.
The dependencies are files or other targets that the creation of the target depends on. For C programs, these dependencies are usually source code and object files.
The recipe contains the commands that are executed when make uses this rule. Recipes can have one or more different commands to be executed sequentially.

Use this explanation to understand the contents of the Makefile in part2-0. Try running make with the cse100 target to ask Make to build cse100 along with its dependencies:

$ make cse100

Pretty cool! It might also be useful to clean out and remove any files that got produced if we want to run make again, like all the cse files that we just made in this Makefile. We do that using:

$ make clean

This will remove all the files that start with cse. Now, after running make clean, we can run:

$ make cse30
$ make cse100

See what files get made at each step! Try out the other targets as well, like cse12 and cse29.

Part 2-1: Makefile for One

Exit the part2-0 directory and enter the part2-1 directory, where we are given a single, very simple source code file program.c. You can look at its contents, but there’s nothing there to see (or do).

It’s not necessary to define dependencies, but we often do because Makefile automatically checks if any of its dependencies have changed more recently than the target file. If not (and if the target file already exists), then make does not bother to execute the recipe, because the target file must already be up to date. This means that make will only execute the recipe if the target file doesn’t exist, or one of its dependencies is more recently updated than the target file.

A typical example of a rule for C is the one below:

program: program.c
	gcc -Wall -g -o program program.c

In this rule, the target is program, which is the executable file we want to create with this rule. The recipe is a gcc command to produce program, which you would normally run manually in the terminal. Since we define program.c to be a dependency of this rule, this means that program will only be recompiled if program.c is more recently updated than program.

This rule example just so happens to work perfectly for the Makefile we want to write in this section, so fill in your Makefile with this rule. Please note you should create your Makefile, for example, using the command touch Makefile.

After writing this rule into the Makefile, you can then run make with the target to run the compilation command in the recipe:

$ make program

Notice that make prints out the recipe command, and, if you check the contents of the directory, executes that command to compile program.c. Try running make program again to see that make refuses to recompile program, because it’s already up to date. Then make a small change to program.c, and run make program again to see that it recompiles if program.c is changed.

This Makefile has already greatly simplified our workflow: instead of typing 33 characters to compile the program, you can type just 12 characters instead. But we can do even better! Add the following rule to the top of the Makefile:

default: program

This rule creates the default target with the program target as the sole dependency and no recipe. Since it is the first rule to appear in the Makefile, running make by itself will default to executing it, which in turns executes the "program" rule as needed. Technically, you could rename the target from default to something else, and the behavior of the make command by itself would stay the same.

Running and Cleaning

Although recipes typically contain commands used to create their corresponding target files, recipes can also contain any other commands you could run in the terminal. As such, some other common uses for Makefiles are to run a program and clean up after a program.

For this program, the rule for running program could be defined as:

run: program
	./program

This simple rule depends on the program target, meaning that it will automatically recompile program if necessary, and run the program. In this case, the target is not a file that we expect to compile, just a convenient name that we use to use this rule.

Similarly, we also often define a rule to clean up files that are produced from the build process. This specific example does not produce any, but sometimes it is also desirable to clean up the target file itself in order to recompile without changes to the source code.

clean:
	rm program

In most cases, this will work without issue, but in the rare case that you create a file called “run” or “clean”, the corresponding rule won’t work properly anymore. This occurs because make does not recognize that “run” and “clean” are not supposed to be files. So when a file of that name is created, the standard behavior of make causes our intended functionality of these two rules to fail: make will not use these rules unless that file no longer exists or a dependency updates. If you want to, try making a file called “run” or “clean” to see this happen.

In order to account for this edge case, we can manually define run and clean to be phony targets. A phony target doesn't really refer to a file; rather it is just a recipe to be executed when requested.

.PHONY: run clean

After defining these rules, your Makefile might look something like this:

default: program

program: program.c
	gcc -Wall -g -o program program.c

.PHONY: run clean

run: program
	./program

clean:
	rm program

All the rules (and phony target definition) can be defined in any order, except default must be placed at the top in order to be executed when you run $ make by itself.

Part 2-2: Makefile for Many

In this section, we’ll show multiple valid Makefiles for the programs in the part2-2 directory. As you follow along, pick one and use it to compile all three programs.

When we have multiple programs to be compiled in a single project, we could create a Makefile with rules for each:

default: program1 program2 program3
program1: program1.c
	gcc -Wall -g -o program1 program1.c
program2: program2.c
	gcc -Wall -g -o program2 program2.c
program3: program3.c
	gcc -Wall -g -o program3 program3.c

Notice how much repetition there is between each rule here. In this case, the repetition is just mildly annoying, but if you have more independent programs (like I do when designing lab activities), mildly annoying becomes very annoying! We’ll see how we can reduce repetition in two different ways that we’ll use together to create a very concise and flexible Makefile.

Variables

Like in C programs, you can also define variables in Makefiles. But unlike C programs, where defined variables are allocated in some memory when the program is run, variables in Makefiles just represent some string value. This lets us reduce the amount of repetition when we want to, for example, change the gcc flags to use in all rules. As such, some common values we can define as variables are the compiler command and its flags:

CC = gcc
CFLAGS = -Wall -g

default: program1 program2 program3

program1: program1.c
	$(CC) $(CFLAGS) -o program1 program1.c
program2: program2.c
	$(CC) $(CFLAGS) -o program2 program2.c
program3: program3.c
	$(CC) $(CFLAGS) -o program3 program3.c

The variables CC and CFLAGS are defined with the values gcc and -Wall -g, respectively. Then we use these variables in each of the recipes. Note that there is a special syntax when we use the variables: $(X), where X is the variable name. This syntax tells the Makefile to expand the variable X to use its value, instead of interpreting "X" as a literal string.

Pattern Rules

Each of these three rules have a similar pattern: each one is identical to the others except for a single number that changes. To eliminate this repetition, we can merge these rules into one pattern rule:

CC = gcc
CFLAGS = -Wall -g

default: program1 program2 program3

program%: program%.c
	$(CC) $(CFLAGS) -o $@ $<

A couple of new symbols were introduced in this pattern rule:

A target with a "%" character creates a pattern rule. The "%" in the target can match any non-empty string, then for each corresponding match, the "%" has that same value in the dependencies. For example, this rule matches program1, program2, and program3 and defines their respective dependencies program1.c, program2.c, and program3.c. This will also define dependencies for any valid match to the target: program4 depends on program4.c, programaaa depends on programaaa.c, etc.
In a pattern rule, we use automatic variables to refer to the target and dependencies, since their exact value is not determined explicitly.
- $@ is an automatic variable which represents the target of the rule.
- $< is an automatic variable which represents the first dependency of the rule.
Other useful automatic variables are given here.

If we were really bold (which we are), we could generalize this Makefile further:

CC = gcc
CFLAGS = -Wall -g

default: program1 program2 program3

%: %.c
	$(CC) $(CFLAGS) -o $@ $<

This pattern rule now matches any name (not just names that begin with "program") to be a target, and defines its dependency to be a file with that name plus the ".c" suffix.

In this section, we’ve developed a Makefile to be increasingly more flexible, both in making future changes easier and expanding the scope of valid targets. An important point to make (pun intended?) is that each of these Makefiles is a valid Makefile for compiling the three programs given in this directory, and they have their own pros and cons. For example, a Makefile similar to the last one was used in last week’s lab to easily compile programs with different names, where the compilation process is the same across programs. However, it might be undesirable to enable the programmer to attempt compiling any file ending in “.c”. On the other hand, the first Makefile might be a good fit for a use case where we know we will customize the build process for each program, but this could lead to a very large Makefile.

Part 2-3: Linking Object Files

When we use gcc to manually compile programs, we typically compile directly from the source file to the executable program. But, the build process involves multiple steps with intermediary files. One of these intermediary files are object files, which contain machine code from a particular module (.c and .h combo) and are linked into the eventual executable file. If .class files from Java sound familiar to you, object files are like .class files. To instruct gcc to compile a source file into an object file, we add the -c flag.

Build Process (Credit: Cloudflare)

The linking process resolves symbol references between object files, meaning that functions defined in one file can be used in another. In part2-3, a long program with 50000 adder functions (each of which adds the integer in its name to the parameter and returns it) is given in adders.c. The corresponding header file, adders.h, contains function declarations to be shared between source files. Then, in main.c, we print the return value of run_adders, which calls all of the adder functions and sums their results.

We can use the following gcc commands to create then link the object files (we will run the time command so it will tell us how long each of these commands took to run):

$ time gcc adders.c -o adders.o -c 
$ time gcc main.c -o main.o -c
$ time gcc adders.o main.o -o adders

We create adders.o from adders.c, create main.o from main.c, then link the two produce the executable adders. My fingers hurt from all that typing; I wish there was an easier way to MAKE all these files...

CC = gcc
CFLAGS = -Wall -g
TARGET = adders
OBJS = adders.o main.o

default: $(TARGET)

adders.o: adders.c adders.h
	$(CC) $(CFLAGS) -c -o $@ $<

main.o: main.c
	$(CC) $(CFLAGS) -c -o $@ $<

$(TARGET): $(OBJS)
	$(CC) $(CFLAGS) -o $(TARGET) $(OBJS)

run: $(TARGET)
	./$(TARGET)

clean:
	rm $(TARGET) $(OBJS)

Here, we make extensive use of variables for the ultimate target (adders) and its prerequisite object files (adders.o and main.o) so that we can easily use these strings in multiple places, e.g. in both the compile command and in the rm command. Examine this Makefile and feel free to ask your groupmates, tutors, or TA about anything unclear.

Part 2-4: Makefile challenge in `headers` directory

Let's go back to the headers directory from the Header Guards section and open the Makefile there, which is partially completed. Complete the Makefile according to the requirements listed inside it. Feel free to copy code segments from above. Once you're done, try make to see your Makefile in action!

Lab 6 Work Check-off (Due before next Tuesday lecture)

Commit and push your code for any of the Makefile parts or span.h to your Github Classroom repo! In the commit message, name the part that you edited.

Lab 7 Reference Document

Part 1: git good

Partner Activity

Get into groups of 2, and decide who will be Partner 1 and Partner 2.

Partner 1 ONLY:

Create a new team on the Github Classroom assignment for you and your partner (DO NOT accept until we do this together in class): https://classroom.github.com/a/4gO-KCH3

Partner 2 ONLY:

Join the team that Partner 1 created.

Both:

Clone the repo to ieng6
Edit hello.c to fill in your own name in the print statement

Partner 1 ONLY:

Commit and push this change

Partner 2 AFTER PARTNER 1 HAS PUSHED:

Commit and then attempt to push this change

Work together on Partner 2's computer. Resolve the merge conflict so that the program prints Hello [PARTNER 1 NAME] and [PARTNER 2 NAME]!. Commit and push when done. (Hint: git pull and git config pull.rebase false may be useful)

Once finished:

Repeat starting from step 4 with goodbye.c and switch roles (i.e. Partner 2 commits and pushes first, then Partner 1 tries to push. Resolve conflict on Partner 1's computer)

Part 2: More Terminal Commands

`mv [FILE] [PATH]`

Move the file [FILE] to the location specified by [PATH].

`cp [FILE] [PATH]`

Copy the file [FILE] to the location specified by [PATH].

`scp [FILE] [PATH]`

Copy the file [FILE] to the location specified by [PATH]. [FILE] and [PATH] can both be located on a server. So for example, if you had Screenshot.png in your working directory, you could scp it to the home directory on ieng6 by (also filling in USERNAME with your actual username):

scp Screenshot.png USERNAME@ieng6.ucsd.edu:~/

And then you can move the screenshot to your lab7-git repo using mv.

Here's the diagram of what that looks like to scp from your laptop/desktop to the repo on ieng6, and then push to GitHub:

Work Check-off

Run git log, which shows the commit history of the repo. Take a screenshot to show that both partners made commits, and push that screenshot to the repo. (Hint: scp) Each partner should push their own screenshot to receive credit for the checkoff.

Lab 8 Reference Document

Starter Repository

Part 1: Valgrind

to run valgrind on an executable file called my_program, run:
valgrind ./my_program

the end out your output will read something like: leak

as it states, running valgrind --leak-check=full ./my_program will give more information.

Note on the -g: This flag instructs GCC to add debugging symbols to the compiled executable. These symbols enable GDB to look up and show your source code during debugging. They also enable Valgrind to show you line numbers in its error backtraces.

You are given a Makefile with the default set to compile both provided programs. you should only need to run make to compile.
To run search, recall you must provide an argument of the substring to search for, ie:
./search alpaca < rockyou_clean.txt will search through the rockyou_clean.txt file line by line, printing out lines containing "alpaca".

To run student, we have provided students.txt which uses our program and will leak memory. ./student < students.txt is how you use this in our program.

Part 2: Scripts in a nutshell and Pokemon Mailtime

Aside: The executable bit

The file permissions scheme used in Unix-style systems is designed to protect files from unauthorized access and unsafe execution. It prevents you from snooping into other users' files on ieng6. It also prevents you from executing a file by accident. Under this scheme, three bits describe the operations that can be performed on every file: Read (r), Write (w), and Execute (x). To see the settings for these bits for all the files in a directory, run ls -l and focus on the second, third, and fourth character in each row, which specify what the owner of the file is allowed to do. For example, in -rwxr-x---, rwx means the owner can read, write, and execute the file, whereas in -r--r-----, r-- means the owner can read but cannot write to or execute the file.

Why does this matter for shell scripting? Well, when you create a script file, it starts out with the Execute bit disabled, which means you can't execute the file by default. (If you try, you would see Permission denied.) To be able to execute your script, you need to set the Execute bit by running chmod +x <files to mark exacutable> using ls -l.

Note: > Several students in the past had the misconception that you need to run chmod +x every time you run the same script. Since a file's permissions are stored alongside it, permission changes are stored permanently, and script files don't unset their executable bit every time they run. So just set it once for each script!

💡 **KEY POINT**: Any command that you can run on the command line can be an instruction in a shell script.

Fill the provided task_1.sh file with some commands that you know like echo, ls, or pwd. Each command should be on a new line.
To mark the file as executable, run chmod +x task_1.sh.
Try running ./task_1.sh in your command line, and you should see the output of each command in the order you added them to the shell script.

The #!/bin/bash at the top specifies that this script should be parsed using the bash program at the directory /bin/. The leading #! is called a "shebang" and signals the start of this directive. A Python script, for example, could have #!/usr/bin/python3 as its first line instead.

As it turns out, all that is needed to complete task1 is to run a couple of commands sequentially. Here are a couple of knowledge bits that could be helpful:

The wildcard pattern *

In shell scripting, the * character acts as a way to target all files and directories. For example, if you ran cat *, you would print out the contents of all the files in your current working directory (you would likely also get error messages from trying to view the contents of folders in your directory). Furthermore, the * character can also be used to match specific patterns. For example, if you ran cat *.txt, you would print out the contents of all the files in your working directory that have the .txt extension.

The mv command

The move command, as its name suggests, can be used to move files into directories. For example, you can use it like so:
mv a.txt Books
This will move a.txt into the Books directory. You can also move multiple files at once like so: mv a.txt b.txt Books, which will move a.txt and b.txt into the Books directory.

Task 1: Books and Music

With these tidbits, you are now ready to write your (possibly) first ever shell script! Your first task will be to write a shell script in the minihome directory to print out the number of words in each .txt file, and bytes in each .mp3 file, then organize them into the Books and Music directories respectively. Please work on this with your groupmates.

Reminder: ls -R will show where all of your files currently are.

While testing your script, you may end up accidentally messing up your directory structure. If this happens, we have provided you with a reset_task_1.sh script in the minihome directory which you can run to reset the .txt and .mp3 files. This will leave your .sh files intact, however, so don’t worry about losing your progress.

Our minihome is now looking a lot cleaner. But maybe we can go further. The length of these books seems a bit varied, no? It would be nice if we were able to sort these books into short stories and novels. This will be your next task. One requirement for your script is that you should be able to supply an argument for the cutoff between a novel and a short story. Before you begin the task, first change into the Books directory.

Here are some knowledge bits that could be useful:

Shell scripting - variables

You can declare and use variables like so in shell scripting:

# Contents of demo.sh
#!/bin/bash
num=5
fruit="apples"
echo $num $fruit

$ ./demo.sh
5 apples

To reference a variable, you simply put a dollar sign in front of it. Note that this still works inside double-quoted strings, so echo "$num $fruit" would have achieved the same effect in the script above. To print out $num $fruit literally, you would need to either use single quotes ('$num $fruit') or escape the dollar signs ("\$num \$fruit").

Shell scripting - accessing command line arguments

To access the nth command line argument, use $n in your script. For example, the 1st command line argument would be accessed with $1. Note that the 0th command line argument is the name of the command you executed, as was the case in your PA 3. Here is a demonstration:

# Contents of demo.sh
#!/bin/bash
echo $1 $2

$ ./demo.sh 5 apples and more
5 apples

Note that a string wrapped in quotes counts as one argument. For example, in: $ ./demo.sh 4 "PA 3" We have $1 equal to 4 and $2 equal to PA 3.

Shell scripting - if statements

The basic structure of an if statement in shell scripting is as follows:

if [ condition ]; then
    # Commands if condition is true
else
    # Commands if condition is false
fi

Here is an example of a script that takes in a number from the user, and prints out whether or not it is greater than 5:

# Contents of demo.sh
if [ $1 -gt 5 ]; then
    echo "Number is greater than 5"
else
    echo "Number is 5 or less"
fi

$ ./demo.sh 10
Number is greater than 5

As you can see, most operators (save for string equality) in bash scripts are different from what you would see in most other programming languages; they look like a hyphen followed by an mnemonic string. Here, the -gt comparison operator checks if $1 is greater than 5. Though you likely won’t need them for this task, here is a decently comprehensive list of the different types of conditional operators you can find in shell scripting:

File/directory existence:

if [ -f filename ]; then   # Check if file exists
if [ -d directory ]; then  # Check if directory exists
if [ -e filename ]; then   # Check if file or directory exists
if [ -s filename ]; then   # Check if file exists and is not empty
if [ -x filename ]; then   # Check if file is executable

String comparison:

if [ "$str1" = "$str2" ]; then   # Strings are equal
if [ "$str1" != "$str2" ]; then  # Strings are not equal
if [ -z "$str1" ]; then          # String is empty
if [ -n "$str1" ]; then          # String is not empty

Numerical comparison:

if [ "$num1" -eq "$num2" ]; then   # Numbers are equal
if [ "$num1" -ne "$num2" ]; then   # Numbers are not equal
if [ "$num1" -gt "$num2" ]; then   # num1 is greater than num2
if [ "$num1" -lt "$num2" ]; then   # num1 is less than num2
if [ "$num1" -ge "$num2" ]; then   # num1 is greater than or equal to num2
if [ "$num1" -le "$num2" ]; then   # num1 is less than or equal to num2

Shell scripting - for loops

In Bash, there are many different types of loops, each having their use cases. Here, we’ll introduce one of those loops: the for loop. The basic syntax is as follows:

for item in [list of items]; do
  # do something with $item
done

Here are some examples of how you can use a for loop:

Looping through variables:

for var in item1 item2 item3; do
  echo "$var"
done

Looping through a range of numbers:

for i in {1..5}; do
  echo "Number: $i"
done

Looping through files in a directory:

for file in *.txt; do
  echo "Processing $file"
done

The `wc` command

Here, wc does not stand for water closet, but rather word count. Passing it a text file will print out the number of lines, words, and characters in that file followed by the name of that file. For example:

$ wc alice.txt
3757  29564 174357 alice.txt

To print out just the number of lines, words, and characters, You can use the -l, -w, and -c options respectively:

$ wc -l alice.txt
3757 alice.txt
$ wc -w alice.txt
29564 alice.txt
$ wc -c alice.txt
174357 alice.txt

If you don’t want the name of the file after, and only wanted the number for whatever reason, you can redirect the contents of the file in to the wc command like so:

$ wc < alice.txt
3757  29564 174357
$ wc -l < alice.txt
3757

Neat!

Shell scripting - command substitution

You can run a command and assign its output (as a string) to a variable using var=$(command). For example:

lines=$(wc -l < sample.txt)
echo $lines

This would store the number of lines in sample.txt into lines and print it.

Task 2: Novels and Short Stories

With the above knowledge, you can now write a script to sort text files into the Novels and Short_Stories folder based on their word length from scratch. In task_2.sh, we have provided you a base script to work off of, with blanks for you to fill. There is also a reset_task_2.sh script, should you wish to restart. To test if your script works, 20000 works well as an argument for differentiating between Novels and Short Stories. Good luck, and have fun!

Once you have made sure that both of your scripts work, we may now proceed to the fun part. First change back to the minihome directory, run the reset_task_3.sh script to undo the changes from both task 1 and task 2. Next, run the task_3.sh script. This is a pre-written script that will run your tasks 1 and 2 in succession. This means that in one command, you can sort all of your files into Books, Novels, Short_Stories, and Music folders!

Confirm that all the text files are in the Novel folder (we used a lower threshold), and all the mp3 files are in the Music folder, and run the reset_task_3.sh script. Make sure your script works before running the next step!

Let’s pretend our user has done a lot of internet browsing and downloaded a lot of files. Run the simulate_downloads.sh script.

That’s a lot of txt and mp3 files! Normally it would take quite a while to sort these out, but luckily, we have our script! Run task_3.sh, and hopefully you should see all of the files get sorted. What normally would have taken perhaps an hour can now be done in an instant!

Pokemon and mail!

While sorting books is cool and all, it may not be the most exciting task.

Making a complex script may not fit in the time alloted for this lab, but using one might be!

You've got Mail!

The mail command, unlike other commands we’ve taught you in this lab and previous ones, is especially unique: literally no one* uses this! As such, this section is not relevant to any course material. But the idea of sending each other mail via the terminal, all 1970s-core, is too appealing to pass up on.

*By "literally no one", I mean "literally no one, except for at least one person at this university", so I've been told.

Throughout this section, we’ll use myname and friendname to refer to your and your partner’s UCSD username, respectively. This is the username you use for your UCSD email, and the one you use to log into ieng6.

In order for mail to work, you and whoever you are mailing to must be on the same cluster on ieng6. You do not need to know what a cluster is, but you do need to know how to SSH into a specific one on ieng6. If each line of your command prompt starts with this:

[myname@ieng6-640]:~:500$

Then ieng6-640 is the cluster you are logged into. In order to SSH into a specific cluster, exit out of your current SSH session, and in your local machine terminal, type:

$ ssh myname@ieng6-640.ucsd.edu

Make sure both you and your partner are on cluster 640.

In the following instructions, you will use the mail command, but it is buggy on the ieng6-2xx servers. Please make sure to log out of ieng6 and then sign into ieng6-640.ucsd.edu! This specific cluster runs an older operating system that still has the less-buggy mail command.

Once you and your partner are in the same cluster, try using the mail command to begin composing an e-mail (electronic mail). Either command below works:

$ mail friendname@ieng6-640.ucsd.edu

This will prompt you with Subject: so you can type your email’s subject. The subject is only one line of test, so when you press Enter you are now typing the contents of your mail. When done writing your message, press Ctrl+D to finish and send.

Now, your partner can use the mail command by itself to see that they have received mail! It will have a number next to it as it enumerates messages every time you open the mail shell. Type this number and press Enter to see the message.

Now that you have mail and therefore can enter the mail shell, you can type ? to get a list of commands that you can use.

type <message list>          type messages
next                         goto and type next message
from <message list>          give head lines of messages
headers                      print out active message headers
delete <message list>        delete messages
undelete <message list>      undelete messages
save <message list> folder   append messages to folder and mark as saved
copy <message list> folder   append messages to folder without marking them
write <message list> file    append message texts to file, save attachments
preserve <message list>      keep incoming messages in mailbox even if saved
Reply <message list>         reply to message senders
reply <message list>         reply to message senders and all recipients
mail addresses               mail to specific recipients
file folder                  change to another folder
quit                         quit and apply changes to folder
xit                          quit and discard changes made to folder
!                            shell escape
cd <directory>               chdir to directory or home if none given
list                         list names of all available commands

A <message list> consists of integers, ranges of same, or other criteria
separated by spaces.  If omitted, Mail uses the last message typed.

Note that when the help dialog says "folder", this is a bit of a misnomer. The "folder" is actually the filename that the command will use. For the save and copy commands, you can give it a valid path to a file and it will save or copy the contents of the email into the file. The path is relative to wherever you opened the mail shell.

You will likely not need to use all of these commands but at least a few are worth noting:

type <message list> can be used to print out the contents of selected messages. For example, type 3 4 prints out the contents of messages 3 and 4.
headers will print out the list of enumerated messages along with their senders, subjects, and statuses.
You can respond to a message with Reply <message list>, which will open a response to the message to type in a reply. Again, you can finish and send with Ctrl+D. For example, Reply 5 opens a response to message 5.

There also exists another method to send mail from outside the mail shell, using the pipe operator we learned earlier. This command also makes use of the echo command, which outputs its argument to stdout. Sounds redundant, but it’s intended to be used in this way to input strings into other commands, or to be used as print statements in bash scripts.

$ echo "email body here" | mail -s "subject here" friendname@ieng6-640.ucsd.edu

Trading Pokemon

We can also use email to send attachments in the form of files. In this section, we’ll trade Pokemon with each other via mail. We will use the pokeget.sh script provided in your lab starter code for fun to generate some files with Pokemon in them. First, let one person be the sender, and the other will be the receiver. After you can successfully send and receive Pokemon from one to the other, swap roles and try sending one the other way.

Sender

Use the below command to generate a ".pk" file with the name of the Pokemon you picked. For example, if you picked Pikachu, #25, you would call this file pikachu.pk. Alternatively, if you want the receiver to not know what the Pokemon is until they open the file, you can call it mystery.pk. Feel free to replace 25 with the National Dex number of any Pokemon.

./pokeget.sh 25 > pikachu.pk

If the pokeget.sh script seems to take more than a few seconds to finish running, try using Ctrl + C to interrupt the program and try running the same command again.

Once you have a ".pk" file, use the following command to send it to your partner, using the -a option. Replace "pikachu.pk" with the name of your Pokemon file if it is different.

$ echo "email body here" | mail -s "subject here" -a pikachu.pk friendname@ieng6-640.ucsd.edu

Receiver

Open up your mail shell with the mail command. You should have a new email if the sender did their job properly. Type the following command to save the email you received as a file, with n replaced by the ID number of the email. Note that the "&" is the prompt, and does not need to be typed, just like "$" in the terminal.

& save n pokemon.mail

Extracting the attachment in a readable form from the email itself is quite involved, so please use the provided script to get your Pokemon:

./extract_pokemon.sh pokemon.mail > pokemon.pk
cat pokemon.pk

If you get an error running extract_pokemon.sh that mentions something about a bad interpreter, this is likely because the file hasn't been formatted for unix correctly yet. Run sed -i 's/\r$//' extract_pokemon.sh to reformat the file and try again.

If everything went well, you should get the Pokemon your partner sent you! Have a bit of fun with this and send each other some cool Pokemon.

Adding Pokemon to `.bash_profile`

If you would like your Pokemon to appear when you open ieng6, you may do the following.

If you would like 1 Pokemon, add the line cat ./path/to/pokemon.pk (replace the path with the entire path to your Pokemon file, starting with ~) to the end of the .bash_profile file in your home directory.

If you would like 2 Pokemon, add the line paste ./path/to/pokemon/bigPoke.pk ./path/to/pokemon/smallPoke.pk (replace the path with the actual path to your Pokemon file) to the end of the .bash_profile file in your home directory. Note that if you paste the smaller Pokemon first, the larger one will be cut in half. You are welcome to try to troubleshoot this.

If you want to do the same thing on your own computer, you can add this same line to your .bash_profile or .bashrc file. Just make sure to download the referenced .pk files using pokeget.sh and edit the paths as needed!

Other Real world Script examples:

A real life script by Katherine Izhikevich and Ben Du which run several python scripts to retrieve and sort through data.

The script Elena made to print out pokemon randomly when she opens ieng6 pokemon.sh (complete with commented out old path variables before she figured out how to pick randomly from an arbitrary amount of directories.)

The script Elena made to print out Joe quotes also upon opening ieng6. juotes.sh Note that the file is full of lines that look like

Feb 10 -|Energy is priceless, but so is uninterrupted descriptions of structs.
Sept25 2025 @10:16? -|I'm thankful to the computer engineering wizards that provided the magic sand to do this

The script Brendan made to automatically zip and re-install python addons for the 3D software Blender. blender_addon_updater.sh

Note: Understanding scripts.
Bash scripting is no small undertaking to understand. It's basically a whole new programming language with it's own syntax and rules, and while this lab covers a ton of the basics, there is a whole truckload of other things. While you do not know everything there is to know about bash-scripting, you do have a foundation of knowledge and internet access. Both google searching and asking ChatGPT can be super helpful for 2 things: understanding a script, and making one.
Wonder what mapfile is? googling this gets an AI overview describing it and a link to a man page. This may lead you to the thought "oh right the man pages, maybe I should check it on my computer".
pokemon.sh was made with much help from ChatGPT. Elena had a starting point but was unfamiliar with how to get prefixes, suffixes, or how exactly to randomly pick between folders. So, she used ChatGPT how she might switch the file extension from .mail to .pk. Prompts included "how would I remove a prefix" and "how do i pick randomly from a list" to piece together the script.
It is also noteworthy that GenAI models will be good at things they have a lot of training on. Common things like what we find in pokemon.sh they will get correct a lot. Things they are not as well trained on they may very well get wrong and they will do it confidently. So, please do be careful.

Work Check-off

Complete fixing the memory leaks in search.c and student.c using free appropriately.

Lab 9 Reference Document

(Accept and clone the github classroom assignment at this link)

Part 1 - Connecting to the chat server

Run ./chat-client NAME HOST PORT

Connect to the host and port of the chat server being run by your lab TA. Try sending some chats, and reacting to some messages.

Try sending HTTP requests directly by putting the following urls into your browser (after changing the uppercased portions)

http://ieng6-201.ucsd.edu:PORT/chats
http://ieng6-201.ucsd.edu:PORT/post?user=NAME&message=MESSAGE
http://ieng6-201.ucsd.edu:PORT/react?user=NAME&message=MESSAGE&id=ID

You can also make requests from the terminal using curl “URL”

you may need to put quotes around the url if it contains special characters like = or &
using the -v option will show extra info like request and response headers.
Connecting to the special hostname “localhost” will connect to the computer you are currently on, useful for testing your own servers

Part 2: Basic Server

Basic Server

Modify handle_request in main.c to print incoming requests to the terminal,
Connect to your and your groupmates server using curl and your browser (does a phone browser work?)

Responses

Send a response back to whoever made the request by using the send function on the client_socket that was given to you. send(client_socket, message, size, 0);

You’ll have to first send an appropriate HTTP response header, then send the body of the response.

HTTP/1.1 200 OK
Content-Type: text/plain

... body of response ...

(remember, the ends of lines in HTTP are "\r\n", not just "\n", and there has to be one blank line before the body of the response to mark that the headers have ended)

Some useful string methods

snprintf(buffer, sizeof(buffer), “format string %d”, 5); like printf, but writes the formatted string to buffer, useful for formatting responses before you send them

sscanf(str, “number=%d”, &x); uses similar format strings to printf, but can parse numbers out of a string like “number=12”.

strstr(s, substr) find substring in a larger string

Part 3 - Number Server

Add a global variable int num to your server.

Make your server handle the following paths: (You can also make the responses fancier if you like)

/shownum
- responds with the current value of num.
/increment
- adds one to num, responds with the new value of num.
/add?value=NNN
- adds value encoded in NNN (a decimal integer) to num responds with the new value of num

Pokemon Mailtime

If you would like to finish the week8 lab on bash scripting, which includes Pokemon mailtime, you are welcome to do that.

Your lab checkoff will be due Tuesday week 10 @11am.

Work Check-Off

Commit and push your number server
Fill out the Lab Survey

Bonus

If you’re done early, you can use curl to add custom headers to your http request with curl -H “SomeHeader: SomeValue” ... . Modify your number server to detect the custom header “SetNumber: NN” and set the number accordingly. (NOTE: this isn’t a defined HTTP header, but it should work fine for this lab)

PA1 - UTF-8:

Due Date: Monday 01/19 Tuesday 01/20 at 11:59pm

Looking for resubmission? Click here

UTF-8

Representing text is straightforward using ASCII: one byte per character fits well within char[] and it represents most English text. However, there are many more than 256 characters in the text we use, from non-Latin alphabets (Cyrillic, Arabic, and Chinese character sets, etc.) to emojis and other symbols like €, to accented characters like é and ü.

The UTF-8 encoding is the default encoding of text in the majority of software today. If you've opened a web page, read a text message, or sent an email in the past 15 years that had any special characters, the text was probably UTF-8 encoded.

Not all software handles UTF-8 correctly! For example, Joe got a marketing email recently with a header “Take your notes further with Connectâ€‹” We're guessing that was supposed to be an ellipsis (…), UTF-8 encoded as the three bytes 0x11100010 0x10000000 0x10100110, and likely the software used to author the email mishandled the encoding and treated it as three extended ASCII characters.

This can cause serious problems for real people. For example, people with accented letters in their names can run into issues with sign-in forms (check out @yournameisvalid for some examples). People with names best written in an alphabet other than Latin can have their names mangled in official documents, and need to have a "Latinized" version of their name for business in the US. Joe had trouble writing lecture notes because LaTeX does not support UTF-8 by default.

UTF-8 bugs can and do cause security vulnerabities in products we use every day. A simple search for UTF-8 in the CVE database of security vulnerabilities turns up hundreds of results.

It's useful to get some experience with UTF-8 so you understand how it's supposed to work and can recognize when it doesn't. To that end, you'll write several functions that work with UTF-8 encoded text, and use them to analyze some example texts.

Getting Started

You can do your programming however you like; using vim on ieng6 will give you good practice for labs, exams, and problem sets, but you are not required to use any particular environment (we'd have no way to check anyway).

To get started create a GitHub Classroom assignment.

UTF-8 Analyzer

You'll write a program that reads UTF-8 input and prints out some information about it.

Here's what the output of a sample run of your program should look like:

$ ./utf8analyzer
Enter a UTF-8 encoded string: My 🐩’s name is Erdős.
Valid ASCII: false
Uppercased ASCII: MY 🐩’S NAME IS ERDőS.
Length in bytes: 27
Number of code points: 21
Bytes per code point: 1 1 1 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1
Substring of the first 6 code points: My 🐩’s
Code points as decimal numbers: 77 121 32 128041 8217 115 32 110 97 109 101 32 105 115 32 69 114 100 337 115 46
Animal emojis: 🐩

You can also test the contents of files by using the < operator:

$ cat utf8test.txt
My 🐩’s name is Erdős.
$ ./utf8analyzer < utf8test.txt
Enter a UTF-8 encoded string:
Valid ASCII: false
Uppercased ASCII: MY 🐩’S NAME IS ERDőS.
Length in bytes: 27
Number of code points: 21
Bytes per code point: 1 1 1 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1
Substring of the first 6 code points: My 🐩’s
Code points as decimal numbers: 77 121 32 128041 8217 115 32 110 97 109 101 32 105 115 32 69 114 100 337 115 46
Animal emojis: 🐩

Using the Problem Set for your PA

Most of the needed functionality for the PA is in the problem set problems! So part of your workflow for this PA should be to move code over from the appropriate part of your problem set and into your PA code.

A good first task is to move over only is_ascii, and then write the corresponding part of main needed to read input and print the result for is_ascii, and make sure you can test that. Then move onto capitalize_ascii, and so on.

You can and should save your work by using git commits (if you're comfortable with that), or even just saving copies of your .c file when you hit important milestones. We may ask to see your work from an earlier milestone if you ask us for help on a function from a later one.

Testing

We provide 3 basic tests in the tests folder - which contain simple tests identifying valid ASCII and converting ASCII lowercase to uppercase characters.

You can see the result for a single test by using:

./utf8analyzer < test-file

Here are some other ideas for tests you should write. They aren't necessarily comprehensive (you should design your own!) but they should get you started. For each of these kinds of strings, you should check how UTF-8 analyzer handles them:

Strings with a single UTF-8 character that is 1, 2, 3, 4 bytes
Strings with two UTF-8 characters in all combinations of 1/2/3/4 bytes. (e.g. "aa", "aá", "áa", "áá", and so on)
Strings with and without animal emojii, including at the beginning, middle, and end of the string, and at the beginning, middle, and end of the range
Strings of exactly 5 characters

We recommend saving your input in files and using redirection to test so you don't have to figure out how to type the same UTF8 characters over and over.

PA Design Questions

You will answer the following questions on the Gradescope Assignment. Answer each of these with a few sentences or paragraphs; don't write a whole essay, but use good writing practice to communicate the essence of the idea. A good response doesn't need to be long, but it needs to have attention to detail and be clear. Examples help!

Question 1

Consider these two bytes: 11000001 10000001

Based on the definition of UTF-8 we used in PA1, what code point does it encode? What are 1-byte, 3-byte, and 4-byte encodings of the same code point? Which of these are valid UTF-8 encodings of this code point? Why are they valid or not valid? (This requires some outside research – "valid" is a technical term here)

Describe how you would write a function that detects if a UTF-8 string has any invalid code points like these.

Question 2

Consider a comparison of UTF-8 with the alternate encoding UTF-32.

For a string s that is n+1 bytes long (n bytes of data with a 1-byte null terminator), strlen(s) must be equal to n.

Is this property true for UTF-8? Explain why or give a counterexample.
Is this property true for UTF-32? Explain why or give a counterexample.

For a string s that is n+1 bytes long (n bytes of data with a 1-byte null terminator) with the n bytes encoding 2c code points (that is, it is even length in terms of unicode characters), there is a valid code point starting at byte s[n / 2].

Is this property true for UTF-8? Explain why or give a counterexample.
Is this property true for UTF-32? Explain why or give a counterexample.

HINT: Write out the UTF-8 and UTF-32 encoding of a few short strings, including characters in the ASCII range and outside it.

Resources and Policy

Refer to the policies on assignments for working with others or appropriate use of tools like ChatGPT or Github Copilot.

You can use any code from class, lab, or discussion in your work.

What to Hand In

Any .c files you wrote (can be one file or many; it's totally reasonable to only have one). We will run gcc *.c -o utf8analyzer to compile your code, so you should make sure it works when we do that.
Your tests are in files tests/*.txt
Submit the Gradescope assignment containing the answers to the design questions

Hand in to the pa1 assignment on Gradescope. You can either submit a link to your GitHub Classroom Repository or upload the files from your computer.

The submission system will show you the output of compiling and running your program on the test input described above to make sure the baseline format of your submission works. You will not get feedback about your overall grade before the deadline.

PA1 Resubmission: Due Date 1/29 at 11:59pm

If you want to resubmit PA1, please read this section carefully. You need to pass all the tests in the original PA1, while also implementing an extra function described below

`void next_utf8_char(char str[], int32_t cpi, char result[])`

Takes a UTF-8 encoded string and a code point index. Calculates the code point at that index. Then, calculates the code point with value one higher (so e.g. for ”é“ U+00E9 that would be “ê” (U+00EA), and for “🐩” (U+1F429) that would be “🐪” (U+1F42A)). Saves the encoding of that code point in the result array starting at index 0.

Example Usage:

char str[] = "Joséph";
char result[100];
int32_t idx = 3;
next_utf8_char(str, idx, result);
printf("Next character of code point at index 3: %s\n",result);
// 'é' is the 4th codepoint represented by the bytes 0xC3 0xA9
// 'ê' in UTF-8 hex bytes is represented as 0xC3 0xAA

=== Output ===
Next character of code point at index 3: ê

Now, Your final output on running the utfanalyzer code that will be graded should contain this extra line:

Next character of code point at index 3: <FILL>

Note: If the number of codepoints in the input string is less than 4, this added line would only have the prompt without any character as follows:

Next character of code point at index 3:

The complete program output for example, should look like:

$ ./utf8analyzer
Enter a UTF-8 encoded string: My 🐩’s name is Erdős.
Valid ASCII: false
Uppercased ASCII: "MY 🐩’S NAME IS ERDőS."
Length in bytes: 27
Number of code points: 21
Bytes per code point: 1 1 1 4 3 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1
Substring of the first 6 code points: My 🐩’s
Code points as decimal numbers: 77 121 32 128041 8217 115 32 110 97 109 101 32 105 115 32 69 114 100 337 115 46
Animal emojis: 🐩
Next character of code point at index 3: 🐪

(All our tests will check for this newly added line, in addition to lines from the original PA)

Design Question Resubmission

If you want to resubmit the design questions, we will be asking this updated design question in a new Gradescope assignment:

UTF-8 has a leading 10 on all the bytes past the first for multi-byte code points. This seems wasteful – if the encoding for 3 bytes were instead 1110XXXX XXXXXXXX XXXXXXXX (where X can be any bit), that would fit 20 bits, which is over a million code points worth of space, removing the need for a 4-byte encoding. What are some tradeoffs or reasons the leading 10 might be useful? Can you think of anything that could go wrong with some programs if the encoding didn't include this restriction on multi-byte code points?

PA2 - Hashing and Passwords: Due 1/29 at 11:59pm - Github Classroom Link

Cryptographic hash functions take an input of arbitrary length and produces a fixed length output. The special features are that the outputs are deterministic (the same input always generates the same output) and yet the outputs are apparently “random” – two similar inputs are likely to produce very different outputs, and it's difficult to determine the input by looking at the ouput.

A common application of hashing is in storing passwords. Many systems store only the hash of a user's password along with their username. This ensures that even if someone gains access to the stored data, users' actual passwords are not exposed. When a user types in a password on such a system, the password handling software grants access if the hash value generated from the user's entry matches the hash stored in the password database.

We said above that it is difficult to determine an input given an output. Password cracking is a family of techniques for accomplishing this difficult (but possible!) task. That is, let's say we have access to a user's password hash only. Can we figure out their password? We could then use it to log in, and it may also be shared across their accounts which we could also access.

In some cases, password cracking can exploit the structure of a hash function; this is a topic for a cryptography class. In our case, we will take a more brute-force approach: trying variations on existing known passwords, under the assumption that many passwords are easy to guess.

Getting Started

To get started, visit the Github Classroom assignment link. Select your username from the list (or if you don't see it, you can skip and use your Github username). A repository will be created for you to use for your work. This PA should be completed on the ieng6 server. Refer this section in Week 1's Lab for instructions on logging in to your account on ieng6 and working with files on the server.

Note - The autograder uses the C11 standard of the C programming language.

Overall Task

We'll start by describing the overall task and goal so it's clear where you're going. However, don't start by trying to implement this exactly – use the milestones below to get there.

`pwcrack`, a Password Cracker

Write a program pwcrack that takes one command-line argument, the SHA256 hash of a password in hex format.

The program then reads from stdin potential passwords, one per line (assumed to be in UTF-8 format).

The program should, for each password:

Check if the SHA256 hash of the potential password matches the given hash
Check if the SHA256 hash of the potential password with each of its ASCII characters uppercased or lowercased matches the given hash

If a matching password is found, the program should print the message below and then exit:

Found password: SHA256(<matching password>) = <hash>

If a matching password is not found, the program should print:

Did not find a matching password

Compiling Your Code

On ieng6

Because the OpenSSL libraries on ieng6 are not in the default library search path, you need to use this command:

gcc -L/usr/lib/x86_64-linux-gnu/ *.c -o pwcrack -lcrypto -lssl

The -L flag tells the compiler where to find the crypto libraries during linking.

On Other Systems

On most other Linux systems or macOS, you can compile with:

gcc *.c -o pwcrack -lcrypto

Examples

seCret has a SHA256 hash of a2c3b02cb22af83d6d1ead1d4e18d916599be7c2ef2f017169327df1f7c844fd, and notinlist has a SHA256 hash of 21d3162352fdd45e05d052398c1ddb2ca5e9fc0a13e534f3c183f9f5b2f4a041

$ ./pwcrack a2c3b02cb22af83d6d1ead1d4e18d916599be7c2ef2f017169327df1f7c844fd
Password
NeverGuessMe!!
secret
Found password: SHA256(seCret) = a2c3b02cb22af83d6d1ead1d4e18d916599be7c2ef2f017169327df1f7c844fd
$ ./pwcrack 21d3162352fdd45e05d052398c1ddb2ca5e9fc0a13e534f3c183f9f5b2f4a041
Password
NeverGuessMe!!
secret
<Press Ctrl-D for end of input>
Did not find a matching password

We could also put the potential passwords in a file:

$ cat guesses.txt
Password
NeverGuessMe!!
secret
$ ./pwcrack a2c3b02cb22af83d6d1ead1d4e18d916599be7c2ef2f017169327df1f7c844fd < guesses.txt
Found password: SHA256(seCret) = a2c3b02cb22af83d6d1ead1d4e18d916599be7c2ef2f017169327df1f7c844fd

Note that we only consider single-character changes when trying uppercase/lowercase variants of the guesses (i.e. we DON'T try all possible capitalizations of the string): In the example below, the correct password is NOT found because going from SECRET to seCret would require 5 characters to be changed.

$ ./pwcrack a2c3b02cb22af83d6d1ead1d4e18d916599be7c2ef2f017169327df1f7c844fd
SECRET
<Press Ctrl-D for end of input>
Did not find a matching password

Design Questions

Problem 1

The time command takes another command and reports how long it took to run. Use the time command on your password cracker for a password that will not match anything in the rockyou file (so it tests all of the passwords).

Show the output of the time command for your ./pwcrack on that input Calculate (or use your program to count) the total number of times you called the SHA256 function and checked a hash against the input How many password variants, on average, does your password cracker check per second?

Problem 2

Consider this run of your password cracker:

$ ./pwcrack 5118f76d9067edc593d6946b88693cefa6604c7e613111193db118166d4af589
apple
alpaca
backpack
Found SHA256(Backpack) = 5118f76d9067edc593d6946b88693cefa6604c7e613111193db118166d4af589

At some point in the run, your password cracker should hash and check the string alPaca .

For that moment in the program, right when your program would call SHA256 on that string:

Copy/paste your code into your answer
What function calls are active in your program? That is, which function is currently executing? Which function was it called from? And so on, up to main .
For each of those function calls, what are the values of the arguments?
For each of those function calls, what are the values of all the local variables?

What to Hand In

Any .c files you wrote (can be one file or many; it's totally reasonable to only have one). We will compile your code, so you should make sure it works when we do that. You will hand in your code to the pa2 assignment on Gradescope. An autograder will give you information about if your code compiles and works on some simple examples like the ones from this writeup.
Submit your answers to the design questions on Gradescope under the assignment PA2 Design Questions.

Fun (and Authentic!) Testing

To help testing your PA, we are providing you with a file containing 3 million real plaintext passwords famously found a data breach of the RockYou social network in 2009. You can use the password file present in the ieng6 servers by reading it into pwcrack using the following commandline:

$./pwcrack a2c3b02cb22af83d6d1ead1d4e18d916599be7c2ef2f017169327df1f7c844fd < /home/linux/ieng6/CSE29_WI26_A00/public/pa2/rockyou_clean.txt

Note: these are real human-generated passwords, so they may contain profane words (and offensive concepts). We are providing a censored version of the RockYou password list. Our filtering methodology was to remove any passwords that matched a widely-used 2,800 profane word list. If you read the contents of the password file, note that you are doing so at your own risk, as we can not guarantee that we removed all offensive passwords from the list.

Staff Passwords Scenario

You see in the news and on social media that there was a “data breach” of PrairieLearn, and the passwords of all instructors and TAs were exposed on the “dark web”. A classmate sends you a mysterious link that clearly contains usernames for the CSE29 staff and gibberish after each name.

PrairieLearn Password Dump

Can you figure out the passwords of each of the staff members using your pwcrack program, the data at that link, and the rockyou dataset shared above?

Cracking these is ungraded, and just provided for fun and to get your attention with a provocative scenario where password cracking would be interesting (if unethical). They aren't our real passwords.

It also gives us a moment to discuss something important: just because you can crack passwords doesn't mean you should. There are plenty of laws prohibiting the use of passwords that aren't yours to access data that isn't yours, regardless of how you obtained the password. Responsibility and ethics are another – it may simply be wrong to reverse-engineer (access to) someone else's private data, even if it happens to be legal. Of course, there are good uses for doing this – cracking can recover a lost password needed for access to critical data, for example, if the hash is available. Like any powerful tool, your knowledge of C programming and passwords should be used with care and appropriately in context!

In many cases, finding and reporting security issues is possible to do responsibly – there's an entire theory and practice of responsible disclosure. Researchers like Professor Aaron Schulman do work on responsibly informing companies, governments, and the public about vulnerabilities in real-world systems. So it's critical to learn about how systems work, how secure systems fail, and how to navigate the surrounding ethical, legal, and social issues!

Resources and Policy

Refer to the policies on assignments for working with others or appropriate use of tools like ChatGPT or Github Copilot.

You can use any code from class, lab, or discussion in your work.

PA2 - Hashing and Passwords: Resubmission Due 2/12 at 11:59pm

If you want to resubmit PA2, please read this section carefully. You need to pass all the tests in the original PA2, while also implementing an extra functionality.

Implement pwcrack as originally described below. Also update pwcrack to check if the SHA256 hash of the potential password with each of its numeric digits '0'-'9' replaced by all possible numberic digits '0'-'9' (considering only single digit changes) matches the given hash.

As before, only one character change at a time is tested, so if Secret111 is a potential password:

This does NOT check secret101 as it would combine changing the case of a character and replacing a digit
SeCreT111 is NOT a valid variation to check as it modifies the case of two characters at the same time
Secret123 is NOT a valid variation to check as it modifies two digits at the same time
secret111, Secret112, SecreT111, etc. are some valid variations to hash and check for a match

For example, Secret112 has a SHA256 hash of b54051d1abdba8656126f85f96d9270283d34b1cb8787b78c50646d9eb4a502d

$ ./pwcrack b54051d1abdba8656126f85f96d9270283d34b1cb8787b78c50646d9eb4a502d
NeverGuessMe!!
Secret
Secret111
Found password: SHA256(Secret112) = b54051d1abdba8656126f85f96d9270283d34b1cb8787b78c50646d9eb4a502d
$ ./pwcrack b54051d1abdba8656126f85f96d9270283d34b1cb8787b78c50646d9eb4a502d
NeverGuessMe!!
secret
secret222
<Press Ctrl-D for end of input>
Did not find a matching password

Design Question Resubmission

If you want to resubmit the design questions, we will be asking this updated design question in a new Gradescope assignment. Please submit a PDF or a Markdown file containing your answer to the following question:

Consider the following run of an updated pwcrack that follows the new requirements:

$ ./pwcrack b54051d1abdba8656126f85f96d9270283d34b1cb8787b78c50646d9eb4a502d
secret118
secret111
<Press Ctrl-D for end of input>
Did not find a matching password

Assume that it processes each potential password without any optimizations and does not store information from previous passwords it has checked. Answer the following questions:

How many password variations were hashed and tested for a match?
How many duplicate password variations were hashed and checked?
What are ways you might change the implementation to avoid this repeated and redundant work?

PA3 - The Pioneer Shell: Due Thursday, February 12th at 11:59pm - Github Classroom Link

Looking for resubmission? Click here

This assignment is from CSE29 SP24, which has its own list of acknowledgments!

Learning Goals

This assignment calls upon many of the concepts that you have practiced in previous PAs. Specifically, we will practice the following concepts in C:

string manipulation using library functions,
command line arguments,
process management using fork(), exec(), and wait(), and of course,
using the terminal and vim.

We'll also get practice with a new set of library functions for opening, reading from, and writing to files.

Introduction

Throughout this quarter, you have been interacting with the ieng6 server via the terminal – you've used vim to write code, used gcc and make to compile, used git to commit and push your changes, etc. At the heart of all this lies the shell, which acts as the user interface for the operating system.

At its core, the shell is a program that reads and parses user input, and runs built-in commands (such as cd) or executable programs (such as ls, gcc, make, or vim).

As a way to wrap up this quarter, you will now create your own shell (a massively simplified one of course). We shall call it– (drumroll please)

The Pioneer Shell

The PIoneer SHell, or as we endearingly call it, pish (a name with such elegance as other popular programs in the UNIX world, e.g., git).

Getting Started

The starter code for this assignment is hosted on GitHub classroom. Use the following link to accept the GitHub Classroom assignment:

Github Classroom: https://classroom.github.com/a/cpys2JKd

Overall Task

Interactive Mode

Your basic shell simply runs on an infinite loop; it repeatedly:

prints out a prompt,
reads keyboard input from the user
parses the input into a command and argument list
if the input is a built-in shell command, then executes the command directly
otherwise, creates a child process to run the program specified in the input command, and waits for that process to finish

This mode of operation is called the interactive mode.

Batch Mode

Shell programs (like bash, which is what you have been using on ieng6) also support a batch execution mode, i.e., scripts. In this mode, instead of printing out a prompt and waiting for user input, the shell reads from a script file and executes the commands from that file one line at a time.

In both modes, once the shell hits the end-of-file marker (EOF), it should call exit(EXIT_SUCCESS) to exit gracefully. In the interactive mode, this can be done by pressing Ctrl-D.

Parsing Input

Every time the shell reads a line of input (be it from stdin or from a file), it breaks it down into our familiar argv array.

For instance, if the user enters "ls -a -l\n" (notice the newline character), the shell should break it down into argv[0] = "ls", argv[1] = "-a", and argv[2] = "-l". In the starter code, we provide a struct to hold the parsed command.

Handling Whitespaces

You should make sure your code is robust enough to handle various sorts of whitespace characters. In this PA, we expect your shell to handle any arbitrary number of spaces ( ) and tabs (\t) between arguments.

For example, your shell should be able to handle the following input: " \tls\t\t-a -l ", and still run the ls program with the correct argv array.

strtok will be VERY helpful for this.

Built-In Commands

Whenever your shell executes a command, it should check whether the command is a built-in command or not. Specifically, the first whitespace-separated value in the user input string is the command. For example, if the user enters ls -a -l tests/, we break it down into argv[0] = "ls", argv[1] = "-a", argv[2] = "-l", and argv[3] = "tests/", and the command we are checking for is argv[0], which is "ls".

If the command is one of the following built-in commands, your shell should invoke your implementation of that built-in command.

There are three built-in commands to implement for this project: exit, cd, and history.

Built-in Command: `exit`

When the user types exit, your shell should call the exit system call with EXIT_SUCCESS (macro for 0). This command does not take arguments. If any is provided, the shell raises a usage error.

Built-in Command: `cd`

cd should be run with precisely 1 argument, which is the path to change to. You should use the chdir() system call with the argument supplied by the user. If chdir() fails (refer to man page to see how to detect failure), you should use call perror("cd") to print an error message.

Built-in Command: `history`

When the user enters the history command, the shell should print out the list of commands a user has ever entered in interactive mode.

(If you are on ieng6, open the ~/.bash_history file to take a look at all the commands you have executed. How far you've come this quarter!)

To do this, we will need to write the execution history to a file for persistent storage. Just like bash, we designate a hidden file in the user's home directory to store the command history.

Our history file will be stored at ~/.pish_history. (You will find a function in the starter code that will help you get this file path.)

Important: When adding a command to history, if the user enters an empty command (0 length or whitespace-only), it should not be added to the history.

When the user types in the history command to our shell, it should print out all the contents of our history file, adding a counter to each line:

▶ history
1 history
▶ pwd
/home/jpolitz/cse29wi26/pa3/Simple-Shell
▶ ls
Makefile  script.sh  pish  pish.c  pish.h  pish_history.c
pish_history.o  pish.o
▶ history
1 history
2 pwd
3 ls
4 history

Note that the number before each line is added by history. The contents of .pish_history should not contain the leading numbers.

Running Programs

If, instead, the command is not one of the aforementioned built-in commands, the shell treats it as a program, and spawns a child process to run and manage the program using the fork() and exec() family of system calls, along with wait().

Excluded Features

Now because our shells are quite simple, there are a lot of things that you may be accustomed to using that will not be present in our shell. (Just so you are aware how much work the authors of the bash shell put into their product!)

You will not be able to:

use the arrow keys to navigate your command history,
use Tab to autocomplete commands,
use the tilde character (~) to represent your home directory,
use redirection (> and <),
pipe between commands (|),
and many more…

Don't be concerned when these things don't work in your shell implementation!

If this were an upper-division C course, we would also ask you to implement redirection and piping.

Handling Errors

Because the shell is quite a complex program, we expect you to handle many different errors and print appropriate error messages. To make this simple, we now introduce:

Usage Errors

This only applies to built-in commands. When the user invokes one of the shell's built-in commands, we need to check if they are doing it correctly.

For cd, we expect argc == 2,
For history and exit, we expect argc == 1.

If the user enters an incorrect command, e.g. exit 1 or cd without a path, then you should call the usage_error() function in the starter code, and continue to the next iteration of the loop.

Errors to handle

You need to handle errors from the following system calls/library functions using perror(). Please pay attention to the string we give to perror() in each case and reproduce it in your code exactly.

fopen() failure: perror("open"),
chdir() failure: perror("cd"),
execvp() failure: perror("pish"),
fork() failure: perror("fork")

The Code Base

You are given the following files:

pish.h: Defines struct pish_arg for handling command parsing; declares functions handling the history feature.
pish.c: Implements the shell, including parsing, some built-in commands, and running programs.
pish_history.c: Implements the history feature.
Makefile: Builds the project.
ref-pish: A reference implementation of the shell. Note that in this version, the history is written to ~/.ref_pish_history rather than ~/.pish_history, to avoid conflict with your own shell program.

struct pish_arg

In pish.h, you will find the definition of struct pish_arg:

#define MAX_ARGC 64

struct pish_arg {
    int argc;
    char *argv[MAX_ARGC];
};

In our starter code, all functions handling a command (after it has been parsed) will be given a struct pish_arg argument.

Running pish

First, run make to compile everything. You should see the pish executable in your assignment directory.

To run pish in interactive mode (accepting keyboard input), type

$ ./pish

Or, to run a script (e.g., script.sh), type

$ ./pish script.sh

The same applies for the reference implementation ref-pish.

Design Questions

On ieng6 , start your shell, and from your shell, run vim . Then, in another terminal logged into the same ieng6 machine (e.g. ieng6-201, ieng6-202, ieng6-203), run ps -u . Copy-paste the output, and answer:

Which listed process is your shell?
Which listed process is the bash shell that you started your shell from?
Which listed process is the vim process that started from your shell?
Run ps -au . Copy-paste the output. Make a few guesses about what you think is happening with these other processes.

What to Hand In

Submit your code to Gradescope
CREDITS.txt is required (more details on the syllabus)
We will run your program with make and ./pish in both interactive and batch mode
Submit a PDF with your Design Question Answers on Gradescope

Resources and Policy

Refer to the policies on assignments for working with others or appropriate use of tools like ChatGPT or Github Copilot.

You can use any code from class, lab, or discussion in your work.

PA3 - The Pioneer Shell: Resubmission Due 2/26 at 11:59pm

If you want to resubmit PA3, please read this section carefully. You need to pass all the tests in the original PA3, while also implementing an extra functionality

Implement pish as originally described. Also update pish to support a new built-in command $? that prints the exit status of the last command executed. This should work for both built-in and non-built-in commands. For example:

▶ ls
Makefile  script.sh  pish  pish.c  pish.h  pish_history.c  pish_history.o  pish.o
▶ $?
0
▶ echo hello
hello
▶ $?
0
▶ ls nonexistent
ls: nonexistent: No such file or directory
▶ $?
1
▶ $?
1
▶ cd nonexistent
cd: no such file or directory: nonexistent
▶ $?
1

The $? command should also be recorded in the command history. It should not affect what the last exit code was.

Errors

For each of the following errors, call the usage_error() function in the starter code and continue to the next iteration of the loop.

If the user enters $? as the very first command (there is no previous exit code).
If the user enters $? <anything> (if argc != 1).

Hints

You may find the WEXITSTATUS macro useful.

Design Question Resubmission

If you want to resubmit the design questions, we will be asking this updated design question in a new Gradescope assignment. Please submit a PDF containing your answer to the following question:

Look up how to run background processes at the shell with & (feel free to use an AI tool to help you see some examples!). Describe a little bit about what might change in your shell if you wanted to allow background processes to run while the shell continues to take input.

What to Hand In

Submit your updated code to Gradescope
CREDITS.txt is required (more details on the syllabus)
We will run your program with make and ./pish in both interactive and batch mode

If you are resubmitting the design question, submit a PDF containing your answer to the design question in the new Gradescope assignment.

Resources and Policy

Refer to the policies on assignments for working with others or appropriate use of tools like ChatGPT or Github Copilot.

You can use any code from class, lab, or discussion in your work. You must write your own answers to the design questions.

PA4 – Malloc

This assignment is thanks to the staff of CSE29 spring 2024, especially Gerald Soosairaj and Jerry Yu.

Due Thursday, February 26th, 11:59pm
GitHub Classroom: https://classroom.github.com/a/5PI5i7dZ

In class, quizzes, and PAs we've used malloc and free to manage memory. These are functions written in C! glibc contains one of the frequently used implementations.

There are a lot of details that go into these implementations – we don't expect that anyone could internalize all the details from the documentation above quickly. However, there are a few key details that are really interesting to study. In this project, we'll explore how to implement a basic version of malloc and free that would be sufficient for many of the programs we have written.

Specifically, we will practice the following concepts in C:

bitwise operations,
pointer arithmetic,
memory management, and of course,
using the terminal and vim.

Task Specification

You will implement and test two functions. As with all PAs, you can use all your code from PSet 4 for helper functions and to help you understand the task.

`vmalloc`

void *vmalloc(size_t size);

The size_t size parameter specifies the number of bytes the caller wants. This has the same meaning as the usual malloc we have been using: the returned pointer is to the start of size bytes of memory that is now exclusively available to the caller.

Note the void * return type, which is the same as the usual malloc. void * is not associated with any concrete data type. It is used as a generic pointer, and can be implicitly assigned to any other type of pointer.

This is how malloc (and our vmalloc) allow assigning the result to any pointer type:

int *p = malloc(sizeof(int) * 10);
char *c = malloc(sizeof(char) * 64);

If size is not greater than 0, or if the allocator could not find a heap block that is large enough for the allocation, then vmalloc should return NULL to indicate no allocation was made.

Your implementation of vmalloc must:

Use a “best fit” allocation policy
Always return an address that is a multiple of 16
Always allocate space on 16-byte boundaries
Ensure that all blocks, either free or allocated, have correct block headers and footers

Best Fit Allocation Policy

Many allocation policies are possible – choosing the first block that fits, choosing the last block that fits, choosing the most-recently-freed block that fits, and so on. The one you must implement for this PA is the best-fit policy.

That is, once you have determined the size requirement of the desired heap block, you need to traverse the entire heap to find the best-fitting block – the smallest block that is large enough to fit the allocation. If there are multiple candidates of the same size, then you should use the first one you find.

If the best-fitting block you find is larger than what we need, then we need to split that block into two. For example, if we are looking for a 16-byte block for vmalloc(4), and the best fitting candidate is a 64-byte block, then we will split it into a 16-byte block and a 48-byte block. The former is allocated and returned to the caller, the latter remains free. Make sure to create a block header for any new blocks, and update block headers of modified blocks.

Returning the Address

After updating all the relevant metadata, vmalloc should return a pointer to the start of the payload. Do not return the address of the block header!

`vmfree`

void vmfree(void* ptr)

The vmfree function expects a 16-byte aligned address obtained by a previous call to vmalloc. If the address ptr is NULL, then vmfree does not perform any action and returns directly. If the block appears to be a free block, then vmfree does not perform any action and returns.

For allocated blocks, vmfree updates the block metadata to indicate that it is free, and coalesces with adjacent (previous and next) blocks if they are also free.

Coalescing freed blocks

Just freeing a block is not necessarily enough, since this may leave us with many small free blocks that can't hold a large allocation. When freeing, we also want to check if the next / previous block in the heap are free, and coalesce with them to make one large free block.

If you are coalescing two blocks, remember to update both the header and the footer!

Block footers / Previous Bit

You will need to update both your vmalloc and your vmfree implementation to add code for creating/updating accurate footers, and making sure the "previous block busy" bit is correct.

Design Questions

Heap fragmentation occurs when there are many available blocks that are small-sized and not adjacent (so they cannot be coalesced).

Give an example of a program (in C or pseudocode) that sets up a situation where a 20-byte vmalloc call fails to allocate despite the heap having (in total) over 200 bytes free across many blocks. Assume all the policies and layout of the allocator from the PA are used (best fit, alignment, coalescing rules, and so on)
Write a short C program that uses the built-in malloc and free that triggers an error when run with valgrind for attempting to read or write in a block of memory that has been malloc'd and then freed. Submit the program and the output.

Then, in the context of this assignment and your vmalloc and vmfree, describe how you would implement a function uint8_t safe_readwrite(void* p) that takes a pointer and returns whether it is within the payload of a live region on the heap.

Submission

You'll submit your code Gradescope as usual and submit a PDF with your Design Question Answers on Gradescope. Refer to the policies on collaboration if you need to refresh your memory.

Important: We will compile your program using the provided Makefiles, which you should not change.

To get started, read through the next few sections about the PA:

Getting Started

The starter code for this assignment is hosted on GitHub Classroom. Use the following link to accept the GitHub Classroom assignment: https://classroom.github.com/a/5PI5i7dZ

Just like on previous PAs, clone the repository to ieng6 server.

The Code Base

Unlike previous PAs, this PA comes with starter code that you should read and understand.

The Library

vmlib.h: This header file defines the public interface of our library. Other programs wishing to use our library will include this header file.
vm.h: This header file defines the internal data structures, constants, and helper functions for our memory management system. It is not meant to be exposed to users of the library.
vminit.c: This file implements functions that set up our “heap”.
vmalloc.c: This file implements our own allocation function called vmalloc.
vmfree.c: This file implements our own free function called vmfree.
utils.c: This file implements helper functions and debug functions.

Testing

vmtest.c: This file is not a part of the library. It defines a main function and uses the library functions we created. We can test our library by compiling this file into its own program to run tests. You can write code in the main() function here for testing purposes.
tests/: This directory contains small programs and other files which you should use for testing your code. We will explain this in more detail in a later section.

Compiling the Starter Code

To compile, run make in the terminal from the root directory of the repository. You should see the following items show up in your directory:

libvm.so: This is a dynamically linked library, i.e., our own memory allocation system that can be linked to other programs (e.g., vmtest). The interface for this library is defined in vmlib.h. The Makefiles handle compiling with a .so for you; we set it up this way to match how production systems include libraries like stdlib
vmtest: This executable is compiled from vmtest.c with libvm.so linked together. It uses your memory management library to allocate/free memory.
The starter code in vmtest.c is very simple: it calls vminit() to initialize our simulated “heap”, and calls the vminfo() function to print out the contents of the heap in a neatly readable format. Run the vmtest executable to find out what the heap looks like right after it’s been initialized! (Hint: it’s just one giant free block.)

Reading the Starter Code

For this programming assignment, we want the addresses returned by vmalloc to always be 16-byte aligned (i.e., at an address ending in 0x0). Since the metadata (block header) is 8 bytes in size, block headers will therefore always be placed at addresses ending in 0x8. To maintain this alignment, the first block header (*heapstart) starts at an offset of 8, and all blocks going forward must have sizes that are multiples of 16 bytes.

Block sizes here refer to the size of the block including the header. The smallest possible block is 16 bytes, and would consist of an 8 byte header followed by 8 bytes of allocatable memory. When free, the last 8 bytes of the block would instead contain the block footer, to be used for coalescing free blocks.

Begin by reading through the vm.h file, where we define the internal data structures for the heap. This is where you will find the all-important block headers and block footers. Focus on understanding how struct block_header is used. You will see it in action later.

Next, open vminit.c. This is a very big file containing functions that create and set up the simulated heap for this assignment. Find the init_heap() function, and read through the entire thing to understand how it is setting up the heap. There is pointer arithmetic involved, you will need to do similar things for implementing vmalloc and vmfree, so make sure you have a solid understanding of that. (Why is it necessary to cast pointers to different types?)

Just like production allocators, we create a large chunk of memory as our heap using mmap, and allocations/frees are performed in that memory region.

Once you understand what the init_heap() function is doing, open up utils.c. Here we have implemented the function vminfo(), which will be your ally throughout this PA. This function traverses through the heap blocks and prints out the metadata in each block header. You should find inspiration for how to write your own vmalloc function here. Look at how it manipulates the pointer to jump between blocks!

Testing

For this PA we provide you with some local testing code to help you make sure your program is working; you will need to write more thorough tests than what we've provided! But we give you a big start

Test Programs

In the repository, you will find the tests/ directory, where we have provided some small testing programs that test your library function.

By coding in to the tests/ directory and running make, you can compile all these test programs and run them yourself. These programs are what we will be using in the autograder as well.

For example, here’s the code for the very first test: alloc_basic.c:

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>

#include "vmlib.h"
#include "vm.h"

/*
 * Simplest vmalloc test.
 * Makes one allocation and confirms that it returns a valid pointer,
 * and the allocation takes place at the beginning of the heap.
 */
int main()
{
    vminit(1024);

    void *ptr = vmalloc(8);
    struct block_header *hdr = (struct block_header *)
                               ((char *)ptr - sizeof(struct block_header));

    // check that vmalloc succeeded.
    assert(ptr != NULL);
    // check pointer is aligned.
    assert((uint64_t)ptr % 16 == 0);
    // check position of malloc'd block.
    assert((char *)ptr - (char *)heapstart == sizeof(struct block_header));
    // check block header has the correct contents.
    assert(hdr->size_status == 19);

    vmdestroy();
    return 0;
}

In this test, we simply make one vmalloc() call, and we verify that the allocation returned a valid pointer at the correct location in the heap.

These tests use the assert() statement to check certain test conditions. If your code passes the tests, then nothing will be printed. If one of the assert statements fail, then you will see an error.

Test Images

When you are writing vmalloc, you may wonder how you can test the allocation policy fully when you don’t have the ability to free blocks to create a more realistic allocation scenarios (i.e., a heap with many different allocated/free blocks to search through).

To help you with that, we have created some images in the starter code in the tests/img directory. In your test programs, instead of calling vminit(), you can call the vmload() function instead to load one of these images.

Our alloc_basic2 program uses one such image. If you open tests/alloc_basic2.c, you will see that it creates the simulated heap using the following function call:

vmload("img/many_free.img");

If you load this image and call vminfo(), you can see exactly how this image is laid out:

vmload: heap created at 0x7c2abd1da000 (4096 bytes).
vmload: heap initialization done.
---------------------------------------
 #      stat    offset   size     prev
---------------------------------------
 0      BUSY    8        48       BUSY
 1      BUSY    56       48       BUSY
 2      FREE    104      48       BUSY
 3      BUSY    152      32       FREE
 4      FREE    184      32       BUSY
 5      BUSY    216      48       FREE
 6      FREE    264      128      BUSY
 7      BUSY    392      112      FREE
 8      BUSY    504      32       BUSY
 9      FREE    536      112      BUSY
 10     BUSY    648      352      FREE
 11     BUSY    1000     304      BUSY
 12     BUSY    1304     336      BUSY
 13     FREE    1640     320      BUSY
 14     BUSY    1960     288      FREE
 15     BUSY    2248     448      BUSY
 16     BUSY    2696     256      BUSY
 17     BUSY    2952     96       BUSY
 18     BUSY    3048     368      BUSY
 19     FREE    3416     672      BUSY
 END    N/A     4088     N/A      N/A
---------------------------------------
Total: 4080 bytes, Free: 6, Busy: 14, Total: 20

You can use this image to test allocating in a more realistic heap.

Three images exist in total:

last_free.img: the last block is free,
many_free.img: many blocks are free,
no_free.img: no block is free (use this to test allocation failure).

PA4 – Malloc: Resubmission Due 3/13 at 11:59pm

If you want to resubmit PA4, please read this section carefully. You need to pass all the tests in the original PA4, while also implementing an extra functionality.

vminfo

In utils.c, we have implemented the function vminfo(), which traverses through the heap blocks and prints out the metadata in each block header. Update the function so that at the end of its output, it prints details about the largest available block as follows:

The largest free block is #x with size y.

where x is the blockid starting from 0, and y is the size (including header) of the largest available block for allocation. If there are multiple candidates for the largest block, x should be the smallest blockid of the candidates. Refer the following updated output of vminfo as an example:

vminit: heap created at 0x707df3026000 (4096 bytes).
vminit: heap initialization done.
---------------------------------------
 #      stat    offset   size     prev
---------------------------------------
 0      BUSY    8        48       BUSY
 1      FREE    56       32       BUSY
 2      BUSY    88       112      FREE
 3      BUSY    200      96       BUSY
 4      FREE    296      208      BUSY
 5      BUSY    504      64       FREE
 6      BUSY    568      320      BUSY
 7      FREE    888      208      BUSY
 8      BUSY    1096     160      FREE
 9      BUSY    1256     368      BUSY
 10     BUSY    1624     272      BUSY
 11     FREE    1896     672      BUSY
 12     BUSY    2568     128      FREE
 13     BUSY    2696     464      BUSY
 14     FREE    3160     672      BUSY
 15     BUSY    3832     160      FREE
 16     FREE    3992     96       BUSY
 END    N/A     4088     N/A      N/A
---------------------------------------
Total: 4080 bytes, Free: 6, Busy: 11, Total: 17
The largest free block is #11 with size 672

Total: 4080 bytes, Free: 6, Busy: 11, Total: 17 The largest free block is #11 with size 672 In this example, there are six FREE blocks with the two largest FREE blocks being 672 bytes in size: the output line shows the index of the 672 sized block that comes first.

Updated DESIGN question for the resubmission

If you want to resubmit the design questions, we will be asking this updated design question in a new Gradescope assignment. Please submit a PDF containing your answer to the following question:

In our implementation, when freeing an allocated block, we coalesce it with the previous and next blocks in the heap if they are free - to make one larger free block.

Consider an updated implementation where in case of freeing, we only coalesce it with exactly 1 next block in the heap if it is free. That is, we do NOT coalesce backwards. Give an example of a program (in C or pseudocode) where all the allocations succeed in the current design (like in this PA), but some allocations would fail with the updated freeing strategy.

In class, quizzes, and PAs we've used malloc and free to manage memory. These are functions written in C! glibc contains one of the frequently used implementations.

Specifically, we will practice the following concepts in C:

bitwise operations,
pointer arithmetic,
memory management, and of course,
using the terminal and vim.

PA5 - Web Server

Due Friday, March 13th, 11:59PM

Github Classroom Assignment: https://classroom.github.com/a/exV5tX-2

Important: There is no resubmission available for Assignment 5 (Problem Set, Design Questions, and Programming Assignment). Please plan your development and testing accordingly. There will also be no hidden tests for PA5, so you will receive immediate feedback on your code.

Web Servers and HTTP

HTTP is one of the most common protocols for communicating across computers. At the systems programming level, this means using system calls (usually in C) to tell the operating system to send bytes over a network.

One nice feature of HTTP is that it is a primarily text-based protocol, which makes it more straightforward for humans to read and debug. It is also well-understood by web browsers and programs like curl, making it easy to test and connect to user-facing devices.

It's useful to get experience with the format of HTTP, and with using system calls in C to manipulate HTTP requests.

Task – Chat Server

In this programming assignment, you'll write a C program to implement a chat room (think a plain-text version of Slack or Discord).

It's best to complete the PA on ieng6, because it gives a consistent testing environment for the live server.

Your programs should compile and run with:

make chat-server
./chat-server <optional port number>

The server should start with ./chat-server and print a single message:

$./chat-server
Server started on port PPPPP

If a port number was provided, it should use that port, otherwise it should print an open port that was selected.

It should continue running, listening for requests on that port, until shutdown with Ctrl-c. It can print any other logging messages or other output needed to the terminal.

The requests the chat server listens for are described in this section:

/chats

A request to /chats responds with the plain text rendering of all the chats.

The rendered chat format is

[#N 20XX-MM-DD HH:MM:SS] <username>: <message>
(<rusername>) <reaction>
... [more reactions] ...
... [more chats] ...

Chats must be rendered properly, as in HW5.5. You can put in whatever effort you like into lining things up nicely within these constraints, but these are the requirements.

Example chats rendering might look like:

[#1 2025-11-06 09:01:00]         joe: hi aaron
[#2 2025-11-06 09:02:00]       aaron: sup
[#3 2025-11-06 09:04:00]         joe: working on the example chat for the PA
[#4 2025-11-06 09:06:00]       aaron: oh cool what should it say
[#5 2025-11-06 09:07:00]         joe: I dunno we could go pretty meta with it? like a chat about the chat
[#6 2025-11-06 09:10:00]       aaron: eh kinda lame tbh
[#7 2025-11-06 09:11:00]         joe: whatever I already wrote it, going with it as-is
[#8 2025-11-06 09:12:00]       aaron: ok but make sure we don't look like jerks
[#9 2025-11-06 09:12:05]       aaron: or at least not me
                            (joe)  👍🏻
[#10 2025-11-06 09:12:25]         joe: good talk

/post

A post request looks like this:

/post?user=<username>&message=<message>

This creates a new chat with the given username and message string with a timestamp given by the time the request is received by the server. It must respond with the list of all chats (including the new one).

Limits and constraints:

If a parameter (username or message) is missing, respond with some kind of error (HTTP code 400 or 500)
If username is longer than 15 bytes, respond with some kind of error (HTTP code 400 or 500)
If message is longer than 255 bytes, respond with some kind of error (HTTP code 400 or 500)
If a post would make there be more than 100000 (one hundred thousand) chats, the server should respond with an error (HTTP code 404 or 500)

/react

/react?user=<username>&message=<reaction>&id=<id>

Creates a new reaction to a chat by the given username with the given message string, reacting to the post with the given id (the ids are the #N at the beginning of posts). It must respond with the list of all chats (including the new one).

Limits and constraints:

If the id is not the ID of some existing chat, respond with some kind of error (HTTP code 400 or 500).
If a parameter (username or message or id) is missing, respond with some kind of error (HTTP code 400 or 500)
If username is longer than 15 bytes, respond with some kind of error (HTTP code 400 or 500)
If message is longer than 15 bytes, respond with some kind of error (HTTP code 400 or 500) – reactions are intended to be short!
If a reaction would make a chat have more than 100 reactions, the server should respond with an error (HTTP code 404 or 500)

Implementation Guide

This page is the entire specification for the assignment; it's what you need to implement. You are free to make whatever choices you like in your code within these constraints. To help you on your way, we have some implementation notes below with suggestions and ideas for how to get started and what to think about. These are not requirements, just suggestions!

First, make sure to do Problem Set 5 first if you haven't already! While completing it you will create helper functions you can use as well as help you develop an understanding of your task.

Then, one way to break down the work the server needs to do is:

Parsing and interpreting requests (is the request a new post, a reaction, etc)
Updating the current data (chats and reactions) based on the parameters in the request
Responding to requests based on the current state of the data (chats and requests)

One way to work incrementally is to separate the data handling and the request handling parts into different functions.

The data handling functions can be tested with by writing a main and using printf or assert, and the request handling can be tested with curl or a client.

We think the following functions might be useful for you to implement. In your program you might have slightly different signatures or ideas, but these are a useful starting point. Also, our staff is more familiar with this approach, so it will take us less time to help you in office hours!

Data Handling Functions

These functions can be written and tested without starting a server at all. You could consider having a separate main function in its own file that just tests these!

`add_chat`

A function add_chat can add a single chat.

uint8_t add_chat(char* username, char* message)

This function might have several tasks:

Update the current id
Get the current timestamp
Create a new chat and fill in its username and message fields
As needed, allocate new space, put the new chat in heap memory, store a reference to it in an array, etc. depending on your specific representation of chats

This is testable by setting up initial states of chats and reactions, running the function, and then using assert or printf on the results.

`add_reaction`

Similar to add_chat, this adds a single reaction:

uint8_t add_reaction(char* username, char* message, char* id)

This function might have several tasks:

Use the id to locate the chat that this reaction is for (and maybe return early with an error if the id is invalid/out of range)
Create a new reaction and fill in its username and message fields
Add the reaction to the chat struct somehow, maybe with newly allocated space, an added element or reference in an array, etc. depending on your specific representation of chats and reactions
Update the count of reactions on the referenced chat

This is testable by setting up initial states of chats and reactions, running the function, and then using assert or printf on the results.

Request and Response Handling Functions

You will definitely need to write a function for handling responses. But the work of handling individual responses can be broken up. One approach could be to get the path and query parameter string from the request and check if it's path is /post, /chats, etc, then pass the string to other functions

`respond_with_chats`

void respond_with_chats(int client)

This function is reponsible for using write or send to send the response to the client that made the request. It might include:

Using snprintf to format strings with data from the timestamp or ids
Calling write(client, str, size) on various strings (with the appropirate size) to directly send the data to the client

`handle_post`

// path is a string like "/post?user=joe&message=hi"
void handle_post(char* path, int client)

This function can have several tasks:

Use string functions to extract the username and message from the path
Call add_chat to do the data update
Call respond_with_chats to send the response

`handle_reaction`

// path is a string like "/react?user=joe&message=hi&id=3"
void handle_reaction(char* path, int client)

This function can have several tasks:

Use string functions to extract the username, message, and id from the path
Call add_reaction to do the data update
Call respond_with_chats to send the response

Representing Chats and Reactions

Chats and reactions both have multiple fields, so a natural choice is to represent both chats and reactions as structs.

A chat has several components, which may be good candidates for struct fields:

The message
The username
The timestamp
The reactions to the message

A reaction has the message content and the user who posted it (no timestamp or reactions-to-reactions), both of which are fixed-size.

You should make structs to hold the Chats and Reactions in your server, and use the constrains above about the lengths of usernames, messages, and so on to help you decide what data to store.

Other Helpful Functions

This PA explores several features that are straightforward to use, but there are many of them. We might add more to this list as the PA goes on! Here are a few functions you'll probably find useful; try man on them, or follow the links, or do your own searching and research. Don't forget all the functions from class (e.g. malloc and other allocation functions, strstr and other string manipulation functions, and so on). This list is mainly focused on things we haven't tried in class.

atoi: convert char* to integer
Time functions:
- time: get the current time
- localtime: convert the time to the current local time zone
- strftime: print the time in a given format

Design Questions

We know from lab 8 that we can write bash scripts to automate running commands in loops. We know from this PA and lab 9 that we can use wget and curl to make requests to our chat server from the command line. We also hear in the news that AI agents are pretty good at writing code.

Choose an AI agent (ChatGPT, Claude Code, Github Copilot, etc) and use it to generate a bash script for testing your chat server. Give it enough information about the specification to generate meaningful requests, and make sure it makes at least 1000 requests to your server.

In your response:

Include relevant prompts and responses from the chat agent
Include the bash script generated
Show an example run of the bash script
Discuss if it found any bugs in your implementation
Discuss if you are more confident in your implementation as a result
Discuss anything else you learned about bash scripting, programming, or URLs/servers as a result

Handin

Be sure to complete your Problem Set on PrairieLearn
Submit a PDF with your Design Question Answers on Gradescope
Submit your PA5 repository on Gradescope
Make sure make chat-server builds your server (that's the command we will run), and the server runs on a predictable port with, for example, ./chat-server 8000
Don't forget CREDITS.txt
Important: There is no resubmission for this Assignment
There are no hidden tests for this PA, so you can get immediate feedback on your code.
Make sure to keep following good coding practices, including comments and style.

UCSD CSE29 Winter 2026

Basics