CIS5200机器学习

CIS5200: Machine Learning Fall 2023Homework 1Release Date: September 20, 2023 Due Date: October 4, 2023
• You will submit your solution for the written part of HW1 as a single PDF file via Gradescope.The deadline is 11:59 PM ET.ContactTAs on Ed if you face any issues uploading yourhomeworks.
• Collaboration is permitted and encouraged for this homework, though each student mustunderstand, write, and hand in their own submission. In particular, it is acceptable forstudents to discuss problems with each other; it is not acceptable for students to look atanother student’s written answers when writing their own. It is alsonot acceptable to publiclypost your (partial) solution on Ed, but you are encouraged to ask public questions on Ed. Ifyou choose tocollaborate, you must indicate on each homework with whom you collaborated.Please refer to the notes and videos posted on the website if you need torecall the material discussedin the lectures.

1 Written Questions

(46 points)Problem 1: Margin Perceptron (15 points)Recall the Perceptron algorithm we saw in the lecture. The Perceptron algorithm terminatesonce it classifies all points correctly. It does notguarantee that the hyperplane that that it findshas large margin (γ) despite our assumption that the true hyperplane w∗ has margin γ whereγ = mini∈{1,...,m}In this problem, we will consider thefollowing simple modification to the Perceptronalgorithm:Algorithm 1: Margin PerceptronInitialize w1 = 0 ∈ Rd ≤ 1 then update wt+1 = wt + yixielse output wtendWe will show that Margin Perceptron stops after 3/γ2steps and returns a hyperplane w such thatminNote that the margin is the distance of the closest point to the hyperplane, and since ∥w∥2 is notnecessarily norm 1, this quantity is given by mini∈{1,...,m}As in the lecture, we will assume that ∥xi∥2 ≤ 1 for all i ∈ {1, . . . , m} and ∥w∗∥Hint: You will need to use theresults in 1.2 and 1.3 plus the stopping condition of thealgorithm.1.6 (1 point) Why is it desirable to learn a predictorthat has margin?

2 Bayes Optimal Classifier

(15 points)Let η(x) denote the conditional probability of the labelbeing 1 given a point x under the distributionD. That isη(x) = Pr[y = 1|x].Recall that the true risk, under the 0/1 loss, forany classifier f isR(f) = Ex,yThe Bayes optimal classfier w.r.t. D is the classifier f∗ that achieves the minimum risk among allpossible classifiers. In this problem, we will work out what the Bayes optimalclassifier is.2.1 (3 points) Show thatR(f) = Ex [min(η(x), 1 − η(x))]Hint: For a fixed x, think about what the minimum loss is using 2.1.2.3 (2 points) Show that the Bayes optimal classifier thatachieves the above loss isf∗(x) = (2.5 (6 points) Now suppose we modify the loss function from 0/1 to the following cost-based lossfunctionHere the loss penalizes false negative with cost c and false positive with cost 1 − c, penalizingdifferent types of mistakes differently.1Note that the true risk under this loss isRc(f) = EFind the Bayes optimal classifier in this setting.Hint: Follow the same ideas you used to solve 2.1-2.3 using ℓc instead of 0/1 loss.

3 Programming Questions

(16 points)Use the link here to access the Google Colaboratory (Colab) file for this homework. Be sure tomake a copy by going to ”File”, and ”Save a copy in Drive”. This assignment uses thePennGradersystem for students to receive immediate feedback. As noted on the notebook, please be sure tochange the student ID from the default ’99999999’ to your 8-digit PennID.Instructions for how to submit the programming component of HW 1 to Gradescope are includedin the Colab notebook. You may find this PyTorch reference to be helpful - if you get stuck, it maybe helpful to review some of the PyTorchdocumentation and functions.1Let us see why this is a useful lossfunction. Consider the case of medical diagnosis, high false negative ratemeans that we are predicting that patients do not have the disease when they actually do. Such a prediction couldlead to the patient not getting the care they need. In such asetting, you would want c to be closer to 1.
WX：codehelp

CIS5200机器学习

1 Written Questions

2 Bayes Optimal Classifier

3 Programming Questions

追风的电脑桌_XPdvn

引用和评论

LRU算法，你别跑，我就要吃透你

大模型中的Token究竟是什么？从原理到作用深度解析

深度探索 DeepSeek 微调：LoRA 与全参数微调实战指南

DeepSeek行业应用实践报告100+份汇总解读|附PDF下载

功率器件热设计基础（九）——功率半导体模块的热扩散

2025增长新前沿——AI人工智能拐点重塑人类潜力 400+份报告汇总解读 | 附PDF下载

英飞凌 | 驱动电路设计（二）——驱动器的输入侧探究