MTHM017 先进统计技术

使用不同先验的效果，以及（ii）拟合固定效果和随机效果模型以进行比较
不同模型下的后验分布。B部分涉及使用不同的方法对数据分为两组。A.贝叶斯推理[66分]1.A部分的第一个问题涉及使用苏格兰数据集拟合泊松回归模型，该模型
包含年地方当局行政区域的观察到的和预期的唇癌症病例数1975年至1986年间的苏格兰。a.【3分】计算每个行政管理人员的标准死亡率（SMR=观察到的/预期的）
面积并绘制SMR的分布图。然后按行政区域绘制SMR地图。到映射SMR您可以使用ELE页面上现成的ScotlandMap功能。下面的代码使用随机数来演示ScotlandMap函数的工作原理。笔记为了运行此操作，您需要将苏格兰的形状文（.shx、.shp、.prj、.dbf）添加到您的工作目录。苏格兰地图。ELE页面上提供了R文件和形状文件。图书馆（tidyverse）图书馆（RColorBrewer）图书馆（sf）库（rgdal）source（“ScotlandMap.R”）#需要在ScotlandMap函数中读取testdat<-runif（56）MTHM017 Advanced Topics in Statistics
Download the HW1 Skeleton before you begin.Homework

1.Overview

Vast amounts of digital data are generated each day, but raw data are often not immediately sable Instead,we are interested in the information content of the data: what patterns are captured? This assignment coversa few useful tools for acquiring, cleaning, storing, and visualizing datasets.Why specific versions of software are used in homework assignments? Using specific versions ofsoftware in homework assignments enables us to grade and provide immediate feedback to the large numberof students in the course (1000+ OMS students, 250+ Atlanta students). Autograders are used to gradestudents' code submissions, and to ensure that these autograders can grade all submissions, we need toknow the specific versions of software that students use. This is because different versions of software canhave different features, and also to make sure that the autograders can detect potential errors that may occurin different libraries and provide students with appropriate feedback to resolve them. Continuously updatingassignments to keep up with the latest versions of technology is a significant undertaking, so we carefullyselect which aspects of our autograders to update, to balance the workload for our course staff and providea positive learning experience for students. As a result, you may see that certain assignment questions requirethe use of lder" versions of software or specific libraries.Q1 [40 points] Collect data from TMDb to build a co-actor networkGoal Collect data using an API for The Movie Database (TMDb). Construct a graphrepresentation of this data that shows which actors have acted together in variousmovies. We use the word raphand etworkinterchangeably.TechnologyPython 3.7.x only (question and autograder developed and tested for theseversions). It is possible that more recent versions may also work, but we do notofficially support them (it is possible that your code written with newer versionsmay break the autograder).TMDb API version 3
Allowed Libraries The Python Standard Library only.All other libraries (including and not limited to Pandas, Numpy, and Requests) areNOT allowed. Providing a consistent autograder experience for all students vastlyoutweighs the marginal utility of extending the scope of supported libraries. Forexample, urllib can be easily used instead of Requests in solving this question.Max runtime 10 minutes. Submissions exceeding this will receive zero credit.Deliverables [Gradescope]Q1.py: The completed Python filenodes.csv: The csv file containing nodesedges.csv: The csv file containing edgesFor this question, you will use and submit a Python file. Complete all tasks according to the instructions5

2. Version

found in Q1.py to complete the Graph class, the TMDbAPIUtils class, and the one global function. TheGraph class will serve as a re-usable way to represent and write out your collected graph data. TheTMDbAPIUtils class will be used to work with the TMDB API for data retrieval.Tasks and point breakdowna) [10 pts] Implementation of the Graph class according to the instructions in Q1.py.o The graph is undirected, thus {a, b} and {b, a} refer to the same undirected edge in thegraph; keep only either {a, b} or {b, a} in the Graph object. A node degree is the numberof (undirected) edges incident on it. In/ out-degrees are not defined for undirected graphs.b) [10 pts] Implementation of the TMDbAPIUtils class according to instructions in Q1.py. Useversion 3 of the TMDb API to download data about actors and their co-actors. To use the API:o Create a TMDb account and follow the instructions on this document to obtain anauthentication token.
o Refer to the TMDB API Documentation as you work on this question.c) [20 pts] Producing correct nodes.csv and edges.csv.o As mentioned in the Python file, if an actor name has comma characters (, remove thosecharacters before writing that name into the csv files.6SQLite is a lightweight, serverless, embedded database that can easily handle multiple gigabytes of data. Itis one of the world most popular embedded database systems. It is convenient to share data stored in anSQLite databasejust one cross-platform file which does not need to be parsed explicitly (unlike CSVfiles, which must be parsed).You will modify the given Q2.py file by adding SQL statements to it. We suggest that you consider testingyour SQL locally on your computer using interactive tools to speed up testing and debugging, such as DBBrowser for SQLite.Goal Construct a TMDb database in SQLite. Partition and combine information within tablesto answer questions.
TechnologySQLite release 3.22. As some students have encountered challenges installingearlier versions of SQLite, we have furthered verified that this question can becompleted with SQLite version 3.39.2 on our local machine. It is possible that otherSQLite versions may also work. Note: while window functions may work in someversions of SQLite, they DO NOT work in v3.22.Python 3.6.x only (question developed and tested for these versions). It ispossible that more recent versions may also work, but we do not officiallysupport them.
Allowed Libraries Do not modify import statements. Everything you need to complete this questionhas been imported for you. Do not use other libraries for this question.Max runtime 10 minutes. Submissions exceeding this will receive zero credit.Deliverables [Gradescope] Q2.py: Modified file containing all the SQL statements you haveused to answer parts a - h in the proper sequence.Tasks and point breakdownNOTE: A sample class has been provided to show example SQL statements; you can turn off this output bychanging the global variable SHOW from True to False. This must be set to False before uploading toGradescope.NOTE: In this question, you must only use INNER JOIN when performing a join between two tables, exceptfor part g. Other types of joins may result in incorrectresults.GTusernameupdate the method GTusername with your credentialsa. [9 points] Create tables and import data.i. [2 points] Create two tables (via two separate methods, part_ai_1 and part_ai_2, in Q2.py)named movies and movie_cast with columns having the indicated data types:1. movies

3. score (real)
2. movie_cast
1. movie_id (integer)
2. cast_id (integer)
3. cast_name (text)
4. birthday (text)
5. popularity (real)

ii. [2 points] Import the provided movies.csv file into the movies table and movie_cast.csv intothe movie_cast table1. Write Python code that imports the .csv files into the individual tables. This will includelooping though the file and using the command. You must only userelative paths while importing files since absolute/local paths are specific locations thatexist only on your computer and will cause the auto-grader to fail.iii. [5 points] Vertical Database Partitioning. Database partitioning is an important technique thatdivides large tables into smaller tables, which may help speed up queries. Create a new tablecast_bio from the movie_cast table (i.e., columns in cast_bio will be a subset of those inmovie_cast). Do not edit the movie_cast table. Be sure that the values are unique wheninserting into the new cast_bio table. Read this page for an example of vertical databasepartitioning.
cast_bio

1. cast_id (integer)
2. cast_name (text)
3. birthday (text)
4. popularity (real)

[1 point] Create indexes. Create the following indexes. Indexes increase data retrieval speed; though thespeed improvement may be negligible for this small database, it is significant for larger databases.1. movie_index for the id column in movies table2. cast_index for the cast_id column in movie_cast table3.cast_bio_index for the cast_id column in cast_bio tablec. [3 points] Calculate a proportion. Find the proportion of actors who are born between 1965 and 1985(both years included). Consider the actors with birthday as oneto be born before 1965 or after 1985.The proportion should be calculated as a percentage and should only be based on the total number ofrows in the cast_bio table. Format all decimals to two places using printf(). Do NOT use theROUND() function as in some rare cases it works differently on different platforms.Output format and example value: 7.70d. [4 points] Find the most prolific actors. List 5 cast members with the highest number of movieappearances that have a popularity > 10. Sort the results by the number of appearances in descendingorder, then by cast_name in alphabetical order.Output format and example row values (cast_name,appearance_count):Harrison Ford,2e. [4 points] Find the highest scoring movies with the smallest cast. List the 5 highest-scoring movies thathave the fewest cast members. Sort the intermediate result by score in descending order, then bynumber of cast members in ascending order, then by movie name in alphabetical order. Format alldecimals to two places using printf().Output format and example values(movie_title,movie_score,cast_count):Star Wars: Holiday Special,75.01,12Games,58.49,33f. [4 points] Get high scoring actors. Find the top ten cast members who have the highest average moviescores. Format all decimals to two decimal places using prin?Next include only cast members who have appeared in three or more movies with score >= 25.Output format and example value(cast_id,cast_name,average_score):8822,Julia Roberts,53.00
row in which cast_member_id1 has a lowernumeric value. For example, faverage_movie_score corresponding to each cast member,including actors in cast_member_id1 as well as cast_member_id2. Format all decimals totwo places using printf().Order your output by collaboration_score (before formatting) in descending order,then by cast_name alphabetically.Output format and example values(cast_id,cast_name,collaboration_score): 2,Mark Hamil,99.32Q3 [15 points] D3 (v5) WarmupRead chapters 4-8 of Scott Murray Interactive Data Visualization for the Web, 2nd edition (sign inusing your GT account, e.g., jdoe3@gatech.edu). Briefly review chapters 1-3 if you need additionalbackground on web development. This reading provides important foundation you will need forHomework 2. This question and the autograder have been developed and tested for D3 version 5 (v5),while the book covers D3 v4. What you learn from the book (v4) is transferable to v5 because v5 introducedfew breaking changes. In Homework 2, you will work with D3 extensively.Goal Visualize temporal trends in movie releases using D3 to showcase how interactive,rather than static plots, can make data more visually appealing, engaging and easierto parse.Technology D3 Version 5 (included in the lib folder)Chrome 97.0 (or newer): the browser for grading your codePython http server (for local testing)Allowed Libraries D3 library is provided to you in the lib folder. You must NOT use any D3 libraries(d3*.js) other than the ones provided. In Gradescope, these libraries will beprovided for you in the auto-grading environment.Deliverables [Gradescope] Q3.html: Modified file containing all html, javascript, and any csscode required to produce the bar plot. Do not include the D3 libraries or q3.csvdataset.
NOTE the following important points:1. You will need to setup an HTTP server to run your D3 visualizations as discussed in the D3 lecture (OMSstudents: the video 鈥淲eek 5 - Data Visualization for the Web (D3) - Prerequisites: JavaScript and SVGCampus students: see lecture PDF.). The easiest way is to use http.server for Python 3.x. Run your localHTTP server in the hw1-skeleton/Q3 folder.2. We have provided sections of code along with comments in the skeleton to help you complete theimplementation. While you do not need to remove them, you may need to write additional code to make thingswork.
followingspecifications:
a. [3.5 points] The bar plot must display one bar per row in the q3.csv dataset. Each bar correspondsto the running total of movies for a given year. The height of each bar represents the running total.The bars are ordered by ascending time with the earliest observation at the far left. i.e., 1880, 1890,..., 2000
b. [1 point] The bars must have the same fixed width, and there must be some space between twobars, so that the bars do not overlap.c. [3 points] The plot must have visible X and Y axes that scale according to the generated bars. Thatis, the axes are driven by the data that they are representing. Likewise, the ticks on these axesmust adjust automatically based on the values within the datasets, i.e., they must not be hard-coded.The x-axis must be a <g> element having the ifunction:d3.scaleLinear()).g. [1 point] Set the HTML title tag and display a title for the plot. Those two titles are independent ofeach other and need to be set separately. Set the HTML title tag (i.e., <title> Running Total ofTMDb Movies by Year </title>). Position the title 7. Gradescope will render your plot using Chrome and present you with a Dropbox link to view thescreenshot of your plot with the solution plot in both a side-by-side and an overlay display.The visual feedback helps you make adjustments and identify errors, e.g., a blank plot likely indicates aserious error. It is not necessary that your design replicates the solution plot. However, the autograderrequires the following DOM structure (including using correct ids for elements) and sizing attributes, sothat it knows how your chart is built. We recommend using the Web Inspector to keep track of the DOMstructure and debug. Based on our experience, most errors students encounter are due to incorrect DOMstructures (including wrong ids). Make sure you have strictly followed all
Goal Use OpenRefine to clean data from Mercari. Construct GREL queries to filter theentries in this dataset.
Technology OpenRefine 3.6.2Deliverables [GrVersion 1
would be split across the newly created columns as olls & AccessoriesUsethe existing functionality in OpenRefine that creates multiple columns from an existing column basedon a separator (i.e., in this case and does not remove the original category_name column.Provide the number of new columns that are created by this operation, excluding the originalcategory_name column.
Output format and sample values:ii.columns: 10NOTE: There are many possible ways to split the data. While we have provided one way toaccomplish this in step ii, some methods could create columns that are completely empty. In thisdataset, none of the new columns should be completely empty. Therefore, to validate your output,we recommend you verify that there are no columns that are completely empty by sorting andcheproduced.Output format and sample values:iii.function: fingerprint, 200NOTE: Use the default Ngram size when testing Ngram-fingerprint.iv. ple values:iv.GREL_categoryname: endsWith("food", "o
Q5 [5 points] Introduction to Python FlaskFlask is a lightweight web application framework written in Python that provides you with tools, libraries andtechnologies to quickly build a web application and scale up as needed.You will modify the given file: wrangling_scripts/Q5.pyGoal Build a web application that displays a table of TMDb data on a single-page websiteusing Flask.
Technology Python 3.7.x only (question developed and tested for these versions)Flask
Allowed Libraries Python standard librariesLibraries already included in Q5.pyAny other libraries (including but not limited to Pandas and NumPy) are NOTallowed in this assignmentDeliverables [Gradescope] Q5.py: Completed Python file with your changesUsername() - Update the username() method inside Q5.py by including your GTUsername.Install Flask on your machine by running pip install Flaska. You can optionally create a virtual environment by following the steps here. Creating a virtualenvironment is purely optional and can be skipped.To run the code, navigate to the Q5 folder in your terminal/command prompt and execute thefollowing command: python run.py. After running the command go to http://127.0.0.1:3001/ onyour browser. This will open up index.html, showing a table in which the rows returned bydata_wrangling() are displayed.You must solve the following 2 sub-questions:a. [2 points] Read and store the first 100 rows in a table using the data_wrangling() method.NOTE: The skeleton code by default reads all the rows from movies.csv. You must add therequired code to ensure reading only the first 100 data rows. The skeleton code already handlesreading the table header for you.b. [3 points]: Sort this table in descending order of the values i.e., with larger values at the topand smaller values at the bottom of the table in the last (3rd) column. Note that this columnneeds to be returned as a string for the autograder but sorting may require float casting.
WX：codehelp

MTHM017 先进统计技术

1.Overview

2. Version

小胡子的灯泡

引用和评论

CSC108H聊天机器人设计细节

大模型中的Token究竟是什么？从原理到作用深度解析

功率器件热设计基础（九）——功率半导体模块的热扩散

英飞凌 | 驱动电路设计（二）——驱动器的输入侧探究

DeepSeek的开源之路:一文读懂从V1-R1的技术发展,见证从开源新秀到推理革命的领跑者

2025低空经济eVTOL行业研究报告42份汇总解读|附PDF下载

入选ICLR 2025，MIT/UC伯克利/哈佛/斯坦福等提出DRAKES算法，突破生物序列设计瓶颈