数据库表结构设计

原文地址：数据库表结构设计

Introduction

根据需求，设计和实现一个数据库的表结构，练习DDL的语法。

DDL

Overview

This assignment will teach students how to create and design databases. Unlike with other databases, you may write to your own database (e.g. you can create tables user_xxxxxx.random_table_name).

Please submit (1) code and (2) slides or PDF writeup to the course Moodle. The writeup should show the output and the code statement. Please answer straightforwardly.

I am trying to strike a balance between attribution and encouragement of learning. Teams of three allowed, but students who do solo work or on teams of 2 are likelier to benefit if their total course grade is on the margin. If you are in a team of three, please do not rely on the same partners more than twice.

Creating a table

Create a copy of crsp.msf in your local space without the keys
- a. How big is the table?
- b. Report the run time for a query that calculates the average turnover, return, and spread for all stockyears
- c. Show the EXPLAIN for the query
Create a copy of crsp.msf in your local space without the keys
- a. How big is the table?
- b. Report the run time for a query that calculates the average turnover, return, and spread for all stockyears
- c. Show the EXPLAIN for the query - why is this query more efficient than the previous one?
Create a table from scatch based on crsp.msf
- a. I have put crsp.msf in a file. Using R or Python, create a copy of the table
- b. "Checksum": formal checksum methodologies exist to check data integrity, but let's do some crude checks. Show me that (1) the number of rows is the same, and (2) the cumulative lifetime returns to Microsoft and Amazon.
A new file - design a schema
- a. Please write the schema of this table based on inspecting the field lengths such that you have no data loss. Please show me the schema.
- b. Please perform a check to show that the table contents were preserved (e.g. minimal data loss).

Altering a table

I just created a horrific table in homework2.crap_table. Create a copy in your workspace. Fix the following columns such that the most spaceefficient data type is used

CUSIP, PERMNO, COMNAM, TICKER, NCUSIP, RET
- Assume that the largest number or character observed is the largest you will observe in the future
- Hints: For the character columns, inspect the length. Assume that the maximum character length observed is indeed the longest character length observed in the field. For the numeric columns, decide which numbers should be integers by examining those which lose information when rounded. Is there data loss?
Alter table
- Drop redundant variations of the return columns
- Create a financial ratio - debt to assets where assets > 0
- Change the dividend column to decimal from integer
- Update the empty column dividend
- Create a copy of the returns columns so that you have at most 5 digits (although there is data loss). Report the average difference and average absolute difference.

Restructuring an existing table

Separate the monthly stock return data, annual data, and quarterly data, and companylevel data.
- For simplicity, we assume the company name, share code, hexcd, industry code (hsiccd), and ticker/tsymbol NEVER change
- This should yield four tables
- Create an index on each table such that an inner join on four tables to create the original table is possible
- Show me that inner join statement
  - It must produce the same # of rows
  - The explain must be optimized using keys
- Show me the table structure (SHOW CREATE TABLE) of the four new tables
- What is the total size of the four new data? How much did you save?

Draw a schema diagram a schema diagram example is here.
Draw a schema diagram of your new table

Extra credit

Download the following file: slide_output/data/dsf.csv.gz
- The data underlies the table crsp.dsf minus some fields
- Create a logical table structure
- Loop through the file and upload the file into a compressed MySQL table via R or Python
  - Report your code
- Key the table with a clustered index
- Take some steps to verify the similarity of the data to crsp.dsf
- PLEASE DROP THE TABLE so as to not take up all the space on the server
Recreate normalized_reprisk from reprisk. Likely requires a bunch of R/Python queries

(本文出自csprojectedu.com，转载请注明出处)

数据库表结构设计

Introduction

Overview

Creating a table

Altering a table

Restructuring an existing table

Extra credit

csprojectedu

引用和评论

3-Tier网络的开发

对于项目管理过程，如何减少项目返工？

一名成功的项目经理的20个项目管理经验

甘特图：项目经理的时空魔方，解锁高效管理的秘密

项目管理中的双螺旋：解码男女项目经理的差异化领导密码

项目经理的时间密码：解码日周月三维工作法则

项目经理的“汇报力”修炼：如何快速打造让领导、客户眼前一亮的方案？