1 数据挖掘方法入门
数据类型,数据质量(清洗和预处理),数据展示,算法
1.1 Data Type
为什么719 723 老师都先从data type 开始讲?
1.1.1 Types of attributes
Avoid foolish actions (e.g. average the Student ID)
-
Categorical, Discrete
Nominal : different names (== , !=) , id#, eye colors, gender
Ordinal:==, !=, <, <=, >,>= 分数评级ABCD
-
Numeric, Continuous
Interval:+ - ,calendar dates
Ratio:* / ,length,age
1.1.2 Types of data sets
-
Record data
通常存入excel表或者关系型数据库中
一般的record data
transaction data
data matrix
Graph-based data
-
Ordered data
sequential data
time series data
1.2 有哪些模型方法,分别解决什么问题
算法导论?算法?
-
Supervised learning
Regression
Classification
-
Unsupervised learning
Clustering
Association rule
Dimension Reduction
学习的工具与方法
1.可以去MSCI 723老师的笔记或者看书《Inro to DM》
简单易懂的例子
不用深入了解数学原理
2.Udacity 的Inro to ML课程
稍微了解一下就可以直接上手应用
动手实践的时候配合scikit learn doc 食用效果更佳
3.想要更深入,从Andrew NG的ML课程开始吧
讲解地很清楚,入门课,打基础,数学方面不深
MATLAB
4.勇攀高峰,台大机器学习基石等等
能自己做算法:数学要打扎实
2 怎样做笔记
markdown
Xmind
3 作业布置
每个小组选择一种classification的方法预测泰坦尼克数据,并简要讲解理论
1月22号
4 Linux system
The Linux system can be accessed by an online course, CS50 in Edx.
CS50x IDE
cs 50 Python
Configure the Linux system
Linux python environment install
command line:
ls
cd
mv
pip
5 Database, SQL basics
Structured Query Language
DML: Data Manipulation Language
Selection:= where
projection:= select Column1, Col2 from table1
Cartesian Product, join table:= table1, table2
Renaming:= Select Column1 as C1 from table 1
* Useful in subquerySet operations:= union, intersect, except
Aggregation and grouping:= avg, min, max, count, group by
* ex. select dept_name, avg(salary) from instructor group by dept_name
More advanced and almost everything in database from Stanford online course
But before that, find some basic stuff here.
Computer science for business leaders, NoSQL, SQL
For cs 50 SQL lecture, tutorial and more!
Microsoft: Introduction to Python for Business leaders, list, numpy, panda, histogram, plot
Microsoft: DAT201x Querying with Transact-SQL, if you are interested in SQL software
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。