clipboard.png

1 数据挖掘方法入门

数据类型,数据质量(清洗和预处理),数据展示,算法

1.1 Data Type

  • 为什么719 723 老师都先从data type 开始讲?

1.1.1 Types of attributes

  • Avoid foolish actions (e.g. average the Student ID)

  • Categorical, Discrete

    • Nominal : different names (== , !=) , id#, eye colors, gender

    • Ordinal:==, !=, <, <=, >,>= 分数评级ABCD

  • Numeric, Continuous

    • Interval:+ - ,calendar dates

    • Ratio:* / ,length,age

1.1.2 Types of data sets

  • Record data

    • 通常存入excel表或者关系型数据库中

    • 一般的record data
      general record data

    • transaction data
      Transaction data

    • data matrix
      clipboard.png

  • Graph-based data

  • Ordered data

    • sequential data

    • time series data

1.2 有哪些模型方法,分别解决什么问题

算法导论?算法?

  • Supervised learning

    • Regression

    • Classification

  • Unsupervised learning

    • Clustering

    • Association rule

    • Dimension Reduction

学习的工具与方法

1.可以去MSCI 723老师的笔记或者看书《Inro to DM》

  • 简单易懂的例子

  • 不用深入了解数学原理

2.Udacity 的Inro to ML课程

  • 稍微了解一下就可以直接上手应用

  • 动手实践的时候配合scikit learn doc 食用效果更佳

3.想要更深入,从Andrew NG的ML课程开始吧

  • 讲解地很清楚,入门课,打基础,数学方面不深

  • MATLAB

4.勇攀高峰,台大机器学习基石等等
能自己做算法:数学要打扎实

2 怎样做笔记

  • markdown

  • Xmind

3 作业布置

  • 每个小组选择一种classification的方法预测泰坦尼克数据,并简要讲解理论

  • 1月22号

4 Linux system

The Linux system can be accessed by an online course, CS50 in Edx.
CS50x IDE
cs 50 Python

Configure the Linux system

Linux python environment install

command line:

  • ls

  • cd

  • mv

  • pip

5 Database, SQL basics

Structured Query Language

DML: Data Manipulation Language

  • Selection:= where

  • projection:= select Column1, Col2 from table1

  • Cartesian Product, join table:= table1, table2

  • Renaming:= Select Column1 as C1 from table 1
    * Useful in subquery

  • Set operations:= union, intersect, except

  • Aggregation and grouping:= avg, min, max, count, group by
    * ex. select dept_name, avg(salary) from instructor group by dept_name

More advanced and almost everything in database from Stanford online course
But before that, find some basic stuff here.
Computer science for business leaders, NoSQL, SQL
For cs 50 SQL lecture, tutorial and more!
Microsoft: Introduction to Python for Business leaders, list, numpy, panda, histogram, plot
Microsoft: DAT201x Querying with Transact-SQL, if you are interested in SQL software


tony
2 声望4 粉丝