2
# 1. 使用read.table读取数据
system.time(
  read.table("/home/data/test_data", sep = "\001", 
                      quote = "", stringsAsFactors = F, comment.char = "",
                      col.names = colNames)
)
# colNames为预先定义的列名;
# 也可以设置为 :col.names = TRUE / FALSE

#    user  system elapsed 
#  67.943   0.277  68.326 


# 2.使用readr::read_delim读取数据
library(readr)
system.time(
  read_delim("/home/data/test_data",
             delim = "\001", quote = "", comment = "",
             col_names = colNames)
)
# colNames为预先定义的列名;
# 也可以设置为 :col.names = TRUE / FALSE

# =================================| 100%  796 MB
#   user  system elapsed 
# 12.790   0.245  12.947 

可以看出,读取796MB的数据test_data,read.table所用时间为67.943s,而read_delim只需要12.790s;读取速度有显著的提升,大约为read.table的5倍。


xiao蜗牛
85 声望20 粉丝

{name: 'Xiao蜗牛',


引用和评论

0 条评论