在读入excel和csv的数据的时候总是回碰到小数点的问题,不能正确的显示。早就该弃用 read.csv
这个函数。
现在来介绍两个比较好的读入数据的包,Hadley出品 ——readxl
&readr
测试数据:
函数介绍:
readxl::read_excel("test.xlsx",col_names = F,col_types = rep("numeric",3))
col_types
一共有四种模式可选: "blank", "numeric", "date" or "text"。blank
就是skip这一列,其他的三个都很好理解。
vignette("column-types") #参考这里的文档
readr::read_csv("test.csv",col_names = F,col_types = cols(X1="d",X2=col_skip(),X3="d"))
这里的col_types 更为丰富,
col_logical()
[l], containing onlyT
,F
,TRUE
orFALSE
.col_integer()
[i], integers.col_double()
[d], doubles.col_character()
[c], everything else.col_date(format = "")
[D]: Y-m-d dates.col_datetime(format = "")
[T]: ISO8601 date timescol_number()
[n], finds the first number in the field. A number is defined
as a sequence of -, "0-9",decimal_mark
andgrouping_mark
. This is useful for currencies and percentages.
decimal_mark
这个是在locale()
里面设置的,具体见帮助文档vignette("locales")
.
You can also manually specify other column types:
col_skip()
[ _, -], don't import this column.col_date(format)
, dates with given format.col_datetime(format, tz)
, date times with given format. If the timezone is UTC (the default), this is >20x faster than loading then parsing withstrptime()
.col_time(format)
, times. Returned as number of seconds past midnight.col_factor(levels, ordered)
, parse a fixed set of known values into a factor
例子
read_csv("iris.csv", col_types = cols(
Sepal.Length = "d",
Sepal.Width = "d",
Petal.Length = "d",
Petal.Width = "d",
Species = col_factor(c("setosa", "versicolor", "virginica"))
))
读入数据后,我们往往会碰到这样的东西
a$X3
[1] 3.000000e-06 1.237595e+06
解决办法:
formattable::digits(a$X3,7)
[1] 0.0000030 1237594.5455460
这个formattable包 还有很多的用途,详情见:http://renkun.me/formattable/
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。