haven 1.0.0
-
The ReadStat library is stored in a subdirectory of
src
(#209, @krlmlr). -
Import tibble so that tibbles are printed consistently (#154, @krlmlr).
-
Update to latest ReadStat (#65). Includes:
-
Added support for reading and writing variable formats. Similarly to
to variable labels, formats are stored as an attribute on the vector.
Usezap_formats()
if you want to remove these attributes.
(@gorcha, #119, #123). -
Added support for reading file "label" and "notes". These are not currently
printed, but are stored in the attributes if you need to access them (#186). -
Added support for "tagged" missing values (in Stata these are called
"extended" and in SAS these are called "special") which carry an extra
byte of information: a character label from "a" to "z". The downside of
this change is that all integer columns are now converted to doubles,
to support the encoding of the tag in the payload of a NaN. -
New
labelled_spss()
is a subclass oflabelled()
that can model
user missing values from SPSS. These can either be a set of distinct
values, or for numeric vectors, a range.zap_labels()
strips labels,
and replaces user-defined missing values withNA
. Newzap_missing()
just replaces user-defined missing vlaues withNA
.labelled_spss()
is potentially dangerous to work with in R because
base functions don't know aboutlabelled_spss()
functions so will
return the wrong result in the presence of user-defined missing values.
For this reason, they will only be created byread_spss()
when
user_na = TRUE
(normally user-defined missings are converted to
NA). -
as_factor()
no longer drops thelabel
attribute (variable label) when
used (#177, @itsdalmo). -
Using
as_factor()
withlevels = "default
orlevels = "both"
preserves
unused labels (implicit missing) when converting (#172, @itsdalmo). Labels
(and the resulting factor levels) are always sorted by values. -
as_factor()
gains a newlevels = "default"
mechanism. This uses the
labels where present, and otherwise uses the labels. This is now the
default, as it seems to map better to the semantics of labelled values
in other statistical packages (#81). You can also uselevels = "both"
to combine the value and the label into a single string (#82). It also
gains a method for data frames, so you can easily convert every labelled
column to a factor in one function call. -
New
vignette("semantics", package = "haven")
discusses the semantics
of missing values and labelling in SAS, SPSS, and Stata, and how they
are translated into R. -
Support for
hms()
has been moved into the hms package (#162).
Time varibles now have classc("hms", "difftime")
and aunits
attribute
with value "secs" (#162). -
labelled()
is less strict with its checks: you can mix double and integer
value and labels (#86, #110, @lionel-), andis.labelled()
is now exported
(#124). Putting a labelled vector in a data frame now generates the correct
column name (#193). -
read_dta()
now recognises "%d" and custom date types (#80, #130).
It also gains an encoding parameter which you can use to override
the default encoding. This is particularly useful for Stata 13 and below
which did not store the encoding used in the file (#163). -
read_por()
now actually works (#35). -
read_sav()
now correctly recognises EDATE and JDATE formats as dates (#72).
Variables with format DATE, ADATE, EDATE, JDATE or SDATE are imported as
Date
variables instead ofPOSIXct
. You can now setuser_na = TRUE
to
preserve user defined missing values: they will be given class
labelled_spss
. -
read_dta()
,read_sas()
, andread_sav()
have a better test for missing
string values (#79). They can all read from connections and compressed files
(@lionel-, #109) -
read_sas()
gains an encoding parameter to overide the encoding stored
in the file if it is incorrect (#176). It gets better argument names (#214). -
Added
type_sum()
method for labelled objects so they print nicely in
tibbles. -
write_dta()
now verifies that variable names are valid Stata variables
(#132), and throws an error if you attempt to save a labelled vector that
is not an integer (#144). You can choose whichversion
of Stata's file
format to output (#217). -
New
write_sas()
allows you to write data frames out tosas7bdat
files. This is still somewhat experimental. -
write_sav()
writes hms variables to SPSS time variables, and the
"measure" type is set for each variable (#133). -
write_dta()
andwrite_sav()
support writing date and date/times
(#25, #139, #145). Labelled values are always converted to UTF-8 before
being written out (#87). Infinite values are now converted to missing values
since SPSS and Stata don't support them (#149). Both use a better test
for missing values (#70). -
zap_labels()
has been completely overhauled. It now works
(@markriseley, #69), and only drops label attributes; it no longer replaces
labelled values withNA
s. It also gains a data frame method that zaps
the labels from every column. -
print.labelled()
andprint.labelled_spss()
now display the type.