-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datetimes #684
base: main
Are you sure you want to change the base?
Datetimes #684
Conversation
Currently converts everything to numpy datetimes when writing
Codecov Report
@@ Coverage Diff @@
## master #684 +/- ##
==========================================
+ Coverage 83.12% 83.16% +0.04%
==========================================
Files 34 34
Lines 4396 4419 +23
==========================================
+ Hits 3654 3675 +21
- Misses 742 744 +2
|
we do technically. If we detect datetime we always just copy it to obs directly. |
An example (or you giving this branch a shot) would be great. Do you have a way of saving these |
@Imipenem can you help here? |
At ehrapy, it just worked out of the box when writing these |
Thought so as well. We didn't run into any issues. |
I'm a little confused here. If I put any sorts of dates into Can you make an example of this? For me: Failing exampleimport anndata as ad, pandas as pd, numpy as np
from vega_datasets import data
print(ad.__version__)
cars = data.cars()
dt_array = cars["Year"]
np_dt_array = dt_array.to_numpy()
N = np_dt_array.shape[0]
adata = ad.AnnData(X=np.ones((N, N)), obs=pd.DataFrame({"dt": dt_array}))
adata.write_h5ad("test_dt.h5ad") ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
208 try:
--> 209 return func(elem, key, val, *args, **kwargs)
210 except Exception as e:
~/github/anndata/anndata/_io/h5ad.py in write_array(f, key, value, dataset_kwargs)
184 value = _to_hdf5_vlen_strings(value)
--> 185 f.create_dataset(key, data=value, **dataset_kwargs)
186
/usr/local/lib/python3.9/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
148
--> 149 dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds)
150 dset = dataset.Dataset(dsid)
/usr/local/lib/python3.9/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, allow_unknown_filter)
90 dtype = numpy.dtype(dtype)
---> 91 tid = h5t.py_create(dtype, logical=1)
92
h5py/h5t.pyx in h5py.h5t.py_create()
h5py/h5t.pyx in h5py.h5t.py_create()
h5py/h5t.pyx in h5py.h5t.py_create()
TypeError: No conversion path for dtype: dtype('<M8[ns]')
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
208 try:
--> 209 return func(elem, key, val, *args, **kwargs)
210 except Exception as e:
~/github/anndata/anndata/_io/h5ad.py in write_series(group, key, series, dataset_kwargs)
288 else:
--> 289 write_array(group, key, series.values, dataset_kwargs=dataset_kwargs)
290
~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
211 parent = _get_parent(elem)
--> 212 raise type(e)(
213 f"{e}\n\n"
TypeError: No conversion path for dtype: dtype('<M8[ns]')
Above error raised while writing key 'dt' of <class 'h5py._hl.group.Group'> from /.
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
208 try:
--> 209 return func(elem, key, val, *args, **kwargs)
210 except Exception as e:
~/github/anndata/anndata/_io/h5ad.py in write_dataframe(f, key, df, dataset_kwargs)
262 for col_name, (_, series) in zip(col_names, df.items()):
--> 263 write_series(group, col_name, series, dataset_kwargs=dataset_kwargs)
264
~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
211 parent = _get_parent(elem)
--> 212 raise type(e)(
213 f"{e}\n\n"
TypeError: No conversion path for dtype: dtype('<M8[ns]')
Above error raised while writing key 'dt' of <class 'h5py._hl.group.Group'> from /.
Above error raised while writing key 'dt' of <class 'h5py._hl.group.Group'> from /.
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
/var/folders/bd/43q20k0n6z15tdfzxvd22r7c0000gn/T/ipykernel_4792/2332825967.py in <module>
----> 1 adata.write_h5ad("test_dt.h5ad")
~/github/anndata/anndata/_core/anndata.py in write_h5ad(self, filename, compression, compression_opts, force_dense, as_dense)
1910 filename = self.filename
1911
-> 1912 _write_h5ad(
1913 Path(filename),
1914 self,
~/github/anndata/anndata/_io/h5ad.py in write_h5ad(filepath, adata, force_dense, as_dense, dataset_kwargs, **kwargs)
109 else:
110 write_attribute(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs)
--> 111 write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
112 write_attribute(f, "var", adata.var, dataset_kwargs=dataset_kwargs)
113 write_attribute(f, "obsm", adata.obsm, dataset_kwargs=dataset_kwargs)
/usr/local/Cellar/python@3.9/3.9.9/Frameworks/Python.framework/Versions/3.9/lib/python3.9/functools.py in wrapper(*args, **kw)
875 '1 positional argument')
876
--> 877 return dispatch(args[0].__class__)(*args, **kw)
878
879 funcname = getattr(func, '__name__', 'singledispatch function')
~/github/anndata/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs)
128 if key in f:
129 del f[key]
--> 130 _write_method(type(value))(f, key, value, *args, **kwargs)
131
132
~/github/anndata/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
210 except Exception as e:
211 parent = _get_parent(elem)
--> 212 raise type(e)(
213 f"{e}\n\n"
214 f"Above error raised while writing key {key!r} of {type(elem)}"
TypeError: No conversion path for dtype: dtype('<M8[ns]')
Above error raised while writing key 'dt' of <class 'h5py._hl.group.Group'> from /.
Above error raised while writing key 'dt' of <class 'h5py._hl.group.Group'> from /.
Above error raised while writing key 'obs' of <class 'h5py._hl.files.File'> from /. |
Sure:
I would not be surprised if we store things differently than you somewhere, but feel free to play around with it. I have the suspicion that the datetimes are somewhere just read as strings and then mapped to categoricals. They are not real datetimes. Feedback is always appreciated! |
That seems to be the case. adata_encoded.obs["charttime"].cat.categories.dtype
Would it be useful if these were actual datetimes? The you could do things like ask how far apart the times were. |
Not surprised. Our primary motivation was the coloring of plots and things like that. Yeah, your suggested use-case is a good one. Although, in general I am trying to reduce the dependency on real time as much as possible with ehrapy and to work more with pseudotime :) |
@ivirshup is this PR still one approach that you'd follow or did it change since Pandas 2.0 got released? Datetime support would still be great for ehrapy - especially for stuff like comparing them and more |
Basic datetime IO support.
Currently this converts everything to numpy datetime arrays at write time. I'm not preserving pandas array types since there are multiple seemingly overlapping ways to deal with datetimes in pandas. This implementation also does not support time zones but that would be easy to add.
It would be good to get someone working with time series data to try this out and see if it meets their needs.
(I thought this would solve #455, but now see that was for datetime scalars which this does not currently support)