-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the flexibility to specify DateTimeStyle in CsvDataReaderOptions. #254
Conversation
There is also |
I haven't used net6 before but on face value, it appears they do however, I'm not sure how best to make the change as there is an inconsistency between CsvDataReader.cs and CsvDataReader.net6.cs. CsvDataReader.cs uses DateTimeStyles.RoundtripKind by default. In my pull request, I left the default in CsvDataReader.cs as DateTimeStyles.RoundtripKind, however the default setting probably should be DateTimeStyles.None; but to change that may be undesirable as it would be a breaking change? In CsvDataReaderOptions where I set the default for DateTimeStyle, I can change the code to have the different default for NET6_0_OR_GREATER: public DateTimeStyles DateTimeStyle { get { return _dateTimeStyle; } set { _dateTimeStyle = value; } } #if NET6_0_OR_GREATER I think I've done this correctly and it seems to pass the test (and fail when I play around with it to try and make it fail) but this is outside of experience level so I'm not certain. Please let me know how best to proceed with the default values and I can (try to) commit a change to this pull request. p.s. I'll have another pull request coming for additional changes; I had a couple of use cases where the headers are not located on the first row because there was metadata and notes on the rows prior to the header row. I've written changes that allow 1. the header row number to be defined OR 2. the header row string to be defined. The code finds and changes the buffer index position to correspond to the row number by counting \r\n OR it matches the headerRowString against the chunks of chr between sets of \r\n. I just need to write the tests before I can submit (which might not happen for a week). I'm thinking it best to keep it separate to this pull request? (More so letting you know in case you want to hold off pushing a new release to GitHub NuGet.) |
I just did a quick experiment, and it looks like the style for DateOnly and TimeOnly parsing is pretty limited: |
@JaseRandall Looking at this again, I'm a bit reluctant to add this feature. I guess I'm worried that I'll end up adding similar options for specifying number format styles, and I don't want to proliferate a ton of options if there are potential workarounds. So, as a workaround, can you please consider this code: using Sylvan.Data;
using Sylvan.Data.Csv;
using System.Data.Common;
const string Data =
"""
a,b
1,2022-01-13T12:33:45.1234567
2,2022-01-13T12:33:45.1234567-08:00
3,2022-01-13T12:33:45.1234567Z
""";
DbDataReader r = CsvDataReader.Create(new StringReader(Data));
r = new DateTimeAdapter(r);
while (r.Read())
{
var id = r.GetInt32(0);
var x = r.GetDateTime(1);
Console.WriteLine($"{id} {x} {x.Kind}");
}
class DateTimeAdapter : DataReaderAdapter // <-- DataReaderAdapater is from Sylvan.Data nupkg
{
public DateTimeAdapter(DbDataReader dr) : base(dr)
{
}
public override DateTime GetDateTime(int ordinal)
{
var value = this.Reader.GetDateTime(ordinal);
if(value.Kind == DateTimeKind.Unspecified)
{
value = new DateTime(value.Ticks, DateTimeKind.Utc);
}
return value;
}
} Basically, you can wrap the CsvDataReader with a custom data reader that adjusts your DateTime values as you desire. Would this be an adequate solution for your problem? |
@MarkPflug I agree with keeping the package streamlined rather than catering to different number format options, especially if there are workaround. That workaround would be perfect except I'm using a CsvSchema: using Sylvan.Data;
using Sylvan.Data.Csv;
using System.Data.Common;
const string Data =
"""
a,b
1,2022-01-13T12:33:45.1234567
2,2022-01-13T12:33:45.1234567-08:00
3,2022-01-13T12:33:45.1234567Z
""";
var schema = new CsvSchema(Schema.Parse("a:int,b:datetime"));
var options = new CsvDataReaderOptions { Schema = schema, HasHeaders = true };
DbDataReader r = CsvDataReader.Create(new StringReader(Data), options);
r = new DateTimeAdapter(r);
while (r.Read())
{
var id = r.GetInt32(0);
var x = r.GetDateTime(1);
var y = r[1];
Console.WriteLine($"{id} {x} {x.Kind}");
Console.WriteLine($"{id} {y} {((DateTime)y).Kind}");
}
class DateTimeAdapter : DataReaderAdapter // <-- DataReaderAdapater is from Sylvan.Data nupkg
{
public DateTimeAdapter(DbDataReader dr) : base(dr)
{
}
public override DateTime GetDateTime(int ordinal)
{
var value = this.Reader.GetDateTime(ordinal);
if(value.Kind == DateTimeKind.Unspecified)
{
value = new DateTime(value.Ticks, DateTimeKind.Utc);
}
return value;
}
} So, I still get this: I have a potential workout for this issue at https://github.com/JaseRandall/Sylvan/tree/Proposed-Change-to-allow-Providing-a-CustomAccessor-in-the-CsvDataAccessor-class. I'll submit this as an issue, and you can decide if you want to include idea before I submit a pull request. |
Oh, I think you've pointed about a bug with the DataReaderAdapter. Currently the indexers are implemented to delegate to the inner/wrapped datareader: /// <inheritdoc/>
public override object this[int ordinal] => dr[ordinal];
/// <inheritdoc/>
public override object this[string name] => dr[name]; These need to be fixed to be: /// <inheritdoc/>
public override object this[int ordinal] => this.GetValue(ordinal);
/// <inheritdoc/>
public override object this[string name] => this.GetValue(this.GetOrdinal(name)); I will make this fix, which should allow this to work correctly. Until then, you can override these and fix it yourself, by adding this override to the adapter:
Maybe not the most elegant solution, but it should allow things to work until I fix the base implementations, at which point you can delete this override (or leave it, at it will be harmless). |
Nevermind. Not sure what I was thinking, you can just provide the same override:
I'll still make this change to the base implementation though. |
I had a csv file containing date time values which were in UTC format, but the format did not contain the appropriate notations within the date time string specifying it was in UTC time format.
Datetime string is automatically parsed as a local time unless the DateTimeStyles is set as DateTimeStyles.AssumeUniversal, a change needed to be made because the solution was using a hardcoded style other than what I needed (i.e. was hard coded as DateTimeStyles.RoundtripKind).
i.e. needed to be able to do this:
var dateTimeString = "2020-01-01 00:00:00";
var format = "yyyy-MM-dd hh:mm:ss";
DateTime value;
DateTime.TryParseExact(dateTimeString, format, opts.Culture, DateTimeStyles.AssumeUniversal, out value);
So:
In CsvDataReaderOptions, I added the flexibility to specify DateTimeStyles.
In CsvDataReader, I added a DateTimeStyles property set with a default value of DateTimeStyles.RoundtripKind as that was the value previously hardcoded in the solution.
In CsvDataReaderTests, I added regression test Date4 to ensure the change worked correctly.
Hopefully the change and test are acceptable for a merge, if not, please let me know what further is required.