You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using a Parallel.ForEach to consume the DbDataReader as Enumerable. I'm validating each field in every record in the parallel loop with regular expressions. Upon finding a record where one or more of the fields fail, I am adding the record to a concurrentBag collection and I reject the file, however, I would love to be able to remove the record from the collection.
I found a solution using DataTable as a storage mechanism, but it's much slower. I'm processing tens of millions of records, and the schema will always vary, thus it's defined by JSON prior to processing.
Is there a way to identify the current record that failed validation, remove it or mark it for removal?
`
var po = new ParallelOptions
{
CancellationToken = cancelTokenSource.Token,
MaxDegreeOfParallelism = Environment.ProcessorCount
};
var uidField = header.Split(',').Any(h => h == "email") ? "email" : "phone";
Parallel.ForEach(StreamFromReader(csv).AsEnumerable(), po, record =>
{
try
{
Parallel.ForEach(header.Split(',').AsEnumerable(), po, f =>
{
var field = f;
var value = Convert.ToString(record[field]).Trim();
try
{
if (regExValidation.ContainsKey(field) && !string.IsNullOrEmpty(value) && !new Regex(Convert.ToString(regExValidation[field]), RegexOptions.Singleline & RegexOptions.Compiled).Match(value).Success)
{
errorList.Add($"removing record [{record[uidField]}]: [{value}] for data validation failure. Did not comply with field [{field}] rule [{regExValidation[field]}]");
}
}
catch (Exception ex)
{
Console.WriteLine($"Error while processing record [{record[uidField]}]: {ex.Message} {ex.InnerException?.Message} {ex.InnerException?.InnerException?.Message}");
}
});
}
catch(Exception ex)
{
Console.WriteLine($"Error while processing [{record[uidField]}]: {ex.Message} {ex.InnerException?.Message} {ex.InnerException?.InnerException?.Message}");
}
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm using a Parallel.ForEach to consume the DbDataReader as Enumerable. I'm validating each field in every record in the parallel loop with regular expressions. Upon finding a record where one or more of the fields fail, I am adding the record to a concurrentBag collection and I reject the file, however, I would love to be able to remove the record from the collection.
I found a solution using DataTable as a storage mechanism, but it's much slower. I'm processing tens of millions of records, and the schema will always vary, thus it's defined by JSON prior to processing.
Is there a way to identify the current record that failed validation, remove it or mark it for removal?
`
var po = new ParallelOptions
{
CancellationToken = cancelTokenSource.Token,
MaxDegreeOfParallelism = Environment.ProcessorCount
};
var uidField = header.Split(',').Any(h => h == "email") ? "email" : "phone";
Parallel.ForEach(StreamFromReader(csv).AsEnumerable(), po, record =>
{
try
{
Parallel.ForEach(header.Split(',').AsEnumerable(), po, f =>
{
var field = f;
var value = Convert.ToString(record[field]).Trim();
try
{
if (regExValidation.ContainsKey(field) && !string.IsNullOrEmpty(value) && !new Regex(Convert.ToString(regExValidation[field]), RegexOptions.Singleline & RegexOptions.Compiled).Match(value).Success)
{
errorList.Add($"removing record [{record[uidField]}]: [{value}] for data validation failure. Did not comply with field [{field}] rule [{regExValidation[field]}]");
}
}
catch (Exception ex)
{
Console.WriteLine($"Error while processing record [{record[uidField]}]: {ex.Message} {ex.InnerException?.Message} {ex.InnerException?.InnerException?.Message}");
}
});
}
catch(Exception ex)
{
Console.WriteLine($"Error while processing [{record[uidField]}]: {ex.Message} {ex.InnerException?.Message} {ex.InnerException?.InnerException?.Message}");
}
`
Beta Was this translation helpful? Give feedback.
All reactions