Make foreign key constraint errors go through OnApplyChangesFailed interceptor #783

slagtejn · 2022-08-11T00:03:00Z

slagtejn
Aug 11, 2022

I know I have asked something similar before, but probably did not explain it well. In our syncing environment we have a scope called user data. This scope contains all the tables needed for the users data with all the correct foreign key relations. This scope is filtered, but how we handle filtered scopes is to create an instance of the base scope with a single filter parameter. The user logging into the application can be linked to multiple users and thus will generate multiple user scopes. For example the scopes would be in the format

UserData_1
UserData_2
UserData_3

We do this because it's a dynamically changing parameter and we would need to get the historical data of a user if they are linked to a new user. A manager getting a new employee for example. The scope_info table will contain the timestamp for each filter. With this implementation we don't have to re-initialize the scope every time the parameter changes to get the historical data and it's transferring a lot less data if new users are added instead of getting all data for all linked users. Over time some of the scopes will become obsolete and the data in the database becomes old, so we have a mechanism to clean it up.

Ok so now the problem. There are cases were foreign key data is crossing scopes. For example some data from UserData_1 crosses into UserData_2. When this happens there are foreign key constraint errors and sync just stops with no other data syncing on the scope. Now UserData_2 syncs and the data is inserted so when UserData_1 syncs again it might finish. If there are a lot of FK errors this process would take a long time and could end in an endless loop. An endless loop would be a data issue that needs to be fixed on server, but at least sync would be able to detect it instead of failing.

Sync framework was able to handle this with the knowledge blob which contained the failed records and this error went through OnApplyChangedFailed interceptor with the type 'ErrorsOccurred'. Sync was able to continue syncing the rest of the data for the scope, but ignore those failed records somehow (still a mystery on how it was able to do that).

Do you see a possible solution to this? I know disabling foreign keys will work, but that locks the table and is not the best solution since it could ruin the integrity of the data. I can think of adding a table that contains the json of the failed records per scope and appends them during the next sync of a specific scope so data is not lost. If sync continues and updates the timestamp on the scope then the failed records would not be retrieved again so there needs to be some sort of tracker.

Mimetis · 2022-08-11T08:36:38Z

Mimetis
Aug 11, 2022
Maintainer

My first guess would be to use DisableConstraintsOnApplyChanges (in SyncOptions) to allow any data to be synced without failing because of foreign keys constraints.

For the errors occured, it's tricky.
I don't see exactly what kind of errors can be raised in such situation.
Can you share a sample where this situation can occurs ?

0 replies

slagtejn · 2022-08-11T15:30:19Z

slagtejn
Aug 11, 2022
Author

Yes disabling the foreign keys will work, but the tables are locked during sync and data inconsistencies can happen. In most scenarios sync is a background process so now any application using the database will have queries hanging because sync has locks on the tables within the scope.

I made a sample a few months ago. https://github.com/slagtejn/Dotmim.Sync.FKError

This issue is not limited to this specific scenario. It's a general issue when using filters on a relational database. The data has to be perfect for sync to finish. If filtered data is crossing tables then sync could continue to fail instead of continuing to sync the rest of the data and only fail specific records. It will fail at the same point every sync cycle and now it's in a broken state with no way to fix it.

Yes it's tricky to find a solution for this.

1 reply

Mimetis Aug 11, 2022
Maintainer

Yes, I saw your sample.
Took me a long time to understand it, without any context, it was quite complicated.

I've rewrite it using a more "human readable" way :)
Here is my version:

var serverDbName = "FServer";
var clientDbName1 = "Employee1";
var clientDbName2 = "Employee2";

await DBHelper.CreateDatabaseAsync(serverDbName, true);
await DBHelper.CreateDatabaseAsync(clientDbName1, true);
await DBHelper.CreateDatabaseAsync(clientDbName2, true);

var script = @"
CREATE TABLE Customer (CustomerId int IDENTITY(5000, 1000) NOT NULL PRIMARY KEY, Name varchar(50) Not Null, EmployeeId int NOT NULL);

CREATE TABLE Sales (SalesId int IDENTITY(100, 10) NOT NULL PRIMARY KEY, EmployeeId int NOT NULL, BuyerCustomerId int NOT NULL, Product varchar(50) NOT NULL,
CONSTRAINT FK_Buyer_Customer FOREIGN KEY(BuyerCustomerId) REFERENCES Customer(CustomerId));

SET IDENTITY_INSERT Customer ON
INSERT Customer (CustomerId, [Name], EmployeeId) VALUES(5000, 'B. Gates', 1)
INSERT Customer (CustomerId, [Name], EmployeeId) VALUES(6000, 'S. Nadela', 1)
INSERT Customer (CustomerId, [Name], EmployeeId) VALUES(7000, 'S. Balmer', 1)
INSERT Customer (CustomerId, [Name], EmployeeId) VALUES(8000, 'S. Jobs', 2)
INSERT Customer (CustomerId, [Name], EmployeeId) VALUES(9000, 'T. Cook', 2)
SET IDENTITY_INSERT Customer OFF

INSERT Sales (EmployeeId, BuyerCustomerId, Product) VALUES (1, 5000, 'Stairs');
INSERT Sales (EmployeeId, BuyerCustomerId, Product) VALUES (1, 6000, 'Doors');
INSERT Sales (EmployeeId, BuyerCustomerId, Product) VALUES (2, 8000, 'Oranges');
-- We have a problem here. An employee 1 sold something to a customer that is not in its customers list 
-- Customer 9000 is affiliated to employee 2
INSERT Sales (EmployeeId, BuyerCustomerId, Product) VALUES (1, 9000, 'Strawberries');
";

await DBHelper.ExecuteScriptAsync(serverDbName, script);

var progress = new SynchronousProgress<ProgressArgs>(s =>
{
    Console.ForegroundColor = ConsoleColor.Green;
    Console.WriteLine($"{s.ProgressPercentage:p}:  \t[{s.Source[..Math.Min(4, s.Source.Length)]}] {s.TypeName}:\t{s.Message}");
    Console.ResetColor();

});

var serverProvider = new SqlSyncProvider(DBHelper.GetDatabaseConnectionString(serverDbName));
var employee1Provider = new SqlSyncProvider(DBHelper.GetDatabaseConnectionString(clientDbName1));
var employee2Provider = new SqlSyncProvider(DBHelper.GetDatabaseConnectionString(clientDbName2));

var setup = new SyncSetup("Customer", "Sales");
setup.Filters.Add("Customer", "EmployeeId");
setup.Filters.Add("Sales", "EmployeeId");

try
{

    var emp1Agent = new SyncAgent(employee1Provider, serverProvider);
    var emp1Params = new SyncParameters(("EmployeeId", 1));
    var emp1result = await emp1Agent.SynchronizeAsync(setup, emp1Params, progress);
    Console.WriteLine(emp1result);

}
catch (Exception ex)
{
    Console.WriteLine(ex);
}

Basically, in this sample, the Employee 1 will sell something to a customer that is not affiliated with him.
And of course, since the customer is not existing in its DB, it will fail.

The problem we have here is that we are using a Merge statement, and Merge is not able to make a "continue on error" statement.
It's all or nothing.

For 4 lines merged, if 3 are ok, and 1 is failing, the whole 4 lines would come in errors....

slagtejn · 2022-08-11T18:19:46Z

slagtejn
Aug 11, 2022
Author

Yes I think we only use the Bulk stored procs and the setting 'UseBulkOperations' is never used. Maybe if these records fail we fallback to use the _update stored proc and try to insert them one at a time so then we know which records failed.

0 replies

VagueGit · 2022-08-11T21:36:09Z

VagueGit
Aug 11, 2022

We have this issue a lot. We cannot disable foreign keys as sync errors can result in inconsistent data. So we often find a single sync error can cause all subsequent syncs to fail.

Our customers are beyond our control so we have customers who ignore sync errors for months at a time. OutdatedAction.ReinitializeWithUpload fails due to the sync error so the customer can lose months of data.

2 replies

Mimetis Aug 11, 2022
Maintainer

If you have any suggestions to by pass foreign keys without disabling them, let me know :D

slagtejn Aug 11, 2022
Author

I think what we are trying to get at is to have a way to track failed records so the entire sync process does not stop. Obviously this is a lot harder said than done and would have to come up with a nice solution. Right now if there are any errors in the process sync will just stop and there are cases like FK errors where it can continue if handled correctly.

VagueGit · 2022-08-11T21:52:19Z

VagueGit
Aug 11, 2022

Perhaps I have is understood. We don't want to bypass foreign keys. We would prefer if a record that failed to sync did not cause all subsequent data to fail to sync.

I think where it gets tricky is in determining which row has the error that causes a batch to fail. Is there any guidance on how to do that? If we had that information we could remediate.

2 replies

Mimetis Aug 11, 2022
Maintainer

That's the problem... the MERGE statement is not able to continue on error and log any row in error
It would be possible I guess, if all insert / update would have been done 1 by 1, but the performances would become a disaster.

Each batch is by default 10 000 rows (CoreProvider.BulkBatchMaxLinesCount)
So the best we could be is to say "If 1 error occurs in a batch (from 1 to 10 K rows) we are raising an error for the entire batch"
And try to continue.

Here is a sample scenario using the AdventureWorks database:

Imagine we have 1 error in ProductCategory, it will then raise an error for all ProductCategory (since it's only 1 batch, with around 20 lines)
Then, if we allow to "Continue On Error", DMS will try to insert rows for Product.

But since no ProductCategory rows have been inserted, all the Product rows will fail too...
And so on for Sales , SalesOrderHeader .....

At the end almost all tables failed to sync.
But DMS will commit the whole sync, since we allowed to "Continue On error"
And now, the problems begin :D

slagtejn Aug 11, 2022
Author

I am not sure performance would be that bad and it's better than having the process fail. Before sync framework 2.1 bulk procedures didn't exist. There was a performance increase when upgrading, but it wasn't huge.

I know I keep mentioning sync framework, but it had scenarios that it was able to magically handle like this FK issue. The sync process never stopped and it was able to insert/update/delete all the records besides the failed ones. On every subsequent sync it would keep trying to apply the failed records and in some cases it was an endless loop, but we were able to see which records were constantly failing and fix the data.

There needs to be a way to track the failed records. If the sync process is allowed to "Continue On Error" then these records would never be retrieved again (unless there was an update to them) because the scope would be updated to the latest timestamp. Without a failed record tracker it's not possible to implement.

VagueGit · 2022-08-11T23:04:49Z

VagueGit
Aug 11, 2022

How about an option to choose 1 record at a time? We service lots of companies but they typically sync 1 order at a time. Bulk operations aren't that important for us.

At the moment even if we choose a very small batch size, we still see hundreds of orders in a batch.

0 replies

Mimetis · 2022-08-11T23:28:33Z

Mimetis
Aug 11, 2022
Maintainer

I've just create a POC that re-introduce the possibility to choose between bulk or not bulk operations for SQL Server (Sqlite & MySql are not concerned here, since bulk is not supported) using the UseBulkOperations property, available on SqlSyncProvider.

It's a major performance downgrade, especially for first sync (when initializing a new client database)
But I guess you can play with this value depending on the situation...

The I've created a new interceptor called (I think I will rename it ... if you have any idea ;) ) OnApplyChangesErrorOccured.
Implementation is quite easy: IF an error occurs, you have the row in error, and you need to pass an ErrorResolution (Continue or Throw) value

var serverProvider = new SqlSyncProvider(DBHelper.GetDatabaseConnectionString(serverDbName));

var employee1Provider = new SqlSyncProvider(DBHelper.GetDatabaseConnectionString(clientDbName1))
{
    // .... And .... We're back again
    UseBulkOperations = false
};

try
{
    var emp1Agent = new SyncAgent(employee1Provider, serverProvider);
    var emp1Params = new SyncParameters(("EmployeeId", 1));

    emp1Agent.LocalOrchestrator.OnApplyChangesErrorOccured(args =>
    {
        Console.WriteLine(args.ErrorRow);
        // We can do something here the failed row
        // ....
        // Then pass the resolution to Continue to prevent a fail 
        args.Resolution = ErrorResolution.Continue;
    });

    var emp1result = await emp1Agent.SynchronizeAsync(setup, emp1Params, progress);
    Console.WriteLine(emp1result);
}

As you can see we downloaded 6 rows, but only 5 have been applied locally.
My only concern here is that now, the client database is not synced with the server, since we missed a row...

Your thoughts ?

5 replies

Mimetis Aug 11, 2022
Maintainer

You have the branch https://github.com/Mimetis/Dotmim.Sync/tree/OnErrorOccured to test it.

100 % sure it will not stay as is.
I think I will try to merge the OnApplyChangesErrorOccured with the OnApplyChangesFailed
Not sure, I need to check if it's faisable

But I would like to have only 1 single interceptor for everything related to Conflicts or Errors during an apply changes.

slagtejn Aug 11, 2022
Author

Do you ever sleep? That was fast. Yes now the data is not in-sync with the server which is fine for this session, but in the next session it will never try to apply that failed record again. We need a table to track the failed records so on the next sync it would try to apply them again. What do you think? I am not sure what would be stored there. We could store the entire JSON of the row or maybe just scope_id, scope_name, PK of the record, tablename? Also would need to handle the case of the failed record being updated and always take the latest version of changes.

I agree it should be within the OnApplyChangesFailed interceptor with the type ErrorsOccured (I think it's already there but not used).

Mimetis Aug 11, 2022
Maintainer

2 AM right now. I go sleep :D

Mimetis Aug 12, 2022
Maintainer

We need a table to track the failed records so on the next sync it would try to apply them again

Not possible. The apply changes on the client is the last operation.
If we try to do that, it means DMS needs to send back the record to the server and say "hey it's not done yet, mark this row has not updated"

The problem is that someone could have update the row at the same time, from another client, or even the server itself

Another problem would be the infinite error. If we mark the error as "not updated", the row will be downloaded again and again over time leading to an infinite error state. (Maybe not the case for Foreign Key error, but for sure for some others errors)
The example we have in this thread will actually leads to an infinite error loop, until someone changes the data in error.

Also storing a row in error will involve creating a new sync_scope_errors table on the server, and I don't want more tables that what we have today.

By the way, if you want to call back the server to mark the row as not updated, you can create your own mechanism / web api to do so.
You have the full row in error in the OnApplyChangesErrorOccured.

slagtejn Aug 12, 2022
Author

Sounds good. We will implement our own solution to handle the failed records. Will the interceptor have the exception incase we want to do something specific based on the type of exception?

Edit: I should have looked at the POC first. I see the exception as part of the args

slagtejn · 2022-08-21T21:58:39Z

slagtejn
Aug 21, 2022
Author

Will this solution work when UsingBulkOperations? If an error occurs on the bulk records it falls back to using the single stored proc to figure out the failed records. We would be able to UseBulkOperations and still figure out which records failed by the fallback

0 replies

VagueGit · 2022-08-22T04:17:17Z

VagueGit
Aug 22, 2022

I think it is ok for bulk operations to be the default. Then on error switch to row-at-a-time-sync and then sync again to find the faulting row.

After a faulting row has been remediated, switch back to bulk operations. This might be done in a loop ... switch back and forth between bulk operations until the last faulting row is remediated.

2 replies

Mimetis Aug 22, 2022
Maintainer

I guess it can be a solution :D

In my opinion, the bulk mode is useful for the first sync, where you are downloading a lot of rows
From that point, I guess line per line should be enough.

Of course, in a regular scenario like a mobile application ...

Remember that bulk mode is only implemented in the Sql Server Sync Provider. it's not existing for MySQL / MariaDB / SQLite

slagtejn Aug 22, 2022
Author

We have scenarios when clients are offline for long periods of time. When coming back online they sync millions of records so bulk operations do help. I like the solution of falling back to row-at-a-time-sync on error so bulk operations can still be used (at least for SQL server).

slagtejn · 2022-08-31T02:58:08Z

slagtejn
Aug 31, 2022
Author

Will this be finished in 0.9.6? I noticed it was in the draft, but does it work when using bulk operations?

34 replies

Mimetis Sep 12, 2022
Maintainer

Rhaaaaa it's tough !

Ok, let's give it a new try,
I tried a new technic, relying on changes coming, more than on tracking tables metadata

(As Always on the HandlingErrors branch)

I think it's almost good !

slagtejn Sep 12, 2022
Author

Yes it's working now! I tested download/upload failed records along with conflict handling. I like that it's now merging the changes together based on the PK. Seems a lot safer than relying on the timestamps and it will always take the latest version of the row.

Mimetis Sep 12, 2022
Maintainer

Eventually it's working !! :)
Good, I guess the v0.9.7 Release is going to be published soon :)

Mimetis Sep 14, 2022
Maintainer

I've added a default option called ErrorResolutionPolicy that you can set in your SyncOptions instance (like the ConflictResolutionPolicy that we already have)
It will allow you to set the default error policy that will be used for all errors, until you are using the OnApplyChangesConflictOccured interceptor to change it for a particular table / row

And by the way, the doc is also done : https://dotmimsync.readthedocs.io/Errors.html (Yeeeeeeeeee)

slagtejn Sep 14, 2022
Author

Very nice work! Looks good and works good. Best feature of the release.

VagueGit · 2022-09-09T22:58:39Z

VagueGit
Sep 9, 2022

Can a similar scenario arise without a schema change?
Parent record deleted on one client and synced to server
On another client, parent delete has failed to sync and so attempts to sync child records ....

I see that quite a bit in our logs.

Maybe I'm barking up the wrong branch again.

EDIT: It's such a big issue that I might have to introduce soft deletes in the next release of this product ... with all the querying overhead that requires

1 reply

Mimetis Sep 9, 2022
Maintainer

Try with ContinueOnError, and the sync will end without failing
Error row will then be stored in the tmp folder

Make foreign key constraint errors go through OnApplyChangesFailed interceptor #783

slagtejn Aug 11, 2022

Replies: 11 comments · 47 replies

Mimetis Aug 11, 2022 Maintainer

slagtejn Aug 11, 2022 Author

Mimetis Aug 11, 2022 Maintainer

slagtejn Aug 11, 2022 Author

VagueGit Aug 11, 2022

Mimetis Aug 11, 2022 Maintainer

slagtejn Aug 11, 2022 Author

VagueGit Aug 11, 2022

Mimetis Aug 11, 2022 Maintainer

slagtejn Aug 11, 2022 Author

VagueGit Aug 11, 2022

Mimetis Aug 11, 2022 Maintainer

Mimetis Aug 11, 2022 Maintainer

slagtejn Aug 11, 2022 Author

Mimetis Aug 11, 2022 Maintainer

Mimetis Aug 12, 2022 Maintainer

slagtejn Aug 12, 2022 Author

slagtejn Aug 21, 2022 Author

VagueGit Aug 22, 2022

Mimetis Aug 22, 2022 Maintainer

slagtejn Aug 22, 2022 Author

slagtejn Aug 31, 2022 Author

Mimetis Sep 12, 2022 Maintainer

slagtejn Sep 12, 2022 Author

Mimetis Sep 12, 2022 Maintainer

Mimetis Sep 14, 2022 Maintainer

slagtejn Sep 14, 2022 Author

VagueGit Sep 9, 2022

Mimetis Sep 9, 2022 Maintainer

slagtejn
Aug 11, 2022

Replies: 11 comments 47 replies

Mimetis
Aug 11, 2022
Maintainer

slagtejn
Aug 11, 2022
Author

Mimetis Aug 11, 2022
Maintainer

slagtejn
Aug 11, 2022
Author

VagueGit
Aug 11, 2022

Mimetis Aug 11, 2022
Maintainer

slagtejn Aug 11, 2022
Author

VagueGit
Aug 11, 2022

Mimetis Aug 11, 2022
Maintainer

slagtejn Aug 11, 2022
Author

VagueGit
Aug 11, 2022

Mimetis
Aug 11, 2022
Maintainer

Mimetis Aug 11, 2022
Maintainer

slagtejn Aug 11, 2022
Author

Mimetis Aug 11, 2022
Maintainer

Mimetis Aug 12, 2022
Maintainer

slagtejn Aug 12, 2022
Author

slagtejn
Aug 21, 2022
Author

VagueGit
Aug 22, 2022

Mimetis Aug 22, 2022
Maintainer

slagtejn Aug 22, 2022
Author

slagtejn
Aug 31, 2022
Author

Mimetis Sep 12, 2022
Maintainer

slagtejn Sep 12, 2022
Author

Mimetis Sep 12, 2022
Maintainer

Mimetis Sep 14, 2022
Maintainer

slagtejn Sep 14, 2022
Author

VagueGit
Sep 9, 2022

Mimetis Sep 9, 2022
Maintainer