**This is a continuation of part one, I suggest you to check that first to get a clear understanding**
Once the first condition is completed let’s check the second which I named as ValidRows as it is going to capture only the non-error values. Compared to the first condition this is very simple as we don’t have to build any expressions, it just collects all the records that does not satisfy point number one.
Both the conditions are completed, now go to the preview option and refresh to check the implementation and the data split is working properly.
Select the first drop down in the data preview tab where all the error or incomplete data is accumulated.
The second dropdown obviously contains all the valid data rows.
The next step of our demo is to store both the ErrorRows and ValidRows into the respective SQL database tables. For this we must add a sink next to the existing ErrorRows in the data flow like below. Please note that this sink should be pointed to trap all the error rows hence we place it next to ErrorRows sink.
When we try to create a new dataset, we can see the available linked service and the tables present under them to select in the drop-down.
If you remember we created an additional column named ‘FileName’ in our database table to store the data source file name as well. We will create a derived column in between which will include the filename into the database table whenever the sink is run.
One thing worth mentioning here is we have created all columns including the date in ‘Sales_table_ErrorRows’ table as varchar because when we try to store the error rows they might be of any kind, so having a column with datatypes like datetime will end up in error.
I have disabled the auto-mapping from the mapping settings and then refreshed the data preview and could see the error rows getting fetched properly including the filename.
Let us move on to the next step, Valid rows.
Create a sink dataflow that will move the records into the database table similar to the previous example. Unfortunately, when I previewed the data, I could not see any datas in ‘Date2’ column and then after a while I found out the reason and fixed it by creating a derived column that will convert the Date2 column from string into Date datatype. Always keep in mind to make sure the data you pass on should be as same as the datatype it is already present in database table, this will avoid half of the run failure errors.
I added Date2 column into the existing derived column settings and converted from string to date function using the visual expression builder as below. We could also see the string symbol abc has been converted into calender symbol in the data preview section as marked below.
See all the valid records are getting placed with file name.
Now the next and final step is to move these data into the database table that we created earlier. For this we will create a new pipeline.
After debugging the dataflow we could see the execution run successfully. You can also publish everything and run the dataflow directly.
The valid rows and the error rows has been segregated into two separate tables.
We saw how we could use Azure Data Factory to filter out error rows and store them into a SQL database table. This is one of the many real time scenarios which will be useful when you are dealing with millions of rows that has to be transformed to present it to the management. Planning to post on how to deal with such real time problems through ADF and Synapse, stay tuned. Cheers!