Post-copy script in Copy Activity
In copy activity there is a feature of pre-copy script. Similarly if there is post-copy script feature it will help to execute code post copy operation is completed from same activity.
Traditionally when data is being copied from source sql to destination sql, the data is copied incrementally from source to temporary/stage tables/in-memory tables in destination. Post copy the merge code is executed to merge data into target table.
If post-copy script option is provided in copy activity it will help to call the merge code from copy activity instead of calling another activity like Execute stored procedure.
While you can run a stored procedure, I don't want to create hundreds of stored procs just to move and process table data. The pre-copy script is great and lets me do a lot to prep the destination before copying. A posy-copy script would be as useful and I would think would use the exact same mechanism just triggered after the copy instead of before it. This would be a game changer for us.
A post copy script is essential in many ETL. We need it as soon as possible!
Abdul Khan commented
Its very logical to provide such functionality and would help in many ETL tasks
Abhinav Joshi commented
This is absolutely an essential requirement. its a very standard requirement to have a pre and post processing for any data load operation and is a feature in most ELT / ETL tools. Reuqest the product team to include this feature too .
demolition porg commented
Yes, would be good... strangely this feature exists in the data flows part of data factory, but for some reason it isn't in the 'copy data' activity, I understand see why it wouldn't have been implemented in both? Please add this feature :-)
This is going to be nuanced, but rather than have a postCopy script, I think a checkbox to indicate that the Copy job is loading into a temporary table would be sufficient to then inform ADF to require a following Store Procedure activity to execute immediately after in the same connection. I think this would be far more clear to illustrate the nuance (use case specific to temp tables) rather than a postCopy script. I can already see the confusion and annoyance between different developers battling over a postCopy script vs. a separate Stored Procedure activity.
I don't know much about preCopy scripts, but it's not immediately clear to me what the advantage of having a preCopy script vs. a stored procedure activity right before the copy activity provides?
Jonathan Basys commented
@Eugen excellent answer, does the job - thanks!
Eugen Podaru commented
While for most of the scenarios running a stored procedure as a subsequent activity is fine, there is one scenario where that is not possible, and this feature becomes mandatory.
From the documentation, the best practice for loading data into azure sql in case of an upsert is to use a global temporary table. Unfortunately this does not work as described in the documentation since while it is possible to create the temp table with the preCopy script, then use the copy activity to load data into that table, there is no way to execute a merge script without a postCopy script. Running a separate stored procedure activity is not possible, since by the time it is run, the temporary table is already dropped, since the new activity creates a new session.
Manh Vo commented
Unfortunately, the stored procedure is not supported with ODBC data source. We have to workaround by spin up a Batch job and run a custom script to do post-copy activity. Hate it.
Joe Lu commented
Thanks for the comments. We will evaluate this request. Meanwhile, please use stored procedure activity in pipeline as an alternative solution as you mentioned.