Azure Data Engineers

1. How will you do incremental load using Mapping data flow and without mapping data flow ?

2. What is polybase option does internally in ADF

3. How you execute the activities in for each loop in sequence not parallel ?

If you choose to run iterations in parallel, you can limit the number of parallel executions by setting the Batch Count. The default number is 20 and the max number is 50.

4. When one failure occurs in for each loop, how to make it stop not continue next loop ?

Add 'IF Condition' activity in the ForEach Loop to skip.

5. When data comes from multiple sources say oracle, netizaa, etc each have 50 tables to load how you will do ?

6. When loading into parquet format how u do partitions in ADf ?

7. How will you choose latest file from blob storage?

8. How you do error handling in ADF ? For Bad rows how ? and for execution error how ?

In the dataflow, Connect to Source to ConditonalSplit Activity.

Add conditions for Error rows and Valid Rows based on column like Date to check whether it is a valid date or invalid date.

ErrorRow: IsNull(todate(OrderDate, 'dd-MM-YYYY))

ValidRow: Rows that do not meet any condition

9. What are the limitations of using event triggers ?

10. What are the limitations of using parameters in Azure arm templates ?

11. When you use azure ssis integration run time do you on it all the time or only when pipeline runs ?

12. Many asking how you are connecting to SFTP location using Azure data factory

13. If customer asked to get all the files and filenames from Azure DLS which are updated recently, how to get that data with file count and file names as well.

14. What is surrogate key and its use and where we will use it.

15. What are constraints and how can we restrict access using constraints.

16. What are security components in Azure Data Lake and how can we restrict access to users in ADLS.

17. What is polybase and how will we implement in Azure Datwarehouse.

18. Difference between Azure SQl and Azure Datwarehouse.

19. What is difference between Azure Blob and Azure Datalake.

20. What is SCD and what type of SCD you have used and how can we create type2 SCD.

21. What are OLTP and OLAP and difference between them and when the OLTP AND OLAP will generate and where we will store those.

22. I have 10 tables and how to copy all the 10 tables data in ADF and how we will do it and how many pipelines we will create to copy 10 tables of data.

23. How to set the ADF pipeline that needs to copy the data during the time interval and after that it should not copy the data.

24. What kind of transformations you have used in project and what is it for?

25. How to process the cloud native data to ADF and what integration runtime you will use for it.

26. What is snowflake and its use and where we will use it.

27. I have a pipeline and it has multiple activities and we want to skip particular activity in pipeline everyday. How to achieve this.

28. What are the types of Integration Runtime in ADF.

29. I have data in Amazon or GCP where we have to copy the data, which Integration Runtime we will use?

30. We have a modified data of last 3 days and how to ingest that data into ADLS using ADF.

31. How can we perform incremental copy of data in ADF.

32. How to scan the files which are present in ADLS.

33. How to upload full copy of data instead of incremental data in ADF.

34. what is data modelling and its process.

35. What is coalesce in Azure Databricks and when we will use this and how we will implement this.

36. How to set email notification for evry activity in pipeline if the activity gets fail or success or any error.

37. Why do we need Azure Datwarehouse if already there is Azure SQL in your project.

38. What are IR's and its types in adf?

39. What is meant by data set in adf?

40. How to create a linked service in adf

41. Different kind of scheduling window in adf

42. What is meant by pool in databricks

43. How to move blob today's files to a source folder and rest of files to archive folder using adf

44. Can we pass parameters in a trigger

45. How to implement Scd 1 and Scd 2 using data flows

46. If a file can be moved to a target location using copy activity and same can be possible in data flows, which costs less and why?

47. What is Shared IR

48. Can we use a self hosted IR for connecting cloud resources also?

49. How to handle Null values in Adf data flows

iif(isNull(Name), 'Unknown', Name)

iifNull(Name, 'Unknown')

50. When data is copying from sqldb to sql warehouse and after loading 5000 records it failed, how to make sure to start from 5001 row, or what is the best way to handle rerun scenarios in adf

51. How to load the data from 100 tables residing on one server?

Using GetMetadaa, ForEach Loop, Copy Activity

52. How to run same pipeline 50 times

53. What will happen to rest 50, if the 50 instances fail.

54. How pipelines are deployed to PROD environment.

55. How do you deploy the AzureDataFactory from a Code Repot?

From adf_publish branch

Directly from JSON files

Manually with ARM templates

56. How to avoid the multiple instances of pipeline running concurrently.

(If the runtime of the pipeline exceeded the trigger interval)

Set the Concurrency value of the pipeline to 1.

Search This Blog

AzureDataEngineer

Azure Data Engineers

Comments

Post a Comment