MS Fabric with bc2adls – Performance

In me last two posts we talked about the enabled feature of bc2adls that export data to Microsoft Fabric:
– BC2ADLS: MS Fabric enabled part 1 – Discover Microsoft Business Central (bertverbeek.nl)
– BC2ADLS: MS Fabric enabled part 2 – Discover Microsoft Business Central (bertverbeek.nl)
In the past weeks I have tested the performance of the export and I was really impressed!
For this setup I used a MS Fabric trial workspace. An trial is delivered with 128 vcores:
Concurrency limits and queueing for Fabric Spark – Microsoft Fabric | Microsoft Learn
For this test I have setup 29 tables with the following total of fields and records:
Table | # Fields selected | Entity name | Total Records |
G/L Entry | 59 | GLEntry-17 | 150000 |
Customer | 101 | Customer-18 | 100000 |
Cust. Ledger Entry | 74 | CustLedgerEntry-21 | 100000 |
Vendor | 79 | Vendor-23 | 100000 |
Vendor Ledger Entry | 68 | VendorLedgerEntry-25 | 100000 |
Item | 142 | Item-27 | 100000 |
Item Ledger Entry | 64 | ItemLedgerEntry-32 | 50000 |
Sales Header | 162 | SalesHeader-36 | 50000 |
Sales Line | 181 | SalesLine-37 | 50000 |
Purchase Header | 144 | PurchaseHeader-38 | 50000 |
Purchase Line | 205 | PurchaseLine-39 | 50000 |
Purch. Comment Line | 7 | PurchCommentLine-43 | 50000 |
Sales Comment Line | 7 | SalesCommentLine-44 | 50000 |
G/L Budget Entry | 16 | GLBudgetEntry-96 | 50000 |
Sales Shipment Header | 104 | SalesShipmentHeader-110 | 50000 |
Sales Shipment Line | 97 | SalesShipmentLine-111 | 50000 |
Sales Invoice Header | 117 | SalesInvoiceHeader-112 | 50000 |
Sales Invoice Line | 101 | SalesInvoiceLine-113 | 50000 |
Purch. Rcpt. Header | 92 | PurchRcptHeader-120 | 50000 |
Purch. Rcpt. Line | 120 | PurchRcptLine-121 | 50000 |
Purch. Inv. Header | 102 | PurchInvHeader-122 | 50000 |
Purch. Inv. Line | 121 | PurchInvLine-123 | 50000 |
Res. Capacity Entry | 5 | ResCapacityEntry-160 | 50000 |
Job | 71 | Job-167 | 50000 |
Job Ledger Entry | 76 | JobLedgerEntry-169 | 50000 |
G/L Entry – VAT Entry Link | 2 | GLEntryVATEntryLink-253 | 50000 |
VAT Entry | 70 | VATEntry-254 | 50000 |
Detailed Cust. Ledg. Entry | 38 | DetailedCustLedgEntry-379 | 50000 |
Detailed Vendor Ledg. Entry | 38 | DetailedVendorLedgEntry-380 | 50000 |
Initial run
When we do an export from Business Central to Microsoft Fabric it will use the OneLake APi. The initial export are with those numbers:
Table | Records | BC Time |
G/L Entry | 150000 | 2 minutes 50 seconds 900 milliseconds |
Customer | 100000 | 3 minutes 43 seconds 400 milliseconds |
Cust. Ledger Entry | 100000 | 1 second 500 milliseconds |
Vendor | 100000 | 17 minutes 31 seconds 700 milliseconds |
Vendor Ledger Entry | 100000 | 1 minute 3 seconds 200 milliseconds |
Item | 100000 | 5 minutes 29 seconds 400 milliseconds |
Item Ledger Entry | 50000 | 57 seconds 900 milliseconds |
Sales Header | 50000 | 3 seconds 600 milliseconds |
Sales Line | 50000 | 1 second 900 milliseconds |
Purchase Header | 50000 | 2 minutes 44 seconds 500 milliseconds |
Purchase Line | 50000 | 1 minute 10 seconds 200 milliseconds |
Purch. Comment Line | 50000 | 34 seconds 600 milliseconds |
Sales Comment Line | 50000 | 32 seconds 600 milliseconds |
G/L Budget Entry | 50000 | 41 seconds 900 milliseconds |
Sales Shipment Header | 50000 | 1 minute 28 seconds 600 milliseconds |
Sales Shipment Line | 50000 | 2 minutes 9 seconds 400 milliseconds |
Sales Invoice Header | 50000 | 7 minutes 8 seconds 700 milliseconds |
Sales Invoice Line | 50000 | 42 seconds 700 milliseconds |
Purch. Rcpt. Header | 50000 | 48 seconds 900 milliseconds |
Purch. Rcpt. Line | 50000 | 54 seconds 900 milliseconds |
Purch. Inv. Header | 50000 | 1 minute 31 seconds 200 milliseconds |
Purch. Inv. Line | 50000 | 59 seconds 400 milliseconds |
Res. Capacity Entry | 50000 | 27 seconds 400 milliseconds |
Job | 50000 | 45 seconds 900 milliseconds |
Job Ledger Entry | 50000 | 44 seconds 600 milliseconds |
G/L Entry – VAT Entry Link | 50000 | 36 seconds 300 milliseconds |
VAT Entry | 50000 | 36 seconds 400 milliseconds |
Detailed Cust. Ledg. Entry | 50000 | 1 minute 2 seconds 900 milliseconds |
Detailed Vendor Ledg. Entry | 50000 | 36 seconds 100 milliseconds |
Only the Vendor take quite some time. But proberly this is caused by an server issue. I run in this case Business Central on an on-premise VM Machine (Standard E4as v5 (4 vcpus, 32 GiB memory)). Some other partners doesn’t have this experience on an Business Central Online environment.
When the files are in Microsoft Fabric the Notebook will transform the .csv files into an lakehouse delta table.
In that first initial run when there are no tables yet the Notebook took only 4 minutes an 54 seconds to run.
Second export
Mostly the delta files take allot of more time because the notebook have to sort and remove the suplicates. In the second run I have exported the following records:
Table | New records | Modified records |
G/L Entry | 2000 | |
Customer | 2000 | 1000 |
Cust. Ledger Entry | 2000 | |
Vendor | 2000 | 1000 |
Vendor Ledger Entry | 2000 | |
Item | 2000 | 1000 |
Item Ledger Entry | 2000 | |
Sales Header | 2000 | 1000 |
Sales Line | 2000 | |
Purchase Header | 2000 | 1000 |
Purchase Line | 2000 | |
Purch. Comment Line | 2000 | |
Sales Comment Line | 2000 | |
G/L Budget Entry | 2000 | |
Sales Shipment Header | 2000 | |
Sales Shipment Line | 2000 | |
Sales Invoice Header | 2000 | |
Sales Invoice Line | 2000 | |
Purch. Rcpt. Header | 2000 | |
Purch. Rcpt. Line | 2000 | |
Purch. Inv. Header | 2000 | |
Purch. Inv. Line | 2000 | |
Res. Capacity Entry | 2000 | |
Job | 2000 | 1000 |
Job Ledger Entry | 2000 | |
G/L Entry – VAT Entry Link | 2000 | |
VAT Entry | 2000 | |
Detailed Cust. Ledg. Entry | 2000 | |
Detailed Vendor Ledg. Entry | 2000 |
When I run the Notebook again the second run has to take 9 minutes and 25 seconds. After that I run another export from Business Central with the same amount of new and modified records and the third time the Notebook ran it took 11 minutes and 11 seconds to complete.
So very impresive. This because with Azure Synapse it took almost 7 minutes per entity to complete the task.
So in short:
Tables exported | New records | Modified records | Notebook |
29 | 1.800.000 | 4 min 54 sec | |
29 | 58.000 | 6.000 | 9 min 25 sec |
29 | 58.000 | 6.000 | 11 min 11 sec |
Parallelism
A spark dataframe works with different partitions because of the big size the dataframe can be (mostly the number of partitions are equal to the number of CPU cores).
This allows completing the job faster. But if you have to move data from one partition to another partitions this can be slowing your job (operations like sorting etc.).
In that case it is better to tune how many partitions you want in your spark pool.
That is why there is now a build in parameter. In this case you can define your number of partitions on your own to do the fasted job for your workspace:
Leave a Reply