How to get large KQL queries into Power BI

How to get large KQL queries into Power BI

Sometimes you want analyse your telemetry in your own Power BI next to the standard Power BI telemetry dashboard found here:
Find the right app | Microsoft AppSource

But when you are loading large sets of data in your Power BI you don’t get all the data. This is because the Application Insights API has two hard limits:
– Receiving 500.000 rows
– Receiving 64000000 bytes

You can lower the numbers by adding a summarize (or other things) to you KQL query. But if you want the raw data you can also create a function in Power BI to retrieve all the entries.

In this case we want all the Pageviews for 30 days in Power BI with the following query (this query it from the standard Power Bi report from Kennie):

pageViews
| where timestamp > ago(30d)
| where customDimensions has 'CL0001'
| where customDimensions.eventID == 'CL0001' or customDimensions.eventId == 'CL0001'
| where isnotempty(customDimensions.aadTenantId) // filter away signal from Docker sandboxes
| extend TenantId = strcat( toupper(customDimensions.aadTenantId), toupper(customDimensions.environmentName) )
, ExtensionVersion = iff( tostring(customDimensions.appVersion) == 'null', '', tostring(customDimensions.appVersion) )
| parse kind=regex client_Browser with browserName:string ' ' browserVersion:string
| parse ExtensionVersion with majorVersion '.' minorVersion '.' theRest
| summarize Count=count(), SumDurationInMs = sum(duration)
by 
timestamp = bin(timestamp, 1h)
, TenantId
, ClientType = tostring( customDimensions.clientType )
, ObjectId = tostring( customDimensions.alObjectId )
, PageName = tostring( customDimensions.alObjectName )
, ObjectType = tostring( customDimensions.alObjectType )
, PageType = tostring( customDimensions.pageType )
, ExtensionId = tostring(customDimensions.appId)
, ExtensionName = iff( tostring(customDimensions.appName) == 'null', 'Platform', tostring(customDimensions.appName) )
, ExtensionVersion, MajorVersion=toint(majorVersion), MinorVersion=toint(minorVersion)
, ExtensionPublisher = iff( tostring(customDimensions.appPublisher) == 'null', 'Microsoft', tostring(customDimensions.appPublisher))
, CompanyName = tostring( customDimensions.companyName )
, Browser = iff(browserName=='Edg', 'Edge', browserName)

If we run this query it exceeds in more then 500.000 records.
So that is why we want to query each day so we get less results in a single query.
We change the following in the query:
| where timestamp > ago(30d)
by
| where timestamp between (now(-1d)..now(-0d))

In this case we query the pageviews only for one day.
When we run it we can save it to an “Power BI M Query”:

In Power BI we create a new query and in the “Advanced editor” we put the “M Query” and save it
After above we are creating an extra new query and add the following column:

Then we do the following:
– Convert it to a table
– Add an index column:

– Change the columns into Text values
– Rename the columns

Now we are going to create a function from the “Power BI M query” we just added and change the following in the “Advanced editor”:

Then we are going to add a function column in the “days” query:

After this is added we can expand the column and use the original names:

When everything is done we can remove the query (because this is converted into a function).
When we apply the changes, Power BI will query each day end we get the following result:

In this case 5.5 million records.

Good luck 😊


3 COMMENTS

Ruben Tijhuis

Nice!

I will try it out some day, there’s no need for more than 500k results in my dashboard right now. But good to know how to achieve this!

Ruben Tijhuis

I got this to work yesterday, and in the end used another method (hash_md5() to link to (distinct) StackTraces in another table to stay under the 8 MiB-limit), but was wondering about this method.

It uses now(), which is always the same datetime in √≥ne query, but here you split it in 30 queries. So if every query takes 1 minute to execute, don’t you have 1-minute-periods that you don’t ingest? For instance, if you start at 04-04-2023 15:50 the first query will take data from (’04-04-2023 15:50′(-1d)..’04-04-2023 15:50′(-0d)). That takes 1 minute, and then the next one will take data from (04-04-2023 15:51(-2d)..04-04-2023 15:51(-1d)).

Is this so? Would there be workaround?

    Bert Verbeek

    That could indeed be possible if the qeuries is slower.
    Moslty it will be seperate queries that you send to App insights back-end.
    What you can do is hash it in your query and then retrieve it in batches.

    Example is (in Power BI M query):
    | extend partitionhash = hash( strcat(ClientTypeId, ExtensionHash, timestamp, ObjectHash, ReportPayloadHash ) )
    | extend partition = partitionhash%” & Number.ToText(partitions) & ”
    | where partition ” & “==” & Number.ToText(partition_number)

    Then in Power BI you get a table with 30 partitions and you query every partition seperate.

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.