Popularity

What is popularity?
A table or Dashboard popularity tells you how frequently it is used by human users.
The popularity is a score given to data assets from 1 to 1 million and then downsized to a 5 stars system.
The popularity computation is a quantile computation based on the number of queries (for table) or number of views (for dashboard) amongst all tables/dashboards of the same source.
We may notably exclude table queries from specific users (settings on the extraction), usually coming from bots or services.
How is it computed?
-
Ranking: We sort the assets with respect to their score (number of queries/views)
-
Bucketing: We put the asset into different buckets according to their rank, aka "global scoring”
-
There are 8 buckets of varying size with the following thresholds:
Bucket 0: assets ranked between 0% -> 33% -- Bottom 33%
Bucket 1: assets ranked between 33% -> 48%
Bucket 2: assets ranked between 48% -> 63%
Bucket 3: assets ranked between 63% -> 73%
Bucket 4: assets ranked between 73% -> 83%
Bucket 5: assets ranked between 83% -> 93%
Bucket 6: assets ranked between 93% -> 98%
Bucket 7: assets ranked between 98% -> 100% -- Top 2%
-
-
In-bucket ranking: Within a single bucket, we sort again the assets and share them equally for
-
Final Score: Finally we compute a score out of the max popularity for all assets of a given source
- With MAX being
1 000 000
- With the number of buckets being
8
- With MAX being
-
From the number to the stars 💫
- Everything before this step gives us a score that we store in our database, however, another process happens in the frontend in order to show you the number of stars according to the score.
- As mentioned, the popularity can range from 0 to 1 000 000 (or be undefined). Then we bucket it down to 11 states, corresponding to stars and half stars - 0, 0.5, 1, ..., 4.5, 5.
Which queries are used for computation
First of all, popularity is calculated on 30 days of activities following the last refresh of your source.
Then there are a few exclusions that allow us to determine a more accurate popularity:
- We only use read queries, we want to be about usage, not update
- We exclude queries that are immense or too small
- We exclude service accounts from the calculation as we want to determine human usage and behavior. Queries by service account are translated in the lineage as you'll find parent/children assets there.
Some facts
- There is exactly 1 asset per source with a perfect score of 1M
- Due to the way the bucketing is done, two assets with the same number of queries might end up in 2 different buckets
- The top 2% of assets are all in the 8th bucket and hence have a score over 875000 which means a number of stars between 4.5 and 5.
- The bottom 33% of assets are all in the 1st bucket and as such have a score lower than 125000 which means a number of stars between 0 and 0.5