Skip to main content

Popularity

Popularity score displayed as stars on a table or dashboard in Catalog.

What Is Popularity

A table or Dashboard popularity tells you how frequently it is used by human users.

The popularity is a score given to data assets from 1 to 1,000,000 and then scaled down to a 5-star system.

The popularity computation is a quantile computation based on the number of queries (for tables) or number of views (for dashboards) amongst all tables and dashboards of the same source.

info

We may notably exclude table queries from specific users (settings on the extraction), usually coming from bots or services.

How Is It Computed

  • Ranking: We sort the assets with respect to their score (number of queries or views)

  • Bucketing: We put the asset into different buckets according to their rank, also known as "global scoring"

    • There are 8 buckets of varying size with the following thresholds:

      Bucket 0: assets ranked between 0% and 33% -- Bottom 33%
      Bucket 1: assets ranked between 33% and 48%
      Bucket 2: assets ranked between 48% and 63%
      Bucket 3: assets ranked between 63% and 73%
      Bucket 4: assets ranked between 73% and 83%
      Bucket 5: assets ranked between 83% and 93%
      Bucket 6: assets ranked between 93% and 98%
      Bucket 7: assets ranked between 98% and 100% -- Top 2%
  • In-bucket ranking: Within a single bucket, we sort again the assets and share them equally.

  • Final Score: Finally we compute a score out of the max popularity for all assets of a given source

    • With MAX being 1 000 000
    • With the number of buckets being 8
  • From the number to the stars

    • Everything before this step gives us a score that we store in our database. However, another process happens in the frontend to show you the number of stars according to the score.
    • As mentioned, the popularity can range from 0 to 1 000 000 (or be undefined). Then we bucket it down to 11 states, corresponding to stars and half stars: 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5.

Which Queries Are Used for Computation

First of all, popularity is calculated on 30 days of activities following the last refresh of your source.

Then there are a few exclusions that allow us to determine a more accurate popularity:

  • We only use read queries; we want to measure usage, not updates
  • We exclude queries that are immense or too small
  • We exclude service accounts from the calculation as we want to determine human usage and behavior. Queries by service accounts are reflected in the lineage as you will find parent and child assets there.

Some Facts

  • There is exactly 1 asset per source with a perfect score of 1M
  • Due to the way the bucketing is done, two assets with the same number of queries might end up in 2 different buckets
  • The top 2% of assets are all in the 8th bucket and hence have a score over 875,000 which means a number of stars between 4.5 and 5
  • The bottom 33% of assets are all in the first bucket and as such have a score lower than 125,000 which means a number of stars between 0 and 0.5