Handling large sites

Unlighthouse is configured by default to run on any sized site and only perform useful scans.

Default configuration

Unlighthouse is configured by default to work on large sites. If you're scanning a smaller site you may consider changing some of the following:

ignoreI18nPages enabled
maxRoutes set to 200
skipJavascript enabled
samples set to 1
throttling disabled
crawler enabled
dynamicSampling set to 5

For example, when scanning a blog with thousands of posts, it may be redundant to scan every single blog post, as the DOM is very similar. Using the configuration we can select exactly how many posts should be scanned.

Manually select URLs

You can configure Unlighthouse to use an explicit list of relative paths. This can be useful if you have a fairly complex and large site.

See Manually providing URLs for more information.

Provide Route Definitions (optional)

To make the most intelligent sampling decisions, Unlighthouse needs to know which page files are available. When running using the integration API, Unlighthouse will automatically provide this information.

Using the CLI you should follow the providing route definitions guide.

Note: When no route definitions are provided it will match based on URL fragments, i.e /blog/post-slug-3 will be mapped to blog-slug.

Exclude URL Patterns

Paths to ignore from scanning.

For example, if your site has a documentation section, that doesn't need to be scanned.

export default {
  scanner: {
    exclude: [
      '/docs/*'
    ]
  }
}

Include URL Patterns

Explicitly include paths; this will exclude any paths not listed here.

For example, if you run a blog and want to only scan your article and author pages.

export default {
  scanner: {
    include: [
      '/articles/*',
      '/authors/*'
    ]
  }
}

Change Dynamic Sampling Limit

By default, a URLs will be matched to a specific route definition 5 times.

You can change the sample limit with:

export default {
  scanner: {
    // see 20 samples for each page file
    dynamicSampling: 20
  }
}

Disabling Sampling

In cases where the route definitions aren't provided, a less-smart sampling will occur where URLs under the same parent will be sampled.

For these instances you may want to disable the sample as follows:

export default {
  scanner: {
    // no dynamic sampling
    dynamicSampling: false
  }
}

Change Scan Device

Improving Accuracy

Handling large sites #

Default configuration #

Manually select URLs #

Provide Route Definitions (optional) #

Exclude URL Patterns #

Include URL Patterns #

Change Dynamic Sampling Limit #

Disabling Sampling #