Test Workflows - Matrix and Sharding

info

This Workflows functionality is not available when running the Testkube Agent in Standalone Mode - Read More

Often you want to run a test with multiple scenarios or environments, either to distribute the load or to verify it on different setup.

Test Workflows have a built-in mechanism for all these cases - both static and dynamic.

Configuration File Setup

Test Workflow sharding is configured through YAML files that define TestWorkflow custom resources in your Kubernetes cluster.

Where to Define Configuration

You can create and apply Test Workflow configurations in several ways:

Create a YAML file (e.g., my-workflow.yaml) with your TestWorkflow definition
Apply it using kubectl:
```
kubectl apply -f my-workflow.yaml
```

Or use the Testkube CLI:

testkube create testworkflow -f my-workflow.yaml

Or use the Testkube Dashboard - navigate to Test Workflows and create/edit workflows through the UI

All Test Workflows are stored as custom resources in your Kubernetes cluster under the testworkflows.testkube.io/v1 API version.

Basic Configuration Structure

A minimal sharded workflow configuration looks like this:

apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
  name: my-sharded-workflow
spec:
  # Your container and content configuration
  container:
    image: your-test-image:latest
  
  steps:
  - name: Run tests
    parallel:
      count: 3  # Number of shards to create
      shell: 'run-your-tests.sh'

Choosing the Right Shard Number

The number of shards you configure has a direct impact on performance and resource utilization:

Performance Impact

Shard Count	Execution Time	Resource Usage	Best For
1 (no sharding)	Baseline	Low	Small test suites, limited resources
2-5	~50-80% reduction	Medium	Medium test suites (10-50 tests)
5-10	~60-90% reduction	High	Large test suites (50-200 tests)
10+	~70-95% reduction	Very High	Very large test suites (200+ tests)

tip

General Guidelines:

Small test suites (fewer than 10 tests): Use 1-2 shards. More shards add overhead without benefit.
Medium test suites (10-50 tests): Use 3-5 shards for optimal balance.
Large test suites (50-200 tests): Use 5-10 shards based on available cluster resources.
Very large test suites (200+ tests): Use 10-20 shards, but monitor resource consumption.

The optimal number depends on:

Test duration: Longer tests benefit more from sharding
Cluster capacity: Each shard requires a pod with allocated resources
Test distribution: Shards work best when tests can be evenly distributed

Resource Considerations

Each shard runs in its own pod, so consider:

CPU and memory: Each shard consumes the resources defined in container.resources
Cluster capacity: Ensure your cluster can handle count x resources simultaneously
Cost: More shards = more parallel pods = higher infrastructure costs during execution

Step-by-Step Configuration Guide

Step 1: Determine Your Sharding Strategy

Choose between static and dynamic sharding:

Static Sharding (count only): Fixed number of shards

parallel:
  count: 5  # Always creates exactly 5 shards

Dynamic Sharding (maxCount + shards): Adaptive based on test data

parallel:
  maxCount: 5  # Creates up to 5 shards based on available tests
  shards:
    testFiles: 'glob("tests/**/*.spec.js")'

Step 2: Define Resource Limits

Specify resources for each shard to prevent resource contention:

parallel:
  count: 3
  container:
    resources:
      requests:
        cpu: 1          # Each shard gets 1 CPU
        memory: 1Gi     # Each shard gets 1GB RAM
      limits:
        cpu: 2
        memory: 2Gi

Step 3: Configure Data Distribution

For dynamic sharding, define how to split your test data:

parallel:
  maxCount: 5
  shards:
    testFiles: 'glob("cypress/e2e/**/*.cy.js")'  # Discover test files
  shell: |
    # Access distributed test files via shard.testFiles
    npx cypress run --spec '{{ join(shard.testFiles, ",") }}'

Step 4: Apply and Verify

# Apply your workflow
kubectl apply -f my-sharded-workflow.yaml

# Run the workflow
testkube run testworkflow my-sharded-workflow -f

# Monitor execution
kubectl get pods -l testworkflow=my-sharded-workflow

Common Use Cases

Use Case 1: Sharding Cypress Tests

Distribute Cypress E2E tests across multiple shards:

apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
  name: cypress-sharded
spec:
  content:
    git:
      uri: https://github.com/your-org/your-repo
      paths: [cypress]
  container:
    image: cypress/included:13.6.4
    workingDir: /data/repo/cypress
  
  steps:
  - name: Install dependencies
    shell: npm ci
  
  - name: Run tests in parallel
    parallel:
      maxCount: 5  # Up to 5 shards for optimal distribution
      shards:
        testFiles: 'glob("cypress/e2e/**/*.cy.js")'
      description: 'Shard {{ index + 1 }}/{{ count }}: {{ join(shard.testFiles, ", ") }}'
      transfer:
      - from: /data/repo
      container:
        resources:
          requests:
            cpu: 1
            memory: 1Gi
      run:
        args: [--spec, '{{ join(shard.testFiles, ",") }}']

Use Case 2: Load Testing with K6

Generate load from multiple nodes:

apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
  name: k6-load-test
spec:
  container:
    image: grafana/k6:latest
  
  steps:
  - name: Run distributed load test
    parallel:
      count: 10  # 10 shards generating concurrent load
      description: 'Load generator {{ index + 1 }}/{{ count }}'
      container:
        resources:
          requests:
            cpu: 2
            memory: 2Gi
      shell: |
        k6 run --vus 50 --duration 5m \
          --tag shard={{ index }} script.js

Use Case 3: Multi-Browser Testing with Playwright

Test across different browsers with sharding:

apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
  name: playwright-multi-browser
spec:
  container:
    image: mcr.microsoft.com/playwright:latest
  
  steps:
  - name: Run tests
    parallel:
      matrix:
        browser: [chromium, firefox, webkit]  # Test on each browser
      count: 3  # Shard each browser's tests into 3 parts
      description: '{{ matrix.browser }} - shard {{ shardIndex + 1 }}/{{ shardCount }}'
      shell: |
        npx playwright test \
          --project={{ matrix.browser }} \
          --shard={{ shardIndex + 1 }}/{{ shardCount }}

Usage

Matrix and sharding features are supported in Services (services), and both Test Suite (execute) and Parallel Steps (parallel) operations.

Services (services)
Test Suite (execute)
Parallel Steps (parallel)

kind: TestWorkflow
apiVersion: testworkflows.testkube.io/v1
metadata:
  name: example-matrix-services
spec:
  services:
    remote:
      matrix:
        browser:
        - driver: chrome
          image: selenium/standalone-chrome:4.21.0-20240517
        - driver: edge
          image: selenium/standalone-edge:4.21.0-20240517
        - driver: firefox
          image: selenium/standalone-firefox:4.21.0-20240517
      image: "{{ matrix.browser.image }}"
      description: "{{ matrix.browser.driver }}"
      readinessProbe:
        httpGet:
          path: /wd/hub/status
          port: 4444
        periodSeconds: 1
  steps:
  - shell: 'echo {{ shellquote(join(map(services.remote, "tojson(_.value)"), "\n")) }}'

kind: TestWorkflow
apiVersion: testworkflows.testkube.io/v1
metadata:
  name: example-matrix-test-suite
spec:
  steps:
  - execute:
      workflows:
      - name: k6-workflow-smoke
        matrix:
          target:
          - https://testkube.io
          - https://docs.testkube.io
        config:
          target: "{{ matrix.target }}"

apiVersion: testworkflows.testkube.io/v1
kind: TestWorkflow
metadata:
  name: example-sharded-playwright
spec:
  content:
    git:
      uri: https://github.com/kubeshop/testkube
      paths:
      - test/playwright/playwright-project
  container:
    image: mcr.microsoft.com/playwright:v1.32.3-focal
    workingDir: /data/repo/test/playwright/playwright-project

  steps:
  - name: Install dependencies
    shell: 'npm ci'

  - name: Run tests
    parallel:
      count: 2
      transfer:
      - from: /data/repo
      shell: 'npx playwright test --shard {{ index + 1 }}/{{ count }}'

Syntax

This feature allows you to provide few properties:

matrix to run the operation for different combinations
count/maxCount to replicate or distribute the operation
shards to provide the dataset to distribute among replicas

Both matrix and shards can be used together - all the sharding (shards + count/maxCount) will be replicated for each matrix combination.

Matrix

Matrix allows you to run the operation for multiple combinations. The values for each instance are accessible by matrix.<key>.

In example:

parallel:
  matrix:
    image: ['node:20', 'node:21', 'node:22']
    memory: ['1Gi', '2Gi']
  container:
    resources:
      requests:
        memory: '{{ matrix.memory }}'
  run:
    image: '{{ matrix.image }}'

Will instantiate 6 copies:

`index`	`matrixIndex`	`matrix.image`	`matrix.memory`	`shardIndex`
`0`	`0`	`"node:20"`	`"1Gi"`	`0`
`1`	`1`	`"node:20"`	`"2Gi"`	`0`
`2`	`2`	`"node:21"`	`"1Gi"`	`0`
`3`	`3`	`"node:21"`	`"2Gi"`	`0`
`4`	`4`	`"node:22"`	`"1Gi"`	`0`
`5`	`5`	`"node:22"`	`"2Gi"`	`0`

The matrix properties can be a static list of values, like:

matrix:
  browser: [ 'chrome', 'firefox', '{{ config.another }}' ]

or could be dynamic one, using Test Workflow's expressions:

matrix:
  files: 'glob("/data/repo/**/*.test.js")'

Sharding

Often you may want to distribute the load, to speed up the execution. To do so, you can use shards and count/maxCount properties.

shards is a map of data to split across different instances
count/maxCount are describing the number of instances to start
- count defines static number of instances (always)
- maxCount defines maximum number of instances (will be lower if there is not enough data in shards to split)

Replicas (count only)
Static sharding (count + shards)
Dynamic sharding (maxCount + shards)

parallel:
  count: 5
  description: "{{ index + 1 }} instance of {{ count }}"
  run:
    image: grafana/k6:latest

parallel:
  count: 2
  description: "{{ index + 1 }} instance of {{ count }}"
  shards:
    url: ["https://testkube.io", "https://docs.testkube.io", "https://app.testkube.io"]
  run:
    # shard.url for 1st instance == ["https://testkube.io", "https://docs.testkube.io"]
    # shard.url for 2nd instance == ["https://app.testkube.io"]
    shell: 'echo {{ shellquote(join(shard.url, "\n")) }}'

parallel:
  maxCount: 5
  shards:
    # when there will be less than 5 tests found - it will be 1 instance per 1 test
    # when there will be more than 5 tests found - they will be distributed similarly to static sharding
    testFiles: 'glob("cypress/e2e/**/*.js")'
  description: '{{ join(map(shard.testFiles, "relpath(_.value, \"cypress/e2e\")"), ", ") }}'

Similarly to matrix, the shards may contain a static list, or Test Workflow's expression.

Counters

Besides having the matrix.<key> and shard.<key> there are some counter variables available in Test Workflow's expressions:

index and count - counters for total instances
matrixIndex and matrixCount - counters for the combinations
shardIndex and shardCount - counters for the shards

Matrix and sharding together

Sharding can be run along with matrix. In that case, for every matrix combination, we do have selected replicas/sharding. In example:

matrix:
  browser: ["chrome", "firefox"]
  memory: ["1Gi", "2Gi"]
count: 2
shards:
  url: ["https://testkube.io", "https://docs.testkube.io", "https://app.testkube.io"]

Will start 8 instances:

`index`	`matrixIndex`	`matrix.browser`	`matrix.memory`	`shardIndex`	`shard.url`
`0`	`0`	`"chrome"`	`"1Gi"`	`0`	`["https://testkube.io", "https://docs.testkube.io"]`
`1`	`0`	`"chrome"`	`"1Gi"`	`1`	`["https://app.testkube.io"]`
`2`	`1`	`"chrome"`	`"2Gi"`	`0`	`["https://testkube.io", "https://docs.testkube.io"]`
`3`	`1`	`"chrome"`	`"2Gi"`	`1`	`["https://app.testkube.io"]`
`4`	`2`	`"firefox"`	`"1Gi"`	`0`	`["https://testkube.io", "https://docs.testkube.io"]`
`5`	`2`	`"firefox"`	`"1Gi"`	`1`	`["https://app.testkube.io"]`
`6`	`3`	`"firefox"`	`"2Gi"`	`0`	`["https://testkube.io", "https://docs.testkube.io"]`
`7`	`3`	`"firefox"`	`"2Gi"`	`1`	`["https://app.testkube.io"]`

Troubleshooting and Best Practices

Common Issues

Issue: Shards Not Starting

Symptoms: Some or all shards remain in pending state

Solutions:

Check cluster resources: Ensure your cluster has enough capacity for all shards

kubectl describe nodes  # Check available resources
kubectl get pods -n testkube  # Check pod status

Review resource requests: Each shard needs allocated resources

container:
  resources:
    requests:
      cpu: 500m      # Reduce if resources are limited
      memory: 512Mi

Reduce shard count: If resources are constrained, use fewer shards
```
parallel:
  count: 3  # Reduced from 10
```

Issue: Uneven Test Distribution

Symptoms: Some shards finish much faster than others

Solutions:

Use dynamic sharding with maxCount instead of count:

parallel:
  maxCount: 5  # Adapts to available tests
  shards:
    testFiles: 'glob("tests/**/*.test.js")'

Ensure test files are similar in size/duration: Group fast and slow tests evenly

Monitor execution times:

testkube get twe EXECUTION_ID  # Check individual shard durations

Issue: Out of Memory Errors

Symptoms: Pods crash with OOM (Out of Memory) errors

Solutions:

Increase memory limits:

container:
  resources:
    limits:
      memory: 4Gi  # Increased from 2Gi

Reduce tests per shard: Increase shard count to distribute load

parallel:
  count: 10  # More shards = fewer tests per shard

Best Practices

1. Start Conservative and Scale Up

Begin with a small shard count and increase based on results:

# Week 1: Baseline
parallel:
  count: 2

# Week 2: If successful, increase
parallel:
  count: 5

# Week 3: Optimize based on metrics
parallel:
  count: 8  # Sweet spot for your test suite

2. Monitor Resource Usage

Track resource consumption to optimize shard configuration:

# Watch resource usage during execution
kubectl top pods -n testkube -l testworkflow=my-workflow

# Review completed execution metrics
testkube get twe EXECUTION_ID

3. Use Descriptive Names

Make debugging easier with clear descriptions:

parallel:
  count: 5
  description: 'Shard {{ index + 1 }}/{{ count }} - {{ shard.testFiles | length }} tests'

4. Implement Retry Logic

Account for transient failures in sharded tests:

steps:
- name: Run tests with retry
  parallel:
    count: 3
    retry:
      count: 2  # Retry failed shards up to 2 times
    shell: 'run-tests.sh'

5. Consider Cost vs. Speed Tradeoffs

More shards = faster execution but higher cost:

Development: Use fewer shards (2-3) to save resources
CI/CD: Use optimal shards (5-8) for speed
Production validation: Use maximum shards (10+) for critical releases

6. Balance Matrix and Sharding

When combining matrix and sharding, avoid excessive parallelism:

# This creates 3 browsers × 5 shards = 15 pods
parallel:
  matrix:
    browser: [chrome, firefox, safari]  # 3 combinations
  count: 5  # 5 shards per combination
  # Total: 15 concurrent pods - ensure cluster can handle this!

Configuration File Setup​

Where to Define Configuration​

Basic Configuration Structure​

Choosing the Right Shard Number​

Performance Impact​

Resource Considerations​

Step-by-Step Configuration Guide​

Step 1: Determine Your Sharding Strategy​

Step 2: Define Resource Limits​

Step 3: Configure Data Distribution​

Step 4: Apply and Verify​

Common Use Cases​

Use Case 1: Sharding Cypress Tests​

Use Case 2: Load Testing with K6​

Use Case 3: Multi-Browser Testing with Playwright​

Usage​

Syntax​

Matrix​

Sharding​

Counters​

Matrix and sharding together​

Troubleshooting and Best Practices​

Common Issues​

Issue: Shards Not Starting​

Issue: Uneven Test Distribution​

Issue: Out of Memory Errors​

Best Practices​

1. Start Conservative and Scale Up​

2. Monitor Resource Usage​

3. Use Descriptive Names​

4. Implement Retry Logic​

5. Consider Cost vs. Speed Tradeoffs​

6. Balance Matrix and Sharding​

Additional Resources​

Configuration File Setup

Where to Define Configuration

Basic Configuration Structure

Choosing the Right Shard Number

Performance Impact

Resource Considerations

Step-by-Step Configuration Guide

Step 1: Determine Your Sharding Strategy

Step 2: Define Resource Limits

Step 3: Configure Data Distribution

Step 4: Apply and Verify

Common Use Cases

Use Case 1: Sharding Cypress Tests

Use Case 2: Load Testing with K6

Use Case 3: Multi-Browser Testing with Playwright

Usage

Syntax

Matrix

Sharding

Counters

Matrix and sharding together

Troubleshooting and Best Practices

Common Issues

Issue: Shards Not Starting

Issue: Uneven Test Distribution

Issue: Out of Memory Errors

Best Practices

1. Start Conservative and Scale Up

2. Monitor Resource Usage

3. Use Descriptive Names

4. Implement Retry Logic

5. Consider Cost vs. Speed Tradeoffs

6. Balance Matrix and Sharding

Additional Resources