File: JSON_FORMAT.md

package info (click to toggle)
firefox 149.0-1
links: PTS, VCS
area: main
in suites: sid
size: 4,767,760 kB
sloc: cpp: 7,416,064; javascript: 6,752,859; ansic: 3,774,850; python: 1,250,473; xml: 641,578; asm: 439,191; java: 186,617; sh: 56,634; makefile: 18,856; objc: 13,092; perl: 12,763; pascal: 5,960; yacc: 4,583; cs: 3,846; lex: 1,720; ruby: 1,002; php: 436; lisp: 258; awk: 105; sql: 66; sed: 53; csh: 10; exp: 6
file content (352 lines) | stat: -rw-r--r-- 8,831 bytes
# Manifest Runtime Data Format Documentation

This document describes the JSON file formats created by `fetch-manifest-data.js`.

## Overview

The script generates two types of JSON files:

1. **Detailed data**: `manifests.json` - Contains all individual manifest runs with task IDs and commit hashes
2. **Runtimes data**: `manifests-runtimes.json` - Smaller file grouping runtimes by manifest and job

Both formats use string tables and index-based lookups to minimize file size.

---

## Detailed Data Format (`manifests.json`)

### Top-Level Structure

```json
{
  "metadata": { ... },
  "manifests": [ ... ],
  "jobNames": [ ... ],
  "commits": [ ... ],
  "prefixes": [ ... ],
  "tasks": { ... },
  "runs": { ... }
}
```

### metadata

Contains information about the data collection:

```json
{
  "date": "2025-02-04",
  "repository": "mozilla-central",
  "generatedAt": "2025-02-05T14:24:33.451Z",
  "processedJobCount": 1234,
  "failedJobCount": 56,
  "skippedJobCount": 789,
  "manifestCount": 3456,
  "jobNameCount": 45,
  "timingCount": 12345
}
```

### String Tables

All strings are deduplicated and stored once:

```json
{
  "manifests": [
    "dom/tests/mochitest/general/mochitest.toml",
    "toolkit/components/extensions/test/mochitest/mochitest.toml",
    ...
  ],
  "jobNames": [
    "test-linux2404-64/debug-mochitest-browser-chrome",
    "test-macosx1470-64/opt-mochitest-plain",
    ...
  ],
  "commits": [
    "f37a6863f87aeeb870b16223045ea7614b1ba0a7",
    "abc123def456789012345678901234567890abcd",
    ...
  ],
  "prefixes": [
    "mochitest-browser-chrome",
    "mochitest-plain",
    "wpt",
    "xpcshell",
    "reftest",
    ...
  ]
}
```

### tasks

Object containing parallel arrays with task metadata:

```json
{
  "tasks": {
    "id": [
      "YJJe4a0CRIqbAmcCo8n63w",
      "XPPf5b1DRJrcBndDp9o74x.1",
      ...
    ],
    "jobName": [5, 12, 8, 15, ...],
    "commitId": [0, 0, 1, 1, ...],
    "prefix": [0, 1, 2, 1, ...]
  }
}
```

All arrays are parallel - index `i` corresponds to the same task:
- `id[i]`: Task ID string (includes retry suffix when retry > 0, e.g., `taskId.1`)
- `jobName[i]`: Index into `jobNames` array (full job name including chunk number)
- `commitId[i]`: Index into `commits` array
- `prefix[i]`: Index into `prefixes` array (artifact type used)

**Note**: The `jobNames` table contains both base names (without chunks, used in `runs.jobNameIds`) and full names (with chunks, used in `tasks.jobName`).

### runs

Parallel arrays containing timing data:

```json
{
  "runs": {
    "manifestIds": [0, 1, 0, 2, ...],
    "jobNameIds": [0, 0, 1, 1, ...],
    "taskIds": [0, 1, 2, 3, ...],
    "durations": [1234, 5678, 2345, ...]
  }
}
```

All arrays are parallel - index `i` in each array corresponds to the same manifest run:
- `manifestIds[i]`: Index into `manifests` array
- `jobNameIds[i]`: Index into `jobNames` array (base name without chunk)
- `taskIds[i]`: Index into `tasks.id` array
- `durations[i]`: Runtime in milliseconds

**Example lookup:**
```javascript
const i = 5;
const manifest = data.manifests[data.runs.manifestIds[i]];
const jobBaseName = data.jobNames[data.runs.jobNameIds[i]];
const duration = data.runs.durations[i];

// Get task details
const taskIdx = data.runs.taskIds[i];
const taskId = data.tasks.id[taskIdx];
const fullJobName = data.jobNames[data.tasks.jobName[taskIdx]];
const commitIdx = data.tasks.commitId[taskIdx];
const commit = data.commits[commitIdx];
const prefixIdx = data.tasks.prefix[taskIdx];
const prefix = data.prefixes[prefixIdx];

// Open in Treeherder
const treeherderUrl = `https://treeherder.mozilla.org/#/jobs?repo=${data.metadata.repository}&revision=${commit}&selectedTaskRun=${taskId}`;
```

---

## Runtimes Data Format (`manifests-runtimes.json`)

### Top-Level Structure

```json
{
  "metadata": { ... },
  "jobNames": [ ... ],
  "manifests": { ... }
}
```

### metadata

```json
{
  "date": "2025-02-04",
  "repository": "mozilla-central",
  "generatedAt": "2025-02-05T14:24:33.451Z",
  "manifestCount": 3456,
  "jobNameCount": 45
}
```

### jobNames

String table for job names:

```json
{
  "jobNames": [
    "test-linux2404-64/debug-mochitest-browser-chrome",
    "test-macosx1470-64/opt-mochitest-plain",
    ...
  ]
}
```

### manifests

Object where keys are manifest names and values are objects containing parallel arrays:

```json
{
  "manifests": {
    "dom/tests/mochitest/general/mochitest.toml": {
      "jobs": [0, 1],
      "runtimes": [
        [1200, 1234, 1250, 1280, 1300],
        [2100, 2150, 2200]
      ]
    },
    "toolkit/components/extensions/test/mochitest/mochitest.toml": {
      "jobs": [0],
      "runtimes": [
        [3400, 3500, 3600]
      ]
    },
    ...
  }
}
```

Each manifest has an object containing parallel arrays:
- `jobs`: Array of indices into the `jobNames` array
- `runtimes`: Array of arrays, where `runtimes[i]` is the sorted runtime values in milliseconds for the manifest on `jobs[i]`

**Example lookup:**
```javascript
const manifest = "dom/tests/mochitest/general/mochitest.toml";
const manifestData = data.manifests[manifest];

for (let i = 0; i < manifestData.jobs.length; i++) {
  const jobName = data.jobNames[manifestData.jobs[i]];
  const runtimes = manifestData.runtimes[i];
  const median = runtimes[Math.floor(runtimes.length / 2)];
  const mean = runtimes.reduce((a, b) => a + b) / runtimes.length;
  console.log(`${jobName}: median=${median}ms, mean=${mean}ms, runs=${runtimes.length}`);
}
```

---

## Use Cases

### Detailed File

**Purpose**: Deep analysis with full traceability to specific CI runs

**Use cases:**
- Link to specific Treeherder jobs for investigation
- Track timing changes across commits
- Identify which specific runs were slow
- Debug timing regressions

**Example: Find all slow runs for a manifest**
```javascript
const targetManifest = "dom/tests/mochitest/general/mochitest.toml";
const manifestId = data.manifests.indexOf(targetManifest);
const threshold = 5000; // ms

const slowRuns = [];
for (let i = 0; i < data.runs.durations.length; i++) {
  if (data.runs.manifestIds[i] === manifestId && data.runs.durations[i] > threshold) {
    slowRuns.push({
      duration: data.runs.durations[i],
      job: data.jobNames[data.runs.jobNameIds[i]],
      commit: data.commits[data.runs.commitIds[i]],
      taskId: data.taskIds[data.runs.taskIds[i]],
    });
  }
}
```

### Runtimes File

**Purpose**: Fast dashboard loading and statistical analysis

**Use cases:**
- Calculate median/mean runtimes per manifest
- Compare timing distributions across jobs
- Identify consistently slow manifests
- Generate runtime estimates for scheduling

**Example: Find slowest manifests by median runtime**
```javascript
const manifestStats = [];

for (const [manifestName, manifestData] of Object.entries(data.manifests)) {
  let totalMedian = 0;

  for (const runtimes of manifestData.runtimes) {
    const median = runtimes[Math.floor(runtimes.length / 2)];
    totalMedian += median;
  }

  manifestStats.push({
    manifest: manifestName,
    avgMedian: totalMedian / manifestData.runtimes.length,
  });
}

manifestStats.sort((a, b) => b.avgMedian - a.avgMedian);
const slowest = manifestStats.slice(0, 10);
```

---

## Data Characteristics

### Job Names

Job names in the `jobNames` table have chunk numbers removed. For example:
- `test-linux2404-64/debug-mochitest-browser-chrome-5` → `test-linux2404-64/debug-mochitest-browser-chrome`
- `test-macosx1470-64/opt-web-platform-tests-12` → `test-macosx1470-64/opt-web-platform-tests`

### Task IDs

Task IDs in the detailed file format:
- Retry 0: `"taskId"` (no suffix)
- Retry > 0: `"taskId.N"` (includes retry number)

### Durations

All duration values are in **milliseconds** and are integers.

### Filtering

The data only includes:
- Jobs from `test-*` (actual test jobs, not build jobs)
- Jobs with `state === "completed"` and `result` in `["success", "testfailed"]`
- Test harnesses that produce manifest-level timing data (excludes `cppunittest`, `marionette`, `gtest`, etc.)

### Manifests

Manifest names are the full path to the manifest file, typically ending in:
- `mochitest.toml`
- `browser.toml`
- `chrome.toml`
- `xpcshell.toml`

For web-platform-tests, manifest names may be test directory paths.

---

## File Size Comparison

Typical file sizes for one day of mozilla-central data:

- **Detailed file**: ~8 MB (contains all individual runs)
- **Runtimes file**: ~3 MB (grouped and sorted)

---

## Notes

- Runtimes in the runtimes file are **sorted** for efficient percentile calculations
- The detailed file allows reconstruction of the runtimes file
- Both files share the same `jobNames` table for consistency
- Commit hashes are full 40-character SHA-1 hashes from mozilla-central
- Task IDs can be used with the TaskCluster API or Treeherder UI