ComputeEstimator

Plan your Compute
with transparent methodology.

A guided estimator grounded in documented formulas, benchmark-informed assumptions, and hardware specifications to translate your research plans into credible GPU-hour requests for clusters like Jean Zay.

This tool is a work in progress and will be continuously calibrated with benchmark evidence from MINERVA and partners contributions .

Input

Questionnaire

Section 1

Workload Intent

Clarifies whether compute is for full-model training, partial adaptation, PEFT, or inference/evaluation.

Captures iteration overhead beyond the single best run.

Section 2

Architecture

Architecture family changes default model sizes and typical data profiles.

Select a representative size band or enter a custom value.

Using 7B parameters for the estimate.

Section 3

Data

Not all datasets have the same per-sample compute cost.

How much data is processed for one epoch (before iteration buffers).

How many times to pass through the entire dataset.

Min: 1
Max: 100

Section 4

Efficiency

The numeric format used for matrix multiplications.

Number of GPUs you intend to use for this run.

Min: 1
Max: 128

Output

Draft Estimate

GPU-Hours by Architecture
Primary output: estimated total GPU-hours for each architecture.

Axis in GPU-hours (auto-scaled per scenario)

Target Cluster

8× H100 (80gb)

Estimated Real Time

32.1 days

Request Recommendation
Recommended request summary for allocation.
H100

GPU-Hours On Recommended Type

6,163 GPU-hours (H100)

Recommended GPU Type

H100 (80gb)

Call Type

Dynamic access

Dynamic limit for H100: 12,500h

High compute or communication pressure: H100 is recommended for faster kernels and stronger scaling across nodes.

Memory Analysis
Careful
24% VRAM

Fits with multi-GPU sharding; communication/memory overhead can increase.

Est. VRAM / GPU18.9 GB / 80 GB
Total Request VRAM151 GB
Minimum GPUs (80GB tier)2

Info

This setup spans 2 nodes on NVIDIA H100 SXM5 (4 GPUs/node).

Inter-node communication over IB400 can reduce scaling efficiency.

Baseline Assumptions

Model FLOPs Utilization (MFU)[Megatron-LM / PaLM papers]
35%
Node-aware scaling efficiency
96% (2 nodes)
Iteration/Ops Overhead
1.6x
Development overhead (installs/downloads/interactive retries)
+71 GPU-hours
Precision mode[Micikevicius 2018]
Mixed precision (FP16/BF16) tensor-core kernels
Main result to read first
Estimated GPU-hours for each GPU type (V100, A100, H100).