Skip to content

Config Reference

Every training run in ALIGNN is controlled by a single JSON file. The schema is enforced by two pydantic classes: TrainingConfig (top-level) in alignn/config.py and a per-model config (e.g. ALIGNNAtomWiseConfig) in alignn/models/alignn_atomwise.py.

At runtime the config is validated and any unknown field is rejected, so this file is the authoritative list of what's available.


Top-level (TrainingConfig)

Dataset & target

Field Type / default Meaning
dataset one of dft_3d, megnet, qm9, user_data, … (see enum) / "dft_3d" Which dataset. Use "user_data" for your own POSCARs.
target string / "exfoliation_energy" Column in id_prop.csv to predict.
id_tag "jid" | "id" | "_oqmd_entry_id" / "jid" Which column holds the ID.
classification_threshold float? / None If set, regression targets above it become class 1; enables classification mode.
target_multiplication_factor float? / None Scale all targets by this constant (useful when units don't match loss scale).

Atoms → graph

Field Default Meaning
atom_features "cgcnn" Node feature scheme. cgcnn = 92-d one-hot-ish table, atomic_number = scalar Z, basic = 11 hand-picked features, cfid = 438 chemical descriptors. Must match model.atom_input_features.
neighbor_strategy "k-nearest" How to build the neighbor graph. See Neighbor strategies.
cutoff 8.0 Radius cutoff (Å) for pair edges.
cutoff_extra 3.0 Fallback slack for radius_graph if too few neighbors are found.
three_body_cutoff None Separate cutoff (Å) for line-graph / angle triplets. Only honored by neighbor_strategy="pure_torch". None → use cutoff. Must be ≤ cutoff.
max_neighbors 12 Per-source cap.
use_canonize True Canonicalize k-nearest edge order (reproducibility).
compute_line_graph True Build the bond-angle line graph.

Neighbor strategies

Value Backend Autograd through r? Notes
k-nearest (default) Jarvis no Fastest for small property-prediction cells.
radius_graph pure torch no Honest cutoff; slower.
radius_graph_jarvis Jarvis no Uses Jarvis's NeighborsAnalysis.
fast_graph matscipy (C) no Same topology as radius_graph but ~200× faster on large cells. Requires pip install matscipy.
torch_graph matscipy + torch yes Recomputes r = positions[dst] − positions[src] + shift @ lattice in torch → gradients flow to positions and lattice.
pure_torch matscipy + torch yes Like torch_graph, plus respects three_body_cutoff for the line graph.

Splitting

Field Default Meaning
train_ratio / val_ratio / test_ratio 0.8 / 0.1 / 0.1 Fractions (used when counts are not set).
n_train / n_val / n_test None Absolute counts; override the ratios when set.
keep_data_order True If True, no shuffling before the split — makes comparisons across runs reproducible.
random_seed 123 Seed for Python / numpy / torch / CUDA RNGs.

Training loop

Field Default Meaning
epochs 300 Number of training passes.
batch_size 64 Per-step batch size.
learning_rate 1e-2 Base LR.
weight_decay 0 AdamW / SGD weight decay.
warmup_steps 2000 OneCycle warmup steps (ignored if scheduler="none").
criterion "mse" Regression loss. "l1" is what the force-field examples use. "zig" / "poisson" are specialized.
optimizer "adamw" "adamw" or "sgd".
scheduler "onecycle" "onecycle" or "none".
n_early_stopping None Stop if val loss hasn't improved in this many epochs (~50 is common).

Graph caching

Field Default Meaning
use_lmdb True Cache built graphs in LMDB (fast reload, memory-bounded).
read_existing False If True, reuse an existing LMDB cache on disk. Must match the current model backend (DGL vs pure-torch). Default False rebuilds from scratch every run to avoid footguns.

Runtime & distributed

Field Default Meaning
dtype "float32" "float32" or "float64".
num_workers 4 DataLoader workers. 0 for debugging.
pin_memory False Pinned host memory for faster H2D copies on GPU.
distributed False Enable DDP multi-process training (paired with torchrun).
data_parallel False Legacy single-process multi-GPU. Prefer distributed.

Logging & outputs

Field Default Meaning
output_dir . Where checkpoints / logs / predictions are written.
filename "sample" Prefix for cache dirs (<prefix>train_data, etc.).
write_checkpoint True Save best_model.pt / current_model.pt.
write_predictions True Dump per-sample predictions at the end.
store_outputs True Keep intermediate outputs in the history JSONs.
progress True tqdm progress bars.
log_tensorboard False Also write a TensorBoard run.
save_dataloader False Pickle the built DataLoader objects (large; rarely useful).
standard_scalar_and_pca False Z-score + PCA the targets (legacy).
normalize_graph_level_loss False Divide the graph-level loss by number of atoms (enable for extensive quantities mixed with intensive ones).

model

A nested object whose schema is chosen by model.name. Currently available:

name Class Purpose
alignn ALIGNNConfig Original ALIGNN (scalar regression only).
alignn_atomwise ALIGNNAtomWiseConfig Adds atomwise / gradwise / stresswise outputs — this is what force-field.md uses.
alignn_atomwise_pure ALIGNNAtomWisePureConfig DGL-free, scatter-based reimplementation of alignn_atomwise. Accepts the same config fields; TorchScript-exportable for LAMMPS. See Pure-torch notes.
ealignn_atomwise eALIGNNAtomWiseConfig Equivariant variant.

See Model config (alignn_atomwise) below.


Model config (alignn_atomwise)

All fields below apply to both alignn_atomwise and alignn_atomwise_pure unless noted.

Architecture

Field Default Meaning
name required "alignn_atomwise" or "alignn_atomwise_pure".
alignn_layers 2 Number of ALIGNN (node + edge + triplet) conv blocks.
gcn_layers 2 Number of subsequent node + edge conv blocks (no triplets).
atom_input_features 1 Width of incoming atom features. Must match atom_features (92 for cgcnn, 1 for atomic_number, 11 for basic, 438 for cfid).
edge_input_features 80 Number of RBF bins for bond lengths.
triplet_input_features 40 Number of RBF bins for bond-angle cosines.
embedding_features 64 Width of the two-layer MLP inside the edge/angle embeddings.
hidden_features 64 Core channel width running through all conv blocks.
output_features 1 Graph-level output dimension (>1 for multi-output regression).

Output heads and loss weights

Field Default Meaning
calculate_gradient True Compute per-atom forces via ∂E/∂r.
atomwise_output_features 0 Per-atom prediction dim (e.g. charges, magmoms). 0 disables.
additional_output_features 0 Extra per-graph head dim. 0 disables.
graphwise_weight 1.0 Loss weight on the graph-level target.
gradwise_weight 1.0 Loss weight on forces. Setting to 0 automatically disables calculate_gradient.
stresswise_weight 0.0 Loss weight on the 3×3 stress tensor.
atomwise_weight 0.0 Loss weight on per-atom outputs (requires atomwise_output_features > 0).
additional_output_weight 0.0 Loss weight on the additional head.

Force-field specifics

Field Default Meaning
grad_multiplier -1 Multiplier applied to ∂E/∂r when converting to forces. Physics convention F = −∂E/∂x-1.
add_reverse_forces True Sum ∂E/∂r over both (i→j) and (j→i) edges (proper Newton's third law on directed graphs).
lg_on_fly True Recompute bond-angle cosines inside the autograd graph every forward (needed for exact force/stress derivatives). Leave True unless profiling.
batch_stress True Compute stress per graph in the batch (vs. one stress per whole batch).
force_mult_natoms False Scale forces by number of atoms (rare; use for unit conventions).
energy_mult_natoms True Treat the graph-head output as energy per atom and multiply by natoms before taking the gradient. Standard for ALIGNN-FF.
stress_multiplier 1.0 Final scalar on the stress output (for unit fixes).
include_pos_deriv False Alternative force path: differentiate through Cartesian positions directly (rather than through edge displacements). Currently experimental; the _pure variant accepts the field but ignores it.

Cutoff shaping

Field Default Meaning
use_cutoff_function False Apply a smooth polynomial envelope to distances before RBF. Useful when cutoff is tight.
multiply_cutoff False If use_cutoff_function=True, multiply the edge embedding by the envelope (instead of the bondlength).
inner_cutoff 3.0 Envelope radius (Å); envelope = 1 below this, smoothly → 0 up to cutoff.
exponent 5 Exponent of the smooth-cutoff polynomial.

Short-distance penalty

Field Default Meaning
use_penalty True Add a repulsive penalty when any bondlength falls below penalty_threshold. Prevents the model from hallucinating attractive wells at unphysical distances — important for MD stability.
penalty_factor 0.1 Strength.
penalty_threshold 1.0 Distance (Å) below which the penalty kicks in.

Miscellaneous

Field Default Meaning
link "identity" Output activation. "log" applies exp() (positive outputs), "logit" applies sigmoid().
classification False Two-class NLL head (set automatically when classification_threshold is non-null at the training level).
zero_inflated False Zero-inflated Gaussian output (specialized).
extra_features 0 If > 0, concatenate this many pre-computed fingerprint features per atom (e.g. Gong et al.).

Pure-torch notes

model.name = "alignn_atomwise_pure" swaps the DGL message-passing core for scatter_add / scatter_mean primitives. The forward outputs are numerically equivalent to alignn_atomwise — parity sweep across all doc examples shows |Δ| ≤ 0.04 on all losses after 3 epochs (force-field scenario is bit-identical, see alignn/scripts/parity_dgl_vs_pure.py). Residual drift is scatter_add nondeterminism on CUDA, not a modelling difference.

Not carried over from alignn_atomwise (config fields accepted but ignored):

  • include_pos_deriv
  • Some extra_features code paths — experimental in the DGL model too.

When the pure model is selected, the training pipeline automatically:

  • Uses alignn/pure_lmdb_dataset.py (pickles TorchGraph objects, no DGL).
  • Uses PureTorchLMDBDataset.collate_line_graph for batching.
  • Leaves dataloader / loss / metrics / checkpoints untouched.

For LAMMPS integration, call alignn/scripts/torch/export_torchscript.py (installed as the export_torchscript.py console script) — it scripts the forward_tensors_z(positions, lattice, atomic_numbers, src, dst, shift, compute_stress) entry point and bakes the atomic-number → feature lookup into the .pt so the C++ host only needs atomic numbers.


Minimal examples

Single-output regression

{
  "dataset": "user_data",
  "target": "formation_energy_peratom",
  "atom_features": "cgcnn",
  "neighbor_strategy": "k-nearest",
  "cutoff": 8.0,
  "max_neighbors": 12,
  "epochs": 100,
  "batch_size": 32,
  "learning_rate": 1e-3,
  "criterion": "mse",
  "model": {
    "name": "alignn_atomwise",
    "atom_input_features": 92,
    "alignn_layers": 4,
    "gcn_layers": 4,
    "hidden_features": 256,
    "output_features": 1,
    "calculate_gradient": false,
    "graphwise_weight": 1.0,
    "gradwise_weight": 0.0,
    "stresswise_weight": 0.0,
    "use_penalty": false
  }
}

ALIGNN-FF (energy + forces + stress)

{
  "dataset": "user_data",
  "target": "target",
  "atom_features": "cgcnn",
  "neighbor_strategy": "fast_graph",
  "cutoff": 6.0,
  "max_neighbors": 12,
  "epochs": 200,
  "batch_size": 2,
  "learning_rate": 1e-3,
  "criterion": "l1",
  "model": {
    "name": "alignn_atomwise",
    "atom_input_features": 92,
    "alignn_layers": 3,
    "gcn_layers": 3,
    "hidden_features": 128,
    "output_features": 1,
    "calculate_gradient": true,
    "graphwise_weight": 0.85,
    "gradwise_weight": 0.05,
    "stresswise_weight": 0.05,
    "add_reverse_forces": true,
    "lg_on_fly": true,
    "use_penalty": true
  }
}

Pure-torch FF with separate 3-body cutoff

{
  "neighbor_strategy": "pure_torch",
  "cutoff": 6.0,
  "three_body_cutoff": 4.0,
  "max_neighbors": 12,
  "model": { "name": "alignn_atomwise_pure", "…": "…" }
}

Classification

{
  "classification_threshold": 0.01,
  "target": "is_metal",
  "criterion": "mse",
  "model": {
    "name": "alignn_atomwise",
    "classification": true,
    "graphwise_weight": 1.0,
    "gradwise_weight": 0.0,
    "calculate_gradient": false
  }
}

Common gotchas

  • atom_input_features vs atom_features. These must match (cgcnn → 92, atomic_number → 1, basic → 11, cfid → 438). A mismatch raises a shape error at the first forward pass, not at config parse.
  • read_existing. Keep false unless you're sure the cache matches the current model backend + graph settings. Backend-mismatch (DGL vs pure) is caught and raises a clear error; parameter drift (different cutoff etc.) is not caught.
  • gradwise_weight=0 silently disables calculate_gradient in the model code. If you think forces should be training but see Grad=0.0000, check both flags.
  • use_penalty=true with a loose cutoff can inflate training loss when your data legitimately has short bonds (e.g. H-containing systems). Lower penalty_threshold or set use_penalty=false in that case.
  • batch_stress=false + large batch gives one stress for the whole batch, not per-graph — almost never what you want.
  • lg_on_fly=false caches triplet angle cosines; saves time but forces computed by autograd will be missing the angle-contribution unless you recompute them. Leave true for force-field training.