I'm working on a machine learning project aimed at automatically predicting dependency links between tasks in industrial maintenance procedures in a group of tasks called gamme.
Each gamme consists of a list of textual task descriptions, often grouped by equipment type (e.g., heat exchanger, column, balloon) and work phases (e.g., "to be done before shutdown", "during shutdown", etc.). The goal is to learn which tasks depend on others in a directed dependency graph (precursor → successor), based only on their textual descriptions.
What I’ve built so far:
Model architecture: A custom link prediction model using a CamemBERT-large encoder. For each pair of tasks (i, j) in a gamme, the model predicts whether a dependency i → j exists.
Data format:
Each training sample is a gamme (i.e., a sequence of tasks), represented as:jsonCopierModifier{ "lines": ["[PHASE] [equipment] Task description ; DURATION=n", ...], "task_ids": [...], "edges": [[i, j], ...], // known dependencies "phases": [...], "equipment_type": "echangeur" }
Model inputs:
For each task:
Tokenized text (via CamemBERT tokenizer)
Phase and equipment type, passed both as text in the input and as learned embeddings
Link prediction: For each (i, j) pair:
Extract [CLS] embeddings + phase/equipment embeddings
Concatenate + feed into MLP
Binary output: 1 if dependency predicted, 0 otherwise
Dataset size:
988 gammes (~30 tasks each on average)
~35,000 positive dependency pairs, ~1.25 million negative ones
Coverage of 13 distinct work phases, 3 equipment types
Many gammes include multiple dependencies per task
Sample of my dataset : Dataset.jsonl
{
"gamme_id": "L_echangeur_30",
"equipment_type": "heat_exchanger",
"lines": [
"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] WORK TO BE DONE BEFORE SHUTDOWN ; DURATION=0",
"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] INSTALLATION OF RUBBER-LINED PIPING ; DURATION=1",
"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] JOINT INSPECTION ; DURATION=1",
"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] WORK RECEPTION ; DURATION=1",
"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] DISMANTLING OF SCAFFOLDING ; DURATION=1",
"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] INSTALLATION OF SCAFFOLDING ; DURATION=1",
"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] SCAFFOLDING INSPECTION ; DURATION=1",
"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] MEASUREMENTS BEFORE PREFABRICATION ; DURATION=1",
[...]
"[END OF WORK] [heat_exchanger] MILESTONE: END OF WORK ; DURATION=0"
],
"task_ids": [
"E2010.T1.10", "E2010.T1.100", "E2010.T1.110", "E2010.T1.120", "E2010.T1.130",
"E2010.T1.20", "E2010.T1.30", "E2010.T1.40", "E2010.T1.45", "E2010.T1.50",
"E2010.T1.60", "E2010.T1.70", "E2010.T1.80", "E2010.T1.90", "E2010.T1.139"
],
"edges": [
[0, 5], [5, 6], [6, 7], [7, 8], [8, 9], [9, 10], [10, 11], [11, 12],
[12, 13], [13, 1], [1, 2], [2, 3], [3, 4], [4, 14]
],
"phases": [
"WORK TO BE DONE BEFORE SHUTDOWN",
"WORK TO BE DONE BEFORE SHUTDOWN",
"WORK TO BE DONE BEFORE SHUTDOWN",
"WORK TO BE DONE BEFORE SHUTDOWN",
"WORK TO BE DONE BEFORE SHUTDOWN",
"WORK TO BE DONE BEFORE SHUTDOWN",
"WORK TO BE DONE BEFORE SHUTDOWN",
"WORK TO BE DONE BEFORE SHUTDOWN",
"WORK TO BE DONE DURING SHUTDOWN",
"WORK TO BE DONE DURING SHUTDOWN",
[...]
"END OF WORK"
]
}
The problem:
Even when evaluating on gammes from the dataset itself, the model performs poorly (low precision/recall or wrong structure), and seems to struggle to learn meaningful patterns. Examples of issues:
Predicts dependencies where there shouldn't be any
Fails to capture multi-dependency tasks
Often outputs inconsistent or cyclic graphs
What I’ve already tried:
Using BCEWithLogitsLoss with pos_weight to handle class imbalance
Limiting negative sampling (3:1 ratio)
Embedding phase and equipment info both as text and as vectors
Reducing batch size and model size (CamemBERT-base instead of large)
Evaluating across different decision thresholds (0.3 to 0.7)
Visualizing predicted edges vs. ground truth
Trying GNN or MLP model : MLP's results were not great and GNN needs edge_index at inference which is what we're trying to generate
My questions:
Is my dataset sufficient to train such a model? Or is the class imbalance / signal too weak?
Would removing the separate embeddings for phase/equipment and relying solely on text help or hurt?
Should I switch to another model ?
Are there better strategies for modeling context-aware pairwise dependencies in sequences where order doesn’t imply logic?
Any advice or references would be appreciated. Thanks a lot in advance!