0
$\begingroup$

I'm working on a machine learning project aimed at automatically predicting dependency links between tasks in industrial maintenance procedures in a group of tasks called gamme.

Each gamme consists of a list of textual task descriptions, often grouped by equipment type (e.g., heat exchanger, column, balloon) and work phases (e.g., "to be done before shutdown", "during shutdown", etc.). The goal is to learn which tasks depend on others in a directed dependency graph (precursor → successor), based only on their textual descriptions.

What I’ve built so far:

Model architecture: A custom link prediction model using a CamemBERT-large encoder. For each pair of tasks (i, j) in a gamme, the model predicts whether a dependency i → j exists.

Data format:

Each training sample is a gamme (i.e., a sequence of tasks), represented as:jsonCopierModifier{ "lines": ["[PHASE] [equipment] Task description ; DURATION=n", ...], "task_ids": [...], "edges": [[i, j], ...], // known dependencies "phases": [...], "equipment_type": "echangeur" }

Model inputs:

For each task:

  • Tokenized text (via CamemBERT tokenizer)

  • Phase and equipment type, passed both as text in the input and as learned embeddings

  • Link prediction: For each (i, j) pair:

  • Extract [CLS] embeddings + phase/equipment embeddings

  • Concatenate + feed into MLP

  • Binary output: 1 if dependency predicted, 0 otherwise

Dataset size:

988 gammes (~30 tasks each on average)

~35,000 positive dependency pairs, ~1.25 million negative ones

Coverage of 13 distinct work phases, 3 equipment types

Many gammes include multiple dependencies per task

Sample of my dataset : Dataset.jsonl

{

"gamme_id": "L_echangeur_30",

"equipment_type": "heat_exchanger",

"lines": [

"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] WORK TO BE DONE BEFORE SHUTDOWN ; DURATION=0",

"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] INSTALLATION OF RUBBER-LINED PIPING ; DURATION=1",

"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] JOINT INSPECTION ; DURATION=1",

"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] WORK RECEPTION ; DURATION=1",

"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] DISMANTLING OF SCAFFOLDING ; DURATION=1",

"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] INSTALLATION OF SCAFFOLDING ; DURATION=1",

"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] SCAFFOLDING INSPECTION ; DURATION=1",

"[WORK TO BE DONE BEFORE SHUTDOWN] [heat_exchanger] MEASUREMENTS BEFORE PREFABRICATION ; DURATION=1",

[...]

"[END OF WORK] [heat_exchanger] MILESTONE: END OF WORK ; DURATION=0"

],

"task_ids": [

"E2010.T1.10", "E2010.T1.100", "E2010.T1.110", "E2010.T1.120", "E2010.T1.130",

"E2010.T1.20", "E2010.T1.30", "E2010.T1.40", "E2010.T1.45", "E2010.T1.50",

"E2010.T1.60", "E2010.T1.70", "E2010.T1.80", "E2010.T1.90", "E2010.T1.139"

],

"edges": [

[0, 5], [5, 6], [6, 7], [7, 8], [8, 9], [9, 10], [10, 11], [11, 12],

[12, 13], [13, 1], [1, 2], [2, 3], [3, 4], [4, 14]

],

"phases": [

"WORK TO BE DONE BEFORE SHUTDOWN",

"WORK TO BE DONE BEFORE SHUTDOWN",

"WORK TO BE DONE BEFORE SHUTDOWN",

"WORK TO BE DONE BEFORE SHUTDOWN",

"WORK TO BE DONE BEFORE SHUTDOWN",

"WORK TO BE DONE BEFORE SHUTDOWN",

"WORK TO BE DONE BEFORE SHUTDOWN",

"WORK TO BE DONE BEFORE SHUTDOWN",

"WORK TO BE DONE DURING SHUTDOWN",

"WORK TO BE DONE DURING SHUTDOWN",

[...]

"END OF WORK"

]

}

The problem:

Even when evaluating on gammes from the dataset itself, the model performs poorly (low precision/recall or wrong structure), and seems to struggle to learn meaningful patterns. Examples of issues:

Predicts dependencies where there shouldn't be any

Fails to capture multi-dependency tasks

Often outputs inconsistent or cyclic graphs

What I’ve already tried:

Using BCEWithLogitsLoss with pos_weight to handle class imbalance

Limiting negative sampling (3:1 ratio)

Embedding phase and equipment info both as text and as vectors

Reducing batch size and model size (CamemBERT-base instead of large)

Evaluating across different decision thresholds (0.3 to 0.7)

Visualizing predicted edges vs. ground truth

Trying GNN or MLP model : MLP's results were not great and GNN needs edge_index at inference which is what we're trying to generate

My questions:

  • Is my dataset sufficient to train such a model? Or is the class imbalance / signal too weak?

  • Would removing the separate embeddings for phase/equipment and relying solely on text help or hurt?

  • Should I switch to another model ?

  • Are there better strategies for modeling context-aware pairwise dependencies in sequences where order doesn’t imply logic?

  • Any advice or references would be appreciated. Thanks a lot in advance!

$\endgroup$

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.