If I understand you correctly, you are trying to create a kind of visual novel system, where lines of text are accompanied by images.
In that case it would make sense to declare a struct type that contains both the line and (references to the) image(s) you want that line of text to be accompanied by. So you have both pieces of information together and handle them as one piece of data.
Example:
public class Dialog : MonoBehaviour
[Serializable]
struct DialogLine {
public string text;
public Sprite image;
}
public DialogLine[] lines;
[...]
void Update()
{
if (Input.GetMouseButtonDown(0))
{
currentLineIndex++;
yourUiImage.sprite = line[currentLineIndex].image;
yourUiText.text = line[currentLineIndex].text;
}
}
}
Due to the [Serializable] attribute, you will get an automatic editor for your lines array in the inspector of this game object. So you can write your dialogue directly in the inspector by adding new elements to that array, entering text and assigning sprite assets via drag&drop.
This might be sufficient for a small game or prototype, or one where the cutscenes remain linear and simple. But if you intent to create a more ambitioned project, then you might want to consider how you can streamline your workflow further. Especially when you need more advanced things like branching dialogues, variables embedded in text or programming logic embedded within dialogues. In that case the solution I presented here might quickly turn ugly.
One more expandable solution would be to read the lines from a text file in a scripting language. You could invent and implement your own language for that (been there, done that, went temporarily insane), but you will probably save a lot of time and sanity by using an existing solutions for that like Ink or Yarnspinner. Or you could go for a complete kitchen-sink-included visual novel system like Naninovel.