Revisions to How do AI developers for LLM fix bugs when the LLM misbehaves, given that they cannot control what the LLM learns from the data?

added 178 characters in body

Source Link

edited Dec 17, 2023 at 23:34

119.7k
27
233
369

machine learning software behaviour is unpredictable since the developer cannot control what the software learns from the data.

If you control the model, the data, the input, and the random, then the output will be the same every time.

What you don’t have are feature flags. If you tell your code monkey “This LLM is racist! Fix it!” they won't find a racist flag to set to false or if racist code to remove.

Training data that reflects racism, or whatever problem you don’t like, will spread to all the nodes in a way that keeps the code monkey from reaching in and tweaking it. That’s because this kind or programming isn’t optimized for manual tweaking. It’s optimized to reflect the training data. You fix it with better data.

If you don’t want your kids to swear, don’t swear in front of your kids.

Or you can teach it what swearing is and when it’s inappropriate. It still won’t show up as a feature flag. It’s just more data.

They can also massage your input and censor the output to attempt to sanitize. But once the problem behavior is learned it’s always there, waiting for a new way to sneak out.

machine learning software behaviour is unpredictable since the developer cannot control what the software learns from the data.

If you control the model, the data, the input, and the random, then the output will be the same every time.

What you don’t have are feature flags. If you tell your code monkey “This LLM is racist! Fix it!” they won't find a racist flag to set to false or if racist code to remove.

Training data that reflects racism, or whatever problem you don’t like, will spread to all the nodes in a way that keeps the code monkey from reaching in and tweaking it. That’s because this kind or programming isn’t optimized for manual tweaking. It’s optimized to reflect the training data. You fix it with better data.

If you don’t want your kids to swear, don’t swear in front of your kids.

Or you can teach it what swearing is and when it’s inappropriate. It still won’t show up as a feature flag. It’s just more data.

machine learning software behaviour is unpredictable since the developer cannot control what the software learns from the data.

If you control the model, the data, the input, and the random, then the output will be the same every time.

What you don’t have are feature flags. If you tell your code monkey “This LLM is racist! Fix it!” they won't find a racist flag to set to false or if racist code to remove.

Training data that reflects racism, or whatever problem you don’t like, will spread to all the nodes in a way that keeps the code monkey from reaching in and tweaking it. That’s because this kind or programming isn’t optimized for manual tweaking. It’s optimized to reflect the training data. You fix it with better data.

If you don’t want your kids to swear, don’t swear in front of your kids.

Or you can teach it what swearing is and when it’s inappropriate. It still won’t show up as a feature flag. It’s just more data.

They can also massage your input and censor the output to attempt to sanitize. But once the problem behavior is learned it’s always there, waiting for a new way to sneak out.

English fixes

Source Link

edited Dec 17, 2023 at 18:56

candied_orange

119.7k
27
233
369

machine learning software behaviour is unpredictable since the developer cannot control what the software learns from the data.

If you control the model, the data, the input, and the randomnessrandom, then the output will be the same every time.

What you don’t have isare feature flags. If you tell your code monkey “This LLM is racist! Fix it!” they won't find a racist flag to set to false or if racist code to remove.

Training data that reflects racism, or whatever problem you don’t like, will spread to all the nodes in a way that keeps the code monkey from reaching in and tweaking it. That’s because this kind or programming isn’t optimized for manual tweaking. It’s optimized to reflect the training data. You fix it with better data.

If you don’t want your kids to swear, don’t swear in front of your kids.

Or you can teach it what swearing is and when it’s inappropriate. It still won’t show up as a feature flag. It’s just more data.

machine learning software behaviour is unpredictable since the developer cannot control what the software learns from the data.

If you control the model, the data, the input, and the randomness, then the output will be the same every time.

What you don’t have is feature flags. If you tell your code monkey “This LLM is racist! Fix it!” they won't find a racist flag to set to false or if racist code to remove.

Training data that reflects racism, or whatever problem you don’t like, will spread to all the nodes in a way that keeps the code monkey from reaching in and tweaking it. That’s because this kind or programming isn’t optimized for manual tweaking. It’s optimized to reflect the training data. You fix it with better data.

If you don’t want your kids to swear, don’t swear in front of your kids.

Or you can teach it what swearing is and when it’s inappropriate. It still won’t show up as a feature flag. It’s just more data.

machine learning software behaviour is unpredictable since the developer cannot control what the software learns from the data.

If you control the model, the data, the input, and the random, then the output will be the same every time.

What you don’t have are feature flags. If you tell your code monkey “This LLM is racist! Fix it!” they won't find a racist flag to set to false or if racist code to remove.

Training data that reflects racism, or whatever problem you don’t like, will spread to all the nodes in a way that keeps the code monkey from reaching in and tweaking it. That’s because this kind or programming isn’t optimized for manual tweaking. It’s optimized to reflect the training data. You fix it with better data.

If you don’t want your kids to swear, don’t swear in front of your kids.

Or you can teach it what swearing is and when it’s inappropriate. It still won’t show up as a feature flag. It’s just more data.

English fixes

Source Link

edit approved Dec 17, 2023 at 18:56

tripleee

177
1
10

machine learning software behaviour is unpredictable since the developer cannot control what the software learns from the data.

If you control the model, the data, the input, and the randomrandomness, then the output will be the same every time.

What you don’t have is feature flags. If you tell your code monkey “This LLM is racist! Fix it!” they wontwon't find a racist flag to set to false or if racist code to remove.

Training data that reflects racism, or whatever problem you don’t like, will spread to all the nodes in a way that keeps the code monkey from reaching in and tweaking it. That’s because this kind or programming isn’t optimized for manual tweaking. It’s optimized to reflect the training data. You fix it with better data.

If you don’t want your kids to swear, don’t swear in front of your kids.

Or you can teach it what swearing is and when it’s inappropriate. It still won’t show up as a feature flag. It’s just more data.

machine learning software behaviour is unpredictable since the developer cannot control what the software learns from the data.

If you control the model, the data, the input, and the random then the output will be the same every time.

What you don’t have is feature flags. If you tell your code monkey “This LLM is racist! Fix it!” they wont find a racist flag to set to false or if racist code to remove.

Training data that reflects racism, or whatever problem you don’t like, will spread to all the nodes in a way that keeps the code monkey from reaching in and tweaking it. That’s because this kind or programming isn’t optimized for manual tweaking. It’s optimized to reflect the training data. You fix it with better data.

If you don’t want your kids to swear, don’t swear in front of your kids.

Or you can teach it what swearing is and when it’s inappropriate. It still won’t show up as a feature flag. It’s just more data.

machine learning software behaviour is unpredictable since the developer cannot control what the software learns from the data.

If you control the model, the data, the input, and the randomness, then the output will be the same every time.

What you don’t have is feature flags. If you tell your code monkey “This LLM is racist! Fix it!” they won't find a racist flag to set to false or if racist code to remove.

Training data that reflects racism, or whatever problem you don’t like, will spread to all the nodes in a way that keeps the code monkey from reaching in and tweaking it. That’s because this kind or programming isn’t optimized for manual tweaking. It’s optimized to reflect the training data. You fix it with better data.

If you don’t want your kids to swear, don’t swear in front of your kids.

Or you can teach it what swearing is and when it’s inappropriate. It still won’t show up as a feature flag. It’s just more data.

added 5 characters in body

Source Link

edited Dec 17, 2023 at 14:57

candied_orange

119.7k
27
233
369

Loading

added 137 characters in body

Source Link

edited Dec 17, 2023 at 3:47

candied_orange

119.7k
27
233
369

Loading

deleted 36 characters in body

Source Link

edited Dec 16, 2023 at 20:56

candied_orange

119.7k
27
233
369

Loading

edited body

Source Link

edited Dec 16, 2023 at 20:30

candied_orange

119.7k
27
233
369

Loading

Source Link

answered Dec 16, 2023 at 17:32

candied_orange

119.7k
27
233
369

Loading

Stack Exchange Network

Return to Answer