Revisions to How to best manage multilingual information in documents in JSON/MongoDB/Mongoose?

added 212 characters in body; deleted 42 characters in body

Source Link

edited May 12, 2021 at 10:13

82.3k
11
136
202

Comparison of both formats

I understand that you have documents in which you have an English version of the content, and potentially several other languages versions of the same content:

The first format has the advantage of always having an English version. The fact that it’s English is implicit. The inconvenience is that it requires to handle English differently than other languages. This means extra coding.
The second format is more flexible: all the languages are handled equally. You can get English exactly as the other languages, and up to you to get to English if the preferred language is not available. (Depending on your needs, you may consider to mark the original language in the “header”).

From the point of view of the performance, there won’t be a significant difference. With mogoDB you’d upload the whole document anyway. So it’s the same amount of data (perhaps one “language”:“EN” more, but this seems really marginal). And with MongoDB’s internal use of BSON that won’t really make a difference.

A third way ?

What could impact the performance is the array: you’ll have to iterate through the successive elements and test if it’s the right language. This is true for both formats. On the second format, you can put English as first element, to avoid any relative performance impact compared to the first format.

If you need to frequently select by language, you may however consider replacing the array by an objectobject, where each language would be a fieldmember name, and the value would be the translated/localized content:

[
  {
      "_id": "",
      "image": "",
      "slug": "",
      "translation": {
        "EN":{            
          "title": "",
          "description": "",
          "content": ""
        },

        "DE":{            
          "title": "",
          "description": "",
          "content": ""
        }
        …

      }
  }
]

While this may seem less intuitive, you’d benefit from an optimized access to the fieldsmembers/languages thanks to BSON(dictionary level performance for members vs sequential array access, and JS internal optimizationssee for example here for JS).

Now in the end, and whatever my advice, there are so many factors that are to be considered for performance, that you’ll need to do some profiling/benchmarking to validate the hypotheses.

Comparison of both formats

I understand that you have documents in which you have an English version of the content, and potentially several other languages versions of the same content:

The first format has the advantage of always having an English version. The fact that it’s English is implicit. The inconvenience is that it requires to handle English differently than other languages. This means extra coding.
The second format is more flexible: all the languages are handled equally. You can get English exactly as the other languages, and up to you to get to English if the preferred language is not available. (Depending on your needs, you may consider to mark the original language in the “header”).

From the point of view of the performance, there won’t be a significant difference. With mogoDB you’d upload the whole document anyway. So it’s the same amount of data (perhaps one “language”:“EN” more, but this seems really marginal). And with MongoDB’s internal use of BSON that won’t really make a difference.

A third way ?

What could impact the performance is the array: you’ll have to iterate through the successive elements and test if it’s the right language. This is true for both formats. On the second format, you can put English as first element, to avoid any performance impact compared to the first format.

If you need to frequently select by language, you may however consider replacing the array by an object, where each language would be a field, and the value would be the translated/localized content:

[
  {
      "_id": "",
      "image": "",
      "slug": "",
      "translation": {
        "EN":{            
          "title": "",
          "description": "",
          "content": ""
        },

        "DE":{            
          "title": "",
          "description": "",
          "content": ""
        }
        …

      }
  }
]

While this may seem less intuitive, you’d benefit from an optimized access to the fields/languages thanks to BSON, and JS internal optimizations.

Now in the end, and whatever my advice, there are so many factors that are to be considered for performance, that you’ll need to do some profiling/benchmarking to validate the hypotheses.

Comparison of both formats

I understand that you have documents in which you have an English version of the content, and potentially several other languages versions of the same content:

The first format has the advantage of always having an English version. The fact that it’s English is implicit. The inconvenience is that it requires to handle English differently than other languages. This means extra coding.
The second format is more flexible: all the languages are handled equally. You can get English exactly as the other languages, and up to you to get to English if the preferred language is not available. (Depending on your needs, you may consider to mark the original language in the “header”).

From the point of view of the performance, there won’t be a significant difference. With mogoDB you’d upload the whole document anyway. So it’s the same amount of data (perhaps one “language”:“EN” more, but this seems really marginal). And with MongoDB’s internal use of BSON that won’t really make a difference.

A third way ?

What could impact the performance is the array: you’ll have to iterate through the successive elements and test if it’s the right language. This is true for both formats. On the second format, you can put English as first element, to avoid any relative performance impact compared to the first format.

If you need to frequently select by language, you may however consider replacing the array by an object, where each language would be a member name, and the value would be the translated/localized content:

[
  {
      "_id": "",
      "image": "",
      "slug": "",
      "translation": {
        "EN":{            
          "title": "",
          "description": "",
          "content": ""
        },

        "DE":{            
          "title": "",
          "description": "",
          "content": ""
        }
        …

      }
  }
]

While this may seem less intuitive, you’d benefit from an optimized access to the members/languages (dictionary level performance for members vs sequential array access, see for example here for JS).

Now in the end, and whatever my advice, there are so many factors that are to be considered for performance, that you’ll need to do some profiling/benchmarking to validate the hypotheses.

deleted 97 characters in body

Source Link

edited May 12, 2021 at 10:02

Christophe

82.3k
11
136
202

Comparison of both formats

I understand that you have documents in which you have an English version of the content, and potentially several other languages versions of the same content.

The first format has the advantage of always having an English version. It is not clear however what is the purpose of this special handling: is English the original version which is translated into the others? or is English just the mandatory default version ? The fact that it’s English is btw implicit. But this construct require you to handle English differently than the other languages. This means extra coding.

The second version is more flexible: all the languages are handled equally. You can get English exactly as the other languages, and up to you to get to English if the preferred language is not available. The only issue I see is whether or not you could need an explicit field for the original language.

The first format has the advantage of always having an English version. The fact that it’s English is implicit. The inconvenience is that it requires to handle English differently than other languages. This means extra coding.

The second format is more flexible: all the languages are handled equally. You can get English exactly as the other languages, and up to you to get to English if the preferred language is not available. (Depending on your needs, you may consider to mark the original language in the “header”).

From the point of view of the performance, there won’t be a significant difference. With mogoDB you’d upload the whole document anyway. So it’s the same amount of data (perhaps one “language”:“EN” more, but this seems really marginal). And with MongoDB’s internal use of BSON that won’t really make thea difference.

A third way ?

Where itWhat could make a differenceimpact the performance is the array, since: you’ll have to iterate through the successive elements and test if it’s the right language. This is true for both formats. On the second versionformat, nothing prevents you tocan put English as first element, so that in practice there won’t be a difference eitherto avoid any performance impact compared to the first format.

As a general performance topic, ifIf you need to frequently select by language, you may however consider replacing the array by an objectreplacing the array by an object, where each language would be a field, and the value would be the translated/localized content:

While this may seem less intuitive, you’d benefit from an optimized access to the fields/languages thanks to BSON, and JS internal optimizations.

I understand that you have documents in which you have an English version of the content, and potentially several other languages versions of the same content.

The first format has the advantage of always having an English version. It is not clear however what is the purpose of this special handling: is English the original version which is translated into the others? or is English just the mandatory default version ? The fact that it’s English is btw implicit. But this construct require you to handle English differently than the other languages. This means extra coding.

The second version is more flexible: all the languages are handled equally. You can get English exactly as the other languages, and up to you to get to English if the preferred language is not available. The only issue I see is whether or not you could need an explicit field for the original language.

From the point of view of the performance, there won’t be a significant difference. With mogoDB you’d upload the whole document anyway. So it’s the same amount of data (perhaps one “language”:“EN” more, but this seems really marginal). And with MongoDB’s internal use of BSON that won’t make the difference.

Where it could make a difference is the array, since you’ll have to iterate through the successive elements and test if it’s the right language. On the second version, nothing prevents you to put English first, so that in practice there won’t be a difference either.

As a general performance topic, if you need to frequently select by language, you may consider replacing the array by an object, where each language would be a field, and the value would be the translated/localized content:

While this may seem less intuitive, you’d benefit from an optimized access to the fields thanks to BSON.

Comparison of both formats

I understand that you have documents in which you have an English version of the content, and potentially several other languages versions of the same content:

The first format has the advantage of always having an English version. The fact that it’s English is implicit. The inconvenience is that it requires to handle English differently than other languages. This means extra coding.

The second format is more flexible: all the languages are handled equally. You can get English exactly as the other languages, and up to you to get to English if the preferred language is not available. (Depending on your needs, you may consider to mark the original language in the “header”).

From the point of view of the performance, there won’t be a significant difference. With mogoDB you’d upload the whole document anyway. So it’s the same amount of data (perhaps one “language”:“EN” more, but this seems really marginal). And with MongoDB’s internal use of BSON that won’t really make a difference.

A third way ?

What could impact the performance is the array: you’ll have to iterate through the successive elements and test if it’s the right language. This is true for both formats. On the second format, you can put English as first element, to avoid any performance impact compared to the first format.

If you need to frequently select by language, you may however consider replacing the array by an object, where each language would be a field, and the value would be the translated/localized content:

While this may seem less intuitive, you’d benefit from an optimized access to the fields/languages thanks to BSON, and JS internal optimizations.

Source Link

answered May 12, 2021 at 9:51

Christophe

82.3k
11
136
202

I understand that you have documents in which you have an English version of the content, and potentially several other languages versions of the same content.

The first format has the advantage of always having an English version. It is not clear however what is the purpose of this special handling: is English the original version which is translated into the others? or is English just the mandatory default version ? The fact that it’s English is btw implicit. But this construct require you to handle English differently than the other languages. This means extra coding.

The second version is more flexible: all the languages are handled equally. You can get English exactly as the other languages, and up to you to get to English if the preferred language is not available. The only issue I see is whether or not you could need an explicit field for the original language.

From the point of view of the performance, there won’t be a significant difference. With mogoDB you’d upload the whole document anyway. So it’s the same amount of data (perhaps one “language”:“EN” more, but this seems really marginal). And with MongoDB’s internal use of BSON that won’t make the difference.

Where it could make a difference is the array, since you’ll have to iterate through the successive elements and test if it’s the right language. On the second version, nothing prevents you to put English first, so that in practice there won’t be a difference either.

As a general performance topic, if you need to frequently select by language, you may consider replacing the array by an object, where each language would be a field, and the value would be the translated/localized content:

[
  {
      "_id": "",
      "image": "",
      "slug": "",
      "translation": {
        "EN":{            
          "title": "",
          "description": "",
          "content": ""
        },

        "DE":{            
          "title": "",
          "description": "",
          "content": ""
        }
        …

      }
  }
]

While this may seem less intuitive, you’d benefit from an optimized access to the fields thanks to BSON.

Now in the end, and whatever my advice, there are so many factors that are to be considered for performance, that you’ll need to do some profiling/benchmarking to validate the hypotheses.

Stack Exchange Network

Return to Answer

Comparison of both formats

A third way ?

Comparison of both formats

A third way ?

Comparison of both formats

A third way ?

Comparison of both formats

A third way ?

Comparison of both formats

A third way ?