Revisions to When to use a dictionary vs tuple in Python

added 1 character in body

Source Link

edited Nov 20, 2017 at 15:56

119.7k
27
233
369

{"filename": "blabla", "size": 123}, or just ("blabla", 123)

This is the age old question of whether to encode your format / schema in-band or out-of-band.

You trade off some memory to get the readability and portability that comes from expressing the format of the data right in the data. If you don't do this the knowledge that the first field is the file name and the second is the size has to be kept elsewhere. That saves memory but it costs readability and portability. Which is going to cost your company more money?

As offor the immutable issue, remember immutable doesn't mean useless in the face of change. It means we need to grab more memory, make the change in a copy, and use the new copy. That's not free but it's often not a deal breaker. We use immutable strings for changing things all the time.

Another consideration is extensibility. When you store data only positionally, without encoding format information, then you're condemned to only single inheritance, which really is nothing but the practice of concatenating additional fields after the established fields. I can define a 3rd field to be the creation date and still be compatible with your format since I define first and second the same way.

However, what I can't do is bring together two independently defined formats that have some overlapping fields, some not, store them in one format, and have it be useful to things that only know about one or the other formats.

To do that I need to encode the format info from the begining. I need to say "this field is the filename". Doing that allows for multiple inheritance.

You're probably used to inheritance only being expressed in the context of objects but the same ideas work for data formats because, well, objects are stored in data formats. It's exactly the same problem.

So use whichever you think you're most likely to need. I reach for flexibility unless I can point to a good reason not to.

{"filename": "blabla", "size": 123}, or just ("blabla", 123)

This is the age old question of whether to encode your format / schema in-band or out-of-band.

You trade off some memory to get the readability and portability that comes from expressing the format of the data right in the data. If you don't do this the knowledge that the first field is the file name and the second is the size has to be kept elsewhere. That saves memory but it costs readability and portability. Which is going to cost your company more money?

As of the immutable issue, remember immutable doesn't mean useless in the face of change. It means we need to grab more memory, make the change in a copy, and use the new copy. That's not free but it's often not a deal breaker. We use immutable strings for changing things all the time.

Another consideration is extensibility. When you store data only positionally, without encoding format information, then you're condemned to only single inheritance, which really is nothing but the practice of concatenating additional fields after the established fields. I can define a 3rd field to be the creation date and still be compatible with your format since I define first and second the same way.

However, what I can't do is bring together two independently defined formats that have some overlapping fields, some not, store them in one format, and have it be useful to things that only know about one or the other formats.

To do that I need to encode the format info from the begining. I need to say "this field is the filename". Doing that allows for multiple inheritance.

You're probably used to inheritance only being expressed in the context of objects but the same ideas work for data formats because, well, objects are stored in data formats. It's exactly the same problem.

So use whichever you think you're most likely to need. I reach for flexibility unless I can point to a good reason not to.

{"filename": "blabla", "size": 123}, or just ("blabla", 123)

This is the age old question of whether to encode your format / schema in-band or out-of-band.

You trade off some memory to get the readability and portability that comes from expressing the format of the data right in the data. If you don't do this the knowledge that the first field is the file name and the second is the size has to be kept elsewhere. That saves memory but it costs readability and portability. Which is going to cost your company more money?

As for the immutable issue, remember immutable doesn't mean useless in the face of change. It means we need to grab more memory, make the change in a copy, and use the new copy. That's not free but it's often not a deal breaker. We use immutable strings for changing things all the time.

Another consideration is extensibility. When you store data only positionally, without encoding format information, then you're condemned to only single inheritance, which really is nothing but the practice of concatenating additional fields after the established fields. I can define a 3rd field to be the creation date and still be compatible with your format since I define first and second the same way.

However, what I can't do is bring together two independently defined formats that have some overlapping fields, some not, store them in one format, and have it be useful to things that only know about one or the other formats.

To do that I need to encode the format info from the begining. I need to say "this field is the filename". Doing that allows for multiple inheritance.

You're probably used to inheritance only being expressed in the context of objects but the same ideas work for data formats because, well, objects are stored in data formats. It's exactly the same problem.

So use whichever you think you're most likely to need. I reach for flexibility unless I can point to a good reason not to.

edited body

Source Link

edited Nov 20, 2017 at 14:32

candied_orange

119.7k
27
233
369

{"filename": "blabla", "size": 123}, or just ("blabla", 123)

This is the age old question of whether to encode your format / schema in band-band or out of band-of-band.

You trade off some memory to get the readability and portability that comes from expressing the format of the data right in the data. If you don't do this the knowledge that the first field is the file name and the second is the size has to be kept elsewhere. That saves memory but it costs readability and portability. Which is going to cost your company more money?

As of the immutable issue, remember immutable doesn't mean useless in the face of change. It means we need to grab more memory, make the change in a copy, and use the new copy. That's not free but it's often not a deal breaker. We use immutable strings for changing things all the time.

Another consideration is extensibility. When you store data only positionally, without encoding format information, then you're condemned to only single inheritance, which really is nothing but the practice of concatenating additional fields after the established fields. I can define a 3rd field to be the creation date and still be compatible with your format since I define first and second the same way.

However, what I can't do is bring together two independently defined formats that have some overlapping fields, some not, store them in one format, and have it be useful to things that only know about one or the other formats.

To do that I need to encode the format info from the begining. I need to say "this field is the filename". Doing that allows for multiple inheritance.

You're probably used to inheritance only being expressed in the context of objects but the same ideas work for data formats because, well, objects are stored in data formats. It's exactly the same problem.

So use whichever you think you're most likely to need. I reach for flexibility unless I can point to a good reason not to.

{"filename": "blabla", "size": 123}, or just ("blabla", 123)

This is the age old question of whether to encode your format / schema in band or out of band.