How does sorting for arrays of arrays differ to multidimensional arrays in awk?

Question

I have approached the a problem to list a set of items, which have components, which in turn have properties in awk.

I have tried to approach the problem in two ways.

1) Define an array list[item-number,component-number][properties].
2) Define an array list[item-number][component-number][properties].

This was in many ways interesting, as I noticed (2) maintain the order of insertion, while (1) does not. I know arrays are associative in awk and it could very well be a coincidence this happened. However, as the order of insertion is important in my case (and also, I want to learn more about awk), I would like to know if this is what happening and why.

Any ideas? BR Patrik

Ed Morton · Accepted Answer · 2020-03-26 14:26:23Z

5

Neither approach retains any information on the order of insertion, if it seems like either does then that is just coincidence. If the order of insertion is important to you then you need to write some code to track that order, e.g.

key = foo FS bar
if ( !(key in list) ) {
    keys[++numKeys] = key
}
list[key] = whatever

would give you an array keys[] of the indices in the order they are inserted and an array list[] that maps each key to it's value so you can later do:

for (keyNr=1; keyNr<=numKeys; keyNr++) {
    key = keys[keyNr]
    print list[key]
}

or similar to print the contents of list[] in the order they were inserted.

edited Mar 26, 2020 at 14:26

answered Mar 26, 2020 at 14:19

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

patrik Over a year ago

I think I will try to use this approach for the order. Unfortunately this seems to ge the Achilles heel to awk. Thanks for the tip!

kvantour Over a year ago

Dear Ed, why do you prefer FS over SUBSEP?

Ed Morton Over a year ago

@patrik you're welcome but it's not an Achilles heel at all. Why track insertion order and use up time and memory for everyone when very few applications need it and it's trivial to write code to support if/when you do want it. Awk is a tiny language that executes extremely fast (typically faster than equivalent C programs) due to it's philosophy of only providing language constructs to do things that are difficult to do with existing constructs and this isn't close to being difficult to do. So it''s a "pro", not a "con".

Ed Morton Over a year ago

@kvantour because, though unlikely, SUBSEP can appear in your input fields while the default FS can't. It's also a couple of chars shorter to type and when you want to print the array indices a blank or other string is easier to see than SUBSEP. If you're using a regexp for FS then the equation changes and then I'd consider using OFS vs SUBSEP.

kvantour Over a year ago

Also important to mention, if the key is generated by floating point numbers, you might want to adjust CONVFMT

Collectives™ on Stack Overflow

How does sorting for arrays of arrays differ to multidimensional arrays in awk?

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related