0

All,

I've recently picked up Python and currently in the process of dealing with lists. I'm using a test file containing several lines of characters indented by a tab and then passing this into my python program. The aim of my python script is to insert each line into a list using the length as the index which means that the list would be automatically sorted. I am considering the most basic case and am not concerned about any complex cases.

My python code below;

newList = []

for line in sys.stdin:
    data = line.strip().split('\t')
    size = len(data)
    newList.insert(size, data)
for i in range(len(newList)):
    print ( newList[i])

My 'test' file below;

2   2   2   2
1
3   2
2   3   3   3   3
3   3   3

My expectation of the output of the python script is to print the contents of the list in the following order sorted by length;

['1']
['3', '2']
['3', '3', '3']
['2', '2', '2', '2']
['2', '3', '3', '3', '3']

However, when I pass in my test file to my python script, I get the following;

cat test | ./listSort.py 
['2', '2', '2', '2']
['1']
['3', '2']
['3', '3', '3']
['2', '3', '3', '3', '3']

The first line of the output ['2', '2', '2', '2'] is incorrect. I'm trying to figure out why it isn't being printed at the 4th line (because of length 4 which would mean that it would have been inserted into the 4th index of the list). Could someone please provide some insight into why this is? My understanding is that I am inserting each 'data' into the list using 'size' as the index which means when I print out the contents of the list, they would be printed in sorted order.

Thanks in advance!

4
  • 3
    Try to replay your algorithm using pen & paper and you will see why the result is wrong. Commented Oct 12, 2017 at 10:52
  • 1
    Also note the "useless use of cat": cat filename | program is the same as program <filename. Commented Oct 12, 2017 at 10:54
  • 2
    Or if your understanding of lists is wrong, then using pen & paper might not help you... The thing is that if you have a list of length N then inserting something at index n > N will just append it to the end of the list. E.g. inserting x into an empty list (N = 0) at index "4" still just results in a list [x], not something like [–, –, –, –, x]. Commented Oct 12, 2017 at 11:02
  • Thank you very much, that explained it better. My understanding of lists was incorrect. Commented Oct 12, 2017 at 11:10

2 Answers 2

4

Inserting into lists work quite differently than what you think:

>>> newList = []
>>> newList.insert(4, 4)
>>> newList
[4]
>>> newList.insert(1, 1)
>>> newList
[4, 1]
>>> newList.insert(2, 2)
>>> newList
[4, 1, 2]
>>> newList.insert(5, 5)
>>> newList
[4, 1, 2, 5]
>>> newList.insert(3, 3)
>>> newList
[4, 1, 2, 3, 5]
>>> newList.insert(0, 0)
>>> newList
[0, 4, 1, 2, 3, 5]

Hopefully you can see two things from this example:

  • The list indices are 0-based. That is to say, the first entry has index 0, the second has index 1, etc.
  • list.insert(idx, val) inserts things into the position which currently has index idx, and bumps everything after that down a position. If idx is larger than the current length of the list, the new item is silently added in the last position.

There are several ways to implement the functionality you want:

  1. If you can predict the number of lines, you can allocate the list beforehand, and simply assign to the elements of the list instead of inserting:

    newList = [None] * 5
    
    for line in sys.stdin:
        data = line.strip().split('\t')
        size = len(data)
        newList[size - 1] = data
    for i in range(len(newList)):
        print ( newList[i])
    

    If you can predict a reasonable upper bound of the number of lines, you can also do this, but you need to have some way to remove the None entries afterwards.

  2. Use a dictionary:

    newList = {}
    
    for line in sys.stdin:
        data = line.strip().split('\t')
        size = len(data)
        newList[size - 1] = data
    for i in range(len(newList)):
        print ( newList[i])
    
  3. Add elements to the list as necessary, which is probably a little bit more involved:

    newList = []
    
    for line in sys.stdin:
        data = line.strip().split('\t')
        size = len(data)
        if len(newList) < size: newList.extend([None] * (size - len(newList)))
        newList[size - 1] = data
    for i in range(len(newList)):
        print ( newList[i])
    
Sign up to request clarification or add additional context in comments.

Comments

1

I believe I've figured out the answer to my question, thanks to mkrieger1. I append to the list and then sort it using the length as the key;

newList = []

for line in sys.stdin:
    data = line.strip().split('\t')
    newList.append(data)
newList.sort(key=len)
for i in range(len(newList)):
    print (newList[i])

I got the output I wanted;

/listSort.py < test
['1']
['3', '2']
['3', '3', '3']
['2', '2', '2', '2']
['2', '3', '3', '3', '3']

2 Comments

You don't need to sort the list every time, only after you've appended all the data.
Yep, I accidentally indented the sort. Edited the answer so that the sort occurs after appending all the data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.