Python Sorting

the-coder · November 8, 2020

Python by default uses TimSort. So when you sort a list of files, say you have the following files.

1.txt
2.txt
3.txt
4.txt
5.txt
...
...
10.txt
11.txt
12.txt

If you sort it with python’s default sort function, the result would be the following.

>>> import os
>>> flist = os.listdir('.')
>>> flist.sort()
>>> flist
1.txt
10.txt
11.txt
12.txt
2.txt
3.txt
...
9.txt

It is because the file names are sorted according to their position in ASCII table or in other words, they’re sorted by comparing their ASCII code. > More info on Ascii Table <

It makes sense for the computer to sort it this way, but it doesn’t make sense for a human. Since 10.txt definitely comes after 2.txt and the like. It doesn’t feel natural or sequential (hence the natural sort is named). Therefore if we want to sort it naturally, we would have to use some parameters in sort() calls.

Help on built-in function sort:

sort(...)
    L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
    cmp(x, y) -> -1, 0, 1

So according to the documentation above, you can probably conclude that you could use “key” parameter. It is used to provide the sorting algorithm alternative set of keys to sort by.

To correctly sort the text file list from 1.txt to 12.txt, what you will need to do is just take the number from the file name. And pass it as key to the sort call.

numlist = [int(x[:x.index('.')]) for x in flist]

First it loops through the flist using x as the loop variable. Then it splice x from the beginning to where it found the .. And it converts it into int, and it’s in a list. This is just to demonstrate how we can get just the file name numbers from a list of filenames.

flist.sort(key=lambda x: int(x[:x.index('.')]))

Now let’s think about it for a moment. What kind of sane person would want ascii ordering/sorting? It doesn’t make sense for any one to sort their files or directories that way. And besides, it’s not 90s anymore, where programming languages are arcane and esoteric. Almost all programming languages’s default sorting algorithms are guaranteed to be ascii sort.

References AKA good reads on this topic

(CodingHorror) Sorting For Humans: Natural Sort Order ASCIIbetical != Alphabetical Ascii Table

Twitter, Facebook