I had a discussion with a colleague on how to persist a sorted (linked?) list in MySQL, and would like to write down my thoughts on this since this is likely a problem I will face in the future. There must be a better way of thinking about this than I’ve come up with. If you have an idea, don’t hesitate to send me an email.
Problem description
We have a sorted list of a generic length. It should be possible to reorder the items in a generic fashion (e.g you can drag and drop the items to change the list order from the GUI). This means supporting both swapping items and inserting items between two existing items. Then list should also gracefully handle growing and shrinking.
As you can hear from the description, this sounds like a linked list. This is a trivial data structure to implement in a programming language, but it does not map well to a RDBMS. It seems to map more gracefully to a NoSQL structure, but to my limited knowledge that would require resaving the whole list every time you do a change, which might not scale well.
Ideas
I don’t have many ideas, but these are the thoughts I’ve had. I hope to add more here in the future, since you can basically summarize it as “I don’t have a good idea yet”, and I feel that there must be a better solution out there.
Idea 1: Integer-order column
The simplest solution, which probably comes immediately when thinking of the problem above is creating a column with a sort index. E.g. with three items [{“One”, “Two”, “Three”}], the table looks like this:
Data | Sort order |
---|---|
One | 0 |
Two | 1 |
Three | 2 |
Thoughts:
- Appending to the end is easy, you can simply just add previous +1 (-> 3, 4, 5).
- Inserting before the first object is just as easy if you are using a signed integer, you simply walk backwards (-> -1, -2, -3).
Inserting between two items is more problematic.
Inserting a value between One and Two causes us having to chose a direction (negative or positive) and change all values in that direction by one value.
To avoid doing this too often, a good idea would be not inserting them right next to each other. E.g. inserting them as 0, 1000, 2000, … .
When inserting value c
between two values a
and b
you insert it at (a+b)/2
, keeping the space between the elements.
If you do this, you will still need to be on watch for when b = a + 1
,
and in that case run a “fragmentation”, where you spread out the items in a range around the high density location.
Unsolved problems:
- When should a “fragmentation” be run? As a maintenance job or when inserting? At what density? And how many items should join the fragmentation?
- What is a good initial distance between items?
Idea 1.5: Floating-order column
A variant on the above solution, but instead of using integers, we use a floating point.
I have actually seen this used in a large application, which surprised me.
Maybe it’s better since it’s more “natural” to give items the numbers 1, 2, 3, etc and then doing the same c=(1+2)/2
function?
At a first glance, this seems “more natural” compared to how humans think, but it also introduces unnecessary complexity.
You have to know the implementation details of the floating point value in the database to be able to measure density in a table location.
And you have to make sure how the value behave when comparing them.
And you need a reliable way of knowing when c=(a+b)/2
becomes a value which the database will se as equal to a
or b
.
My thought on this idea is that it’s probably a worse way of doing Idea 1, since it adds complexity but does not seem to solve any of its issues.
Idea 2: A linked list
Note that this is probably a very bad idea, but in theory, and depending on how you want to use it, it should actually be possible to persist the list as a linked list in the database. For example:
Id | Data | Previous | Next |
---|---|---|---|
1 | One | 0 | 1 |
2 | Two | 1 | 3 |
3 | Three | 2 | 0 |
Obviously, this moves all problems from inserting (which is now very simple) to reading. Sadly, this is untenable, since in all systems I’ve coded in, writing is much much less common than reading. You will probably need to load the whole table, even if you only want to show a part of the table. And you will need to sort in application memory.