sorting based on previous sorts

2018-06-03 05:40:45

I'm trying to sort a list of IDs based on a "sorting map" which is an array of (ID1, ID2, timestamp) tuples that determines which IDs should be sorted before other IDs. Here are the rules:

ID1 should be sorted before ID2 .

the timestamp can be used to break ties with newer timestamps beating older timestamps. eg given sorting key (C, A, 1/1/1900), (C, B, 1/1/2000) then B is sorted before A .

there can be cycles eg (A, B, 1/1/1950), (B, C, 1/1/1980), (C, A, 1/1/1900) . The timestamp can can be used to break cycles, with older-timestamped records in a cycle being removed from the sorting map until the cycle is gone

if an ID is not present in the sorting map, it's sorted after any ID that is present in the sorting map

Example: given the sorting map (C, A, 1/1/1900), (C, B, 1/1/2000) , and the list (A, B, C, D) to sort, the sorted output would be (C, B, A, D) .

I'm stumped by turning these rules into an algorithm. Here's what I have so far:

Fetch the most current sorting map from the database. I'll get at most one record for every unique pair of IDs.

Remove cycles from the sorting map. How? Or is it easier to simply ignore cycles as part of step 4?

Transform the sorting map in memory for optimal performance. For example, build a hashtable whose key is each unique ID in the sorting map so I can quickly find all sorting map rows that contain a particular ID.

Sort my array of IDs using a generic binary sorting library using a custom comparison function that accepts any two IDs ID1 and ID2 parameters. The comparison function:

a. Look up all sorting map entries containing ID1 or ID2 using the hashtable from step #3.

b. If I already have a record containing both ID1 and ID2 in the sorting map, stop-- we know which one should be first!

c. If neither ID1 nor ID2 are found in the sorting map, then it's a tie. Return a deterministically arbitrary result (eg lower ID wins).

d. If one ID is in the sorting map but the other isn't, stop. The found one should be sorted first.

e. If we get to here, both IDs are in the sorting map but there is no direct comparison available in the sorting map. Now what?

Performance is not a big concern because the maximum size of the sorting map is under 20K rows and the maximum number of IDs being sorted is under 30.

Got ideas?

FWIW, we'll be using .NET's List<T>.Sort(Comparison<T>) to do the sorting in C#, but the underlying algorithm is obviously language- and platform-agnostic.

If you're curious, here's the real-world need for this algorithm:

Our company builds mobile apps for delivery drivers who every day visit about 20 locations out of a territory of 100-150 total locations they are responsible for. The list of locations each day is dynamically assigned based on inventory of each location. Locations which have low inventory get a delivery of new inventory, while locations that still have enough inventory are not visited.

Drivers are free to visit locations in any order, but they usually take similar routes every day (eg visit locations in the South part of town when traffic is light in the morning, then visit locations in the North part of town when traffic is heavier down South).

We chose not to use 3rd-party routing software which automatically determines the most efficient driving route. Instead we've found it's better to let the driver choose the route because routing software has a hard time with constraints like "that building's loading dock is usually only free before 7AM" or "the guy who needs to sign the delivery receipt leaves early on Fridays" that have a big impact on delivery schedules.

Anyway, we'd like to use the driver's historical choices to sort each day's itinerary in the same order that the driver visited the same locations last time. This will give the driver a nicely arranged itinerary each day that matches his preferences, without him having to manually rearrange the schedule except in unusual cases. This will save the driver a minute or two each day, which adds up over time.

Each historical itinerary is really a list like this (ID1, ID2, ID3, ..., IDN, timestamp) but as an alternative to storing hundreds of past schedules I was thinking it'd be easier to decompose each N-machine historical itinerary into pairs of machines. This means I have to store, at most, N*N-1 tuples because newer orderings always kick older ones out of the sorting map. If this is a bad simplification, let me know. ;-)

What you are looking for is called a Topological sorting. Using that search term you can probably find very good resources.

There is one complication in your specific domain: Cycles (because drivers have behaved inconsistently over time). You're right with the fact that you need to break dependency cycles because otherwise the topological sort would fail.

You also need to break all cycles of length bigger than two.

Let's treat you ID-map as a graph: IDs (places) are nodes. Entries in your map are edges (from place ID1 to place ID2). A simple way to do that would be this:

while true
 allCycles = getListOfAllCycles();
 if (allCycles.length == 0) break;
 breakNode = chooseBreakNode(allCycles); //defined later
 deleteBreakNodeFrom(allCycles);

chooseBreakNode:
 chose the node which has been driven to the least //node is not important
 if ambiguous: chose the node in the dependency graph which is present in the highest number of cycles //breaks multiple cycles at once
 if ambiguous: chose the node which is in the longest cycle
 if ambiguous: pick an arbitrary node

Probably I didn't get chooseBreakNode quite right. It is a heuristic that you can tune to your needs.

I'll propose an alternative approach but let me know if I'm misunderstanding the business need.

Have a table like (DriverId, LocationId, Priority) that stores the relative priority of locations for each driver.

Anytime you need to process a completed itinerary, start from the bottom of the list (the last visited location) and run the following algorithm for each location, going up the list:

If the priority of the location is not already greater than the priority of the location below it, then newPriority = priorityBelow + 1. (If there's nothing below, priorityBelow = 0)

When you're done processing the list, re-normalize the priority points as 1,2,3... (by making the least priority = 1, the second least = 2, and so on)

Then when you need to order a new itinerary, you'd just order locations by their relative priority values for that driver.

Have you considered this approach?

EDIT: Adding example code per comment below.

given 4 historical itineraries: ABCD (newest), ACBE, CBDF, CBDFA (oldest), how would I sort a new itinerary ABCDEF?

static Dictionary<string, int> Priorities = new Dictionary<string, int>();

static void Main(string[] args)
{
    var itineraries = new string[][]{   
        new string[] { "C", "B", "D", "F", "A" },
        new string[] { "C", "B", "D", "F" },
        new string[] { "A", "C", "B", "E" },
        new string[] { "A", "B", "C", "D" } };

    //process past itineraries
    foreach (var itinerary in itineraries)
        ProcessItinerary(itinerary);

    //sort new itinerary
    string[] newItinerary = { "A", "B", "C", "D", "E", "F" };
    string[] sortedItinerary = newItinerary.OrderByDescending(
        x => Priorities.ContainsKey(x) ? Priorities[x] : 1).ToArray();

    Console.WriteLine(String.Concat(sortedItinerary));
    Console.ReadKey();
}

static void ProcessItinerary(string[] itinerary)
{
    itinerary.Reverse().Aggregate((below, above) =>
    {
        int priBelow = Priorities.ContainsKey(below) ?
            Priorities[below] : Priorities[below] = 1;

        if (!(Priorities.ContainsKey(above) &&
            Priorities[above] > priBelow))
            Priorities[above] = priBelow + 1;

        return above;
    });

    //normalize priorities
    // (note: running in reverse so that if priorities tie, 
    //  the older location has higher priority)
    int i = Priorities.Count;
    foreach (var pair in Priorities.OrderByDescending(x => x.Value))
        Priorities[pair.Key] = i--;
}

This would print out: ABCDFE

链接地址: http://www.djcxy.com/p/11144.html

上一篇: gSOAP的/ Valgrind的; 没有泄漏，但内存错误

下一篇: 根据以前的排序进行排序