Vol. I — Graph Theory·Chương V·Graph Theory Ngoài Đời Thật

GT - Chapter V

Shortest Path, but Shortest by Which Definition,
when defining the weight is the actual problem

The ETA says 25 minutes, the driver takes 50, and Dijkstra is entirely innocent. In the textbook, weights are given numbers; in production, defining the weight is the problem. The core skill is three decisions made before touching any algorithm: best by which definition, which laws the road graph contains, and where the computational cost gets paid.

Tech · 11 tháng 6, 2026 · 27 phút đọc

The dashboard of a food delivery app, one Friday afternoon. The cancellation curve starts climbing at three o'clock, and by five it no longer deserves the word curve; it is a wall. The operations team filters the cancelled orders, reads the reasons customers typed in, and one reason repeats so often it is practically the only answer: the app said twenty-five minutes, the driver took fifty. Customers are not cancelling because they are hungry. They are cancelling because a promise was broken.

The meeting that follows runs on a script you can probably guess. Someone projects the number onto the screen, someone asks how the system computes ETA, and a familiar sentence gets said: the routing algorithm probably has a bug.

The engineering team does the right thing: they take one specific order, log the road graph at that moment, log the numbers attached to every road segment, and replay the computation step by step. The algorithm returns exactly the minimum-total path on the graph it was given, by the numbers it was handed, ten reruns producing ten identical results. Dijkstra is innocent.

And that innocence is the bad news. A bug can be fixed; it has a stack trace, a line of code to point at. But a system in which every component runs correctly while the result is still wrong by a factor of two, that wrongness has no address. It has to be hunted.

Hunting needs a map. A routing answer stands on three layers. The bottom layer is the graph: which roads exist, where they connect, which directions they allow. The middle layer is the weight: what each segment costs, and in what unit. The top layer is the algorithm: find the smallest total over whatever the two layers below declare. The textbook hands you the bottom two layers, the problem statement compressing them into one line, given a graph G with weight function w, and then spends the whole chapter teaching the top layer. Production is the reverse. The top layer almost never breaks; it is a few hundred lines of code verified over forty years. The bottom two layers are almost never given.

Figure I — The three layers of a routing answer. The textbook hands you the bottom two and teaches the top one. Production is the reverse: the top layer almost never breaks, and the bottom two are almost never given.

Back to the cancelled orders. The weight in that system is segment length divided by the legal speed limit. Which means the word shortest is being defined as fastest across a city in which nobody else lives, no red lights, no rush hour, no rain. Twenty-five minutes is the correct answer to a problem that does not exist.

In the previous chapter, the question stopped at whether something could be reached, and a list of what was reachable was a complete answer. This chapter's question begins right after that: reached by which route, and which word in the phrase best route is the hard one. The distance between shortest path in the textbook and routing in production is not a better algorithm. It is that in the book, the word short is defined for you. In life, that definition is your job.

“
The shortest-path algorithm is always right. What is wrong is the definition of the word short.
”

The Coefficient Table Nobody Dares Delete

Before describing a different way, fairness demands a look at the way nearly every team actually goes, because it is not stupid.

The trajectory begins very reasonably. Take an existing routing library or engine, OSRM say, with the default weight of distance or time at the legal speed limit per road class. Running within a sprint, a smooth demo, and on an empty city at ten in the morning the results are genuinely good. Any^° team walking this road is doing the most reasonable thing possible with the information and time they have.

Then the errors appear, and the natural reflex appears with them. ETA runs systematically below reality, so multiply everything by one point three. Rush hour is worse, so after five p.m. the factor becomes one point six. The city center is wrong in a different way than the suburbs, so each district gets its own factor. Rain adds fifteen percent. Every coefficient improves the average on the dashboard, and that is exactly why this trajectory survives so long: it appears to be converging. With every patch, the average error shrinks a little, and the feeling of being just one coefficient away never goes away.

a pause

In the system you operate, how many coefficients are there that nobody remembers were added to compensate for which variable of reality?

Think about it for a moment.

The first breaking point is technical. A multiplicative coefficient fixes proportional error; it cannot fix structural error. Look at two routes between the same pair of points. The first is shortest in meters, passing through three unprotected left turns, each one a wait for a gap to open in oncoming traffic. The second is eight hundred meters longer, runs along a main road, turns right twice. The second is faster at almost every hour of the day. But the algorithm already chose the first route, based on weights measured in meters, and only then was the coefficient multiplied onto the travel time of the wrongly chosen road. No coefficient multiplied onto the wrong route produces the right route. The error happens before the coefficient gets a chance to intervene, at the moment of comparing and choosing.

The second breaking point runs deeper, and it is why I tell this section with respect instead of mockery. Every coefficient is a genuine piece of knowledge about reality, compressed into a dead number. Multiplying by one point six after five o'clock is a way of saying travel time depends on the hour of the day, without letting the model^° know it. Adding fifteen percent in rain is a way of saying weather is a variable, while hiding the variable from the model^°. The trouble with this encoding shows when the variables meet: rain plus rush hour plus city center multiplies how exactly, add first or multiply first, and nobody can answer because the coefficients were never designed to meet. Eventually the coefficient table becomes a thing nobody dares delete a single row of, because nobody remembers what the row compensates for, and deleting it experimentally makes the dashboard worse in a corner nobody expected.

The coefficient table is knowledge about reality encoded where the model cannot read it. So the work is not tuning coefficients more skillfully; it is moving each variable to where it belongs, into the weight. That means admitting something the textbook's problem statement hid: the weight is not a number that arrives with the map. The weight is a model, and a model has to have a designer.

Weight Is a Model, Not Data

The problem statement in the book says: given a graph G with weight function w. The four words given a weight function are where the entire difficulty of production is hidden, neatly, like a rug over a hole. Out in the world, nobody gives you w. You have to answer what w is, in what unit, depending on what, and that answer determines the solution more than every algorithmic choice after it combined.

To say plainly what you already know and set it aside: Dijkstra, A*, and their variants are the most trustworthy layer of the whole system, rigorously proven, carefully implemented in every respectable library. They are almost never where you earn your keep. The three decisions below are.

Best by Which Definition

The first decision, and the heaviest: choose the quantity the word best is measured in, then admit that the quantity is a function, not a constant. The same road segment takes different times at six in the morning and six in the evening, differs between weekdays and Sundays, between sun and rain. And there is a dimension spoken of less: uncertainty. Some roads are fast on average but high in variance, flowing on ordinary days, but one minor collision and everything stands still for half an hour, because there is no escape in the middle.

That uncertainty dimension leads to what I consider this decision's most valuable observation. The business says fastest, but their behavior is usually buying most reliable. Customers of a delivery app do not cancel because the ETA is long. They cancel because the ETA is wrong. A thirty-five minute promise kept retains the customer; a twenty-five minute promise broken loses them, even though twenty-five is less than thirty-five. Which means the true objective function^° is not expected time; it is the trustworthiness of the promise. Those two functions produce two different routes: one prefers roads fast on average, the other prefers roads with few surprises.

When there is more than one criterion, fast, stable, fuel-cheap, few left turns, you have two honest options. One is to fold them into a single scalar function with explicit weights, written on paper, readable by everyone, three parts time to one part variance, say. The other is to accept that best does not exist uniquely, only solutions not completely dominated by any other (optimization people call this the Pareto front), and choosing among them is a human's job, not the algorithm's. The dishonest option is the most common one: leaving the criteria implicit, one private version per head, and letting the algorithm optimize something nobody has defined.

a pause

For the problem you are thinking of, try writing this sentence out in full: route A is better than route B if and only if. Can you finish it, and would the person paying for the system sign it?

Think about it for a moment.

The check for this decision sits inside that very question. If the if-and-only-if sentence cannot yet be written, every tuning effort downstream is optimizing something undefined, and your dashboard is measuring the progress of a walk in the wrong direction.

A Graph of Laws, Not of the Map

The second decision: the road graph must be a model of the laws of movement, not the geometry of the map. The two look alike enough to be mistaken for one another. The map says there is a road here. The law says who may drive it, in which direction, at which hours, and what it costs to move from this road onto that one.

Part of the law lives comfortably on the familiar graph structure. A one-way street means an edge with one direction. A truck ban by hour means an edge that exists conditionally; for trucks it vanishes from the graph during the banned window. But there is one kind of cost that can live nowhere at all in a graph drawn from the map: the cost of turning. An unprotected left turn at a busy intersection costs a full minute on average, and that minute belongs neither to the road before the intersection, nor to the road after it, nor to the geometric dot called the intersection. It is the cost of moving from one edge onto another, a thing the current model has no place to hold.

And here chapter three's tool comes back to work. When a real cost cannot attach to any existing node or edge, that is the signal that the model is missing a kind of entity^°, and the answer is to split the relationship into an entity^°: reification^°. The intersection stops being a dot. It becomes a small cluster of nodes, one per incoming and outgoing direction, and each way of passing through, straight, right, left, becomes an internal edge with its own weight. Turning left at intersection X is now an edge, and that waiting minute has a home. Worth pausing on for a beat: the map did not change by a meter, the city is the same city, but the graph changed shape entirely. That is chapter three's thesis running at full current: the model is cut along the question, not along the geometry.

Figure II — Intersection X after reification. The geometric dot becomes a cluster of nodes split by incoming and outgoing direction, and each way through it is an internal edge with its own weight. The left-turn minute, which had no home in the map-drawn graph, now has one.

The check for this decision can be done this very afternoon. Take three real movement laws of your city, one one-way street, one banned turn, one time-restricted zone, and point to where each law lives in your system's graph. Any law without a home is a wrong result waiting for its detonation day, and the later part of this chapter shows what the detonation looks like.

Pay Up Front or Pay per Question

The third decision is about money, in the sense of computational cost. On a graph the size of a city or larger, nobody runs plain Dijkstra from zero for every query. The operational question is how much to pay up front, in preprocessing, so that each query, asked while a user is waiting, only has to pay the remainder.

If you read the previous chapter, this trade-off smells familiar; it is kin to the materialization story, trading memory and preparation work for read speed, with the true price being that everything precomputed must be recomputed when the data changes. At the level of principle, that is all that needs saying. The concrete mechanics, what gets precomputed and why it is so fast, deserve to be told through a real system, and that is the job of the very next section.

The check is still the previous chapter's division: query frequency over data-change frequency. What is new in this chapter is that at the scale of global routing, the division produces a number so large the answer is practically forced, and how people cope with the fast-changing part of the data is where the real lesson lives.

Three decisions, and notice, not one of them is an algorithm. The heaviest is the first, because it is where the mathematics touches what the business actually wants, and it is the only place where a wrong answer looks exactly like a right one on a dashboard. The three systems below were each tested to the limit by one of these three decisions.

The Price of a Millisecond

A continental road graph has tens of millions of intersections, and a global mapping service receives daily routing queries at a scale where hundreds of millions is the modest way to say it. A user types two places three thousand kilometers apart and waits no longer than a breath. Plain Dijkstra on a graph that size, for one transcontinental query, is a seconds-to-minutes problem depending on hardware. The gap between what the user waits for and what the textbook algorithm delivers is about three orders of magnitude, and three orders of magnitude cannot be closed with faster machines.

It is closed with an intuition every veteran driver has. Going far, get on the big roads. Nobody drives from Hanoi to Da Nang by weighing every alley of every town along the way; you take your alley out to the highway, ride the highway for most of the journey, and only dive back into alleys at the far end. The contraction hierarchies family of techniques industrializes exactly that intuition. During preparation, the system ranks roads into tiers by their role in long journeys, then precomputes shortcuts, each shortcut an artificial edge summarizing a whole stretch of journey across minor-road territory, carrying a correctly computed total weight. At query time, the algorithm climbs from both ends up into the high tiers and meets there, instead of crawling through every intersection of a continent. The result is still the correct path by the weights; it is just that most of the additions were done in advance.

One note belongs here, for tidiness. The importance tier of a road in this technique is a tool for creating shortcuts for fast queries, a ranking in service of computation. It is not a serious claim about which roads matter to the network; the serious version of the question of importance is a different kind of question, and its turn has not come in this chapter.

Now the part that earns its keep: the price. Every shortcut stands on the weights as of preparation time. But a good weight, as the first decision said, depends on traffic, and traffic changes by the minute. Recomputing the preprocessing for a continent cannot run every minute. Those two statements cannot both hold over a single block of data, so the mature solution is to stop treating it as a single block. Cut the data along its rate of change. A static layer, the network's shape and the laws of movement, things that change weekly or monthly, gets deep, expensive preprocessing. A dynamic layer, the current state of traffic, gets applied at query time or prepared shallowly on a short cycle. This is the previous chapter's trade-off grown one level older: the question is no longer pay before or pay after for the whole system, but cut the data along the rate-of-change boundary and answer separately for each layer.

And that cut-by-rate-of-change principle is the transferable part, even if you never touch a continent-sized graph. An inner-city logistics team also has static data, the city's road network and laws, and dynamic data, today's congestion. Their preprocessing decision should also cut along that boundary, instead of choosing one answer for two kinds of data living at two different speeds.

The Optimal Solution the Drivers Would Not Drive

The second system was tested at the first decision, and tested in a place no optimization textbook teaches.

UPS: every delivery driver makes more than a hundred stops a day, and the order of the stops is a variable. This is no longer A to B. The number of ways to order a hundred stops exceeds the enumerating capacity of every computer that has ever existed, and at the scale of tens of thousands of routes a day, every percent of distance saved is an enormous number once multiplied out. ORION, UPS's route optimization system, was born for that problem, took roughly a decade from research to full fleet coverage, and UPS's public statements cite savings of hundreds of millions of dollars and hundreds of millions of driving miles per year.

A boundary note before going on, for tidiness. The problem ORION optimizes, a relative of the traveling salesman with real-world constraints, is a different class from shortest path, with its own books and its own profession; this chapter does not teach it. But it still stands on exactly this chapter's layers, a road graph and a definition of cost, with a combinatorial layer stacked on top. And the reason it is here is not that combinatorial layer. The reason is what happened after the optimal solution was found.

The routes ORION proposed often looked absurd to the person behind the wheel. It told a driver to skip a delivery right across the street and loop back two hours later; it drew arcs where the human eye saw only waste. By the objective function, the solution was correct, every number adding up for a reason. To the eyes of someone who had driven that route for ten years and knew half the customers by name, the solution looked stupid. And drivers did what people do when handed an order that looks stupid: they did not trust it, they drove their own way, and every percent of on-paper optimality evaporated right at the truck's doorstep.

This is the moment the first decision returns wearing a new face. The ORION team came to see that their objective function was missing a variable, one measured neither in miles nor in minutes: the acceptance of the person executing the plan. An optimal solution nobody drives has a real-world value of zero, which means drivability has to live inside the definition of the word good from the start, not in the training materials at rollout. ORION's subsequent adjustments, at the level the public sources tell it, all orbit this realization: making routes more explainable and consistent even at the cost of a few percent of optimality, leaving drivers a margin of adjustment instead of dictating every step, and investing in explaining why a route that looks roundabout is in fact faster. The word best was redefined, from shortest in miles to shortest within the set of solutions human beings are willing to drive.

Set ORION beside the cancelled orders at the start of the chapter and you see they are one error wearing two outfits. A twenty-five minute ETA, optimal on a city that does not exist; the error lives between the model and the asphalt. An optimal route on a fleet that does not exist, where drivers are eyeless executing machines; the error lives between the model and trust. Both are objective functions that forgot a piece of the world, and neither can be fixed by a better algorithm.

“
An optimal solution that cannot be explained is a solution that does not get used, and drivability belongs inside the definition of good, not inside the training materials.
”

When Static Weights Run Out

There is one more end to the first decision, the far end, where even the most carefully designed weight function runs out of breath.

Following this chapter's spirit faithfully, you would build a time function that depends on hour of day, then add day of week, then weather, then events, vehicle type, the habits of the individual driver. Every variable is real and worth including. But the number of interactions between variables grows combinatorially, rain at rush hour is not rain at midnight, and past a certain complexity, the coefficient table nobody dares delete returns, this time in a more sophisticated costume: a hand-designed function with hundreds of parameters nobody dares touch. The same disease, a new house.

Modern ETA systems handle this exhaustion point with a move that sounds grand but is structurally humble: treat time estimation as a prediction problem and let a model learn from real trip data. The detail most valuable to this chapter is that the system does not throw the graph away. In the architecture Uber published for DeepETA, the routing layer on the road graph still runs first, producing the route and a raw ETA correct in this chapter's sense, and the machine learning layer stands behind it, correcting the raw ETA with signals the graph cannot see. The graph provides structure; the learned model fills in the numbers. Skeleton and soft tissue of the same body.

Put back into this chapter's language, this is still the first decision, defining the weight, except the answer has evolved across three generations: a dead number, a hand-designed function, a learned function. That evolutionary axis erases neither of the other two decisions: the graph still has to contain the laws of movement, and the computational cost still has to be paid somewhere, before or after. As for how deeply graphs and machine learning interleave, that story is longer than a section, and it will return when this series reaches the place where language models meet graphs. Here one observation suffices: they do not replace each other. They divide the work.

Two Ways It Breaks, and One Buy-or-Build Question

The last part of the chapter is for judgment, starting with the two failures I have seen repeat most, and ending with advice whose flavor is the exact opposite of the previous chapter's.

The first failure: optimizing the measurable number instead of the number customers care about. It looks like this: the average-ETA dashboard improves quarter after quarter, the engineering team has charts to show in every review, and the cancellation rate does not move, so everyone meets weekly to ask each other why. The why sits in the first decision being made the convenient way: the expectation is easy to measure, easy to baseline, easy to plot, while the trustworthiness of a promise is an order of magnitude harder, so the team optimizes the expectation while customers punish the variance, two numbers living two separate lives. How to catch it: regularly set the number being optimized next to the behavioral numbers, cancellations, complaints, drivers abandoning routes, and if the two curves have not spoken to each other across several quarters, your objective function is talking to itself.

The second failure: a wrong graph making a correct algorithm produce wrong results. It looks like this: the proposed route drives the wrong way down a street that became one-way last month, sends a truck into an hour-restricted street, or confidently crosses a bridge that closed for repairs three weeks ago. Users do not say your graph is stale; users say this app is stupid, and they are right in the only way users need to be right. The why has two layers: the first is the graph rotting against real time, kin to the previous chapter's snapshots and film; the second is nastier, errors of the data layer wearing the uniform of errors of the algorithm layer, so users report stupid directions and the engineering team debugs the algorithm for weeks, exactly the trap the investigation at the start of this chapter defused. How to catch it: separate the tests by layer, one suite for the question does the graph reflect current law, checked against official road data sources and user reports, running independently of every algorithm test; and measure the age of the graph data as a first-class operational metric, not a footnote.

And this chapter's when-to-deviate has the opposite flavor of the previous chapter's, a flavor worth stating plainly. For reachability^°, writing your own recursive CTE^° is healthy; the previous chapter defended that. For routing on real roads, building your own is almost always wrong. The road graph of the entire world, real-time traffic data, the movement laws of every city changing weekly with every administrative decision: that is the lifetime product of teams of hundreds, and routing services sell exactly that through an API. The single test: is routing your core business, meaning does your company's competitive advantage lie in finding better routes than your competitors? Ride-hailing and food delivery at scale, last-mile logistics, the answer may be yes, and it is precisely those companies that hold proprietary trip data dense enough to build better weights than a general-purpose provider. Everyone else: almost certainly no.

But buying does not exempt you from the three decisions. The API returns good routes by their definition, with their weights, on their graph. You still have to know what your own definition of good is in order to choose the right parameters, still have to check whether their graph contains the local laws your business collides with, whether your trucks get routed into hour-restricted streets. This entire chapter, even if you never build a routing engine in your life, keeps its full value in exactly that place: it is the list of questions to ask your vendor. Buying the solution still means owning the problem.

Return to the Friday-afternoon dashboard once more. The system is different now: the weight is a function of the hour and of the rain, the left turn at the intersection has its own price, the ETA promises thirty-five minutes and keeps thirty-five minutes, and the Friday cancellation curve looks like a curve again. The question from A to B can be considered answered, and answered trustworthily.

Then look at the whole city at rush hour, not at any single trip anymore but at all of them at once, and you will see something no A-to-B query can display. There are intersections that every good route wants to squeeze through, thousands of optimal solutions of thousands of different trips all betting on the same bridge. When that bridge chokes, it is not one route that slows down; the whole city changes shape. Comparing routes is one thing. How much weight a position in the network carries, that is a kind of question this chapter does not yet have a name for.

■

One sentence to remember

The shortest-path algorithm is always right. What is wrong is the definition of the word short.

from Shortest Path, but Shortest by Which Definition

the author’s choice · not an algorithm

Tiếp tục trong Graph Theory Ngoài Đời Thật

← Chương trước

Chương IV

Traversal và reachability: câu hỏi "cái gì chạm tới cái gì"

Chương tiếp →

Chương VI

Centrality, ai quan trọng trong mạng lưới

or xem toàn bộ tập

A quiet word

What did you take away from this?

Not published. Never shown to other readers.

In this neighborhood

hand-picked by the author

Opens the door

Traversal and Reachability: What Touches WhatTech · 27 min

“The previous chapter taught the question "can it be reached" and closed on an observation: the list of what is reachable says nothing about how. This chapter opens exactly into that gap, and discovers that the hardest word in "best route" is not route. It is best.”

→

The same thread

Nodes and Edges Are Design DecisionsTech · 24 min

“Chapter three taught that a model is a decision and gave reification its name. This chapter runs that principle at the weight layer, and the turn penalty is the first time reification gets used as an operational tool: a real cost with nowhere to live forces the intersection to stop being a dot.”

→

If you’d rather wander, the full archive is here.

← Traversal và reachability: câu hỏi "cái gì chạm tới cái gì"Kếtin →Đường đi ngắn nhất, nhưng "ngắn" theo nghĩa nào →