-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Feat/0-cost hops for favorite routers #7992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/0-cost hops for favorite routers #7992
Conversation
…ation - Preserve hop_limit when both local device and previous relay are routers/CLIENT_BASE - Only preserve hops for favorite routers to prevent abuse - Apply to both FloodingRouter and NextHopRouter - Update hop counting logic in MeshService for router-to-router communication This allows routers to communicate over longer distances without consuming hop limits, improving mesh network efficiency for infrastructure nodes.
|
|
This reverts the protobufs submodule back to a84657c22 to remove unintended changes from this branch.
|
This is a really neat idea for extending the range of the higher bandwidth presets. I do think it should probably be gated for presets below the LONG_FAST link budget. |
GUVWAF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not totally against the idea, but there is a problem with this, namely that in several places the firmware relies on the hop limit decrementing at each node for routing decisions. For example:
firmware/src/mesh/FloodingRouter.cpp
Line 30 in 09de0e3
| bool isRepeated = p->hop_start > 0 && p->hop_start == p->hop_limit; |
And:
firmware/src/mesh/NextHopRouter.cpp
Line 80 in 09de0e3
| (p->hop_start != 0 && p->hop_start == p->hop_limit && |
When not decrementing the hop limit, it thinks the packet was received directly.
Furthermore, it messes with the “hopsAway” counter and also traceroutes that include “unknown” nodes.
That said, I believe meshes in metro regions would benefit from #5534 and I would give higher priority in getting that one merged than this more pervasive PR.
But how do you know that at the point where it is ready to rebroadcast (after scanning the channel and random delays due to CSMA/CA), it hasn’t received any duplicates with a higher hop limit?
While indeed the impact of a higher hop limit on channel utilization is limited when using a faster preset, I’m not sure the overall reliability will increase. At each transmission, there is a chance it fails, e.g. due to collisions, external interference/noise, or just a marginal link budget. Thus, the chance that more than 8 transmissions (7 hops) in a row succeed gets lower and lower.
I’m also talking about any intermediate node, so only two is enough. For example, node A sends a packet that gets heard by two nodes B and C, where B is a favorite router that does not decrement the hop limit. Once C hears the rebroadcast of B with the original hop limit, it thinks it’s a repeated packet from A and hence tries to rebroadcast again, while it shouldn’t. This can only be fixed by relying on the
Yes, for example when you request position/NodeInfo/traceroute/telemetry of a router. A node receiving a rebroadcast from another router without decremented hop limit thinks it’s the original message. |
We don't. In the bay mesh logger, we log the first packet that comes into the node, and generally it is correct. We can see the hops fan out from each route appropriately via traceroute or MQTT data. The routers are all well connected, and we don't see evidence a lower hop packet arrives after a higher one. I'd be willing to run test code on my router though to validate how often packets like this arrive out of order.
I don't know how other meshes operate when using routers, but in the bay mesh if a router doesn't receive a message (due to noise, collision, etc) chances are it will receive it from another router. Same with clients. The core routers of the environment are highly meshed with each router having 3-4 others it hears. Reliability is solved through redundancy.
This is not completely correct... If node A sends a packet that gets heard by nodes B and C, both B and C will decrement the hop limit since node A is not a favorite router, and will not fit the criteria. The only way this gets weird is if A, B, and C are all routers, A is liked by B and C, and A is the sender. Your point is still valid and correct. I'll fix it. As long as hop_limit is less than hop_start the rest should be good. Thank you for this input. I appreciate it. |
…o add isFirstHop. If isFirstHop, always decrease hop_limit to avoid retry logic.
That would be nice. It would be a shame if this would get nullified just by packets arriving via other routes first, which is common in dense and active meshes (where routers often have multiple packets in the queue).
I agree that indeed redundancy helps a lot here, but having 4-5 routers in each other’s range is a rather unique set-up. (And to be honest also not really recommended if they are real routers (not
A and B yes, but C, no. My point was that any What remains are the more cosmetic anomalies like incorrect “hops away” counts, traceroutes and NeighborInfo. |
From my testing (admittedly, limited) traceroutes work as intended and ignore the hop_start and hop_limit, although it can reach ROUTE_SIZE which is limited to 8. It looks like it's a graceful fail (with error) but without completely re-doing the protobufs, I can't find a good solution here. This one we're definitely limited by the payload size as increasing it past 8 will break a lot of things. I don't have a good solve other than it works until 8 hops. On the other side, it will only show "unknown" hops as long as the number if hopsTaken is less than the hop_start - hop_limit. I guess this is a blessing and curse, as the hops work as intended, but if there is a missing unknown hop somewhere, it will be masked. I personally don't see this as a dealbreaker given our current constraints. For NeighborInfo the previous change of decreasing hop_limit will fix all the issues there, as it only cares about local hops. I'm starting to appreciate my first-hop fix a little more now. With hops away counts, which seems to be a simple calculation of hop_start - hop_limit, which will treat the "favorite routers" as a single hop in calculating this. I would say this is preferred, even if it is a lie, while working in the confines of the current protobuf implementation. |
Yes, I think we'll need to live with the above limitations.
Indeed, nice. |
f9394ff to
5e91335
Compare
|
I am currently unable to add labels to fix the remaining failing check.. |
GUVWAF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thebentern do we still want to gate this for faster presets only?
In principle I'm OK with this, but still I would prefer to have it merged after #5534, or at least at the same time to get the most out of this.
- Replace multiple individual role checks with cleaner IS_ONE_OF macro - Add CLIENT_BASE support as suggested in PR meshtastic#7992 - Include MeshTypes.h for IS_ONE_OF macro - Makes code more maintainable and consistent with other parts of codebase
|
I thought about this for about 2 minutes before deciding to include CLIENT_BASE, but the thought process was something like this...
The assumption is that if someone is using a CLIENT_BASE node, they're not using the node as a CLIENT since the whole purpose of CLIENT_BASE is to forward messages from inside a structure outwards (please let me know if this is incorrect). The hope is users would just use CLIENT if the node is on the roof and they're connecting to it directly. The reason for including CLIENT_BASE is to allow CLIENT_BASE nodes to repeat a packet from a favorite ROUTER (one hopefully nearby) before the packet runs out of hops. I'll need to look at the CLIENT_BASE code to see if it discriminates favorite CLIENTs from ROUTERs when making this determination of sending a packet as priority content and sends it before other ROUTERs have a chance to send it. Technically, it's adding a 0-cost hop to the path to other routers, and other routers will forward the message anyway however it is received. But I do see the edge case. The most logical fix in my eyes is to make it so only favorite non-repeating roles (client, client_mute, etc) can make the priority window when using CLIENT_BASE |
yes, that is correct. either from inside a structure or from a mobile node, say a mobile client_mute role type node and wanting to prevent client nodes in the trenches of city buildings passing a package on when the rootop Client_Base node would be better-suited.
They might now, since the feature of trusted / favorite routers-from (client_base, router, router_late) for non-reduction of 0 hop count is favorite-based and client_base favoriting of nodes is also a part of the client_base PR #7873 Basically, your changes and the ones from that PR change have some intersection in terms of the idea of favorites.
Yeah, this is a bit too deep at this moment in time in the evening for me, but I think I get the idea of what you are checking in the code. In our experience, with too many Routers around, problems quickly arise, even if it's just two routers. We decided to put our mountain nodes on ROUTER_LATE here in Switzerland.
Yes, people in our Swiss network that have that possibility are doing so, depending on BLE / bluetooth range conditions, BLE antenna used and so on. Though we have a lot of houses condo-style, multiple-story, where being at floor 0 or 1 means once won't be able to connect to a rooftop node anymore via bluetooth. So people are excited, especially those with access to a roof, for that new role CLIENT_BASE in general.
@h3lix1 yes, that would be great, thank you for having taken my thoughts into account. That is THE added bonus / edge case prevention feature. |
|
@shalberd CLIENT_BASE strictly uses p->from and p->to for its decision making. The 0 hop routing focuses on p->relay_node to make its decisions about the packet. Combined with the first-hop rule (the first hop always decrements) there isn't a case where two client_base favorited neighboring nodes will cause an issue. I think this is working perfectly as-is. I'm unable to determine an edge case where the two will conflict. The only edge cases I can think of are the ones CLIENT_BASE introduces itself. (i.e. client (f)->router->client_base(f')->router) where CLIENT_BASE adds an extra hop between routers .. Ping me on discord (h3lix) and let's see if we can find a way this can cause an issue. |
|
@h3lix1 thinking about this more, I think we will be ok. Definitely make sure to write a blog article and update documentation on the topic of non-hop-decrement / 0 cost hops for favorited ROUTER/ROUTER_LATE/CLIENT_BASE because otherwise, I can well image ROUTER_LATE or CLIENT_BASE operators by accident favoriting among themselves when that is not even intended. |
We have very well-visible mountain nodes, ROUTER_LATE, that can see each other well. Those people I will definitely have to tell to NOT favorite their mountain nodes. Same for more than / > 2 CLIENT_BASE that can see each other well in different cities each on a higher mountain ... that could have unintended consequences if they are on the favorite list of each other. |
|
If you don't mind, can you draw a diagram that explains what drawbacks you see with 0-cost hops? I want to make sure I'm solving for the right scenario. |
|
here is my fear diagram regarding node favorites and their role in non-decrementing / 0 cost hops.
We had scenarios with 60 - 80 km hops and round trips of Switzerland before and were able to prevent excessive early rebroadcasting by not using Router anymore, only Router_Late. The new combination of Client_Base re-transmitting in the early contention window, plus the 0 cost hop feature, all based on favorites, worries me. Maybe those worries are unfounded, but it leaves a kind of questionmark with me. https://github.com/orgs/meshtastic/discussions/409#discussioncomment-14523768 |
Meshtastic has duplicate packet detection. If the same packet is received again by a node, that node will simply ignore it. This prevents the 'endless loop' scenario you are concerned about. |
That is not what we observed with role Router and our very well-visible, in terms of topography, Router nodes. Speaking of firmware 2.6.11, possible this has changed for the better very recently, e.g. with #8216 and #8148 |
It's not new; the ignore-dupes functionality has existed for ages and was definitely present in v2.6.11. That is one of the things that the Can you please clarify exactly what it was that you were observing? I would be extremely surprised if you have actually observed a node repeatedly rebroadcasting the same packet outside of one of the intended exceptions (although if you have, then there's a bug somewhere that needs squashing ASAP). |
early rebroadcast / contention window, multiple routers, role ROUTER, all broadcasting at the same time. https://github.com/orgs/meshtastic/discussions/409#discussioncomment-14523768 One user mentions there: This highlights the directional message sending problem that even Client nodes can have with each other. It is just more obvious and likely with nodes that have higher gain antennas and placed in very good locations. Now me again: That, together with Client_Base now using early rebroadcast window / Router-type logic for favorite-based routing, leads me to worry along with this 0-cost-fowarding feature here in our alpine scenarios. We have switched our alpine nodes to role ROUTER_LATE and don't have that problem anymore, but as mentioned, Client_Base utilizing Router, not Router_Late, type behavior for favorited nodes, along with 0 cost hop forwarding for favorited nodes, has the potential, as an attack vector, to cause massive problems. Somewhere else, someone from either Canada or NZ mentioned this: technical vs. social engineering and coordination question. We are all learning, and we appreciate the effort put into Meshtastic. Just please don't put in obvious config traps that could be avoided, or try finding a balance between complete dependency on user good-will (especially problematic in larger networks) and functionality. |
That is completely unrelated, and has nothing to do with your infinite-rebroadcast concern. Those are mandatory-rebroadcast roles, they are supposed to rebroadcast packets. But as I pointed out above, as a general rule no node will rebroadcast the same packet a second time. Duplicate detection prevents them from doing so. Once they have received a packet, they remember doing so, and simply ignore it on any subsequent occasion they happen to hear that packet again. Are you perhaps confusing the concept of duplicate detection (i.e. that a node wil only act on the same packet once, regardless of how many times it is received), with the concept of cancelling pending rebroadcasts of packets where another node was heard to rebroadcast it? As an example of the former, let's say you have 50 ROUTER nodes, and they can all see each other. A packet arrives. Each of those 50 routers will rebroadcast that packet precisely once, and will then proceed to ignore any further instances of that packet which they might subsequently hear. It ends up on the air air 50 times as a result (because there are 50 routers), but there is no routing loop to worry about. The hop limit has no impact on a node's duplicate-detection behaviour.
This is still offtopic, but yes - you're absolutely correct on that. Meshtastic is inherently a very easy system to initiate denial-of-service attacks against. There's not really much you can do to avoid that unfortunately. Tweaking things to make it harder to accidentally cause problems is quite helpful (e.g. removal of the repeater role), but ultimately an attacker who is actively malicious is always going to find it pretty easy to cause major problems. I do feel that the CLIENT_BASE role should be using the same timing behaviour as ROUTER_LATE though, not ROUTER, and I agree with you that having it in the early window has the potential to cause a number of frustrating problems. However others seem to feel that the problems arising from running it in the early window can be safely ignored, and it was ultimately implemented in the early window anyway. We'll have to wait and see what happens once it starts seeing significant real-world use.
What did you have in mind re config traps? That isn't something that anybody here wants, so if you see one, please by all means point it out 🙂 |
yes, I think so, thanks for clearing that up
Mhh, when we did traceroutes between our Router mountain nodes, there might not have been a loop, but because of it being transmitted via 4 routers in our case, the hop limit was reached prematurely. I guess it wasn't simultaneously in that case, I see your point. Since we switched to Router_Late for out mountaintop nodes, the situation has much improved because we do indeed have good paths between Client role nodes as well.
Config trap 1) the use of favorites in the two features we are talking about @compumike A user mentioned in ticket "CLIENT_BASE role unwanted behavior"
We are already seeing problems in the wild with this, issue 8338, but also when I discuss with early adopters.
Just to summarize and be clear: I think both features are really cool and useful: 0 cost hops and CLIENT_BASE preferential routing. It is just I think that they will cause more problems, in the sense of favorite notion being a config trap, than they solve. cross-linking #8338 (comment) Config trap 2) when setting user HAM mode is_licensed, override duty cycle is set to true https://github.com/meshtastic/firmware/blob/master/src/modules/AdminModule.cpp#L1339 Plus, at least on the Android app, users would not even see that setting being enabled / true (override duty cycle) meshtastic/Meshtastic-Android#3324 (comment) |
|
Dumb question maybe, but how do you even favorite another Router on a Router-Node? I don't see how that's available via remote administration. In |
That only seems possible when connected via USB or Bluetooth. |
|
@h3lix1 @fifieldt Intuitively, a firmware search does not support my hunch that is_ignored also leads to is_favorite ... hmm then again, a search here shows that an is_favorite: true node could also be is_ignored: true or vice versa ... https://github.com/meshtastic/firmware/blob/master/src/mesh/NodeDB.cpp#L1917 However, the nodes are then not seen in the Android app under favorites, only under filter "show ignored". I want to make sure that nodes I set as is_ignored do not by accident end up in consideration for originating-from 0 hop links ... As for a different field, I was suggesting a similar approach over at Client_Base preferred nodes (new field): |
Not as a normal course of business.. Unless you message the other router, which might be what @shalberd is finding above. (May have tried to message a node that was ignored to test it) The favorites flag is a little overloaded. I might not have realized about how much until after the fact. I think it's still manageable for now since routers favoriting routers is a little odd, but as you found it will take wifi/bluetooth/serial access to set favorites normally. I'll see if there is a way to do this remotely one way or another. |
@thebentern @fifieldt no, I did not message that node at all after I set it to is_ignored
No, it is not, isn't the whole point of this feature 0 cost hops here that is_favorite determines whether to make a from-hop a 0 cost hop? I mean, your feature "Feat/0-cost hops for favorite routers" requires operators to favorite other ROUTER, ROUTER_LATE, CLIENT_BASE nodes. you currently only check for an originating node to be marked as is_favorite: However, as I have shown above, now, even is_ignored nodes, nodes clearly to be ignored, are given 0 hop cost because somewhere in the background, they are also marked as favorite. |
|
@thebentern Update: same on Router_Late device role. I just absolutely want to make sure that ignored nodes are never ever considered for 0 cost hops, nor for preferential routing early window in the case of CLIENT_BASE. Nonetheless, for both features, using is_favorite is not good, in my opinion. Separation of concerns. proposing a new NodeInfo field name, e.g. 0_cost_from_node true/false |
You can set it via remote admin, but must use the CLI tool to do it. The phone apps don't have that ability currently. |
|
Hi @h3lix1 I have a question about this great new feature. My use case is that I have poor signal inside my home and at best I see only one nearby node and only sometimes. To solve this, I installed a node on the roof. I understand that by seeing more nodes the hop cost should not matter very much but anyway I see no harm in taking a free hop from inside my house to the roof. Why does this implementation need to have both devices as CLIENT_BASE? Why the first hop is always subtracted? If I understand correctly, this new feature doesn't meet the needs of my use case. Am I correct? Do I have an alternative? Thanks! |
|
@Hamberthm The issue was towards the structures within the Meshtastic code treat the first hop as special in terms of other functions like neighborinfo. It expects the first hop to be the actual sender, which can be very confusing for these functions if it is a few actual hops away. (Other things include gratuitous nodeinfo for neighbors) It was easier to expect someone to just bump up their local max hops to account for their roof node than to handle a lot of these edge cases. |
All right. I suppose setting one more hop on the outgoing messages would solve that part. What about receiving? I see the previous relay also has to be set to ROUTER/ROUTER_LATE/CLIENT_BASE and favorited, so there's no way of preventing a "no hops left" message from dying on my roof before hoping one more time to my inside node. |
|
@Hamberthm That is correct - It is assuming that the hop before the roof node is a ROUTER (or ROUTER_LATE) node that is favorited. The expectation is that if there is one (or zero) hops left when it reaches your roof node it will still forward it along. It's not a guarantee that the node will avoid no-hops-left issue, although I guess we can try to make exceptions for all ROUTER_LATE packets. |
|
@h3lix1 @compumike |


With a move to faster (but shorter distance) presets, the limit of 7 hops may be a challenge for some networks to grow within metro regions while keeping within the 7 hop limit, limiting adoption of presets like SHORT_FAST or SHORT_TURBO for environments that can support it otherwise.
Allowing users to configure past 7 hops could cause problems with meshes not prepared for this. A compromise is to allow routers with other routers in the "friend" list to not decrease max_hops.
Using some inspiration from CLIENT_BASE, it uses "friended" routers as a determinator if the hop_limit is decreased. In the case of this, client_base is also considered a router.
This implementation looks at the p->relay_node as the last byte of the node ID to determine if the node is a friend based on the last byte of node ID, and then secondly by if it is a router in the node DB. There could be collisions here since there are only 256 possible values between 0x00 to 0xff. There is a much smaller possibility that a client will match both.
The equation for collision probability is 1 - ((256 - F) / 256)^(N - F)
Practical examples:
10 nodes, 1 favorite - will collide with 0.04 nodes (3.5% probability of a single collision)
50 nodes, 3 favorites - will collide with 0.55 nodes (43% probability of a single collision)
100 nodes, 5 favorites - will collide with 1.86 nodes (84.5% probability of a single collision)
200 nodes, 10 favorites - will collide with 7.42 nodes (99.9% probability of a single collision)
Even with a collision, it would require both nodes to be in close proximity to talk to each other, which makes this an edge case at best, and there is no harm done if there is a collision.
Nodes maintain a PacketHistory with sender/id to avoid sending the same packet twice. This will avoid common routing loops.
Features
Supported Infrastructure Roles:
Behavior:
Usage:
To benefit from this feature:
Safety:
🤝 Attestations
Station G2