How is the Management VRF special ?

  • 1
  • Question
  • Updated 1 year ago
Hi,

I'm wondering what being a mgmt VRF implied exactly.

Mostly what's different between :
 * a mgmt VRF + using the default one for "public internet" routing.
 * Using default one for mgmt + using a data plane VRF for "public internet".

Currently I was more leaning toward the second solution since it seems to have less limitation (i.e. not limited to eth0 for instance), also if I ever want OSPF for my internal private network, that part has to be in the default one (not sure if VRF OSPF is planned for the future ?).


Here's a bit of explanation about my setup if needed for context.

 - 2 * 10G switches running cumulus with mlags to the hosts.

 - The switches 'eth0' are connected to an 'emergency' gateway that will only be used as a last resort. Most likely we won't administer them through there using something goes very wrong.

 - What we actually call "management network" that has the machine BMCs and such is just a separate VLAN but connected to the same switches (there is way too many ports so we can afford to 'waste' a few 10G port with 1G copper for that). The switches would have a L3 interface on that VLAN and that's where they would be managed from most of the time and also get their 'management' internet access (fetch packages ...)

 - Most machines (physical & virtual) only have private RFC1918 IPs (in the management VLANs and most other VLANs for inter VM communication). And I'm trying to keep that as separate of the "public internet" side of things as possible. Machines on that private network get their internet access from routers "on a stick" on those switches (doing the NAT, firewall, ...)

 - Routes in those private IP vlans are redistributed using OSPF (to other sites) and there are dedicated routers that handle that. So technically the switch don't have to participate to OSPF, but maybe in the future ...

 - Those switches act as our border routers to the public internet (but they only receive default routes + a few more specific ones). I would put everything BGP and public IP stuff in a separate VRF.

Maybe actually doing both options at once is worth considering:
 - Use the mgmt VRF for the 'emergency access' so it has its own routes.
 - Use the default VRF for our RFC1918 internal network
 - Use a data plane VRF for all the BGP public internet stuff.


Cheers,

   Sylvain
Photo of Sylvain Munaut

Sylvain Munaut

  • 746 Points 500 badge 2x thumb

Posted 1 year ago

  • 1
Photo of Sean Cavanaugh

Sean Cavanaugh, Alum

  • 3,380 Points 3k badge 2x thumb
The mgmt vrf is a special vrf for eth0 (or eth1 on some switch platforms).  The eth ports are software only ports that are not hardware accelerated.

 - The switches 'eth0' are connected to an 'emergency' gateway that will only be used as a last resort. Most likely we won't administer them through there using something goes very wrong.
You only want OOB (out of band) traffic to use your eth0, whether or not you have a VRF or not.  This is b/c this port is SW switched, not HW switched.

Where a VRF helps is when you have multiple matching routes.  For example you could have a default route (0.0.0.0/0) on eth0 you got from your DHCP server, and a default from your internet service provider for your data plane traffic.  You now have two routes
  • 0.0.0.0/0 via DHCP (Kernel Route) Admin Distance 0
  • 0.0.0.0/0 via OSPF or BGP, Admin Distance > 1
What happens is your Dynamic routing protocol never gets installed....  VRF gives you two route tables so you can have overlapping routes.

The rest of your questions don't seem to go along with VRF.  What is the exactly problem?  Did my answer help here?  Maybe we can dive down a deeper level now.
Photo of Pete B

Pete B, Official Rep

  • 2,786 Points 2k badge 2x thumb
I like this explanation a lot, Sean. Mind if I repurpose it for the docs? 
Photo of Sean Cavanaugh

Sean Cavanaugh, Alum

  • 3,380 Points 3k badge 2x thumb
lol sure
Photo of Sylvain Munaut

Sylvain Munaut

  • 746 Points 500 badge 2x thumb
Yeah, I'm aware the eth0 port should only be used for OOB administration.

But one of the thing I was wondering was if I could use a front-panel for instead for day-to-day administration. But turns out it seems like a bad idea because (1) a bunch of rate limit rules are applied to traffic from front panel ports to the linux host, so this needs quite a bit of default config override  (2) Traffic using those ports goes through switchd which tends to use CPU time.

So if I want the switch to be administrable using two different path (eth0 itself is not redudant, I want to have two path in case one goes down ...), I'm better off using eth0 for "day-to-day" administration and access and then use a front-panel-port (SVI) for "emergency / recovery" in case my eth0 link is screwed up.

The other think I was wondering is how / where the special name 'mgmt' is matched and what kind of different behavior does it trigger vs naming it 'admin' (just an examples). That name is apparently special and triggers some different part of the code, I'm wondering which ones.

As for my reasons for using VRFs, it is mostly security / isolation. To make sure some misconfig couldn't lead to packets from the internet leaking into my internal network (since the switch is directly connected to upstreams providers and I can't trust anything coming from there). It also allows me to make sure all the "apps" running on the switch ( ntp / smtp / ... ) would go through our internal router/firewall rather than directly to the upstream providers.
Photo of Eric Pulvino

Eric Pulvino, Official Rep

  • 4,082 Points 4k badge 2x thumb
The mgmt name triggers special handling of DNS traffic specifically if you're using the MGMT vrf and run the "ip rule ls" command, you'll see that when applied the mgmt vrf builds a special rule to send traffic from your currently configured DNS server out the eth0 port (instead of out your front-panel ports). From talking to David Ahern, I understand that will be a configurable setting soon but that is the only piece of specialness I'm aware of. I'll send this thread over to David Ahern in case I've missed something. (I'm sure I have).
Photo of Sylvain Munaut

Sylvain Munaut

  • 746 Points 500 badge 2x thumb
Ok, thanks really good to know.

Is that rule inserted in the HW ? (i.e. if I have packets being routed from one front-panel port to another toward the DNS server IP, will they be affected)
Photo of David Ahern

David Ahern, Employee

  • 312 Points 250 badge 2x thumb
The "mgmt" name is special cased to identify the Management VRF from a data plane VRF. As Eric mentioned, FIB rules are installed for DNS servers since that is the usual deployment case. In addition, the user shell is set to the Management VRF context at login. This allows admin tools like ansible, chef, apt-get to Just Work over the management plane with no change in how the command is run.  It really comes down to making Management VRF transparent and easy, especially for new users doing a typical deployment.