network_driver_refactor [Switches]

DRAFT!!

This needs to be validated, verified etc and a second flow to show the SDK U-Boot flow.

Right now, things are a bit spaghetti flow, in that configuring the PHY, serdes (on both sides), serdes calibration, is a bit all over the place.

The PHY framework offers `probe` and `config_init` functions.

The SerDes itself is a PHY and gets probed as such, and needs to be setup. The remote PHY, currently only gets initialized, but then does a lot of things all over the place, information that needs to be obtained from elsewhere. E.g. we need to know the connected MDIO bus number probably (so we can send messages, though the PHY framework probably handles that) and the serdes (which should be the phys phandle).

= rtl9300 = rtl9300 serdes gets probed whenever the rtl9300 serdes gets probed based on it's PHYID (0x70d03106) which is connected to mdio bus 0x1a (26) and 0x1b (27), which are not configured via the devicetree. Where/how are these set? The `rtl9300_serdes_probe` function doesn't actually do anything.

= rtl8218d = The same serdes is probed as slave again, later, but as _slave_ mdio bus number 0x1a (26), 0x1b (27) *and* 0x3f (63). The bus name is setup in `rtl83xx_mdio_probe` but no idea how/why that triggers the internal serdes detection again, and where these addresses are coming from. Also, do we have 2 or 3 internal serdes controllers?

Next, DSA is setting up the rtl93xx, should it wait/defer until the PHY's are actually all setup?

Then the MDIO probes the external PHY's (0x001cc983) via `rtl8218d_phy_probe`. Phy probing does nothing except for 'joining' PHY's into a package `devm_phy_package_join` as we have 8 phy's in one XSGMII package (4 phy's in a QSGMII package times 2).

Next, we have the messy bit, `rtl9300_configure_8218d` is the config_init function for the PHY, does a little bit of everything.

It gets the SDS number from the devicetree, and each link config without a SDS number setup is ignored. This to indicate how the whole block connects to a serdes, not so nice.
It gets CMU band from the actual link config per sds, but that is only displayed, not used.
All ports of the get the polling of the SMI/MDIO disabled.
MAC gets turned off for the port.
Serdes of the exact Serdes gets turned off.
In effect, the external PHY is now disconnected, polling is off, serdes is off.
The serdes now gets patched (configured) with binary data (page, register, value).
For even serdes numbers, one table is written, for odd serdes numbers, a second table is written. This is of course problematic, because it all depends on how the hardware is setup. e.g. in some cases, serdes 3 is XSGMII which this setup breaks completely also, a second table gets written, overwriting some bits of the first?
Then the PHY gets setup with `rtl9300_rtl8218d_phy_setup`
- The current serdes mode gets read from the actual PHY via MDIO, but this data is NOT used! This could be useful to validate how the host connects to the phy! This should not be relied upon however how to configure the serdes, as not every phy will offer this option to detect.
- Then the serdes model data is retrieved, this to determine if it is a 8281D or 8281D NMP.
- the PHY is now patched with the correct table for either a 8281D or 8281D NMP. TODO, figure out if `phy_write_paged` writes to the actual PHY (and do we need to do this for each individual MDIO address, or only once … The `phy_config` structure seems to imply only the first mdio, regardless…
The mac link gets configured in `rtl9300_serdes_mac_link_config`, but in reality, the 10gbit and 1gbit pages of the serdes are being modified depending if we want tx or rx 'normal'. We do not know what the 'other' is. Since the before/after values are identical, nothing is done in effect. Loopback maybe? Check SDK!
with the newly patched serdes, it is being enabled, in the requested mode.
MAC for the port gets turned back on.
More serdes configuration is needed, `rtl9300_sds_tx_config`
- SDS TX eye data is written to the serdes (appearantly this is not part of the other serdes pages, or those the RX parameters? doubtful though. Generic setup? The sdk does far more calibration here too?
- polling of the PHY is re-enabled (but only on the primary MDIO, what about the 7 others?)
- Link power down saving is enabled (shouldn't this be configurable?)

Seems like (other then the binary table choice) `rtl9300_configure_8218d` configures both sides of the serdes, which is not ideal; however we need to figure out how to do this properly within the kenrel frameworks.

The devicetree design is new, in that we have a serdes, that we really ought to treat specially in the devicetree, as each serdes needs to be configured.

Also logically, it follows that we have switch → serdes → port (range). While we can defer how the serdes is connected (QSGMII, XSGMII …), we don't know which ports belong to what serdes etc.

Currently, we do this by having the `sds` property to indicate to the driver what sds is to be used, but then the code already makes a lot of assumptions. E.g. that there's always a quad phy connected, and things are being derived from the fact that the first phy has an `sds` property, and the rest do not.

The thing is, we need to configure a serdes (patch it) with a blob of configuration data, which is based on type (configuration) and board layout, as these affect the configuration data. As such, it makes much more sense to treat things in a more tree-like fashion.

Finally, the mdio bus also plays a part, as this is where we discover and configure the actual PHY, based on the PHY id, and also configure the 'other side' of the serdes link, e.g. the phy's serdes equally needs to be configured.

This draft does not yet cover where to store these configuration tables. In C code, means it can't easily be changed, and is still 'sort of' hard coded. Also, the tables are unique per device, as it is very possible, that routing/layout of the PCB traces affect the configuration (though this hasn't been observed yet!). Logically thus, it should be in the devicetree, but these are relative big blobs, and the devicetree should be kind of immutable (in theory) so might not be ideal. The alternative is to install it as firmware blobs and load it on demand. But all these are future discussions.

In the deviceetree, the configuration currently looks like this:

ethernet@1b00a300 {
    mdio-bus {
        regmap = <&ethernet>;
                
        phy0: ethernet-phy@0 {
            rtl9300,smi-address = <0 0>;
            sds <0>;
        };
        ...
        phy23: ethernet-phy@23 {
            rtl9300,smi-address = <2 8>;
            sds <2>;
        };
    };
};

First, the MDIO bus needs to become its own node, as it has no relation to the Ethernet device. This is especially true as the NIC on the realtec soc doesn't even have an MDIO bus and directly connected with a fixed link. There are some hacks in the driver now, that make little/no sense to have on the NIC, and should be handled by DSA 'cpu port' node. Unclear right now, is how the fixed link is configured, or rather if it can. The driver implies its configureable, but it could be very much that the NIC just follows exactly how the switch port is configured. How is unclear, there's no NIC related registers, everything happens via the CPU switch port.

Secondly, all the 'switch' registers are used for ethernet, phy, DSA. This needs to be split up of course.

ethernet@1b007c60 {
}

mdio@1b00ca00 {
    frequency = <25000000>;
    bus@0 {
        max-frequency = <8000000>;

        phy0: phy@0 {
        };
        phy3: phy@3 {
        };
        ...
        phy4: phy@0 {
        };
    };
    ...
    bus@2 {
        ...
        phy23: phy@7 {
        };
    };
};

serdes@1b00080 {
    // A serdes really is just a generic PHY, See [[https://elixir.bootlin.com/linux/latest/source/drivers/phy/microchip/sparx5_serdes.c || sparx5 ethernet switch serdes]]
    #phy-cells 2; (sds num + optional lane number)
};

switch@1b00000 {
    port@0 {
        label = "lan1";
        phys = <&serdes 0 0>, <&serdes 1 1>; // sds_num, lane_num, probably only even/odd matters
        phy-handle = <&phy0>;
        phy-mode = "qsgmii";
    };
    port@1 {
        label = "lan1";
        phys =<&serdes 0 0>, <&serdes 1 1>;;
        phy-handle = <&phy0>;
        phy-mode = "qsgmii";
    };
    ...
    port@8 {
        label = "lan9";
        phys = <&serdes 2>;
        phy-handle = <&phy8>;
        phy-mode = "xgmii";
    };
};

Open Questions

What if a octal RTL8218 PHY is connected with only a single QSGMII link?

The picture in the datasheet would suggest that there is a 1:1 mapping, in that mac0 - 3, map to phy0 - 3. Serdes Lane configuration however suggests otherwise, in that you just have one big '2 bit per clock' data pipe. Regardlesss, the MII addresses will still all be consumed/available. E.g. the MII address 7 will still relate to port7, regardless there's a port 7 (assuming we can run on half a QSGMII, which is not listed as a supported mode). So a octal PHY, will be detected 8 times and thus indicate 8 PHY's to the kernel, even if less ports are configured int he switch section.

Config flow

Devicetree

Open Questions