Scapy traffic generator for DPDK L3FWD application - linux

I am new to DPDK and trying to run L3FWD app using scapy to send traffic to it.
I have two Hosts. Host A(Ubuntu 4.15.0-154-generic) for Scapy to send the traffic. Host B (Ubuntu 5.11.0-25-generic) for DPDK(21.08.0) and Host B has vfio-pci module and two NICs (Ethernet Controller XXV710) binded to it.
I have huge pages inserted like below
mkdir -p /dev/hugepages
mountpoint -q /dev/hugepages || mount -t hugetlbfs nodev /dev/hugepages
echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo 1024 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
I verified the connectivity between Hosts using Scapy on Host A and tcpdump on Host B. (Traffic is coming in from Host A to B)
I build the DPDK and its example apps. I am trying to run L3FWD with this arguments on Host B.
./dpdk-l3fwd -l 1,2 -n 4 -- -p 0x3 --config="(0,0,1),(1,0,2)"
And I tried many ways of sending the traffic from Host A to B using Scapy like below and still not able to see the output of L3FWD on Host B.
way 1)
sendp(Ether()/IP(src="1X.1X.2x.1x"), iface="enp25s0f0",count=1000)
using sendp command, I didn't see traffic on Host B with L3FWD running. Please note, for src ip add i have replaced numeric values with x here
way 2)
send(IP(src="1x.1x.2x.1x"), iface="enp25s0f0",count=1000)
using send command, I didn't see traffic on Host B with L3FWD running
way 3)>>
x = Ether(src='xc:xd:xe:a9:x9:x0', dst='xC:xx:xx:Ax:Bx:x1')
sendp(x, iface='enp25s0f0',count=10000)
This also didn't work
pls Note I have replaced above actual MAC and IP addr with few 'x'.
Output of L3FWD is as below
./build/examples/dpdk-l3fwd -l 1,2 -n 4 -- -p 0x3 --config="(0,0,1),(1,0,2)"
EAL: Detected 56 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
**EAL: No available 1048576 kB hugepages reported**
EAL: VFIO support initialized
EAL: Using IOMMU type 8 (No-IOMMU)
EAL: Probe PCI driver: net_i40e (8086:158b) device: 0000:18:00.0 (socket 0)
EAL: Probe PCI driver: net_i40e (8086:158b) device: 0000:18:00.1 (socket 0)
**TELEMETRY: No legacy callbacks, legacy socket not created**
Neither LPM, EM, or FIB selected, defaulting to LPM
Initializing port 0 ... Creating queues: nb_rxq=1 nb_txq=2... Port 0 modified RSS hash function based on hardware support,requested:0xa38c configured:0x2288
Address:xx:xx:xx:xx:xx:xx, Destination:02:00:00:00:00:00, Allocated mbuf pool on socket 0
LPM: Adding route 198.18.0.0 / 24 (0)
LPM: Adding route 198.18.1.0 / 24 (1)
LPM: Adding route 2001:200:: / 64 (0)
LPM: Adding route 2001:200:0:1:: / 64 (1)
txq=1,0,0 txq=2,1,0
Initializing port 1 ... Creating queues: nb_rxq=1 nb_txq=2... Port 1 modified RSS hash function based on hardware support,requested:0xa38c configured:0x2288
Address:xx:xx:xx:xx:xx:xx, Destination:02:00:00:00:00:01, txq=1,0,0 txq=2,1,0
Initializing rx queues on lcore 1 ... rxq=0,0,0
Initializing rx queues on lcore 2 ... rxq=1,0,0
Checking link status........done
Port 0 Link up at 25 Gbps FDX Autoneg
Port 1 Link up at 25 Gbps FDX Autoneg
L3FWD: entering main loop on lcore 1
L3FWD: -- lcoreid=1 portid=0 rxqueueid=0
L3FWD: entering main loop on lcore 2
L3FWD: -- lcoreid=2 portid=1 rxqueueid=0
Output of L3FWD doesn't go forward after this point.
Can any of you please help me find out where i am going wrong. Or help me know how to send traffic from Host A to run L3FWD app on Host B.
Thanks for your responses.

[EDIT] the solution mentioned in the answer is tried and tested on both HOST and Virtual machines and it works.
There are a couple of ways to check if L3fwd is actually receiving and sending the matches rule traffic. One can check the statistics
using DPDK secondary application dpdk-procinfo for DPDK ports
checking the statistics at the HOST-B PF port via ethtool
checking stats at the sender (scapy) on Host-A
For your specific use cases there 2 factors to match
Since DPDK ports on Host-B is created via VF over single PF interface, you would at least have to send packets with VF-1 MAC address to allow the packet to be received in.
For packets to send out from VF-2, the mac address of the packet has to be modified to allow be forwarded out from Host-B PF to Host-A.
Note: If you were having 2 separate PF NIC on Host-B one need to configure ports on Host-A to promiscuous mode to receive with any MAC address.
Solution:
Program Scapy to send ETH-IP packet with MAC address of VF-1 (assuming MAC-VLAN is not enabled on PF) and Dest IP Address of LPM table entry as 198.18.X.X (where X stands for wild card byte value)
check with the dpdk secondary application for Port-0 (VF-1) and Port-1 (VF-2) to confirm packets are indeed received and transmitted out.
Check on Host-B statistics of PF using cat /proc/net/dev to see if packets are received, transmitted or dropped.
Note: the sample program can be easily edited with print and rte_pktmbuf_dump too.

Related

Packet capturing in l3fwd

I am performing a dpdk experiment. In my setup, I have two physical machines, Host1 and Host2 with 2 10Gbps NICs on each. One interface of Host1 is bounded with dpdk and generating traffic using pktgen. Both interfaces of Host2 are bounded with dpdk and l3fwd is running as packet forwarding application. Second NIC of Host2 is used to capture the packets. I want to breakdown the delay experienced by a packet by seeing the time spent in each interface of Host2.
Is there any way to capture packets of dpdk interfaces using l3fwd as packet forwarding applications?
For DPDK interfaces you can make use DPDK-PDUMP capture to get packets from DPDK bonded nic. Refer https://doc.dpdk.org/guides-16.07/sample_app_ug/pdump.html.
Application l3fwd is to be modified with rte_pdump_init API call right after rte_eal_init. This will enable multi-proecss communication channel, there by when dpdk-pdump (secondary) application is run rte_ring and packet copy is enabled to copy the content over.
Note: please check DPDK PDUmp App on usage. FOr example to copy packets from port 0 and queue 1 use sudo ./[path to applciation]/dpdk-pdump -- --pdump 'port=0,queue=1,rx-dev=/tmp/port0_queue1.pcap'
pdump is good tool to capture packets at any port binded to dpdk. Launch the pdump tool as follows:
sudo ./build/app/dpdk-pdump -- --pdump 'port=0,queue=*,rx-dev=/tmp/capture.pcap'
and after packets are received, run the following command in home/temp directory to view them
tcpdump -nr ./capture.pcap

ARP Retry count exceeded in uboot

I was testing my Ethernet connection on my i.MX6 Board in u-boot
I used the following commands:
setenv ipaddr xx.xx.xx.xx
setenv serverip xx.xx.xx.xx
setenv netmask xx.xx.xx.xx
setenv gatewayip xx.xx.xx.xx
setenv ethaddr xx:xx:xx:xx:xx:xx
When I do a ping to my address it fails
=> ping xx.xx.xx.xx
Using FEC device
ARP Retry count exceeded; starting again
ARP Retry count exceeded; starting again
=> mii info
PHY 0x00: OUI = 0x209A, Model = 0x01, Rev = 0x00, 100baseT, FDX
=> mii dump 0 0
0. (3100) -- PHY control register --
(8000:0000) 0.15 = 0 reset
(4000:0000) 0.14 = 0 loopback
(2040:2000) 0. 6,13 = b01 speed selection = 100 Mbps
(1000:1000) 0.12 = 1 A/N enable
(0800:0000) 0.11 = 0 power-down
(0400:0000) 0.10 = 0 isolate
(0200:0000) 0. 9 = 0 restart A/N
(0100:0100) 0. 8 = 1 duplex = full
(0080:0000) 0. 7 = 0 collision test enable
(003f:0000) 0. 5- 0 = 0 (reserved)
What can be the error. I have seen in NXP Website, that fake MAC Address will not work with ping.. How to make it work..
First of all, we need to check whether the MAC address is properly written into H/W register, i.e. basically SpecAdd1top and SpecAdd1bottom registers (which holds the MAC address). Read the value of these 2 registers and whether they are perfectly matching with assigned MAC address or not.
Need to verify whether ARP request is reaching on server PC by running Wireshark on server side PC. If it is reaching, then it's not getting ARP response to target board within its ARP timeout timing is 5 ms. It seems like ARP request is not reaching to the server side, which means it is not sending ARP request properly.
Check the ENET MAC successfully sending ARP request or not from the u-boot ethernet MAC driver.
Increase the ARP timeout time from 5ms to some more ms.
Use wireshark or tcpdump to validate what is actually happening on the "wire". As modern ethernet is switched you need to find a way to actually see that particular ethernet segment. This is easy if your device is connected via a cross over cable and you have your laptop on the other side. Otherwise you might have to configure your switch so it copies traffic from a certain port to the port where your laptop is attached.
You might want to setup the network adapter using the dhcp command instead of using setenv.
Using dhcp you verify that you have a working network connection and you can display the correct values for the environment variables that you are currently setting manually using printenv.
If a basic ping doesn't work with a static IP address, what makes you think a DHCP request has a better chance of success?
Static addresses can be set up incorrectly. E.g. if the wrong gateway or network mask is specified you may not be able to ping a target.

tcpdump returns 0 packets captured, received and dropped

I am currently trying to debug a networking problem that has been plaguing me for almost three weeks. I'm working with openstack and can create virtual machines and networks fine but cannot connect to them at all. When I run this command from the server, i have to ctrl+c to stop the time-out and it returns:
[root#xxxxxx ~(keystone_admin)]# tcpdump -i any -n -v 'icmp[icmptype] = icmp-echoreply or icmp[icmptype] = icmp-echo'
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
I'm not sure if this is exclusively and OpenStack problem or just a networking problem in general, but i know that 'tcpdump' is supposed to return something other than 0 packets captured, received or dumped. I am new to networking and therefore do not have much experience so please be gentle. Any help is appreciated. Thanks.
tcpdump is the right tool to dump ip packets. But if your openstack security group rules blocks ICMP, 0 ICMP packets are expected.
I just want to understand what do you mean by "cannot connect to the virtual machines at all". ping command doesn't work? or other protocol like ssh or HTTP.
Generally the first common problem when connecting to OpenStack VM is the security group rules. the default one disallow ICMP protocol. You can run the following command to see the rules:
nova secgroup-list: it usually returns a default one
nova secgroup-rules-list default: it will show the defined rules. where there must be at least one rule to allow ICMP protocol.
Here's the official doc to tell how to add rules allowing ICMP and SSH.

accept_local doesn't work

I want send out data from one NIC and received by another NIC on a CENTOS6.4( X86 frame ,3 NIC, one is onboard realtek's, the other two is intel NICs ).
First,I configured intel nic ip: (eth0) 192.168.1.1/24 and (eth1) 192.168.1.2/24 on two intel NICs.
Second, I add route by following cmds:
# route add -host 192.168.1.1 dev eth1
# route add -host 192.168.1.2 dev eth0
Third, I enabled accept_local in /etc/sysctl.conf:
net.ipv4.conf.eth0.accept_local = 1
net.ipv4.conf.eth1.accept_local = 1
And I also disabled iptables and SElinux. I reboot the system, then use a wire connect eth0 and eth1, then I test like this:
#ping 192.168.1.1 -I eth1
Message returned:
"From 192.168.1.2 icmp_seq=xx Destination Host Unreachable"
Has I missed something?
I have read this topic How can configure linux routing to send packets out one interface, over a bridge and into another interface on the same box already.
try set sysctl -w net.ipv4.conf.all.rp_filter=2
Refer https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
accept_local - BOOLEAN
Accept packets with local source addresses. In combination
with suitable routing, this can be used to direct packets
between two local interfaces over the wire and have them
accepted properly.
rp_filter must be set to a non-zero value in order for
accept_local to have an effect.
rp_filter - INTEGER
0 - No source validation.
1 - Strict mode as defined in RFC3704 Strict Reverse Path
Each incoming packet is tested against the FIB and if the interface
is not the best reverse path the packet check will fail.
By default failed packets are discarded.
2 - Loose mode as defined in RFC3704 Loose Reverse Path
Each incoming packet's source address is also tested against the FIB
and if the source address is not reachable via any interface
the packet check will fail.
Current recommended practice in RFC3704 is to enable strict mode
to prevent IP spoofing from DDos attacks. If using asymmetric routing
or other complicated routing, then loose mode is recommended.
The max value from conf/{all,interface}/rp_filter is used
when doing source validation on the {interface}.
Default value is 0. Note that some distributions enable it
in startup scripts.

UDP packet greater than 1500 bytes dropped

I'm developing a tftp client and server and I want to dynamically select the udp payload size to boost transfer performance.
I have tested it with two linux machines ( one has a gigabit ethernet card, the other a fast ethernet one ). I changed the MTU of the gigabit card to 2048 bytes and left the other to 1500.
I have used setsockopt(sockfd, IPPROTO_IP, IP_MTU_DISCOVER, &optval, sizeof(optval)) to set the MTU_DISCOVER flag to IP_PMTUDISC_DO.
From what I have read this option should set the DF bit to one and so it should be possible to find the minimum MTU of the network ( the MTU of the host that has the lowest MTU ). However this thing only gives me an error when I send a packet which size is bigger than the MTU of the machine from which I'm sending packets.
Also the other machine ( the server in this case ) doesn't receive the oversized packets ( the server has a MTU of 1500 ). All the UDP packets are dropped, the only way is to send packets of 1472 bytes.
Why the hosts do this? From what I have read, if I send a packet larger than MTU, the ip layer should fragment it.
I fail to see the problem. You are setting the "don't fragment" bit, and you send a package smaller than the sending host's MTU, but larger than the receiving host's MTU. Of course nobody will fragment here (doing so would violate the DF bit). Instead, the sending host should get an ICMP message back.
Edit: IP specifies that an ICMP error message type 3 (destination unreachable) code 4 (Fragmentation Required but DF Bit Is Set) is sent to the originating host at the point where the fragmentation would have occurred. The TCP layer handles this on its own for PMTU discovery. On connection-less sockets, Linux reports the error in the socket's error queue if the IP_RECVERR option is activated; see ip(7).
That "DF bit" you're setting, stands for "Don't Fragment". The IP layer should not be expected to fragment packets when you've told it not to.
It is not correct to run hosts with different interface MTUs on the same subnet1.
This is a host/network misconfiguration, and IP path MTU discovery is not expected to work correctly in this situation.
If you wish to test your application's path MTU discovery, you will need to set up multiple subnets connected by a router2, with different MTUs. In this situation, the router is the device that will pick up the MTU mismatch, and send back an ICMP "Fragmentation Needed" error.
1. Well, technically, same broadcast domain.
2. The devices sold as "home routers" are really router/switches - they route between the WAN and the LAN, but switch between the ethernet ports on the LAN. This isn't sufficient to separate networks with different MTUs.

Resources