Commit graph

6319 commits

Author SHA1 Message Date
Stephen Hemminger 156055da95 ethdev: improve API comment for MAC address addition
The comment used the term whitelist and was awkardly written.
Replace it with simpler direct description of adding a new address.
No code or API changes for this.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: John McNamara <john.mcnamara@intel.com>
2020-08-07 13:02:10 +02:00
Stephen Hemminger 95a2e18dfb kni: fix reference to master/slave process
In DPDK, the correct terms for process are primary/secondary.
This is bugfix, not a change in terms for new release.

Fixes: f2e7592c47 ("kni: fix multi-process support")
Cc: stable@dpdk.org

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
2020-08-07 13:01:54 +02:00
Thomas Monjalon 2fca871ce7 ethdev: remove device-specific comments from VLAN API
Some confusing comments were still present from old days,
when most drivers were from Intel.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-08-05 20:01:49 +02:00
Patrick Fu 6563cf9238 vhost: fix async copy on multi-page buffers
Async copy fails when single ring buffer vector is split on multiple
physical pages. This happens because current hpa address translation
function doesn't handle multi-page buffers. A new gpa to hpa address
conversion function, which returns the hpa on the first hitting host
pages, is implemented in this patch. Async data path recursively calls
this new function to construct a multi-segments async copy descriptor
for ring buffers crossing physical page boundaries.

Fixes: cd6760da10 ("vhost: introduce async enqueue for split ring")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-07-30 00:41:24 +02:00
Maxime Coquelin b53a497294 vhost: fix guest notification setting
If rte_vhost_enable_guest_notification is called before
the virtqueue is ready, the configuration is lost.

This patch fixes this by saving the guest notification
enablement value requested by the application, and apply
it before the virtqueue is made ready to the application.

Fixes: 604052ae53 ("net/vhost: support queue update")

Reported-by: Yinan Wang <yinan.wang@intel.com>
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Tested-by: Yinan Wang <yinan.wang@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-30 00:41:24 +02:00
Yuying Zhang 4a67e71816 net: fix IPv6 checksum with TSO
The ol_flags check lacks of flag for IPv6 which causes checksum
flag configuration error while IPv6/TCP TSO packet is sent.
This patch fixes the issue by adding PKT_TX_TCP_SEG flag.

The rte_net_intel_cksum_flags_prepare() function prepares the
pseudo header checksum in packet data when doing checksum or TSO
offload.

Fixes: 520059a41a ("net: check fragmented headers in non-debug as well")

Signed-off-by: Yuying Zhang <yuying.zhang@intel.com>
Tested-by: Xi Zhang <xix.zhang@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-07-30 00:41:24 +02:00
Patrick Fu 819a716858 vhost: fix async callback return type
The async copy device callbacks are used by async APIs to transfer data
and check completion status. Async APIs return the number of packets
successfully processed to the caller applications and no error
(negative) value is allowed for API return value. Thus, negative return
values from async device callbacks don't have meaningful usage, while
adding overhead in checking the return value validity. This patch change
the callback return values from "int" to "uint32_t" to get aligned with
async API definition.

Fixes: 78639d5456 ("vhost: introduce async enqueue registration API")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-07-30 00:41:23 +02:00
Parav Pandit 21587b4921 eal: introduce macro for bit definition
There are several drivers which duplicate bit generation macro.
Introduce a generic bit macros so that such drivers avoid redefining
same in multiple drivers.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Acked-by: Matan Azrad <matan@mellanox.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
2020-07-28 18:27:46 +02:00
Yunjian Wang a5f803c804 hash: fix out-of-memory handling in hash creation
The function rte_zmalloc_socket() could return NULL, the return
value need to be checked.

Fixes: 5915699153 ("hash: fix scaling by reducing contention")
Cc: stable@dpdk.org

Reported-by: Bin Huang <brian.huangbin@huawei.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Yipeng Wang <yipeng1.wang@intel.com>
2020-07-27 12:53:40 +02:00
Anatoly Burakov 8b7b02f945 power: fix environment detection
Anything coming from sysfs has a newline at the end. Cut it off before
comparing the strings.

Fixes: 20ab67608a ("power: add environment capability probing")

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: David Hunt <david.hunt@intel.com>
Tested-by: Lihong Ma <lihongx.ma@intel.com>
Reviewed-by: Bruce Richardson <bruce.richardson@intel.com>
2020-07-22 01:35:39 +02:00
Zhike Wang 9dbe628a7b mempool: fix allocation in memzone during retry
If allocation is successful on the first attempt, typically
there is no problem since we allocated everything required and
we'll terminate the loop (if memory chunk is really sufficient
to populate required number of mempool elements).

If the first attempt fails, we try to allocate half
of mem_size and it succeed, we'll have one more iteration of
the for-loop to allocate memory for remaining elements and
should not try the next time with quarter of the mem_size.

It is wrong that max_alloc_size is divided by 2 in the
case of successful allocation as well, or invalid memory
can be allocated, and leads to population failure, then errno
other than ENOMEM may be returned.

Fixes: 3a3d0c75b4 ("mempool: fix slow allocation of large pools")
Cc: stable@dpdk.org

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Signed-off-by: Zhike Wang <wangzhike@jd.com>
Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-07-22 01:27:10 +02:00
Nithin Dabilpuram 5b2655a693 node: add packet classifier
This node classifies pkts based on packet type and
sends them to appropriate next node. This is node
helps in distribution of packets from ethdev_rx node
to different next node with a constant overhead for
all packet types.

Currently all except non fragmented IPV4 packets are marked
to be sent to "pkt_drop" node.
Performance difference on ARM64 Octeontx2 is -4.9% due to
addition of new node in the path.

Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
2020-07-22 01:18:59 +02:00
Raslan Darawsheh e4f9eab7d9 net: fix pedantic build
when trying to compile rte_mpls with pedantic enabled,
on old compilers like 4.8 it will complain about bit field definition.

error: type of bit-field 'bs' is a GCC extension [-Werror=pedantic]
error: type of bit-field 'tc' is a GCC extension [-Werror=pedantic]
error: type of bit-field 'tag_lsb' is a GCC extension [-Werror=pedantic]

This fixes the compilation error by adding extension to the header
definition.

Fixes: e480cf487a ("net: add MPLS header structure")
Cc: stable@dpdk.org

Signed-off-by: Raslan Darawsheh <rasland@mellanox.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-07-21 17:36:54 +02:00
Patrick Fu a608436b63 vhost: fix double-free with zero-copy
zmbufs should be set to NULL when getting freed to avoid double free on
the same buffer pointer

Fixes: b0a985d1f3 ("vhost: add dequeue zero copy")
Cc: stable@dpdk.org

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-07-21 16:55:30 +02:00
Patrick Fu 47958f7cbf vhost: fix async completion of multi-seg packets
In async enqueue copy, a packet could be split into multiple copy
segments. When polling the copy completion status, current async data
path assumes the async device callbacks are aware of the packet
boundary and return completed segments only if all segments belonging
to the same packet are done. Such assumption are not generic to common
async devices and may degrade the copy performance if async callbacks
have to implement it in software manner.

This patch adds tracking of the completed copy segments at vhost side.
If async copy device reports partial completion of a packets, only
vhost internal record is updated and vring status keeps unchanged
until remaining segments of the packet are also finished. The async
copy device is no longer necessary to care about the packet boundary.

Fixes: cd6760da10 ("vhost: introduce async enqueue for split ring")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-07-21 16:54:58 +02:00
Patrick Fu 5c7ddd6b14 vhost: fix missing virtqueue status check in async path
Vring should not be touched if vq is disabled. This patch adds the vq
status check in async enqueue polling to avoid accessing to a disabled
queue.

Fixes: cd6760da10 ("vhost: introduce async enqueue for split ring")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-07-21 16:50:29 +02:00
Patrick Fu 6a82bceb56 vhost: fix missing device pointer validity check
This patch adds the check of dev pointer in vhost async enqueue
completion poll. If a NULL dev pointer detected, the poll function
returns immediately.

Coverity issue: 360839
Fixes: cd6760da10 ("vhost: introduce async enqueue for split ring")

Signed-off-by: Patrick Fu <patrick.fu@intel.com>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
2020-07-21 16:50:29 +02:00
Andrew Rybchenko 520059a41a net: check fragmented headers in non-debug as well
Pseudo-header checksum calculation requires contiguous headers.
There is no any formal requirements on data location and mbuf
structure which could be used by the application.

Since

commit dfc6b2fd8d ("mbuf: remove Intel offload checks from generic API")

fragmented headers checks are done inside
rte_net_intel_cksum_flags_prepare() in RTE_LIBRTE_ETHDEV_DEBUG build
because it is moved from rte_validate_tx_offload() which is called
under debug only.

Make corresponding check to be done in non-debug build as well
to avoid bad accesses, incorrect checksum calculation and to
return appropriate error from Tx prepare.

Make no-offloads check more precise and do it in non-debug build
as well to avoid contiguous headers check and Tx prepare failure
if it is not actually required.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-07-21 13:54:54 +02:00
Ruifeng Wang 4cdd49f9b0 lpm: report error when defer queue overflows
Coverity complains about unchecked return value of rte_rcu_qsbr_dq_enqueue.
By default, defer queue size is big enough to hold all tbl8 groups. When
enqueue fails, return error to the user to indicate system issue.

Coverity issue: 360832
Fixes: 8a9f8564e9 ("lpm: implement RCU rule reclamation")

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
2020-07-21 20:48:40 +02:00
Phil Yang db48bae253 mbuf: use C11 atomic builtins for refcnt
Use C11 atomic builtins with explicit ordering instead of rte_atomic
ops which enforce unnecessary barriers on aarch64.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Suggested-by: Dodji Seketeli <dodji@redhat.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-07-21 10:30:35 +02:00
Ciara Power 2a7d0b872f telemetry: add upper limit on connections
This patch limits the number of client connections to the new telemetry
socket. The limit is set to 10.

Signed-off-by: Ciara Power <ciara.power@intel.com>
2020-07-19 15:36:37 +02:00
Phil Yang 672a150563 eal: add wrapper for C11 atomic thread fence
Provide a wrapper for __atomic_thread_fence builtins to support
optimized code for __ATOMIC_SEQ_CST memory order for x86 platforms.

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ola Liljedahl <ola.liljedahl@arm.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2020-07-17 16:00:30 +02:00
Ciara Power 9683022930 metrics: fix header installation with meson
If Jansson was found, the headers list is overwritten when including
rte_metrics_telemetry.h, which prevents rte_metrics.h from being
installed. This is now fixed to add to headers, rather than overwrite,
to allow both headers be installed when Jansson is present.

Fixes: c5b7197f66 ("telemetry: move some functions to metrics library")
Cc: stable@dpdk.org

Signed-off-by: Ciara Power <ciara.power@intel.com>
Acked-by: David Marchand <david.marchand@redhat.com>
2020-07-17 16:00:30 +02:00
Honnappa Nagarahalli 8831678b51 eal: change the log level for test asserts
Change the log level for RTE_TEST_ASSERT macro to error to help
log errors while running test cases.

Suggested-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Acked-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
2020-07-17 10:47:56 +02:00
Ferruh Yigit 353162537f lpm: fix build dependency on RCU library
'librte_rcu' is now dependency to 'librte_lpm' library, this dependency
should be reflected to build system.

Fixes: 8a9f8564e9 ("lpm: implement RCU rule reclamation")

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
2020-07-15 13:15:06 +02:00
Bing Zhao d164c609e7 ethdev: add eCPRI key fields to flow API
Add a new item "rte_flow_item_ecpri" in order to match eCRPI header.

eCPRI is a packet based protocol used in the fronthaul interface of
5G networks. Header format definition could be found in the
specification via the link below:
https://www.gigalight.com/downloads/standards/ecpri-specification.pdf

eCPRI message can be over Ethernet layer (.1Q supported also) or over
UDP layer. Message header formats are the same in these two variants.

Signed-off-by: Bing Zhao <bingz@mellanox.com>
Acked-by: Ori Kam <orika@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-07-13 02:11:30 +02:00
Renata Saiakhova c9c74288f0 ethdev: add function to release HW rings
Free previously allocated memzone for HW rings

Signed-off-by: Renata Saiakhova <renata.saiakhova@ekinops.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-07-11 06:18:54 +02:00
Viacheslav Ovsiienko 9da82e8d8b mbuf: introduce accurate packet Tx scheduling
There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this patchset is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature
from mlx5 PMD side [1].

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

The device clock is opaque entity, the units and frequency are
vendor specific and might depend on hardware capabilities and
configurations. If might (or not) be synchronized with real time
via PTP, might (or not) be synchronous with CPU clock (for example
if NIC and CPU share the same clock source there might be no
any drift between the NIC and CPU clocks), etc.

After PKT_RX_TIMESTAMP flag and fixed timestamp field supposed
deprecation and obsoleting, these dynamic flag and field might be
used to manage the timestamps on receiving datapath as well. Having
the dedicated flags for Rx/Tx timestamps allows applications not
to perform explicit flags reset on forwarding and not to promote
received timestamps to the transmitting datapath by default.
The static PKT_RX_TIMESTAMP is considered as candidate to become
the dynamic flag and this move should be discussed.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and might be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_read_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

[1] http://patches.dpdk.org/patch/73714/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-07-11 06:18:54 +02:00
Ivan Malov 5cf04fd15a net: use named constants for deprecated QinQ TPIDs
Add named constants for deprecated QinQ TPIDs.
Update drivers which have already been using existing
TPID named constants from librte_net to use the
new named constants rather than magic numbers.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-07-11 06:18:53 +02:00
Junfeng Guo d9a8bc6570 ethdev: add RSS types for IPv6 prefix
This patch defines new RSS offload types for IPv6 prefix with 32, 40,
48, 56, 64, 96 bits of both SRC and DST IPv6 address.
Ref https://tools.ietf.org/html/rfc6052.

Signed-off-by: Junfeng Guo <junfeng.guo@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-07-11 06:18:53 +02:00
Wei Hu (Xavier) 50ce3e7aec ethdev: fix VLAN offloads set if no relative capabilities
Currently, there is a potential problem that calling the API function
rte_eth_dev_set_vlan_offload to start VLAN hardware offloads which the
driver does not support. If the PMD driver does not support certain VLAN
hardware offloads and does not check for it, the hardware setting will
not change, but the VLAN offloads in dev->data->dev_conf.rxmode.offloads
will be turned on.

It is supposed to check the hardware capabilities to decide whether the
relative callback needs to be called just like the behavior in the API
function named rte_eth_dev_configure. And it is also needed to cleanup
duplicated checks which are done in some PMDs. Also, note that it is
behaviour change for some PMDs which simply ignore (with error/warning
log message) unsupported VLAN offloads, but now it will fail.

Fixes: a4996bd89c ("ethdev: new Rx/Tx offloads API")
Fixes: 0ebce6129b ("net/dpaa2: support new ethdev offload APIs")
Fixes: f9416bbafd ("net/enic: remove VLAN filter handler")
Fixes: 4f7d9e383e ("fm10k: update vlan offload features")
Fixes: fdba3bf15c ("net/hinic: add VLAN filter and offload")
Fixes: b96fb2f0d2 ("net/i40e: handle QinQ strip")
Fixes: d4a27a3b09 ("nfp: add basic features")
Fixes: 56139e85ab ("net/octeontx: support VLAN filter offload")
Fixes: ba1b3b081e ("net/octeontx2: support VLAN offloads")
Fixes: d87246a437 ("net/qede: enable and disable VLAN filtering")
Cc: stable@dpdk.org

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Hyong Youb Kim <hyonkim@cisco.com>
Acked-by: Sachin Saxena <sachin.saxena@nxp.com>
Acked-by: Xiaoyun Wang <cloud.wangxiaoyun@huawei.com>
Acked-by: Harman Kalra <hkalra@marvell.com>
Acked-by: Jeff Guo <jia.guo@intel.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-11 06:18:53 +02:00
Wei Hu (Xavier) 36fbaaf30d ethdev: fix data room size verification in Rx queue setup
In the rte_eth_rx_queue_setup API function, the local variable named
mbp_buf_size, which is the data room size of the input parameter mp,
is checked to guarantee that each memory chunk used for net device
in the mbuf is bigger than the min_rx_bufsize. But if mbp_buf_size is
less than RTE_PKTMBUF_HEADROOM, the value of the following  statement
will be a large number since the mbp_buf_size is a unsigned value.
    mbp_buf_size - RTE_PKTMBUF_HEADROOM
As a result, it will cause a segment fault in this situation.

This patch fixes it by modify the check condition to guarantee that the
local variable named mbp_buf_size is bigger than RTE_PKTMBUF_HEADROOM.

Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Sachin Saxena <sachin.saxena@nxp.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
2020-07-11 06:18:53 +02:00
Ferruh Yigit cacd2bb786 ethdev: verify reserved HW ring
Function 'rte_eth_dma_zone_reserve()' returns an existing memzone based
on name match, but other requested attributes are discarded.
This may cause driver using a memzone with wrong size or alignment.

Verify size, alignment and socket_id for matched memzone, and do not use
memzone if any one of the attributes are not justified.

It is possible to free the existing memzone and allocate again with the
requested attributes but it is better caller do the explicit free.

Reported-by: Renata Saiakhova <renata.saiakhova@ekinops.com>
Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
2020-07-11 06:18:52 +02:00
Adrian Moreno 2025f4fe6c vhost: support virtio status message
This patch adds support to the new Virtio device get status
Vhost-user message.

The driver can send this new message to read the device status.

One of the uses of this message is to ensure the feature negotiation has
succeeded.  According to the virtio spec, after completing the feature
negotiation, the driver sets the FEATURE_OK status bit and re-reads it
to ensure the device has accepted the features.

This patch also clears the FEATURE_OK status bit if the feature
negotiation has failed to let the driver know about his failure.

Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
Maxime Coquelin 41d201804c vhost: support virtio status
This patch adds support to the new Virtio device status
Vhost-user protocol feature.

Getting such information in the backend helps to know
when the driver is done with the device configuration
and so makes the initialization phase more robust.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
Maxime Coquelin a15f9dbba0 vhost: check vDPA configuration succeed
This patch checks whether vDPA device configuration
succeed and does not set the CONFIGURED flag if it
didn't.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
Maxime Coquelin b46a99c600 vhost: make some vDPA callbacks mandatory
Some of the vDPA callbacks have to be implemented
for vDPA to work properly.

This patch marks them as mandatory in the API doc and
simplify code calling these ops with removing
unnecessary checks that are now done at registration
time.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
Maxime Coquelin 2ab58f20db vhost: refactor virtio ready check
This patch is a small refactoring, as preliminary work
for adding support to Virtio status support.

No functional change here.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
Maxime Coquelin 1c3df72bda vhost: fix virtio ready flag check
Before checking whether the device is ready is done
a check on whether the RUNNING flag is set. Then the
READY flag is set if virtio_is_ready() returns true.

While it seems to not cause any issue, it makes more
sense to check whether the READY flag is set and not
the RUNNING one.

Fixes: c0674b1bc8 ("vhost: move the device ready check at proper place")
Cc: stable@dpdk.org

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Chenbo Xia <chenbo.xia@intel.com>
2020-07-11 06:18:52 +02:00
David Marchand 7d1af09e98 eal/linux: truncate thread name
pthread_setname_np refuses names larger than 16 bytes (\0 included).
Rather than return an error, truncate the name to this limit in the
rte_thread_setname helper.

Caught with ixgbe which creates control thread with name
"ixgbe-link-handler":

Configuring Port 0 (socket 0)
EAL: Cannot set name for ctrl thread
...
EAL: Cannot set name for ctrl thread

Port 0: link state change event
...
EAL: Cannot set name for ctrl thread

Port 0: link state change event

Note: before this change, the thread would keep its original name, which
meant in my test for the ixgbe handler either "dpdk-testpmd" or
"eal-intr-thread".

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2020-07-11 15:03:47 +02:00
Ruifeng Wang 0f392d91b9 lpm: hide defer queue handle
There is no need to return the defer queue handle in rte_lpm_rcu_qsbr_add,
since enough flexibility has been provided to configure the defer queue.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
2020-07-11 14:35:04 +02:00
Anatoly Burakov 20ab67608a power: add environment capability probing
Currently, there is no way to know if the power management env is
supported without trying to initialize it. The init API also does
not distinguish between failure due to some error and failure due to
power management not being available on the platform in the first
place.

Thus, add an API that provides capability of probing support for a
specific power management API.

Suggested-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
2020-07-11 13:31:16 +02:00
Thomas Monjalon 9d2b245937 pci: keep API compatibility with mmap values
The function pci_map_resource() returns MAP_FAILED in case of error.
When replacing the call to mmap() by rte_mem_map(),
the error code became NULL, breaking the API.
This function is probably not used outside of DPDK,
but it is still a problem for two reasons:
	- the deprecation process was not followed
	- the Linux function pci_vfio_mmap_bar() is broken for i40e

The error code is reverted to the Unix value MAP_FAILED.
Windows needs to define this special value (-1 as in Unix).
After proper deprecation process, the API could be changed again
if really needed.

Because of the switch from mmap() to rte_mem_map(),
another part of the API was changed: "int additional_flags"
are defined as "additional flags for the mapping range"
without mentioning it was directly used in mmap().
Currently it is directly used in rte_mem_map(),
that's why the values rte_map_flags must be mapped (sic) on the mmap ones
in case of Unix OS.

These are side effects of a badly defined API using Unix values.

Bugzilla ID: 503
Fixes: 2fd3567e54 ("pci: use OS generic memory mapping functions")

Reported-by: David Marchand <david.marchand@redhat.com>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Tested-by: Lihong Ma <lihongx.ma@intel.com>
2020-07-11 11:48:13 +02:00
Harman Kalra 3596a037ab eal: fix parentheses in alignment macros
Found an issue while using RTE_ALIGN_MUL_NEAR with an
expression, like as passed in estimate_tsc_freq().
RTE_ALIGN_MUL_FLOOR resulted in unexpected value as
parathesis are required to evaluate an expression.

Fixes: 5120203d75 ("eal: add macros to align value to multiple")
Cc: stable@dpdk.org

Signed-off-by: Harman Kalra <hkalra@marvell.com>
2020-07-11 11:41:33 +02:00
Dmitry Kozlyuk 7daf5bdb0f eal/windows: detect insufficient privileges for hugepages
AdjustTokenPrivileges() succeeds even if no requested privileges have
been granted; this behavior is documented. Check last error code in
addition to return value to detect such case.

Make error messages more specific and add troubleshooting hint.

Signed-off-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
2020-07-11 00:45:20 +02:00
Hongzhi Guo 982bb68cab net: fix checksum on big endian CPUs
With current code, the checksum of odd-length buffers is wrong on
big endian CPUs: the last byte is not properly summed to the
accumulator.

Fix this by left-shifting the remaining byte by 8. For instance,
if the last byte is 0x42, we should add 0x4200 to the accumulator
on big endian CPUs.

This change is similar to what is suggested in Errata 3133 of
RFC 1071.

Fixes: 6006818cfb26("net: new checksum functions")
Cc: stable@dpdk.org

Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-07-11 00:45:20 +02:00
Hongzhi Guo d5df2ae042 net: fix unneeded replacement of TCP checksum 0
Per RFC768:
If the computed checksum is zero, it is transmitted as all ones.
An all zero transmitted checksum value means that the transmitter
generated no checksum.

RFC793 for TCP has no such special treatment for the checksum of zero.

Fixes: 6006818cfb ("net: new checksum functions")
Cc: stable@dpdk.org

Signed-off-by: Hongzhi Guo <guohongzhi1@huawei.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
2020-07-11 00:45:20 +02:00
Joyce Kong 58902736a4 vhost: restrict pointer aliasing for packed ring
Restrict pointer aliasing to allow the compiler to vectorize loop
more aggressively.

With this patch, a 9.6% improvement is observed in throughput for
the packed virtio-net PVP case, and a 2.8% improvement in throughput
for the packed virtio-user PVP case. All performance data are measured
on ThunderX-2 platform under 0.001% acceptable packet loss with 1 core
on both vhost and virtio side.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Acked-by: Adrián Moreno <amorenoz@redhat.com>
2020-07-10 15:43:41 +02:00
Joyce Kong 428e684795 introduce restricted pointer aliasing marker
The 'restrict' keyword is recognized in C99, while type qualifier
'__restrict' compiles ok in C with all language levels. This patch
is to replace the existing 'restrict' with '__rte_restrict' which
is a common wrapper supported by all compilers.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-07-10 15:35:32 +02:00
Ruifeng Wang 8a9f8564e9 lpm: implement RCU rule reclamation
Currently, the tbl8 group is freed even though the readers might be
using the tbl8 group entries. The freed tbl8 group can be reallocated
quickly. This results in incorrect lookup results.

RCU QSBR process is integrated for safe tbl8 group reclaim.
Refer to RCU documentation to understand various aspects of
integrating RCU library into other libraries.

To avoid ABI breakage, a struct __rte_lpm is created for lpm library
internal use. This struct wraps rte_lpm that has been exposed and
also includes members that don't need to be exposed such as RCU related
config.

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Ray Kinsella <mdr@ashroe.eu>
Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
2020-07-10 13:41:29 +02:00