Commit graph

279 commits

Author SHA1 Message Date
Phil Yang db48bae253 mbuf: use C11 atomic builtins for refcnt
Use C11 atomic builtins with explicit ordering instead of rte_atomic
ops which enforce unnecessary barriers on aarch64.

Suggested-by: Olivier Matz <olivier.matz@6wind.com>
Suggested-by: Dodji Seketeli <dodji@redhat.com>
Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-07-21 10:30:35 +02:00
Viacheslav Ovsiienko 9da82e8d8b mbuf: introduce accurate packet Tx scheduling
There is the requirement on some networks for precise traffic timing
management. The ability to send (and, generally speaking, receive)
the packets at the very precisely specified moment of time provides
the opportunity to support the connections with Time Division
Multiplexing using the contemporary general purpose NIC without involving
an auxiliary hardware. For example, the supporting of O-RAN Fronthaul
interface is one of the promising features for potentially usage of the
precise time management for the egress packets.

The main objective of this patchset is to specify the way how applications
can provide the moment of time at what the packet transmission must be
started and to describe in preliminary the supporting this feature
from mlx5 PMD side [1].

The new dynamic timestamp field is proposed, it provides some timing
information, the units and time references (initial phase) are not
explicitly defined but are maintained always the same for a given port.
Some devices allow to query rte_eth_read_clock() that will return
the current device timestamp. The dynamic timestamp flag tells whether
the field contains actual timestamp value. For the packets being sent
this value can be used by PMD to schedule packet sending.

The device clock is opaque entity, the units and frequency are
vendor specific and might depend on hardware capabilities and
configurations. If might (or not) be synchronized with real time
via PTP, might (or not) be synchronous with CPU clock (for example
if NIC and CPU share the same clock source there might be no
any drift between the NIC and CPU clocks), etc.

After PKT_RX_TIMESTAMP flag and fixed timestamp field supposed
deprecation and obsoleting, these dynamic flag and field might be
used to manage the timestamps on receiving datapath as well. Having
the dedicated flags for Rx/Tx timestamps allows applications not
to perform explicit flags reset on forwarding and not to promote
received timestamps to the transmitting datapath by default.
The static PKT_RX_TIMESTAMP is considered as candidate to become
the dynamic flag and this move should be discussed.

When PMD sees the "rte_dynfield_timestamp" set on the packet being sent
it tries to synchronize the time of packet appearing on the wire with
the specified packet timestamp. If the specified one is in the past it
should be ignored, if one is in the distant future it should be capped
with some reasonable value (in range of seconds). These specific cases
("too late" and "distant future") can be optionally reported via
device xstats to assist applications to detect the time-related
problems.

There is no any packet reordering according timestamps is supposed,
neither within packet burst, nor between packets, it is an entirely
application responsibility to generate packets and its timestamps
in desired order. The timestamps can be put only in the first packet
in the burst providing the entire burst scheduling.

PMD reports the ability to synchronize packet sending on timestamp
with new offload flag:

This is palliative and might be replaced with new eth_dev API
about reporting/managing the supported dynamic flags and its related
features. This API would break ABI compatibility and can't be introduced
at the moment, so is postponed to 20.11.

For testing purposes it is proposed to update testpmd "txonly"
forwarding mode routine. With this update testpmd application generates
the packets and sets the dynamic timestamps according to specified time
pattern if it sees the "rte_dynfield_timestamp" is registered.

The new testpmd command is proposed to configure sending pattern:

set tx_times <burst_gap>,<intra_gap>

<intra_gap> - the delay between the packets within the burst
              specified in the device clock units. The number
              of packets in the burst is defined by txburst parameter

<burst_gap> - the delay between the bursts in the device clock units

As the result the bursts of packet will be transmitted with specific
delays between the packets within the burst and specific delay between
the bursts. The rte_eth_read_clock is supposed to be engaged to get the
current device clock value and provide the reference for the timestamps.

[1] http://patches.dpdk.org/patch/73714/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2020-07-11 06:18:54 +02:00
Xiaolong Ye 6248368a9a mbuf: add dump of free dynamic flags
Add support to dump free_flags as below format:

Free bit in mbuf->ol_flags (0 = occupied, 1 = free):
  0000: 0 0 0 0 0 0 0 0
  0008: 0 0 0 0 0 0 0 0
  0010: 0 0 0 0 0 0 0 1
  0018: 1 1 1 1 1 1 1 1
  0020: 1 1 1 1 1 1 1 1
  0028: 1 0 0 0 0 0 0 0
  0030: 0 0 0 0 0 0 0 0
  0038: 0 0 0 0 0 0 0 0

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-25 23:03:18 +02:00
Xiaolong Ye d27a4cb3ef mbuf: fix dynamic field dump log
For each mbuf byte, free_space[i] == 0 means the space is occupied,
free_space[i] != 0 means space is free.

Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-25 23:03:18 +02:00
Xiaolong Ye 3a7fb882fd mbuf: fix free space update for dynamic field
The value free_space[i] is used to save the size of biggest aligned
element that can fit in the zone, current implementation has one flaw,
for example, if user registers dynfield1 (size = 4, align = 4, req = 124)
first, the free_space would be as below after registration:

  0070: 08 08 08 08 08 08 08 08
  0078: 08 08 08 08 00 00 00 00

Then if user continues to register dynfield2 (size = 4, align = 4),
free_space would become:

  0070: 00 00 00 00 04 04 04 04
  0078: 04 04 04 04 00 00 00 00

Further request dynfield3 (size = 8, align = 8) would fail to register
due to alignment requirement can't be satisfied, though there is enough
space remained in mbuf.

This patch fixes above issue by saving alignment only in aligned zone,
after the fix, above registrations order can be satisfied, free_space
would be like:

After dynfield1 registration:

  0070: 08 08 08 08 08 08 08 08
  0078: 04 04 04 04 00 00 00 00

After dynfield2 registration:

  0070: 08 08 08 08 08 08 08 08
  0078: 00 00 00 00 00 00 00 00

After dynfield3 registration:

  0070: 00 00 00 00 00 00 00 00
  0078: 00 00 00 00 00 00 00 00

This patch also reduces iterations in process_score() by jumping align
steps in each loop.

Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-25 23:03:18 +02:00
Xiaolong Ye 359386e15b mbuf: fix error code in dynamic field/flag registration
Set rte_errno as ENOMEM when allocation failure.

Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-25 23:03:18 +02:00
Xiaolong Ye f8eb26dda8 mbuf: fix boundary check at dynamic field registration
We should make sure off + size < sizeof(struct rte_mbuf) to avoid
possible out-of-bounds access of free_space array, there is no issue
currently due to the low bits of free_flags (which is adjacent to
free_space) are always set to 0. But we shouldn't rely on it since it's
fragile and layout of struct mbuf_dyn_shm may be changed in the future.
This patch adds boundary check explicitly to avoid potential risk of
out-of-bounds access.

Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-25 23:03:18 +02:00
Tal Shnaiderman 4887a7e234 mbuf: align layout in Windows
Using uint32_t type bit-fields in Windows will pads the
'L2/L3/L4 and tunnel information' union with additional bits.

This padding causes rte_mbuf size misalignment and the total size
increases to 3 cache-lines.

Changed packet_type bit-fields types from uint32_t to uint8_t
to allow unified 2 cache-line structure size.

Added the __extension__ attribute over the modified struct to avoid
the warning:

type of bit-field ... is a GCC extension [-pedantic]

Signed-off-by: Tal Shnaiderman <talshn@mellanox.com>
Tested-by: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Acked-by: Ranjit Menon <ranjit.menon@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-11 16:26:33 +02:00
Alexander Kozyrev d6eb247371 mbuf: fix external buffer pool boundaries
Memzones are created in testpmd in order to test external data
buffers functionality. Each memzone is 2Mb in size and divided among
the pool of external memory buffers.

Memzone may not always be fully utilized because mbufs size can vary
and some space can be left unused at the tail of a memzone. This is
not handled properly and mbuf can get the address of this leftover
space since this address is still valid (part of memzone), but there
is not enough space to fit the whole packet data. As a result packet
data may overflow and cause the memory corruption.

Take mbuf size into account when distributing memory addresses from
a memzone to external mbufs. Skip the remaining tail in case there
is not enough room for a packet and move to a next memzone instead.

Fixes: 6c8e50c2e5 ("mbuf: create pool with external memory buffers")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-11 09:51:46 +02:00
Xiaolong Ye c67a423c53 mbuf: remove unused next member in dynamic flag/field
TAILQ_ENTRY next is not needed in struct mbuf_dynfield_elt and
mbuf_dynflag_elt, since they are actually chained by rte_tailq_entry's
next field when calling TAILQ_INSERT_TAIL(mbuf_dynfield/dynflag_list, te,
next).

Fixes: 4958ca3a44 ("mbuf: support dynamic fields and flags")
Cc: stable@dpdk.org

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-06-11 09:32:43 +02:00
Thomas Monjalon d1342ea419 mbuf: document guideline for new fields and flags
Since dynamic fields and flags were added in 19.11,
the idea was to use them for new features, not only PMD-specific.

The guideline is made more explicit in doxygen, in the mbuf guide,
and in the contribution design guidelines.

For more information about the original design, see the presentation
https://www.dpdk.org/wp-content/uploads/sites/35/2019/10/DynamicMbuf.pdf

This decision was discussed in the Technical Board:
http://mails.dpdk.org/archives/dev/2020-June/169667.html

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
2020-06-11 09:29:15 +02:00
Muhammad Bilal ec5a905ec8 mbuf: prevent setting mempool ops name empty
Bugzilla ID: 353

Signed-off-by: Muhammad Bilal <m.bilal@emumba.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
2020-04-25 22:56:10 +02:00
Thomas Monjalon f2fc83b40f replace unused attributes
There is a common macro __rte_unused, avoiding warnings,
which is now used where appropriate for consistency.

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
2020-04-16 18:30:58 +02:00
Pavan Nikhilesh acec04c4b2 build: disable experimental API check internally
Remove setting ALLOW_EXPERIMENTAL_API individually for each Makefile and
meson.build. Instead, enable ALLOW_EXPERIMENTAL_API flag across app, lib
and drivers.
This changes reduces the clutter across the project while still
maintaining the functionality of ALLOW_EXPERIMENTAL_API i.e. warning
external applications about experimental API usage.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
2020-04-14 16:22:34 +02:00
Alexander Kozyrev 759a6a221e mbuf: optimize memory loads during freeing
Introduction of pinned external buffers doubled memory loads in the
rte_pktmbuf_prefree_seg() function. Analysis of the generated assembly
code shows unnecessary load of the pool field of the rte_mbuf structure.
Here is the snippet of the assembly for "if (!RTE_MBUF_DIRECT(m))":
Before the change the code was:
	movq  0x18(%rbx), %rax // load the ol_flags field
	test %r13, %rax	       // check if ol_flags equals to 0x60...0
	jz 0x9a8718 <Block 2>  // jump out to "if (m->next != NULL)"
After the change the code became:
	movq  0x18(%rbx), %rax // load ol_flags
	test %r14, %rax	       // check if ol_flags equals to 0x60...0
	jnz 0x9bea38 <Block 2> // jump in to "if (!RTE_MBUF_HAS_EXTBUF(m)"
	movq  0x48(%rbx), %rax // load the pool field
	jmp 0x9bea78 <Block 7> // jump out to "if (m->next != NULL)"
Look like this absolutely unneeded memory load of the pool field is an
optimization for the external buffer case in GCC (4.8.5), since Clang
generates the same assembly for both before and after the change versions.
Plus, GCC favors the external buffer case over the simple case.
This assembly code layout causes the performance degradation because the
rte_pktmbuf_prefree_seg() function is a part of a very hot path.
Workaround this compilation issue by moving the check for pinned buffer
apart from the check for external buffer and restore the initial code
flow that favors the direct mbuf case over the external one.

Fixes: 6ef1107ad4 ("mbuf: detach mbuf with pinned external buffer")
Cc: stable@dpdk.org

Signed-off-by: Alexander Kozyrev <akozyrev@mellanox.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-03-31 02:57:42 +02:00
Ciara Power f42c9ac5b6 lib: fix unnecessary double negation
An equality expression already returns either 0 or 1.
There is no need to use double negation for these cases.

Fixes: ea672a8b16 ("mbuf: remove the rte_pktmbuf structure")
Fixes: a0fd91cefc ("mempool: rename functions with confusing names")
Cc: stable@dpdk.org

Signed-off-by: Ciara Power <ciara.power@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-02-14 17:39:46 +01:00
Viacheslav Ovsiienko 545fa736b9 mbuf: fix pinned memory function style
Minor style issue is fixed.

Fixes: 6c8e50c2e5 ("mbuf: create pool with external memory buffers")

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-02-06 16:17:14 +01:00
Stephen Hemminger 5b6eaea8ea mbuf: display more fields in dump
The rte_pktmbuf_dump should display offset, refcount, and vlan
info since these are often useful during debugging.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-02-06 16:17:14 +01:00
Viacheslav Ovsiienko 6c8e50c2e5 mbuf: create pool with external memory buffers
The dedicated routine rte_pktmbuf_pool_create_extbuf() is
provided to create mbuf pool with data buffers located in
the pinned external memory. The application provides the
external memory description and routine initializes each
mbuf with appropriate virtual and physical buffer address.
It is entirely application responsibility to register
external memory with rte_extmem_register() API, map this
memory, etc.

The new introduced flag RTE_PKTMBUF_POOL_F_PINNED_EXT_BUF
is set in private pool structure, specifying the new special
pool type. The allocated mbufs from pool of this kind will
have the EXT_ATTACHED_MBUF flag set and initialiazed shared
info structure, allowing cloning with regular mbufs (without
attached external buffers of any kind).

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-01-20 23:36:38 +01:00
Viacheslav Ovsiienko 6ef1107ad4 mbuf: detach mbuf with pinned external buffer
Update detach routine to check the mbuf pool type.
Introduce the special internal version of detach routine to handle
the special case of pinned external bufferon mbuf freeing.

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-01-20 23:36:15 +01:00
Viacheslav Ovsiienko c621a9b66b mbuf: introduce routine to get private mbuf pool flags
The routine rte_pktmbuf_priv_flags is introduced to fetch
the flags from the mbuf memory pool private structure
in unified fashion.

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2020-01-20 23:30:38 +01:00
Jerin Jacob 3e6181b070 mbuf: use structure marker from EAL
Use new marker typedef available in EAL and remove private marker
typedef.

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Gavin Hu <gavin.hu@arm.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Matan Azrad <matan@mellanox.com>
2020-01-20 21:17:35 +01:00
Jörg Thalheim a18e79cd3c mbuf: improve API doc for attaching external buffer
Enhance API documentation of rte_pktmbuf_attach_extbuf() to
explain that the attached mbuf is initialized with length = 0.

Link: https://bugs.dpdk.org/show_bug.cgi?id=362

Signed-off-by: Jörg Thalheim <joerg@thalheim.io>
Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
2020-01-20 16:37:27 +01:00
Shahaf Shuler 39a19ae03d mbuf: extend mbuf pool private structure
With the API and ABI freeze ahead, it will be good to reserve
some bits on the private structure for future use.

Otherwise we will potentially need to maintain two different
private structure during 2020 period.

There is already one use case for those reserved bits[1]

The reserved field should be set to 0 by the user.

[1] https://patches.dpdk.org/patch/63077/

Signed-off-by: Shahaf Shuler <shahafs@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-11-25 22:44:46 +01:00
Pawel Modrak 85ff364f3b build: align symbols with global ABI version
Merge all versions in linker version script files to DPDK_20.0.

This commit was generated by running the following command:

:~/DPDK$ buildtools/update-abi.sh 20.0

Signed-off-by: Pawel Modrak <pawelx.modrak@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-20 23:05:39 +01:00
Anatoly Burakov fbaf943887 build: remove individual library versions
Since the library versioning for both stable and experimental ABI's is
now managed globally, the LIBABIVER and version variables no longer
serve any useful purpose, and can be removed.

The replacement in Makefiles was done using the following regex:

	^(#.*\n)?LIBABIVER\s*:=\s*\d+\n(\s*\n)?

(LIBABIVER := numbers, optionally preceded by a comment and optionally
succeeded by an empty line)

The replacement for meson files was done using the following regex:

	^(#.*\n)?version\s*=\s*\d+\n(\s*\n)?

(version = numbers, optionally preceded by a comment and optionally
succeeded by an empty line)

[David]: those variables are manually removed for the files:
- drivers/common/qat/Makefile
- lib/librte_eal/meson.build
[David]: the LIBABIVER is restored for the external ethtool example
library.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
2019-11-20 23:05:39 +01:00
Viacheslav Ovsiienko 9bf26e1318 ethdev: move egress metadata to dynamic field
The dynamic mbuf fields were introduced by [1]. The egress metadata is
good candidate to be moved from statically allocated field tx_metadata to
dynamic one. Because mbufs are used in half-duplex fashion only, it is
safe to share this dynamic field with ingress metadata.

The shared dynamic field contains either egress (if application going to
transmit mbuf with tx_burst) or ingress (if mbuf is received with rx_burst)
metadata and can be accessed by RTE_FLOW_DYNF_METADATA() macro or with
rte_flow_dynf_metadata_set() and rte_flow_dynf_metadata_get() helper
routines. PKT_TX_DYNF_METADATA/PKT_RX_DYNF_METADATA flag will be set
along with the data.

The mbuf dynamic field must be registered by calling
rte_flow_dynf_metadata_register() prior accessing the data.

The availability of dynamic mbuf metadata field can be checked with
rte_flow_dynf_metadata_avail() routine.

DEV_TX_OFFLOAD_MATCH_METADATA offload and configuration flag is removed.
The metadata support in PMDs is engaged on dynamic field registration.

Metadata feature is getting complex. We might have some set of actions
and items that might be supported by PMDs in multiple combinations,
the supported values and masks are the subjects to query by perfroming
trials (with rte_flow_validate).

[1] http://patches.dpdk.org/patch/62040/

Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-08 23:15:05 +01:00
Viacheslav Ovsiienko e02ecc1324 ethdev: extend flow metadata
Currently, metadata can be set on egress path via mbuf tx_metadata field
with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_META matches metadata.

This patch extends the metadata feature usability.

1) RTE_FLOW_ACTION_TYPE_SET_META

When supporting multiple tables, Tx metadata can also be set by a rule and
matched by another rule. This new action allows metadata to be set as a
result of flow match.

2) Metadata on ingress

There's also need to support metadata on ingress. Metadata can be set by
SET_META action and matched by META item like Tx. The final value set by
the action will be delivered to application via metadata dynamic field of
mbuf which can be accessed by RTE_FLOW_DYNF_METADATA() macro or with
rte_flow_dynf_metadata_set() and rte_flow_dynf_metadata_get() helper
routines. PKT_RX_DYNF_METADATA flag will be set along with the data.

The mbuf dynamic field must be registered by calling
rte_flow_dynf_metadata_register() prior to use SET_META action.

The availability of dynamic mbuf metadata field can be checked
with rte_flow_dynf_metadata_avail() routine.

If application is going to engage the metadata feature it registers
the metadata  dynamic fields, then PMD checks the metadata field
availability and handles the appropriate fields in datapath.

For loopback/hairpin packet, metadata set on Rx/Tx may or may not be
propagated to the other path depending on hardware capability.

MARK and METADATA look similar and might operate in similar way,
but not interacting.

Initially, there were proposed two metadata related actions:

- RTE_FLOW_ACTION_TYPE_FLAG
- RTE_FLOW_ACTION_TYPE_MARK

These actions set the special flag in the packet metadata, MARK action
stores some specified value in the metadata storage, and, on the packet
receiving PMD puts the flag and value to the mbuf and applications can
see the packet was threated inside flow engine according to the appropriate
RTE flow(s). MARK and FLAG are like some kind of gateway to transfer some
per-packet information from the flow engine to the application via
receiving datapath. Also, there is the item of type RTE_FLOW_ITEM_TYPE_MARK
provided. It allows us to extend the flow match pattern with the capability
to match the metadata values set by MARK/FLAG actions on other flows.

From the datapath point of view, the MARK and FLAG are related to the
receiving side only. It would useful to have the same gateway on the
transmitting side and there was the feature of type RTE_FLOW_ITEM_TYPE_META
was proposed. The application can fill the field in mbuf and this value
will be transferred to some field in the packet metadata inside the flow
engine. It did not matter whether these metadata fields are shared because
of MARK and META items belonged to different domains (receiving and
transmitting) and could be vendor-specific.

So far, so good, DPDK proposes some entities to control metadata inside
the flow engine and gateways to exchange these values on a per-packet basis
via datapaths.

As we can see, the MARK and META means are not symmetric, there is absent
action which would allow us to set META value on the transmitting path.
So, the action of type:

- RTE_FLOW_ACTION_TYPE_SET_META was proposed.

The next, applications raise the new requirements for packet metadata.
The flow ngines are getting more complex, internal switches are introduced,
multiple ports might be supported within the same flow engine namespace.
From the DPDK points of view, it means the packets might be sent on one
eth_dev port and received on the other one, and the packet path inside
the flow engine entirely belongs to the same hardware device. The simplest
example is SR-IOV with PF, VFs and the representors. And there is a
brilliant opportunity to provide some out-of-band channel to transfer
some extra data from one port to another one, besides the packet data
itself. And applications would like to use this opportunity.

It is supposed for application to use trials (with rte_flow_validate)
to detect which metadata features (FLAG, MARK, META) actually supported
by PMD and underlying hardware. It might depend on PMD configuration,
system software, hardware settings, etc., and should be detected
in run time.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Ori Kam <orika@mellanox.com>
2019-11-08 23:15:04 +01:00
Olivier Matz 4958ca3a44 mbuf: support dynamic fields and flags
Many features require to store data inside the mbuf. As the room in mbuf
structure is limited, it is not possible to have a field for each
feature. Also, changing fields in the mbuf structure can break the API
or ABI.

This commit addresses these issues, by enabling the dynamic registration
of fields or flags:

- a dynamic field is a named area in the rte_mbuf structure, with a
  given size (>= 1 byte) and alignment constraint.
- a dynamic flag is a named bit in the rte_mbuf structure.

The typical use case is a PMD that registers space for an offload
feature, when the application requests to enable this feature.  As
the space in mbuf is limited, the space should only be reserved if it
is going to be used (i.e when the application explicitly asks for it).

The registration can be done at any moment, but it is not possible
to unregister fields or flags.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Acked-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-10-26 19:08:50 +02:00
Konstantin Ananyev 3eb860b08e mbuf: move definitions into a separate file
Right now inclusion of rte_mbuf.h header can cause inclusion of
some arch/os specific headers.
That prevents it to be included directly by some
non-DPDK (but related) entities: KNI, BPF programs, etc.
To overcome that problem usually a separate definitions of rte_mbuf
structure is created within these entities.
That aproach has a lot of drawbacks: code duplication, error prone, etc.
This patch moves rte_mbuf structure definition (and some related macros)
into a separate file that can be included by both rte_mbuf.h and
other non-DPDK entities.

Note that it doesn't introduce any change for current DPDK code.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Michel Machado <michel@digirati.com.br>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-25 19:30:38 +02:00
Ting Xu d892768c6d mbuf: add GTP tunnel type
Add GTP tunnel type flag in mbuf for future use in GTP
Tx checksum offload.

Signed-off-by: Ting Xu <ting.xu@intel.com>
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
2019-10-23 16:43:10 +02:00
Morten Brørup 0f824df6f8 mbuf: add bulk free function
Add function for freeing a bulk of mbufs.

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-24 02:45:40 +02:00
Stephen Hemminger c3a90c381d mbuf: add a copy routine
This is a commonly used operation that surprisingly the
DPDK has not supported. The new rte_pktmbuf_copy does a
deep copy of packet. This is a complete copy including
meta-data.

It handles the case where the source mbuf comes from a pool
with larger data area than the destination pool. The routine
also has options for skipping data, or truncating at a fixed
length.

This patch also introduces internal inline to copy the
metadata fields of mbuf.

Add a test for this new function, based of the clone tests.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-16 12:43:53 +02:00
Stephen Hemminger 1d2db47c9f mbuf: deinline clone function
Cloning mbufs requires allocations and iteration
and therefore should not be an inline.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-16 12:42:04 +02:00
Stephen Hemminger 6b1dd3be54 mbuf: deinline linearize function
This copy part of this function is too big to be put inline.
The places it is used are only in special exception paths
where a highly fragmented mbuf arrives at a device that can't handle it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-10-16 12:42:04 +02:00
Ivan Malov b51690ec69 mbuf: clarify outer offsets for non-tunnel packets
The default policy for offload-specific fields is that
they are undefined unless the corresponding offloads are
requested in mbuf ol_flags. This is also the case for outer
L2 and L3 length fields which must not be assumed to contain
zeros for non-tunnel packets. The patch clarifies this behaviour
in the comments and also adds appropriate checks to the PMDs which
do not check any tunnel-related offloads before using the said fields.

Signed-off-by: Ivan Malov <ivan.malov@oktetlabs.ru>
Reviewed-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-07-01 17:15:01 +02:00
David Marchand 18218713bf enforce experimental tag at beginning of declarations
Putting a '__attribute__((deprecated))' in the middle of a function
prototype does not result in the expected result with gcc (while clang
is fine with this syntax).

$ cat deprecated.c
void * __attribute__((deprecated)) incorrect() { return 0; }
__attribute__((deprecated)) void *correct(void) { return 0; }
int main(int argc, char *argv[]) { incorrect(); correct(); return 0; }
$ gcc -o deprecated.o -c deprecated.c
deprecated.c: In function ‘main’:
deprecated.c:3:1: warning: ‘correct’ is deprecated (declared at
deprecated.c:2) [-Wdeprecated-declarations]
 int main(int argc, char *argv[]) { incorrect(); correct(); return 0; }
 ^

Move the tag on a separate line and make it the first thing of function
prototypes.
This is not perfect but we will trust reviewers to catch the other not
so easy to detect patterns.

sed -i \
     -e '/^\([^#].*\)\?__rte_experimental */{' \
     -e 's//\1/; s/ *$//; i\' \
     -e __rte_experimental \
     -e '/^$/d}' \
     $(git grep -l __rte_experimental -- '*.h')

Special mention for rte_mbuf_data_addr_default():

There is either a bug or a (not yet understood) issue with gcc.
gcc won't drop this inline when unused and rte_mbuf_data_addr_default()
calls rte_mbuf_buf_addr() which itself is experimental.
This results in a build warning when not accepting experimental apis
from sources just including rte_mbuf.h.

For this specific case, we hide the call to rte_mbuf_buf_addr() under
the ALLOW_EXPERIMENTAL_API flag.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
2019-06-29 19:04:48 +02:00
David Marchand cfe3aeb170 remove experimental tags from all symbol definitions
We had some inconsistencies between functions prototypes and actual
definitions.
Let's avoid this by only adding the experimental tag to the prototypes.
Tests with gcc and clang show it is enough.

git grep -l __rte_experimental |grep \.c$ |while read file; do
	sed -i -e '/^__rte_experimental$/d' $file;
	sed -i -e 's/  *__rte_experimental//' $file;
	sed -i -e 's/__rte_experimental  *//' $file;
done

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2019-06-29 19:04:43 +02:00
Tom Barbette 5e74137765 ethdev: add API to read device clock
Add rte_eth_read_clock to read the raw clock of a device.

The main use is to get the device clock conversion co-efficients to be
able to translate the raw clock of the timestamp field of the pkt mbuf
to a local synced time value.

This function was missing to allow users to convert the Rx timestamp
field to real time without the complexity of the rte_timesync* facility.
One can derivate the clock frequency by calling twice read_clock and
then keep a common time base.

Signed-off-by: Tom Barbette <barbette@kth.se>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
2019-06-06 20:21:20 +09:00
John McNamara 8bd5f07c7a doc: fix spelling reported by aspell in comments
Fix spelling errors in the doxygen docs.

Signed-off-by: John McNamara <john.mcnamara@intel.com>
2019-05-03 00:38:14 +02:00
Thomas Monjalon 914631872c mbuf: fix big endian build
Compilation was failing when using a big endian toolchain:

rte_mbuf.h:504:2: error: expected ',' or '}' before 'RTE_MBUF_L3_LEN_OFS'

Fixes: 8d9c2c3a1f ("mbuf: add function to generate raw Tx offload value")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
2019-04-22 15:27:23 +02:00
Ferruh Yigit 00cc9701e4 mbuf: update Tx VLAN and QinQ flags documentation
Currently PKT_TX_VLAN and PKT_TX_QINQ mbuf flags are documented as
they are to say packet contains VLAN or QINQ information.

Updating the definition as they are requests from application to
driver to insert VLAN or double VLAN tags into packet.

Fixes: dc6c911c99 ("mbuf: use reserved space for double vlan")
Fixes: af75078fec ("first public release")
Cc: stable@dpdk.org

Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-04-19 23:21:13 +02:00
Konstantin Ananyev 8d9c2c3a1f mbuf: add function to generate raw Tx offload value
Operations to set/update bit-fields often cause compilers
to generate suboptimal code.
To help avoid such situation for tx_offload fields:
introduce new enum for tx_offload bit-fields lengths and offsets,
and new function to generate raw tx_offload value.
Add new test-case into UT for introduced function.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Akhil Goyal <akhil.goyal@nxp.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-04-02 16:50:24 +02:00
Andrew Rybchenko dfc6b2fd8d mbuf: remove Intel offload checks from generic API
rte_validate_tx_offload() is used in Tx prepare callbacks
(RTE_LIBRTE_ETHDEV_DEBUG only) to check Tx offloads consistency.
Requirement that packet headers should not be fragmented is not
documented and unclear where it comes from except
rte_net_intel_cksum_prepare() functions which relies on it.

It could be NIC vendor specific driver or hardware limitation, but,
if so, it should be documented and checked in corresponding Tx
prepare callbacks.

Signed-off-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-04-02 16:42:52 +02:00
Rami Rosen b13baac8d5 mbuf: fix a typo
This trivial patch fixes a typo in rte_mbuf.h.

Fixes: f20b50b946 ("mbuf: optimize refcnt update")
Cc: stable@dpdk.org

Signed-off-by: Rami Rosen <ramirose@gmail.com>
2019-02-12 14:32:01 +01:00
Ed Czeck 14ff7fb97e mbuf: fix struct initialization with C++
g++ reports "error: missing initializer for member"

Fixes: 5d3f721009 ("mbuf: implement generic format for sched field")

Signed-off-by: Ed Czeck <ed.czeck@atomicrules.com>
2019-01-28 00:57:57 +01:00
Yongseok Koh c277b34c1b mbuf: add function returning buffer address
This patch introduces two new functions - rte_mbuf_buf_addr() and
rte_mbuf_data_addr_default().

rte_mbuf_buf_addr() reutrns the buffer address of given mbuf which comes
after mbuf structure and private data.

rte_mbuf_data_addr_default() returns the default address of mbuf data
taking the headroom into account.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
Reviewed-by: David Marchand <david.marchand@redhat.com>
2019-01-15 02:40:40 +01:00
David Marchand d4dca8fe43 mbuf: add a non fatal sanity check helper
Let's add a little helper that does the same as rte_mbuf_sanity_check but
without the panic.

Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-01-15 02:40:40 +01:00
David Marchand dcd2a33d3c mbuf: add sanity checks on segment metadata
Add some basic checks on the segments offset and length metadata:
always funny to have a < 0 tailroom cast to uint16_t ;-).

Signed-off-by: David Marchand <david.marchand@6wind.com>
Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-01-15 02:40:40 +01:00
Yongseok Koh bf78d4dc2b mbuf: remove experimental tag for external attachment
Remove the experimental tag of rte_pktmbuf_attach_extbuf() which was
introduced in 18.05.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
Acked-by: Olivier Matz <olivier.matz@6wind.com>
2019-01-14 16:37:59 +01:00