Blog
4/22/24 MPEG 146 - Video Coding for Machines enters final stages of development
Coding for Machines standards enter final stages of development
Video Coding for Machines (VCM) New Work Item Proposal (NWIP) was officially approved with the votes of the national bodies as ISO/IEC 23888-2. It is expected that VCM will reach Committee Draft (CD) stage after next meeting, in July. Total of 48 input documents were received and reviewed during the meeting.
For the Feature Coding for Machines (FCM), a total of 61 input documents were received and reviewed.
OP Solutions, together with partners from Florida Atlantic University, presented two contributions to the FCM:
- “[FCM] CE2-Related: Reconstruction Refinement”, and
- “[FCM] CE2-Related: Feature Map Targeted Region of Interest Encoding”.
After presenting promising preliminary results we joined the core experiment, of which the results will be evaluated during the next meeting, in July.
FCM reached preliminary working draft (PWD) stage.
This blog will be updated once MPEG releases a public report on the meeting.
The next meeting MPEG 147 will take place in Sapporo from 2024-07-15 until 2024-07-19.
MPEG released the roadmap and exploration plans for the period after the 146th meeting.
1/22/24 MPEG 145 - Enhanced image coding, updates to genetic coding, and more
MPEG’s 145th meeting took place online from 2024-01-22 until 2024-01-26.
MPEG’s imaging standard evolves with cutting-edge features for enhanced image decoding and annotation
MPEG Systems (WG 3) ratified the third edition of its High Efficiency Image Format (HEIF; ISO/IEC 23008-12: Image file format). HEIF has solidified its position as one of the most rapidly and widely adopted standards in the imaging industry. The newest edition represents a significant leap forward, introducing progressive decoding capabilities that elevate image quality through a sequential, single-decoder instance process. This enhancement empowers users to decode a bitstream in successive steps, with each phase delivering perceptible improvements in image quality compared to the preceding step.
Additionally, this edition introduces a sophisticated data structure that describes the spatial configuration of the camera and outlines the distinctive characteristics of the camera responsible for generating the image content. Furthermore, the updated HEIF specification encompasses innovative tools for annotating specific areas in diverse shapes, enhancing the versatility of image content manipulation.
MPEG finalizes the third edition of MPEG-D dynamic range control
MPEG Audio Coding (WG6) completed the work on the third edition of ISO/IEC 23003-4, Dynamic range control, promoting it to the Final Draft International Standard (FDIS) stage.
The third edition includes the specification of dynamic range control (DRC) side chain information and metadata-based real-time loudness leveling for live workflows. The technologies enable producers of live content, such as sports broadcasts and concerts, to seamlessly integrate MPEG-D DRC-based loudness leveling into their existing workflows. The metadata-based approach offers highest possible quality of loudness processing and dynamic range control while maintaining full flexibility and control in playback devices.
MPEG finalizes the second edition of MPEG-4 audio conformance
MPEG Audio Coding (WG6) celebrated the completion of the second edition of ISO/IEC 14496-26, audio conformance, elevating it to the Final Draft International Standard (FDIS) stage. This significant update incorporates seven corrigenda and five amendments into the initial edition, originally published in 2010. ISO/IEC 14496-26 serves as a pivotal standard, providing a framework for designing tests to ensure the compliance of compressed data and decoders with the requirements outlined in ISO/IEC 14496-3 (MPEG-4 Audio).
MPEG genomic coding extended to support transport and file format for genomic annotations
MPEG Genomic Coding (WG 8) working group extended the support of transport and file format to the coding of any common type of annotations obtained by the analysis results of DNA sequencing data. The ISO/IEC 23092-1 (3rd edition) – Transport and file format, supporting a joint coding of sequencing and annotation data, has been promoted to Final Draft International Standard (FDIS). The current MPEG-G standard series (ISO/IEC 23092) can now support full application pipelines, covering data representation and compression from the output of the sequencing up to the results of tertiary analysis support in a single structured transport and file format. The extended structured and compressed representation provides the basis for standard APIs implementing advanced standard browsing and searching features. They include standard APIs for exact and approximate string-matching capabilities directly in the compressed domain for sequencing data metadata and annotations. These new standard functionalities are fundamental for searching large databases of compressed sequencing and annotation data resulting from the massive amounts of sequencing data that are generated by next generation sequencing technologies.
Continued work on coding for machines
Video Coding for Machines (VCM) reached a new milestone – the new work item proposal (NWIP) was sent for a ballot in the ISO, with the expected period for voting and comments of 19 January to 13 April. Total of 33 input contributions were registered for the meeting and reviewed.
Feature Coding for Machines (FCM) received a total of 58 input contributions that were reviewed during the ad-hoc group and break-out group meetings. It was agreed to initiate work on the preliminary working draft during the next meeting, in April.
OP Solutions continues active participation in the VCM and FCM standardizations in partnership with the Florida Atlantic University.
MPEG white paper
At the 145th MPEG meeting, MPEG Liaison and Communication (AG 3) approved this MPEG white paper, available at the link.
Neural Network Coding (NNC) – Efficient Storage and Inference of Neural Networks for Multimedia Applications
Artificial neural networks have been adopted for a broad range of tasks in almost every technical field, such as medical applications, transportation, network optimization, big data analysis, surveillance, speech, audio, image and video classification, image and video compression, and many more. An additional factor for the exponential growth is the appearance of new use cases, such as federated learning with continuous communication between many devices. To effectively reduce bandwidth usage in communication and reduce the size of networks for inference, achieving an optimal compression ratio must be prioritized. Thus, a standard for neural network coding (NNC) has been defined in ISO/IEC 15938-17 (Compression of Neural Networks for Multimedia Description and Analysis), with the second edition adding new compression tools and support for coding incremental updates of neural networks.
Incremental coding, one of the main extensions in the second edition, targets neural network updates as a difference signal between a base neural network (i.e., an instance of a trained neural network for the particular use case) and an updated neural network.
10/16/23 MPEG 144 - FCM standardization underway, new metrics for codec quality, green metadata
Call for Proposals on Feature Compression for Video Coding for Machines deemed a success, resulting in new standardization project
At the 144th MPEG meeting, which took place in Hannover, Germany from October 16-20, 2023, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Proposals (CfP) on Feature Compression for Video Coding for Machines (FCVCM). Feature Compression for Video Coding for Machines investigates technology directed towards compression of intermediate ‘features’ encountered within neural networks, enabling use cases such as distributed execution of neural networks. This stands in contrast to Video Coding for Machines, which compresses conventional video data but with optimizations targeting machine consumption of the decoded video, rather than human consumption.
OP Solutions partnered with Florida Atlantic University (FAU) and InterDigital in producing two proposals.
Based on the 12 responses received to this CfP, the overall pipeline of FCVCM can be divided into two stages: (1) feature reduction and (2) feature coding. Technologies related to feature reduction include – but are not limited to – neural network-based feature fusion, temporal and spatial resampling, and adaptive feature truncation. Technologies related to feature coding include learning-based codecs, block-based exiting video codecs, and hybrid codecs.
All responses were evaluated on three tasks across four datasets. The results provide an overall gain, measured in average Bjøntegaard-Delta (BD) rate, of up to 94% against the feature anchors and 69% against the visual anchors.
MPEG Video Coding (WG 4) took over the management of the project. The new standardization project will be started and is planned to be completed and reach the status of Final Draft International Standard (FDIS) by July 2025. The tentative name of the project is “Feature Coding for Machines” (FCM).
OP Solutions together with FAU continues participation in the FCM core experiments.
MPEG issues Call for Learning-Based Video Codecs for Study of Quality Assessment
MPEG Visual Quality Assessment (AG 5) issued a call for learning-based video codecs for study of quality assessment. AG 5 has been conducting subjective quality evaluations for coded video content and studying their correlation with objective quality metrics. Most of these studies focused on the High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC) standards. MPEG maintains the Compressed Video for study of Quality Metrics (CVQM) dataset for the purpose of this study.
Given the recent advancements in the development of learning-based video compression algorithms, MPEG studies compression using learning-based codecs. MPEG anticipates that different types of distortion would be present in a reconstructed video that has been compressed using learning-based codecs compared to those induced by traditional block-based motion-compensated video coding designs. To facilitate a deeper understanding of these distortions and their impact on visual quality, MPEG issued a public call for learning-based video codecs for study of quality assessment. MPEG welcomes inputs in response to the call. Upon evaluating the responses, MPEG will invite those responses that meet the call’s requirements to submit compressed bitstreams for further study of their subjective quality and potential inclusion into the CVQM dataset.
MPEG enhances the Support of Energy-Efficient Media Consumption
MPEG Systems (WG 3) promoted the ISO/IEC 23001-11 Amendment 1 (energy-efficient media consumption (green metadata) for Essential Video Coding (EVC)) to Final Draft Amendment (FDAM), the final milestone of the standard development. This latest amendment defines metadata that enables a reduction in decoder power consumption for ISO/IEC 23094-1 (Essential Video Coding (EVC)). At the same time, ISO/IEC 23001-11 Amendment 2 (energy-efficient media consumption for new display power reduction metadata) has been promoted to Committee Draft Amendment (CDAM), the first stage of standard development. This amendment introduces a novel way to carry metadata about display power reduction encoded as a video elementary stream interleaved with the video it describes. The amendment is expected to be completed and reach the status of Final Draft Amendment (FDAM) by the beginning of 2025. These developments represent a significant step towards more energy-efficient media consumption and a more sustainable future.
Other developments
MPEG Systems (WG 3) progressed the development of various ISO Base Media File Format (ISOBMFF) related standards. As a part of the family of ISOBMFF-related standards, ISO/IEC 14496-15 defines the carriage of Network Abstract Layer (NAL) unit structured video data such as Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), Essential Video Coding (EVC), and Low Complexity Enhancement Video Coding (LCEVC). ISO/IEC 14496-15 has been further improved by adding support for enhanced features such as Picture-in-Picture (PiP) use cases particularly enabled by VVC, which resulted in the approval of the Final Draft Amendment (FDAM).
MPEG Systems (WG 3) promoted ISO/IEC 23090-18 Amendment 1 (support of temporal scalability) to Final Draft Amendment (FDAM), the final stage of standard development. The amendment enables the compression of a single elementary stream of point cloud data using ISO/IEC 23090-9 and storing it in more than one track of ISO Base Media File Format (ISOBMFF)-based files, thereby enabling support for applications that require multiple frame rates within a single file.
MPEG Coding of 3D Graphics and Haptics (WG 7) promoted ISO/IEC 23090‑28 (efficient 3D graphics media representation for render-based systems and applications) to Committee Draft (CD), the first stage of standard development. This standard aims to streamline the interchange of 3D graphics formats.
MPEG Genomic Coding (WG 8) announced the completion of ISO/IEC 23092‑6 (coding of genomic annotations). This standard addresses the need to provide compressed representations of genomic annotations linked to the compressed representation of raw sequencing data and metadata.
7/17/23 MPEG 143 - VVC, VCM, FCVCM and other developments
The MPEG 143 took place in Geneva from July 17-21, 2023. Most of the meetings were held in the ITU-T headquarters and the nearby conference center.
Video coding updates and milestones
A. ISOBMFF
MPEG Systems (WG 3) finalized ISO/IEC 23001-17 – Carriage of uncompressed video and images in ISO Base Media File Format (ISOBMFF) – by promoting it to the Final Draft International Standard (FDIS) stage. The ISOBMFF supports the carriage of a wide range of media data such as video, audio, point clouds, haptics, etc., which is now further expanded to uncompressed video and images.
WG 3 also enhanced the capabilities of the ISO Base Media File Format (ISOBMFF) family of standards by promoting two standards to their first milestone, Committee Draft Amendment (CDAM):
- ISO/IEC 14496-12 (8th edition) CDAM 1 – Support for T.35, original sample duration, and other improvements – will enable the carriage of the user data registered as specified in ITU-T Rec. T.35 as part of the media sample data. It also supports a more efficient way of describing subsamples by referencing the same features defined by other subsamples.
- ISO/IEC 14496-15 (6th edition) CDAM 3 – Support for neural-network post-filter supplemental enhancement information and other improvements – will enable the carriage of the newly defined Supplemental Enhancement Information (SEI) messages for neural-network post-filters in ISOBMFF. The carriage of the neural-network post-filter characteristics (NNPFC) SEI message and the neural-network post-filter activation (NNPFA) SEI message enable the delivery of a base post-processing filter and a series of neural network updates synchronized with the input video pictures.
Both standards are planned to be completed, i.e., to reach the status of Final Draft Amendment (FDAM), by the end of 2024.
B. VVC
MPEG Joint Video Experts Team with ITU-T SG 16 (WG 5) issued the Final Draft International Standard (FDIS) texts of the third editions of the Versatile Video Coding (VVC, ISO/IEC 23090-3) and the Versatile Supplemental Enhancement Information (VSEI, ISO/IEC 23002-7) standards. The corresponding twin texts were also submitted to ITU-T SG 16 for consent as ITU-T H.266 and ITU-T H.274, respectively. New elements contained in VVC are the support of an unlimited level for the video profiles, as well as some technical corrections and editorial improvements on top of the second edition text of VVC. Furthermore, the VVC-specific support is specified for some supplemental enhancement information (SEI) messages that may be included in VVC bitstreams but are defined in external standards. These SEI messages include two systems-related SEI messages, (a) one for signaling of green metadata as specified in ISO/IEC 23001-11 and (b) the other for signaling of an alternative video decoding interface for immersive media as specified in ISO/IEC 23090-13. Furthermore, four other SEI messages are contained in the third edition of VSEI, namely (i) the shutter interval information SEI message, (ii) the neural network post-filter characteristics SEI message, (iii) the neural-network post-processing filter activation SEI message, and (iv) the phase indication SEI message.
While the shutter interval indication is already known from Advanced Video Coding (AVC) and High Efficiency Video Coding (HEVC), the new one on subsampling phase indication is relevant for variable-resolution streaming. The two SEI messages for describing and activating post-filters using neural network technology in video bitstreams could, for example, be used for reducing coding noise, spatial and temporal upsampling, colour improvement, or general denoising of the decoder output. The description of the neural network architecture itself is based on MPEG’s neural network representation standard (ISO/IEC 15938‑17). As results from an exploration experiment have shown, neural network-based post-filters can deliver better results than conventional filtering methods. Processes for invoking these new post-filters have already been tested in a software framework and will be made available in an upcoming version of the VVC reference software (ISO/IEC 23090-16).
C. Current and legacy codecs
MPEG Joint Video Experts Team with ITU-T SG 16 (WG 5) also issued the Committee Draft (CD) text of the eleventh edition of the Advanced Video Coding standard (AVC, ISO/IEC 14496-10) and the Committee Draft Amendment (CDAM) text for extension of the High Efficiency Video Coding standard (HEVC, ISO/IEC 23008-2). Both add specific supports for three new supplemental enhancement information (SEI) messages from the third edition of Versatile Supplemental Enhancement Information (VSEI), namely (i) the subsampling phase indication SEI message, (ii) the neural network postfilter characteristics SEI message, (iii) and the neural-network post-processing filter activation SEI message, so these can be included in AVC and HEVC bitstreams. Furthermore, code point identifiers for YCgCo-R colour representation with equal luma and chroma bit depths, and for a colour representation referred to as IPT-PQ-C2 (from the upcoming SMPTE ST 2128 specification) are added. The new edition of AVC also contains some technical corrections and editorial improvements on top of the 10th edition text, and the HEVC amendment specifies additional profiles supporting multiview applications, namely a 10-bit multiview profile, as well as 8-bit, 10-bit, and 12-bit monochrome multiview profiles, which could be beneficial for coding depth maps as auxiliary pictures.
D. Video Coding for Machines (VCM) and Feature Coding for VCM (FCVCM)
A total of 56 input documents related to VCM were presented in the ad-hoc group and break-out-group meetings, including two by the Florida Atlantic University: “Containerized VCM Reference Software”, and “Study of DCT-Based Filtering for Improving Machine Task Performance”. Timeline for VCM standardization was agreed upon: July 2023 - Working Draft, July 2024 - Committee Draft, October 2024 - Draft International Standard, April 2025 – Final Draft International Standard. The Working Draft document was issued after the meeting.
Regarding FCVCM, the intermediate report was issued for the Calls for Proposals (CfP): The timeline was strictly followed. All test materials were crosschecked before May 2nd. A total of 19 proposal registrations were received before the July 3rd deadline. So far, the CfP progress is on schedule, and only some minor issues were spotted. The next major milestone is bitstream files and results upload before September 13th. All proposals will be evaluated at the October meeting, and the Working Draft is scheduled for release in November 2023.
Future video coding explorations
Two main venues of exploration for future video coding keep producing improved results (although still with prohibitive complexity).
The “Exploration experiment on enhanced compression beyond VVC capability” reported combined improvements of the ECM-9.0 over VTM-11.0ecm8.0 anchor as follows:
- For All-Intra configuration, 11.59% overall bitrate improvement at the cost of 788.5% for the encoder and 430.8% for the decoder.
- For Random Access configuration, 21.03% overall bitrate improvement at the cost of 685.5% for the encoder and 777.1% for the decoder.
- For Low Delay configuration, 17.10% overall bitrate improvement at the cost of 596.3% for the encoder and 596.4% for the decoder.
The Neural network-based video coding reported combined improvements over the previous reference model as follows:
- For All-Intra configuration, 7.81% overall bitrate improvement at the cost of 201% for the encoder and 4507% for the decoder.
- For Random Access configuration, 6.62% overall bitrate improvement at the cost of 132% for the encoder and 7557% for the decoder.
- For Low Delay configuration, 4.98% overall bitrate improvement at the cost of 138% for the encoder and 7257% for the decoder.
Call for interest - Audio coding for machines
All audio coding schemes and formats standardized by MPEG have so far targeted human consumption of audio content. High compression ratios have been achieved by incorporating models of the human auditory system. Such data formats might not be adequate for computerized analysis of sound and sound scenes. Audio Coding for Machines is targeting several new applications where computers, rather than humans, are listening and analyzing audio content.
In principle, two different operations modes are foreseen:
- The new data format is used to store and exchange data to be used in the development of audio analysis algorithms. In general, this will involve training artificial intelligence (AI) models using huge data sets.
- The new data format is used for the communication between acoustic sensors and some analysis and control algorithms. In general, such algorithms will be AI-based and trained on huge data sets.
For both operation modes, the data set will contain the audio essence together with rich metadata describing the content.
MPEG’s WG2 is assessing the pros and cons of working on Audio Coding for Machines (ACoM). WG2 invites contributions from within and outside MPEG on this topic. Contributions might include expressions of interest in using such a format in applications, or in working on the creation and standardization of the technology.
Interested parties might be, but are not limited to, from academic institutions, research labs, service providers, device manufacturers, equipment vendors, network operators, and technology providers.
MPEG Roadmap after 143rd meeting
MPEG issued updated roadmap that keeps being shaped by significant developments and needs:
• The relentless increase of IP-distributed and mobile media
• Higher quality media
• More immersion (UHD, VR, AR, Light Fields, Holography)
• The Internet of Media Things & Wearables
• Cloud-based media processing, storage and delivery
• New high-speed networks including fiber, 5G mobile, and cable 10G
• New emerging technologies (machine vision, AI)
5/1/23 MPEG 142 - Call for proposals for FCVCM and other developments
The 142nd MPEG meeting took place in Antalya, Turkey from April 24-28. This was the first time MPEG held its quarterly meeting in Turkey. The organization was good, and the impressions of experts and visitors were positive.
Feature Coding for Video Coding for Machines
MPEG issued the call for proposals (CfP) for Feature Coding for Video Coding for Machines (FCVCM). As we already mentioned in our blog, Video Coding for Machines (VCM) has been under development for a couple of meeting cycles. The main purpose of VCM is to standardize encoding of the images and videos that are processed (i.e. consumed) by the machines. Input to VCM is an image or video that comes straight from the camera or a storage device. In the case of FCVCM, the input to the standard will be feature stream that comes from the output of the layers of the neural network. Instead of encoding visual information such as videos and images, the FCVCM will encode features from the “middle” of neural network, while still producing bitstream that is in the end going to be consumed by the machines.
From the CfP’s introduction:
In 2019 MPEG started an investigation into the area of video coding for machines. The focus of this exploration was to study the case where images and videos are compressed not to be looked at and evaluated by humans, but rather by machine vision algorithms. These algorithms can serve different purposes such as object detection, instance segmentation, or object tracking. As video compression standards such as HEVC or VVC are developed and optimized towards the human visual system, the existing standards may not be optimal for applications where the video is analyzed by machines. One aspect is the compression of intermediate features seen in a neural network.
Regarding feature compression, a formal call for evidence was issued in July 2022 and provided evidence that this can be achieved in different ways. This call for proposals is the start of a process which has the creation of a new international standard as its goal.
This work on “Feature Compression for Video Coding for Machines” (FCVCM) aims at compressing features for machine tasks. As networks increase in complexity architectures such as ‘Collaborative Intelligence’ (whereby a network is distributed across an edge device and the cloud) become advantageous. With the rise of newer network architectures being deployed amongst a heterogenous population of edge devices, such architectures bring flexibility to systems implementers. As a consequence of such architectures, there is a need to efficiently compress intermediate feature information for transport over wide area networks (WANs). As feature information differs substantially from conventional image or video data, coding technologies and solutions could be different from conventional ones in order to achieve optimized performance for machine usage. With the rise of machine learning technologies and machine vision applications, the amount of video and images consumed by machines has been rapidly growing. Typical use cases include intelligent transportation, smart city, intelligent content management, etc., which incorporate machine vision tasks such as object detection, instance segmentation, and object tracking. Due to the large volume of video data, it is essential to extract and compress the features from video for efficient transmission and storage. Feature compression technology solicited in this CfP can also be helpful in some other regards, such as computational offloading and privacy protection. This call focuses on the compression of features and thus responses are expected to produce decoded features that will be used to complete execution of a pre-defined set of machine vision algorithm to generate the performance results.
This CfP welcomes submissions of proposals from companies and other organizations. Registration is required by the 3rd of July 2023; the submission of bitstream files, results, and decoder packages is required by the 13th of September 2023; and the submission of proponent documentation is due by the 9th of October 2023. Evaluation of the submissions in response to the CfP will be performed at the 144th MPEG meeting in October 2023.
OP Solutions plans to respond to the call with a joint proposal with Florida Atlantic University.
Video Coding for Machines
Video Coding for Machines (VCM) work continued with a main focus on refining core experiments, defining appropriate anchor (reference) and producing first output documents. Upon revision of the proposals in all Core Experiments (CEs), it was decided to merge some of the core experiments, resulting in 3 CEs as compared to 5 CEs before the meeting. The current CEs are :
CE 1 - Region-of-interest based coding methods,
CE 2 - Neural network based inner coding,
CE 3 - Spatial Resampling.
Work on the preliminary draft of the Technology-under-Consideration (TuC) document was initiated. This document will describe core technologies that are tested and are candidates to be adopted in the final standard.
Ad-hoc group (AhG) mandates were specified to guide the work before the next meeting and beyond – including:
- Completion of output documents, including TuC.
- Release VCM Reference Software v0.5.
- Continue developing VCM technologies.
- Continue collecting test and training materials.
- Continue refining cross-check procedure.
Working Group for Video, which oversees the development of the VCM standard targets January 2024. as a date for release of the first Working Draft of the standard.
OP Solutions will continue participation in VCM jointly with Florida Atlantic University.
Some of the other developments
In the context of the WG2/Market Needs activity, in which the group is tasked to identify the MPEG standards and technologies applicable to Metaverse, potential use cases were collected, and matching MPEG technologies were identified. Use cases are collected following the template that includes the self-assessment of 7 characteristics linked to the idea of what the Metaverse is specific in: Real time aspects, 3D experiences, interactivity of user senses, user navigation, social aspects, persistence of events and representation of users and objects. Documented use cases include: Virtual dressing room, Online game enjoyed simultaneously on different immersive displays, Digital asset bank for online communities, Virtual museums, AR two-party call, Immersive Live Performances, B2B Digital twin systems in critical environments. Work on identifying additional use cases and development of the MPEG architectures to support current use cases will continue.
MPEG immersive video (MIV) conformance and reference software standard (ISO/IEC 23090-23) has been promoted to the Final draft international standard (FDIS) stage, the last formal milestone of its approval process. The document specifies how to conduct conformance tests and provides reference encoder and decoder software for ISO/IEC 23090-12 MPEG immersive video. This draft includes 23 verified and validated conformance bitstreams and encoding and decoding reference software based on version 15.1.1 of the Test model for MPEG immersive video (TMIV). The test model, objective metrics, and some other tools are publicly available at https://gitlab.com/mpeg-i-visual.
At this meeting, MPEG Systems (WG 3) reached the first milestone for ISO/IEC 23090-32 entitled “Carriage of haptics data” by promoting the text to Committee Draft (CD) status. This specification enables the storage and delivery of haptics data (defined by ISO/IEC 23090-31) in the ISO Base Media File Format (ISOBMFF; ISO/IEC 14496-12). Considering the nature of haptics data composed of spatial and temporal components, a data unit with various spatial or temporal data packets is used as a basic entity like an access unit of audio-visual media. Additionally, an explicit indication of a silent period considering the sparse nature of haptics data, has been introduced in this draft. The standard is planned to be completed, i.e., to reach the status of Final Draft International Standard (FDIS), by the end of 2024.
MPEG released a white paper titled “White paper on Geometry based Point Cloud Compression (G-PCC)”. This white paper is interesting from the aspect of describing technology that is potentially very useful in the automotive industry and other industries that use 3D modalities, sucj as LiDAR. Geometry-based Point Cloud Compression (G-PCC) provides a standard for coded representation of point cloud media. Point clouds may be created in various manners. Recently, 3D sensors such as Light Detection And Ranging (LiDAR) or Time of Flight (ToF) devices have been widely used to scan dynamic 3D scenes. To precisely describe 3D objects or real-world scenes, point clouds come with a large set of points in the 3D space with geometry information and attribute information. The geometry information represents the 3D coordinates of each point in the point cloud; the attribute information describes the characteristics (e.g., colour and reflectance) of each point. Point clouds require a large amount of data, bringing huge challenges to data storage and transmission. White paper is available at https://www.mpeg.org/whitepapers/ .
Finally, MPEG issued the roadmap for all the standards that reflects updates from the 142nd meeting, presented below.
The 143rd MPEG meeting will take place July 17-21 at CICG, in Geneva, Switzerland.
2/17/23 MPEG 141 - MPEG-AI is born, VCM further evolves
The 141st MPEG meeting, held in person and online in the third week of January, covered many topics, including the establishment of an over-arching project that will cover all machine learning-related initiatives within MPEG and Video Coding for Machines (VCM).
1. MPEG-AI
At the meeting, experts initiated MPEG-AI, an umbrella initiative for all the AI-related activities within MPEG (VCM, FCVCM, NNC, etc.). The project will officially be launched at the April 2023 MPEG meeting. More information can be found on the project’s website, at https://www.mpeg.org/standards/MPEG-AI/.
2. VCM
Activities on the Video Coding for Machines (VCM) standard continued during the ad-hoc group and break-out group meetings.
As a reminder, the VCM call for proposals issued in July of 2022. Excerpt from the MPEG’s press release (https://www.mpeg.org/meetings/mpeg-140/) stated:
At the 140th MPEG meeting, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Proposals (CfP) for technologies and solutions enabling efficient video coding for machine vision tasks. A total of 17 responses to this CfP were received, with responses providing various technologies such as (i) learning-based video codecs, (ii) block-based video codecs, (iii) hybrid solutions combining (i) and (ii), and (iv) novel video coding architectures. Several proposals use a region of interest-based approach, where different areas of the frames are coded in varying qualities.
The responses to the CfP reported an improvement in compression efficiency of up to 57% on object tracking, up to 45% on instance segmentation, and up to 39% on object detection, respectively, in terms of bit rate reduction for equivalent task performance. Notably, all requirements defined by WG 2 were addressed by a variety of proposals.
Given the success of this call, MPEG will continue working on video compression methods for machine vision tasks. The work will continue in MPEG Video Coding (WG 4) within a new standardization project. A test model will be developed based on technologies from the responses to the CfP and results from the first round of core experiments in one or two meeting cycles. At the same time, the Joint Video Team with ITU-T SG 16 (WG 5) will study encoder optimization methods for machine vision tasks on top of existing MPEG video compression standards.”
Furthermore, in the publc document “CfP response report for Video Coding for Machines” (https://www.mpeg.org/wp-content/uploads/mpeg_meetings/140_Mainz/w22071.zip), MPEG expressed acknowledgment of the participating organizations:
The following organizations are thanked for responding to this CfP:
· Alibaba
· Institute of Computing Technology, Chinese Academy of Sciences (CAS-ICT)
· China Telecom
· City University of Hong Kong
· Ericsson
· Electronics and Telecommunications Research Institute (ETRI)
· Florida Atlantic University (FAU)
· Konkuk University
· Myongji University
· Nokia
· OP Solutions
· Poznan University of Technology (PUT)
· Tencent
· V-Nova
· Wuhan University
· Zhejiang University
During the 141st meeting updated results from the proponents that responded to the Call for Proposals were reviewed, and the decision was made to continue the work on the reference software as well as five core experiments (CEs):
CE 1 – Region-of-interest based coding methods,
CE 2 – Neural network based inner coding,
CE 3 – Frame level spatial resampling,
CE 4 – Temporal resampling,
CE 5 – Post filtering.
OP Solutions, together with its partner institution, Florida Atlantic University, continues to participate in the development of the VCM standard, as a proponent of proposals directed to several core experiments. New and updated results of or proposed technology will be presented in the 142nd MPEG meeting in April.
In addition, the draft CfP was issued for the Feature Compression for Video Coding for Machines (FCVCM). In contrast to VCM, which as inputs takes the pixel domain picture or a frame of a video, the FCVCM takes as inputs the features from the arbitrary layer of the neural network processing the input picture. (We are planning to write additional blog posts explaining details of those technologies in the near future – stay tuned!).
The final CfP for FCVCM will issue in April. OP Solutions plans to respond to this CfP as well.
3. MPEG roadmap
MPEG’s roadmap emphasizes the importance of the VCM, FCVCM, and related. This is a short-term plan that is result of MPEG experts’ assessment of current status and near-term viability of the ongoing standardization efforts.
In the accompanying presentation, MPEG gives following rationale for producing and publicizing the roadmap:
MPEG has created, and still produces, media standards that enable huge markets to flourish
• MPEG works on requirements from industry.
• Many industries are represented in MPEG, but not all of MPEG’s customers can or need to participate in the process.
• MPEG wants to inform its customers about its long-term plans (~ 5 years out).
• MPEG collects feedback and requirements from these customers.
The roadmap is shaped by significant developments
• The relentless increase of IP-distributed and mobile media
• Higher quality media
• More immersion (UHD, VR, AR, Light Fields, Holography)
• The Internet of Media Things & Wearables
• Cloud-based media processing, storage and delivery
• New high-speed networks including fiber, 5G mobile, and cable 10G
• New emerging technologies (machine vision, AI)
The short-term plan for the MPEG’s roadmap, after the 141st meeting, is depicted in the picture accompanying this blog post.
We are glad to announce that OP Solutions will continue participating in the MPEG’s work on the exciting and promising new technologies.