Journal cover Journal topic
Geoscientific Model Development An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

Journal metrics

  • IF value: 5.154 IF 5.154
  • IF 5-year value: 5.697 IF 5-year
    5.697
  • CiteScore value: 5.56 CiteScore
    5.56
  • SNIP value: 1.761 SNIP 1.761
  • IPP value: 5.30 IPP 5.30
  • SJR value: 3.164 SJR 3.164
  • Scimago H <br class='hide-on-tablet hide-on-mobile'>index value: 59 Scimago H
    index 59
  • h5-index value: 49 h5-index 49
Discussion papers
https://doi.org/10.5194/gmd-2018-250
© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/gmd-2018-250
© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.

Development and technical paper 20 Nov 2018

Development and technical paper | 20 Nov 2018

Review status
This discussion paper is a preprint. A revision of this manuscript was accepted for the journal Geoscientific Model Development (GMD) and is expected to appear here in due course.

Evaluation of lossless and lossy algorithms for the compression of scientific datasets in NetCDF-4 or HDF5 formatted files

Xavier Delaunay1, Aurélie Courtois1, and Flavien Gouillon2 Xavier Delaunay et al.
  • 1Thales Services, 290 allée du Lac, 31670 Labège, France
  • 2CNES, Centre spatial de Toulouse, 18 avenue Edouard Belin, 31401 Toulouse, France

Abstract. The increasing volume of scientific datasets imposes the use of compression to reduce the data storage or transmission costs, specifically for the oceanography or meteorological datasets generated by Earth observation mission ground segments. These data are mostly produced in NetCDF formatted files. Indeed, the NetCDF-4/HDF5 file formats are widely spread in the global scientific community because of the nice features they offer. Particularly, the HDF5 offers the dynamically loaded filter plugin functionality allowing users to write filters, such as compression/decompression filters, to process the data before reading or writing it on the disk. In this work, we evaluate the performance of lossy and lossless compression/decompression methods through NetCDF-4 and HDF5 tools on analytical and real scientific floating-point datasets. We also introduce the Digit Rounding algorithm, a new relative error bounded data reduction method inspired by the Bit Grooming algorithm. The Digit Rounding algorithm allows high compression ratio while preserving a given number of significant digits in the dataset. It achieves higher compression ratio than the Bit Grooming algorithm while keeping similar compression speed.

Xavier Delaunay et al.
Interactive discussion
Status: closed
Status: closed
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
Printer-friendly Version - Printer-friendly version Supplement - Supplement
Interactive discussion
Status: closed
Status: closed
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
Printer-friendly Version - Printer-friendly version Supplement - Supplement
Xavier Delaunay et al.
Xavier Delaunay et al.
Viewed  
Total article views: 464 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
317 142 5 464 7 8
  • HTML: 317
  • PDF: 142
  • XML: 5
  • Total: 464
  • BibTeX: 7
  • EndNote: 8
Views and downloads (calculated since 20 Nov 2018)
Cumulative views and downloads (calculated since 20 Nov 2018)
Viewed (geographical distribution)  
Total article views: 421 (including HTML, PDF, and XML) Thereof 418 with geography defined and 3 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Cited  
Saved  
No saved metrics found.
Discussed  
No discussed metrics found.
Latest update: 16 Jul 2019
Publications Copernicus
Download
Short summary
This work evaluates the performance of lossy and lossless compression/decompression of NetCDF-4/HDF5 floating-point datasets. It also introduces the Digit Rounding algorithm. It is a relative error bounded data reduction method inspired by the Bit Grooming algorithm. It allows high compression ratio while preserving a given number of significant digits in the dataset, and achieves higher compression ratio than the Bit Grooming algorithm while keeping similar compression speed.
This work evaluates the performance of lossy and lossless compression/decompression of...
Citation