GC file transformation

Summary of how it works

The GC Transformer works in three main steps:

  1. Classification - based on analysis of several dozen historical print formatted GC reports we recorded where the analysis is specified, the columns where the context headers and data are located and other variables. Based on this we can identify whether the file is presented in a familiar structure and know where to find the context data (depth, well name etc).

  2. Context data - We capture information such as Company, location, well name, depth etc from the block near the top of the file. This is typically structured in two header columns and two columns used for values:



  3. Page data - The bulk of the data is then captured in repeating cycles with context data followed by a table of measurements e.g.


    Each page starts with the repeated block of context data, then has headers followed by data. Whilst iterating through the rows we can pick up indicators that pages are starting or finishing such as after the context data block we expect to see empty rows or the headers and after the headers we expect to see page data.

    We use the Peak Label (and Ion channel if given) to identify the properties and use the Area, Height etc to identify the indicators and units of measure used.

    Because the sheets are formatted for presentation rather than ease of processing the header columns do not always correspond with the value columns even if they align visually e.g. it's common for the value to have a larger range of merged columns than the header, but be right aligned so they match up visually. We use consistency checks to ensure that the number of value columns match with the number of header columns (correcting for multiple values within merged cells where needed).

    Rows that do not meet this convention such as subheadings are disregarded e.g. 

 

Value Selection

In some cases columns are included for standard and response factor corrected (for the same indicator). Where both are given the standard values are preferred. 

We also do not currently capture the retention time or the full compound name (unless the peak label is unspecified).

 

Suggested output checks

The first thing to check is that all of the sheets that you expected were transformed. If a file contains multiple sheets and some but not all are a recognised structure, we only include the sheets that could be transformed in the output.

Another useful check is that you are happy with the indicators and units of measure that have been assigned. We do our best to capture appropriate values for these, but have seen a range of different abbreviations, so they may not be mapped appropriately if unfamiliar headings are given. If you check the first set of values after the context data, the same indicators and units of measure will be repeated for each property:

 

Unsupported files / sheets

You may come across examples of files or individual sheets that are not yet supported by the service. Whilst we cannot guarantee that every GC file can be converted due to the wide variety of structures seen we would like to support as wider a range as we can (especially commonly occurring file structures). If you find any files that are not yet supported please consider submitting this to IGI (anonymised if you wish) for investigation: dataservice@igiltd.com.


Known Issues

© 2024 Integrated Geochemical Interpretation Ltd. All rights reserved.

search exit shift Show/Hide Sidebar Show/Hide Sidebar