Skip to content

Python code to help read ASEG GDF2 packages

License

Notifications You must be signed in to change notification settings

csiro-hydrogeology/aseg_gdf2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

aseg_gdf2

Python code to help read ASEG GDF2 packages. See the ASEG technical standards page for more information about the file format.

Still very much a work in progress.

Usage

In [1]: import aseg_gdf2

In [2]: gdf = aseg_gdf2.read(r'tests/example_datasets/3bcfc711/GA1286_Waveforms')

In [3]: gdf.field_names()
Out[3]: ['FLTNUM', 'Rx_Voltage', 'Flight', 'Time', 'Tx_Current']

In [4]]: for row in gdf.iterrows():
   ...:     print(row)
   ...:
OrderedDict([('Index', 0), ('FLTNUM', 1.0), ('Rx_Voltage', -0.0), ('Flight', 1), ('Time', 0.0052), ('Tx_Current', 0.00176)])
OrderedDict([('Index', 1), ('FLTNUM', 1.0), ('Rx_Voltage', -0.0), ('Flight', 1), ('Time', 0.0104), ('Tx_Current', 0.00176)])
OrderedDict([('Index', 2), ('FLTNUM', 1.0), ('Rx_Voltage', -0.0), ('Flight', 1), ('Time', 0.0156), ('Tx_Current', 0.00176)])
...

For .dat files that will fit in memory, you can read them into a pandas.DataFrame:

In [5]: gdf.df()
Out[5]:
       FLTNUM  Rx_Voltage  Flight     Time  Tx_Current
0         1.0        -0.0       1   0.0052     0.00176
1         1.0        -0.0       1   0.0104     0.00176
2         1.0        -0.0       1   0.0156     0.00176
3         1.0        -0.0       1   0.0208     0.00176
4         1.0        -0.0       1   0.0260     0.00176
5         1.0        -0.0       1   0.0312     0.00176
...       ...         ...     ...      ...         ...
23034     2.0         0.0       2  59.9687    -0.00170
23035     2.0        -0.0       2  59.9740    -0.00170
23036     2.0        -0.0       2  59.9792    -0.00170
23037     2.0        -0.0       2  59.9844    -0.00170
23038     2.0        -0.0       2  59.9896    -0.00170
23039     2.0        -0.0       2  59.9948    -0.00170

[23040 rows x 5 columns]

For .dat files that are too big for memory, you can use the chunksize= keyword argument to specify the number of rows. Normally you could get away with a few hundred thousand, but for the example we'll use something less:

In [6]: for chunk in gdf.df_chunked(chunksize=10000):
    ...:     print('{} length = {}'.format(type(chunk), len(chunk)))
    ...:
<class 'pandas.core.frame.DataFrame'> length = 10000
<class 'pandas.core.frame.DataFrame'> length = 10000
<class 'pandas.core.frame.DataFrame'> length = 3040

The metadata from the .dfn file is there too:

In [7]: gdf.record_types
Out[7]:
{'': {'fields': [{'cols': 1,
    'comment': '',
    'format': 'F10.1',
    'long_name': 'FlightNumber',
    'name': 'FLTNUM',
    'null': None,
    'unit': '',
    'width': 10},
   {'cols': 1,
    'comment': '',
    'format': 'F10.5',
    'long_name': 'Rx_Voltage',
    'name': 'Rx_Voltage',
    'null': '-99.99999',
    'unit': 'Volt',
    'width': 10},
   {'cols': 1,
    'comment': '',
    'format': 'I6',
    'long_name': 'Flight',
    'name': 'Flight',
    'null': '-9999',
    'unit': '',
    'width': 6},
   {'cols': 1,
    'comment': '',
    'format': 'F10.4',
    'long_name': 'Time',
    'name': 'Time',
    'null': '-999.9999',
    'unit': 'msec',
    'width': 10},
   {'cols': 1,
    'comment': '',
    'format': 'F13.5',
    'long_name': 'Tx_Current',
    'name': 'Tx_Current',
    'null': '-99999.99999',
    'unit': 'Amp',
    'width': 13}],
  'format': None}}

Get the data just for one field/column:

In [8]: gdf.get_field('Time')
Out[8]:
array([  5.20000000e-03,   1.04000000e-02,   1.56000000e-02, ...,
         5.99844000e+01,   5.99896000e+01,   5.99948000e+01])

What about fields which are 2D arrays? Some GDF2 data files have fields with more than one value per row/record. e.g. in this one the last four fields each take up 30 columns:

In [9]: gdf = aseg_gdf2.read(r'tests/example_datasets/9a13704a/Mugrave_WB_MGA52.dfn')

In [10]: print(gdf.dfn_contents)
DEFN   ST=RECD,RT=COMM;RT:A4;COMMENTS:A76
DEFN 1 ST=RECD,RT=;GA_Project:I10:Geoscience Australia airborne survey project number
DEFN 2 ST=RECD,RT=;Job_No:I10:SkyTEM Australia Job Number
DEFN 3 ST=RECD,RT=;Fiducial:F15.2:Fiducial
DEFN 4 ST=RECD,RT=;DATETIME:F18.10:UNIT=days,Decimal days since midnight December 31st 1899
DEFN 5 ST=RECD,RT=;LINE:I10:Line number
DEFN 6 ST=RECD,RT=;Easting:F12.2:NULL=-9999999.99,UNIT=m,Easting (GDA94 MGA Zone 52)
DEFN 7 ST=RECD,RT=;NORTH:F15.2:NULL=-9999999999.99,UNIT=m,Northing (GDA 94 MGA Zone 52)
DEFN 8 ST=RECD,RT=;DTM_AHD:F10.2:NULL=-99999.99,Digital terrain model (AUSGeoid09 datum)
DEFN 9  ST=RECD,RT=;RESI1:F10.3:NULL=-9999.999,Residual of data
DEFN 10 ST=RECD,RT=;HEIGHT:F10.2:NULL=-99999.99,UNIT=m,Laser altimeter measured height of Tx loop centre above ground
DEFN 11 ST=RECD,RT=;INVHEI:F10.2:NULL=-99999.99,UNIT=m,Calulated inversion height
DEFN 12 ST=RECD,RT=;DOI:F10.2:NULL=-99999.99,UNIT=m,Calculated depth of investigation
DEFN 13 ST=RECD,RT=;Elev:30F12.2:NULL=-9999999.99,UNIT=m,Elevation to the top of each layer
DEFN 14 ST=RECD,RT=;Con:30F15.5:NULL=-9999999.99999,UNIT=mS/m,Inverted Conductivity for each layer
DEFN 15 ST=RECD,RT=;Con_doi:30F15.5:NULL=-9999999.99999,UNIT=mS/m, Inverted conductivity for each layer, masked to the depth of investigation
DEFN 16 ST=RECD,RT=;RUnc:30F12.3:NULL=-999999.999,Relative uncertainty of conductivity layer;END DEFN

You can see the field names in the normal manner:

In [11]: gdf.field_names()
Out[11]:
['GA_Project',
 'Job_No',
 'Fiducial',
 'DATETIME',
 'LINE',
 'Easting',
 'NORTH',
 'DTM_AHD',
 'RESI1',
 'HEIGHT',
 'INVHEI',
 'DOI',
 'Elev',
 'Con',
 'Con_doi',
 'RUnc']

Or you can see an "expanded" version of the fields, which is used for the column headings of the data table:

In [12]: gdf.column_names()
Out[12]:
['GA_Project', 'Job_No', 'Fiducial', 'DATETIME', 'LINE', 'Easting', 'NORTH', 'DTM_AHD', 'RESI1',
'HEIGHT', 'INVHEI', 'DOI', 'Elev[0]', 'Elev[1]', 'Elev[2]', 'Elev[3]', 'Elev[4]', 'Elev[5]',
'Elev[6]', 'Elev[7]', 'Elev[8]', 'Elev[9]', 'Elev[10]', 'Elev[11]', 'Elev[12]', 'Elev[13]',
'Elev[14]', 'Elev[15]', 'Elev[16]', 'Elev[17]', 'Elev[18]', 'Elev[19]', 'Elev[20]', 'Elev[21]',
'Elev[22]', 'Elev[23]', 'Elev[24]', 'Elev[25]', 'Elev[26]', 'Elev[27]', 'Elev[28]', 'Elev[29]',
'Con[0]', 'Con[1]', 'Con[2]', 'Con[3]', 'Con[4]', 'Con[5]', 'Con[6]', 'Con[7]', 'Con[8]', 'Con[9]',
'Con[10]', 'Con[11]', 'Con[12]', 'Con[13]', 'Con[14]', 'Con[15]', 'Con[16]', 'Con[17]', 'Con[18]',
'Con[19]', 'Con[20]', 'Con[21]', 'Con[22]', 'Con[23]', 'Con[24]', 'Con[25]', 'Con[26]', 'Con[27]',
'Con[28]', 'Con[29]', 'Con_doi[0]', 'Con_doi[1]', 'Con_doi[2]', 'Con_doi[3]', 'Con_doi[4]',
'Con_doi[5]', 'Con_doi[6]', 'Con_doi[7]', 'Con_doi[8]', 'Con_doi[9]', 'Con_doi[10]', 'Con_doi[11]',
'Con_doi[12]', 'Con_doi[13]', 'Con_doi[14]', 'Con_doi[15]', 'Con_doi[16]', 'Con_doi[17]',
'Con_doi[18]', 'Con_doi[19]', 'Con_doi[20]', 'Con_doi[21]', 'Con_doi[22]', 'Con_doi[23]',
'Con_doi[24]', 'Con_doi[25]', 'Con_doi[26]', 'Con_doi[27]', 'Con_doi[28]', 'Con_doi[29]', 'RUnc[0]',
'RUnc[1]', 'RUnc[2]', 'RUnc[3]', 'RUnc[4]', 'RUnc[5]', 'RUnc[6]', 'RUnc[7]', 'RUnc[8]', 'RUnc[9]',
'RUnc[10]', 'RUnc[11]', 'RUnc[12]', 'RUnc[13]', 'RUnc[14]', 'RUnc[15]', 'RUnc[16]', 'RUnc[17]',
'RUnc[18]', 'RUnc[19]', 'RUnc[20]', 'RUnc[21]', 'RUnc[22]', 'RUnc[23]', 'RUnc[24]', 'RUnc[25]',
'RUnc[26]', 'RUnc[27]', 'RUnc[28]', 'RUnc[29]']

In [13]: gdf.df().head()
Out[13]:
  GA_Project  Job_No   Fiducial      DATETIME    LINE   Easting      NORTH  \
0        1288   10013  3621109.0  42655.910984  112601  948001.6  7035223.1
1        1288   10013  3621110.0  42655.910995  112601  948001.9  7035196.8
2        1288   10013  3621111.0  42655.911007  112601  948001.5  7035169.5
3        1288   10013  3621112.0  42655.911019  112601  948000.6  7035141.6
4        1288   10013  3621113.0  42655.911030  112601  947999.1  7035113.6

  DTM_AHD  RESI1  HEIGHT    ...     RUnc[20]  RUnc[21]  RUnc[22]  RUnc[23]  \
0    354.1  1.091   40.98    ...         1.39      1.76      2.35      3.26
1    353.8  1.101   41.08    ...         1.43      1.84      2.47      3.41
2    353.7  0.813   41.03    ...         1.45      1.88      2.53      3.48
3    353.9  0.567   40.79    ...         1.45      1.87      2.53      3.49
4    354.2  0.522   40.37    ...         1.45      1.88      2.54      3.52

  RUnc[24]  RUnc[25]  RUnc[26]  RUnc[27]  RUnc[28]  RUnc[29]
0      4.45      5.74      6.94      8.00      8.99      98.0
1      4.62      5.90      7.09      8.15      9.15      98.0
2      4.70      5.97      7.16      8.22      9.21      98.0
3      4.71      5.98      7.16      8.21      9.20      98.0
4      4.74      6.01      7.18      8.23      9.22      98.0

[5 rows x 132 columns]

You can retrieve one of the original field arrays using get_field():

In [14]: gdf.get_field('Elev')
Out[14]:
array([[ 354.1,  352.1,  349.8, ..., -105.8, -171.2, -245.7],
       [ 353.8,  351.8,  349.5, ..., -106.1, -171.5, -246. ],
       [ 353.7,  351.7,  349.4, ..., -106.2, -171.6, -246.1],
       ...,
       [ 510.5,  508.5,  506.2, ...,   50.6,  -14.8,  -89.3],
       [ 510.5,  508.5,  506.2, ...,   50.6,  -14.8,  -89.3],
       [ 510.6,  508.6,  506.3, ...,   50.7,  -14.7,  -89.2]])

Or one of the columns:

In [15]: gdf.get_field('Elev[0]')
Out[15]:
array([ 354.1,  353.8,  353.7,  353.9,  354.2,  354.5,  354.6,  354.7,
        354.6,  354.5,  354.3,  354.1,  353.9,  353.8,  353.9,  354. ,
        512.8,  512.6,  512.4,  512.3,  512.3,  512.5,  512.7,  512.9,
        512.9,  512.8,  512.6,  512.4,  512. ,  511.7,  511.4,  511.2,
        511. ,  510.6,  510.5,  510.5,  510.5,  510.6])

You can also retrieve a subset of fields and column names as a pandas.DataFrame using the usecols keyword argument -- you don't necessarily need to retrieve the whole file at once. Note that the multidimensional 'Con' field is expanded into the column names:

In [16]: gdf.df(usecols=['Easting', 'NORTH', 'Con']).head()
Out[16]:
    Easting      NORTH    Con[0]    Con[1]    Con[2]     Con[3]     Con[4]  \
0  948001.6  7035223.1  28.76870  31.88776  46.04052   83.68201  157.48031
1  948001.9  7035196.8  31.06555  35.47357  51.17707   92.08103  165.37126
2  948001.5  7035169.5  38.18251  42.48088  59.91612  103.59474  174.79462
3  948000.6  7035141.6  47.61905  51.84033  70.17544  114.31184  178.79492
4  947999.1  7035113.6  58.58231  61.12469  77.45933  118.04982  173.64126

      Con[5]     Con[6]     Con[7]    ...        Con[20]    Con[21]  \
0  231.53508  242.01355  198.84669    ...      108.63661  145.39110
1  235.73786  237.41690  190.65777    ...      108.95620  144.84357
2  232.07241  225.88660  178.98693    ...      110.29006  146.09204
3  219.92523  212.44954  170.64846    ...      112.81588  148.58841
4  209.51184  204.49898  168.60563    ...      114.48197  150.03751

     Con[22]    Con[23]    Con[24]    Con[25]    Con[26]    Con[27]  \
0  181.29079  191.60759  178.44397  162.31131  152.43902  148.38997
1  179.79144  190.36741  177.99929  162.33766  152.55530  148.47810
2  179.88847  189.35808  177.11654  161.89089  152.46227  148.54427
3  180.31013  187.68769  175.37706  161.00467  152.23017  148.65468
4  180.21265  186.35855  174.27675  160.56519  152.23017  148.83167

     Con[28]    Con[29]
0  147.49263  147.42739
1  147.53615  147.47087
2  147.66686  147.62327
3  147.88524  147.86337
4  148.10427  148.08233

[5 rows x 32 columns]

About

Python code to help read ASEG GDF2 packages

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 76.4%
  • Python 23.6%