Skip to content

HBS Research Assistant project to convert XML formatted files to CSC formats.

Notifications You must be signed in to change notification settings

jackyan540/convert-xml-to-csv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 

Repository files navigation

convert-xml-to-csv

HBS Research Associate project to convert XML formatted files to CSV formats. Project completed for Professor Jonathan Wallen.

Received data covering Jan. 1, 2017 through Mar. 14, 2022. Data is at the daily frequency level, amounting to 1,900 individual XML files. The objective is to convert these XML files into easier to use CSV files without any data loss.

The data is proprietary and cannot be shared in this repository. Details about the data are described in the "Data Specifications" section.

Data Specifications

XML file names are formatted as: WSH_DAILY_SNAPSHOT_ED_V03_YYMMDD.xml (i.e. for Jan. 1, 2022: WSH_DAILY_SNAPSHOT_ED_V03_20220101.xml). Some XML filename extensions are capitalized (i.e. using .XML rather than .xml). CSV file names will have the same format: WSH_DAILY_SNAPSHOT_ED_V03_YYMMDD.csv.

Data is organized by year and file format. (i.e. for an XML file dated Jan. 1, 2022, the file path is: 2022\xml\WSH_DAILY_SNAPSHOT_ED_V03_20220101.xml and for the file path for the corresponding csv file is: 2022\csv\WSH_DAILY_SNAPSHOT_ED_V03_20220101.csv).

XML files are formatted as:
<WallStreetHorizon>
<Source FileName="WSH_EARNINGS_V03" CreateTime="MM/DD/YYYY hh:mm:ss AM">
    <earnings>
        <event_id> </event_id>
        <company_id> </company_id>
        <stock_symbol> </stock_symbol>
        <company_name> </company_name>
        <stock_exchange> </stock_exchange>
        <isin> </isin>
        <earnings_date> </earnings_date>
        <quarter> </quarter>
        <fiscal_year> </fiscal_year>
        <earnings_date_status> </earnings_date_status>
        <time_of_day> </time_of_day>
        <quarter_end_date> </quarter_end_date>
        <audit_source> </audit_source>
        <disclaimer> </disclaimer>
    </earnings>
</Source>
</WallStreetHorizon>

About

HBS Research Assistant project to convert XML formatted files to CSC formats.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages