Research code for ECG machine learning exploration.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Alexander William Wong e10d2d78c3
col headers are now unknowns
1 week ago
.vscode Reworked SQL parsing and PDF generation scripts 1 week ago
sierra-ecg-tools @ c56c30f8a2 Reworked SQL parsing and PDF generation scripts 1 week ago
.gitignore Reworked SQL parsing and PDF generation scripts 1 week ago
.gitmodules Reworked SQL parsing and PDF generation scripts 1 week ago
LICENSE Initial commit 1 month ago
README.md Reworked SQL parsing and PDF generation scripts 1 week ago
check_xml_with_diff.py Reworked SQL parsing and PDF generation scripts 1 week ago
check_xml_with_diff.py.out Reworked SQL parsing and PDF generation scripts 1 week ago
gen_ecg_pdfs.py Reworked SQL parsing and PDF generation scripts 1 week ago
gen_ecg_pdfs.py.out Reworked SQL parsing and PDF generation scripts 1 week ago
sql_decoder.py col headers are now unknowns 1 week ago
sql_decoder.py.out Reworked SQL parsing and PDF generation scripts 1 week ago

README.md

ecgml_research

Research code for ECG machine learning exploration. See Trace Master Vue ECG Management System Database And XML Schema Data Dictionary for Philips ECG XML and Database Schema definitions.

Out Folder

fragments
Raw XML decoded values from tblECGMain {headerInfo, userDefines, orderInfo, documentInfo, reportInfo, acquisitionInfo, patientInfo, interpretationInfo} and tblECGWaveforms {waveformInfo, measurementInfo}. Column name to decoded value mismatches in sql_decoder.py.out.
full
XML decoded fragments, joined together based on schema definition. Because column name value mismatches exist, this approach relies on content structure over column names. Warnings in sql_decoder.py.out.
full-invalid
Symlinks to full. These files have data that do not match the reference columns in the tblECGMain database table. (example, tblECGMain says RR interval should be 1000, but XML file shows 847.) Warnings in check_xml_with_diff.py.out
full-valid
Symlinks to full. These files have data that correctly match the reference columns in the tblECGMain database table.
json
Remaining columns in the tblECGMain database that are unused in the XML file. Used for XML to table reference check. Holds additional information (uuids) that are currently unused.
pdf
Technician and cardiologist readable ECG printouts. Errors/warnings shown in gen_ecg_pdfs.py.out

Potential Research Questions

Fri, Jan 24, 2020

  • Who has afib diagnosed vs not diagnosed (+/-)
    • these labels are provided, diagnosed by physicians
  • Who has stroke vs who does not has a stroke (+/-)
    • these labels are provided, diagnosed by physicians

Can we detect slient afib using historical ECG data from stroke patients? {afib: -, stroke: +} * is stroke + and afib = silent afib? (maybe, not always) * limit to some time between afib diagnosis and stroke

  1. Predict afib given ECG waveform (labels derived from ECG machine itself; investigate how ECG machine does this)

Wed, Feb 5, 2020

  • Outputted PDFs need to be ‘calibrated’ such that they are readable by ECG technicians & cardiologists
  • todo: rework sql extraction, verification, and pdf generation scripts
  • Investigate QRS score (manual generated feature)
  1. Can we train a model to predict sudden cardiac death probability in ST elevation Myocardial Infarction (STEMI) population?
  2. Can we train a model to detect stroke probability in afib population?