Day 4: Infective Endocarditis microbiology

Day 4: With some of the Anatomy stuff sorted out, I started exploring the bugs i.e. bacteria, virus, fungi etc that cause Infective Endocarditis. The main culprits involved in IE are bacteria, with a few fungi causing occasion infection.
Exploring IDO, the only class I can find for microorganisms is Viruses (NCBITaxon_10239) – as a subclass of organism (OBI_0100026). A basic categorization of microorganisms is nowhere to be found. Is this an oversight or a deliberate design decision? For now, it looks like an oversight. Also to note, NCBI does not consider Viruses to be organisms!
I will reuse the NCBI classification (see image) limited to the three subclasses of ‘cellular organisms’ – Archaea, Bacteria and Eukaryota.
NCBI-Taxonomy root

NCBI-Taxonomy root

I think NCBI’s cellular organisms would be equivalent to organism (OBI_0100026) in IDO. I created a preliminary list of the microorganisms involved in Infective endocarditis (see list below) and then matched them with the corresponding NCBI term (see Inf. Endocarditis google spreadsheet). The next step is to extract the hierarchy up to the three categories.

Microbiology of Infective Endocarditis
[Haldar Saptarsi M, O’Gara Patrick T, “Chapter 85. Infective Endocarditis” (Chapter). Fuster V, O’Rourke RA, Walsh RA, Poole-Wilson P, Eds. King SB, Roberts R, Nash IS, Prystowsky EN, Assoc. Eds.: Hurst’s The Heart, 12e: http://www.accessmedicine.com/content.aspx?aID=3079425]
  • Streptococci
    • Streptococcus viridans
      • Streptococcus sanguis, Streptococcus bovis, Streptococcus mutans, Streptococcus mitior
    • Streptococcus intermedius
    • Streptococcus pneumoniae (or pneumococcus)
    • Nutritionally deficient streptococci, now known as Abiotrophia spp. and Granulicatella spp.
    • Group A
      • Streptococcus pyogenes
    • Group B
      • Streptococcus agalactiae
  • Enterococci  (disease typically runs an indolent course)
    • E. faecalis (80 percent)
    • E. faecium (10 percent)
  • Staphylococci
    • Staphylococcus aureus   (leading cause of IE worldwide)
    • Staphylococcus epidermidis
  • HACEK organisms
    • Haemophilus spp. – H. aphrophilus, H. parainfluenzae
    • Actinobacillus actinomycetemcomitans
    • Cardiobacterium hominis
    • Eikenella corrodens
    • Kingella kingae
  • Enterobacteriaceae (Escherichia coli, Klebsiella, Enterobacter, Serratia, etc.) are rare causes of IE.
  • Pseudomonas aeruginosa
  • Rickettsial organism
    • Coxiella burnetii (Q fever)
  • Bartonella species (B.quintana, B.henselae, B.elizabethae)
  • Brucella spp.
  • Fungi – associated with exceedingly high mortality (survival rates of <20 percent)
    • Candida and
    • Aspergillus species
Noninfectious causes of endocarditis that may behave like culture-negative IE, including those that are related to
  • neoplasia (nonbacterial thrombotic endocarditis),
  • autoimmune diseases (antiphospholipid antibody syndrome, systemic lupus erythematosus),
  • the postcardiac surgery state (thrombi, stitches)

Extracting the term hierarchies from NCBI model
Last year, at the ICBO 2009, Melanie Courtot presented the MIREOT Tool for extracting fragments of an ontology for the purpose of integration into another ontology. It looked fairly straightforward. So for this, I first identified the source terms from NCBI corresponding to the microorganisms listed below and added them to my Inf. Endocarditis google spreadsheet.
There are some categorization differences I found between the list below and NCBI – I will go with the NCBI term matches and let the hierarchy come from it. I matched a total of 36 NCBITaxon terms. Now to extract the hierarchies …

OntoFox
is a browser based implementation of the MERIOT tool. I decided to go with option #1: Data input using web forms, and filled it out as follows:
  1. Select one ontology:
    – I selected ‘NCBI organismal classification (NCBITaxon).
  2. Class term specification:
    Section A: Bottom up term specification
    (a) Include low level source term URIs (One URI per line):
    – entered the 36 NCBITaxon term URIs
    (b) Include top level source term URIs and target direct superclass URIs (One URI per line, optional):
    – selected NCBITaxon_131567 #cellular organisms for the top level term
    (c) Select a setting for retrieving intermediate source terms:
    – selected includeAllIntermediates
  3. Annotation Specification: Include source annotation URIs (One URI per line, optional):
    – added includeAllAxioms
  4. URI of the OWL(RDF/XML) output file:
    – I will go with the recommendation here and use   http://purl.obolibrary.org/obo/vo/external/NCBITaxon_import.owl.
and go ->  selected Get OWL (RDF/XML) Output File
….. waiting
….. waiting
after several seconds, less than a minute, the Results page was displayed with links to the output and input files. Downloaded the two files and opened the owl file in Protege 4.1 – shows a Class count of 91 and Individual count of 96.
Why ‘Individual’ ???
In the Classes tab, there are 2 subclasses under ‘Thing’ – ‘cellular organisms’ and ‘Synonym’. ‘Synonym’ has a ton (~400+) of members:
NCBI-Taxon-extract-genids

NCBI-Taxon extracted file - Protege view showing genids

I guess these are all anonymous individuals corresponding to ‘synonyms’. The actual classes themselves do not have the alternate labels (or synonyms) and instead have these ‘genid’ – of absolutely no use. And of course, in the Individuals tab, all the classes are found as individuals – these are the NamedIndividuals from OWL 2.0 specification. Again, in this case, I am not sure they are of any real use here. Found this thread:   http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/strange-classes-named-genidxxx-appear-in-4-0-115-td2299948.html
Well, I clean this up manually in a text editor ….
– removed all the NamedIndividuals that are mirroring the classes. Saved the file. Protege detects the change and I ask it to update. No change is visible in Protege – individual count is the same, the Individual tab still shows the same individuals as before. Hmm… not sure what’s going on here – is this a bug? Closed and reopened Protege – now the Individual count is down to 10 and the Individual tab is also updated. In the Classes tab, Synonym does not have any members. Deleting this one too.
Now, to change all the classes with genids – currently it looks like this:
Class-def-with-genids

Class definition - with genids

I want to change the ‘hasRelatedSynonym’ to ‘skos:altLabel’ annotations:
Class-def-with-altLabels

Class definition - with skos:altLabels

Posted in IDO, Infectious Diseases, Medical Informatics, Medicine, Microbiology, Ontology | Leave a comment

Day 3: Organizing terms

Day 3: I started organizing the terms into categories – anatomy and microbiology to start with. I want to get some of the ‘basic’ information in first – information such as anatomy, physiology, microbiology etc. Not an exhaustive list, only a few – I think this will be a good starting point. And then build on these later in the areas of pathology, phenotypes, procedures etc. The approach would be to analyze each term and break it into its component parts.

BioPortal is an excellent place for browsing biomedical ontologies. I explored some of these terms (mitral valve, aortic valve, staphylococcus aureus, etc.) and think I have enough to form an initial plan of the candidate ontologies I can reuse:

  • FMA for anatomy terms
  • NCBI organismal classification for bacteria, virus, etc

Now, the challenge is going to be in organizing the IE ontology. Luckily, IDO (with OGMS and BFO) provides a frame-work to build on. Here goes ….

I am exploring IDO for a place to anchor the anatomy terms. The only term I can find is ‘anatomical entity(CARO#CARO_0000000) as a subClass of BFO’s ‘material entity(snap#MaterialEntity) an independent continuant. So far so good …

CARO's Anatomical entity

FMA has the following high level structure:

FMA's upper categories

Hmm… this will need some thinking!

CARO_0000000 (anatomical entity) := Biological entity that is either an individual member of a biological species or constitutes the structural organization of an individual member of a biological species.

fma:Anatomical_entity := Organismal continuant entity which is enclosed by the bona fide boundary of an organism or is an attribute of its structural organization. Examples: cell, heart, head, peritoneal cavity, apex of lung, anatomical term, sagittal plane.
– the scope of the FMA is described in its FAQ (#1):
“The FMA is a reference ontology for the domain of anatomy. It is a symbolic representation of the phenotypic structure of the human body.”

This means that fma:Anatomical_entity is a subClass of the much broader CARO:anatomical entity term. But, if you look at the class hierarchies, it appears that fma:Material_anatomical_entity is the one closest aligned to CARO_0000000 (anatomical entity) – in a subClass relation in the snap:IndependentContinuant hierarchy

What about the rest of FMA? fma:Immaterial_anatomical_entity and the fma:Non-physical_anatomical_entity consist of things that are dependent on other things i.e. belong to the snap:DependentContinuant hierarchy.

So, I think for now, I will add fma:Anatomical_entity as a top-level class under snap:Continuant keeping FMA’s structure intact and add other subclass relations as needed within the BFO/IDO hierarchy.

Going forward, to manage this kind of organization, I want to try cross-tabulating each term – the clinical categories (anatomy, physiology, biochemistry, microbiology, pathology, ….) on the X-axis and the BFO categories (and sub-categories) on the Y-axis in the Inf. Endocarditis google spreadsheet.

Posted in Anatomy, BFO, FMA, IDO, Infectious Diseases, Medical Informatics, Medicine, Microbiology, OGMS, Ontology | Leave a comment

Day 2: Goals, frameworks, conventions

Day 2:  After starting the list of terms, I realized that it was turning out to be a long one. I wanted to limit it to about 50 terms. Experience tells me that five times as many terms will be modeled in the ontology. To help me narrow the term set, I focused on the central theme for this ontology. Coming from a surgical background, I was looking at this more from a surgical treatment point of view. Not that I am not interested in the pathophysiology, it will have to wait. For now the focus will be on signs and symptoms that bring the patient to the doctor, the diagnostic procedures, and the treatment aspects, especially the surgical treatment.


Today, I created the new ontology (IDO-IE) in Protege 4.1 and imported the Infectious Disease Ontology (IDO) into it. IDO is a domain-level ontology that covers entities generally relevant to infectious diseases. IDO itself imports OGMS (Ontology for General Medical Science). I am using IDO as the general framework upon which I can develop this Infectious Endocarditis extension.


So, the first thing I noticed is that IDO (and OGMS) use numeric identifiers for terms with a namespace prefix e.g. IDO_0000506 (for ‘bacteremia’). The full URI looks like this:      http://purl.obolibrary.org/obo/IDO_0000506
  • This uses the generic address (http://purl.obolibrary.org/obo/) for all the OBO ontologies and each ontology term has its name as a prefix. So, for IDO it is IDO_0000506 (bacteremia) and for OGMS it is OGMS_0000073 (diagnosis). Seems OK, but causes huge problems with namespace declarations in Protege – result is that the namespace cannot be rendered.
  • Both IDO and OGMS use numeric identifiers. Therefore, to view the terms meaningfully, Protege’s preferences setting of ‘render entities using annotation values’ needs to be selected.
Now, I need to decide whether to use the same naming style as IDO or not!
Posted in Uncategorized | Leave a comment

Day 1: Infective endocarditis – introduction, modeling approach …

Background
Infective endocarditis (IE) is an infection of the heart; to be more precise, of the endocardium of the heart, leading to valvular damage, heart failure and death if untreated. Bacterial infection is the most common cause and there is often a predisposing condition (most important being previous valve damage) that results in an altered blood flow. Intravenous drug abuse, prosthetic heart valves and other intravascular procedures are important risk factors.
IE can manifest as an acute or sub-acute presentation with fever, changing heart murmur and symptoms and signs of septic emboli and immunological phenomenon, etc. Diagnosis often requires blood cultures and echocardiography. Duke criteria (2 major and 6 minor criteria) are used to establish a diagnosis. Treatment involves the use of high dose IV antibiotics to eradicate the infection and addressing the complications. Surgical treatment is required in many patients.

Ontological engineering approach
I plan to develop this OWL model using BFO framework [http://www.ifomis.org/bfo] and the OBO Foundry principles [http://www.obofoundry.org/crit.shtml]. It will extend the Infectious Disease Ontology (IDO) which in turn imports OGMS and BFO. The naming conventions to be followed use the OBO Foundry recommendations  [http://www.obofoundry.org/wiki/index.php/Naming] and  [http://www.obofoundry.org/wiki/index.php/NewConventions].

Tools/Software
I will be using a bunch of freely available tools including google docs and Protege editor to develop the OWL model.
Google Spreadsheet for list of Terms, definitions etc – this will be updated over time.
– Protege 4.1 for ontology model development.
Posted in Infectious Diseases, Medical Informatics, Medicine, Ontology | Leave a comment

Experiment 1: A lab notebook for developing an ontology for Infective Endocarditis

This is the start of an experiment in keeping a weblog of my efforts in developing an ontology for Infective Endocarditis as an extension of the Infectious Disease Ontology (IDO). I started developing this about 3 years ago but then put it on hold when I changed jobs last year. The break, however, has been good because a few things changed in this past year. An Ontology for General Medical Science (OGMS) based on the Basic Formal Ontology (BFO) was started last year and has made good progress. The Core IDO model has also evolved during this time and now uses OGMS as general medical framework ontology.
The Infective Endocarditis (IE) extension of IDO (called IDO-IE) was originally developed directly using Basic Formal Ontolgy (BFO), and the model evolved, in hindsight, quite haphazardly. So now that a Core-IDO based on OGMS and BFO is available, I will once again begin from a clean slate and add the IE related terms. This blog will be my lab notebook where I will record my efforts in building a IDO-IE model.
Posted in Infectious Diseases, Medical Informatics, Medicine, Ontology | Leave a comment