Day 4: With some of the Anatomy stuff sorted out, I started exploring the bugs i.e. bacteria, virus, fungi etc that cause Infective Endocarditis. The main culprits involved in IE are bacteria, with a few fungi causing occasion infection.
Exploring IDO, the only class I can find for microorganisms is Viruses (NCBITaxon_10239) – as a subclass of organism (OBI_0100026). A basic categorization of microorganisms is nowhere to be found. Is this an oversight or a deliberate design decision? For now, it looks like an oversight. Also to note, NCBI does not consider Viruses to be organisms!
I will reuse the NCBI classification (see image) limited to the three subclasses of ‘cellular organisms’ – Archaea, Bacteria and Eukaryota.
I think NCBI’s cellular organisms would be equivalent to organism (OBI_0100026) in IDO. I created a preliminary list of the microorganisms involved in Infective endocarditis (see list below) and then matched them with the corresponding NCBI term (see Inf. Endocarditis google spreadsheet). The next step is to extract the hierarchy up to the three categories.
Microbiology of Infective Endocarditis
[Haldar Saptarsi M, O’Gara Patrick T, “Chapter 85. Infective Endocarditis” (Chapter). Fuster V, O’Rourke RA, Walsh RA, Poole-Wilson P, Eds. King SB, Roberts R, Nash IS, Prystowsky EN, Assoc. Eds.: Hurst’s The Heart, 12e: http://www.accessmedicine.com/content.aspx?aID=3079425]
- Streptococcus viridans
- Streptococcus sanguis, Streptococcus bovis, Streptococcus mutans, Streptococcus mitior
- Streptococcus intermedius
- Streptococcus pneumoniae (or pneumococcus)
- Nutritionally deficient streptococci, now known as Abiotrophia spp. and Granulicatella spp.
- Group A
- Streptococcus pyogenes
- Group B
- Streptococcus agalactiae
- Streptococcus viridans
- Enterococci (disease typically runs an indolent course)
- E. faecalis (80 percent)
- E. faecium (10 percent)
- Staphylococcus aureus (leading cause of IE worldwide)
- Staphylococcus epidermidis
- HACEK organisms
- Haemophilus spp. – H. aphrophilus, H. parainfluenzae
- Actinobacillus actinomycetemcomitans
- Cardiobacterium hominis
- Eikenella corrodens
- Kingella kingae
- Enterobacteriaceae (Escherichia coli, Klebsiella, Enterobacter, Serratia, etc.) are rare causes of IE.
- Pseudomonas aeruginosa
- Rickettsial organism
- Coxiella burnetii (Q fever)
- Bartonella species (B.quintana, B.henselae, B.elizabethae)
- Brucella spp.
- Fungi – associated with exceedingly high mortality (survival rates of <20 percent)
- Candida and
- Aspergillus species
Noninfectious causes of endocarditis that may behave like culture-negative IE, including those that are related to
- neoplasia (nonbacterial thrombotic endocarditis),
- autoimmune diseases (antiphospholipid antibody syndrome, systemic lupus erythematosus),
- the postcardiac surgery state (thrombi, stitches)
Extracting the term hierarchies from NCBI model
Last year, at the ICBO 2009, Melanie Courtot presented the MIREOT Tool for extracting fragments of an ontology for the purpose of integration into another ontology. It looked fairly straightforward. So for this, I first identified the source terms from NCBI corresponding to the microorganisms listed below and added them to my Inf. Endocarditis google spreadsheet.
There are some categorization differences I found between the list below and NCBI – I will go with the NCBI term matches and let the hierarchy come from it. I matched a total of 36 NCBITaxon terms. Now to extract the hierarchies …
OntoFox is a browser based implementation of the MERIOT tool. I decided to go with option #1: Data input using web forms, and filled it out as follows:
- Select one ontology:
– I selected ‘NCBI organismal classification (NCBITaxon).
- Class term specification:
Section A: Bottom up term specification
(a) Include low level source term URIs (One URI per line):
– entered the 36 NCBITaxon term URIs
(b) Include top level source term URIs and target direct superclass URIs (One URI per line, optional):
– selected NCBITaxon_131567 #cellular organisms for the top level term
(c) Select a setting for retrieving intermediate source terms:
– selected includeAllIntermediates
- Annotation Specification: Include source annotation URIs (One URI per line, optional):
– added includeAllAxioms
- URI of the OWL(RDF/XML) output file:
– I will go with the recommendation here and use http://purl.obolibrary.org/obo/vo/external/NCBITaxon_import.owl.
and go -> selected Get OWL (RDF/XML) Output File
after several seconds, less than a minute, the Results page was displayed with links to the output and input files. Downloaded the two files and opened the owl file in Protege 4.1 – shows a Class count of 91 and Individual count of 96.
Why ‘Individual’ ???
In the Classes tab, there are 2 subclasses under ‘Thing’ – ‘cellular organisms’ and ‘Synonym’. ‘Synonym’ has a ton (~400+) of members:
I guess these are all anonymous individuals corresponding to ‘synonyms’. The actual classes themselves do not have the alternate labels (or synonyms) and instead have these ‘genid’ – of absolutely no use. And of course, in the Individuals tab, all the classes are found as individuals – these are the NamedIndividuals from OWL 2.0 specification. Again, in this case, I am not sure they are of any real use here. Found this thread: http://protege-ontology-editor-knowledge-acquisition-system.136.n4.nabble.com/strange-classes-named-genidxxx-appear-in-4-0-115-td2299948.html
Well, I clean this up manually in a text editor ….
– removed all the NamedIndividuals that are mirroring the classes. Saved the file. Protege detects the change and I ask it to update. No change is visible in Protege – individual count is the same, the Individual tab still shows the same individuals as before. Hmm… not sure what’s going on here – is this a bug? Closed and reopened Protege – now the Individual count is down to 10 and the Individual tab is also updated. In the Classes tab, Synonym does not have any members. Deleting this one too.
Now, to change all the classes with genids – currently it looks like this:
I want to change the ‘hasRelatedSynonym’ to ‘skos:altLabel’ annotations: