{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Example notebook showcasing the use of the PseudomonasDotCom Scraper as a programmable interface to the pseudomonas.com database " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## List of content \n", "[Load required python modules](#Load-required-python-modules) \n", "[Setting-things-up](#Setting-things-up) \n", "[Retrieve-the-data](#Retrieve-the-data) \n", "[Display the data](#Display-the-data) \n", "[List what data is in the results](#List-all-keys-in-the-results-dict) \n", "[Get the data for one queried gene](#Get-the-data-for-one-queried-gene) \n", "[Display one table](#Display-one-table) \n", "[Display a given table for all three genes](#Display-a-given-table-for-all-three-genes) \n", "[Select all rows with a given value in one column](#Select-all-rows-with-a-given-value-in-one-column) \n", "[Save to disk](#Save-to-disk) \n", "[Read from disk](#Read-results-from-disk) \n", "[Example with references](#Example-for-a-query-with-references) \n", "[Display references](#Display-references-with-proper-html-links)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load required python modules " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# The scraper\n", "from GenDBScraper.PseudomonasDotComScraper import PseudomonasDotComScraper as scraper\n", "\n", "# The query object (derived from collections.namedtuple)\n", "from GenDBScraper.PseudomonasDotComScraper import pdc_query\n", "\n", "# Regular expressions\n", "import re\n", "\n", "# pandas DataFrame, the workhorse datastructure\n", "import pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting things up " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# We want to get data for three adjacent genes, pflu0915, pflu0916, pflu0917\n", "queries = [pdc_query(strain='sbw25',feature=feat) for feat in ['pflu0915', 'pflu0916', 'pflu0917']]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Set up the scraper\n", "scraper = scraper(query=queries)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "INFO: Good response from https://www.pseudomonas.com .\n" ] } ], "source": [ "# Connect to the database\n", "scraper.connect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Retrieve the data" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0915&e1=1&term1=sbw25&assembly=complete .\n", "INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0915&e1=1&term1=sbw25&assembly=complete .\n", "INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459887 .\n", "INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459887&view=functions .\n", "WARNING: No data found for 'Functional Classifications Manually Assigned by PseudoCAP'. Will return empty pandas.DataFrame.\n", "DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0916&e1=1&term1=sbw25&assembly=complete .\n", "INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0916&e1=1&term1=sbw25&assembly=complete .\n", "INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459889 .\n", "INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459889&view=functions .\n", "WARNING: No data found for 'Functional Classifications Manually Assigned by PseudoCAP'. Will return empty pandas.DataFrame.\n", "DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0917&e1=1&term1=sbw25&assembly=complete .\n", "INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0917&e1=1&term1=sbw25&assembly=complete .\n", "INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459891 .\n", "INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459891&view=functions .\n", "WARNING: No data found for 'Functional Classifications Manually Assigned by PseudoCAP'. Will return empty pandas.DataFrame.\n" ] } ], "source": [ "results = scraper.run_query()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Display the data" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'sbw25__pflu0915': {'Gene Feature Overview': 1\n", " 0 \n", " Strain Pseudomonas fluorescens SBW25\n", " Locus Tag PFLU0915\n", " Name NaN\n", " Replicon chromosome\n", " Genomic location 1015020 - 1015640 (+ strand),\n", " 'Cross-References': 1\n", " 0 \n", " RefSeq YP_002870577.1\n", " GI 229588458\n", " Entrez 7816630,\n", " 'Product': 1\n", " 0 \n", " Feature Type CDS\n", " Coding Frame 1\n", " Product\\tName hypothetical protein\n", " Synonyms NaN\n", " Evidence for Translation NaN\n", " Charge (pH 7) 2.14\n", " Kyte-Doolittle Hydrophobicity Value -0.358\n", " Molecular Weight (kDa) 23.2\n", " Isoelectric Point (pI) 8.48,\n", " 'Subcellular localization': Confidence \\\n", " Localization \n", " Unknown Class 3 \n", " Individual Mappings Localization Confidence PMID Unknown Class 3 ... \n", " Additional evidence for subcellular localization NaN \n", " \n", " PMID \n", " Localization \n", " Unknown 20472543.0 \n", " Individual Mappings NaN \n", " Additional evidence for subcellular localization NaN ,\n", " 'Pathogen Association Analysis': 1\n", " 0 \n", " Results Common Found in both pathogen and nonpathogeni...,\n", " 'Orthologs/Comparative Genomics': 1\n", " 0 \n", " Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database\n", " Pseudomonas Ortholog Group POG004285 (549 members)\n", " Putative Inparalogs None Found,\n", " 'Interactions': 1\n", " 0 \n", " STRING database Search for predicted protein-protein interacti...,\n", " 'References': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Gene Ontology': Empty DataFrame\n", " Columns: [Ontology, Accession, Term, GO Evidence, Evidence Ontology (ECO) Code, Reference, Comments]\n", " Index: [],\n", " 'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Functional Predictions from Interpro': Analysis Accession Description \\\n", " 0 Gene3D G3DSA:3.90.1680.10 NaN \n", " 1 SUPERFAMILY SSF143081 NaN \n", " 2 Pfam PF02586 SOS response associated peptidase (SRAP) \n", " \n", " Interpro Accession Interpro Description \\\n", " 0 IPR036590 SOS response associated peptidase-like \n", " 1 IPR036590 SOS response associated peptidase-like \n", " 2 IPR003738 SOS response associated peptidase (SRAP) \n", " \n", " Amino Acid Start Amino Acid Stop E-value \n", " 0 1 206 3.400000e-46 \n", " 1 2 205 1.090000e-47 \n", " 2 1 192 1.600000e-38 },\n", " 'sbw25__pflu0916': {'Gene Feature Overview': 1\n", " 0 \n", " Strain Pseudomonas fluorescens SBW25\n", " Locus Tag PFLU0916\n", " Name NaN\n", " Replicon chromosome\n", " Genomic location 1015719 - 1017857 (- strand),\n", " 'Cross-References': 1\n", " 0 \n", " RefSeq YP_002870578.1\n", " GI 229588459\n", " Entrez 7816631\n", " INSDC CAY47182.1\n", " UniParc UPI00019D9DE0\n", " UniProtKB Acc C3KBK5\n", " UniProtKB ID C3KBK5_PSEFS\n", " UniRef100 UniRef100_C3KBK5\n", " UniRef50 UniRef50_Q4K6C9\n", " UniRef90 UniRef90_C3KBK5,\n", " 'Product': 1\n", " 0 \n", " Feature Type CDS\n", " Coding Frame 1\n", " Product\\tName putative methyl-accepting chemotaxis protein\n", " Synonyms NaN\n", " Evidence for Translation NaN\n", " Charge (pH 7) -20.37\n", " Kyte-Doolittle Hydrophobicity Value -0.035\n", " Molecular Weight (kDa) 76.4\n", " Isoelectric Point (pI) 4.78,\n", " 'Subcellular localization': Confidence \\\n", " Localization \n", " Cytoplasmic Membrane Class 3 \n", " Individual Mappings Localization Confidence PMID Cytoplasmic Memb... \n", " Additional evidence for subcellular localization NaN \n", " \n", " PMID \n", " Localization \n", " Cytoplasmic Membrane 20472543.0 \n", " Individual Mappings NaN \n", " Additional evidence for subcellular localization NaN ,\n", " 'Pathogen Association Analysis': 1\n", " 0 \n", " Results Common Found in both pathogen and nonpathogeni...,\n", " 'Orthologs/Comparative Genomics': 1\n", " 0 \n", " Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database\n", " Pseudomonas Ortholog Group POG002657 (1824 members)\n", " Putative Inparalogs None Found,\n", " 'Interactions': 1\n", " 0 \n", " STRING database Search for predicted protein-protein interacti...,\n", " 'References': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Gene Ontology': Ontology Accession Term \\\n", " 0 Biological Process GO:0007165 signal transduction \n", " 1 Cellular Component GO:0016021 integral component of membrane \n", " 2 Cellular Component GO:0016020 membrane \n", " \n", " GO Evidence \\\n", " 0 ISM Inferred from Sequence Model Term mapped ... \n", " 1 ISM Inferred from Sequence Model Term mapped ... \n", " 2 ISM Inferred from Sequence Model Term mapped ... \n", " \n", " Evidence Ontology (ECO) Code Reference Comments \n", " 0 ECO:0000259 match to InterPro signature eviden... NaN NaN \n", " 1 ECO:0000259 match to InterPro signature eviden... NaN NaN \n", " 2 ECO:0000259 match to InterPro signature eviden... NaN NaN ,\n", " 'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Functional Predictions from Interpro': Analysis Accession \\\n", " 0 CDD cd06225 \n", " 1 Gene3D G3DSA:1.10.287.950 \n", " 2 Gene3D G3DSA:3.30.450.20 \n", " 3 Pfam PF00672 \n", " 4 SUPERFAMILY SSF58104 \n", " 5 Gene3D G3DSA:3.30.450.20 \n", " 6 Pfam PF02743 \n", " 7 Coils Coil \n", " 8 Gene3D G3DSA:3.30.450.20 \n", " 9 Pfam PF00015 \n", " 10 ProSiteProfiles PS50111 \n", " 11 SMART SM00304 \n", " 12 CDD cd11386 \n", " 13 ProSiteProfiles PS50885 \n", " 14 SMART SM00283 \n", " \n", " Description Interpro Accession \\\n", " 0 HAMP IPR003660 \n", " 1 NaN NaN \n", " 2 NaN NaN \n", " 3 HAMP domain IPR003660 \n", " 4 NaN NaN \n", " 5 NaN NaN \n", " 6 Cache domain IPR033479 \n", " 7 NaN NaN \n", " 8 NaN NaN \n", " 9 Methyl-accepting chemotaxis protein (MCP) sign... IPR004089 \n", " 10 Bacterial chemotaxis sensory transducers domai... IPR004089 \n", " 11 NaN IPR003660 \n", " 12 MCP_signal NaN \n", " 13 HAMP domain profile. IPR003660 \n", " 14 NaN IPR004089 \n", " \n", " Interpro Description Amino Acid Start \\\n", " 0 HAMP domain 383 \n", " 1 NaN 381 \n", " 2 NaN 64 \n", " 3 HAMP domain 383 \n", " 4 NaN 403 \n", " 5 NaN 238 \n", " 6 Double Cache domain 1 47 \n", " 7 NaN 584 \n", " 8 NaN 73 \n", " 9 Methyl-accepting chemotaxis protein (MCP) sign... 495 \n", " 10 Methyl-accepting chemotaxis protein (MCP) sign... 440 \n", " 11 HAMP domain 381 \n", " 12 NaN 477 \n", " 13 HAMP domain 381 \n", " 14 Methyl-accepting chemotaxis protein (MCP) sign... 450 \n", " \n", " Amino Acid Stop E-value \n", " 0 431 8.02722E-7 \n", " 1 712 1.2E-78 \n", " 2 72 2.7E-40 \n", " 3 431 1.3E-8 \n", " 4 712 8.89E-82 \n", " 5 341 2.7E-40 \n", " 6 331 1.2E-16 \n", " 7 604 - \n", " 8 237 2.7E-40 \n", " 9 678 2.6E-45 \n", " 10 676 48.239 \n", " 11 435 3.0E-11 \n", " 12 672 3.08124E-57 \n", " 13 435 10.351 \n", " 14 711 8.7E-86 },\n", " 'sbw25__pflu0917': {'Gene Feature Overview': 1\n", " 0 \n", " Strain Pseudomonas fluorescens SBW25\n", " Locus Tag PFLU0917\n", " Name NaN\n", " Replicon chromosome\n", " Genomic location 1018094 - 1018918 (+ strand),\n", " 'Cross-References': 1\n", " 0 \n", " RefSeq YP_002870579.1\n", " GI 229588460\n", " Entrez 7816632,\n", " 'Product': 1\n", " 0 \n", " Feature Type CDS\n", " Coding Frame 1\n", " Product\\tName putative exported peptidase\n", " Synonyms NaN\n", " Evidence for Translation NaN\n", " Charge (pH 7) 1.89\n", " Kyte-Doolittle Hydrophobicity Value -0.234\n", " Molecular Weight (kDa) 29338.2\n", " Isoelectric Point (pI) 8.19,\n", " 'Subcellular localization': Confidence \\\n", " Localization \n", " Unknown Class 3 \n", " Individual Mappings Localization Confidence PMID Unknown Class 3 ... \n", " Additional evidence for subcellular localization NaN \n", " \n", " PMID \n", " Localization \n", " Unknown 20472543.0 \n", " Individual Mappings NaN \n", " Additional evidence for subcellular localization NaN ,\n", " 'Pathogen Association Analysis': 1\n", " 0 \n", " Results Common Found in both pathogen and nonpathogeni...,\n", " 'Orthologs/Comparative Genomics': 1\n", " 0 \n", " Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database\n", " Pseudomonas Ortholog Group POG004284 (550 members)\n", " Putative Inparalogs None Found,\n", " 'Interactions': 1\n", " 0 \n", " STRING database Search for predicted protein-protein interacti...,\n", " 'References': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Gene Ontology': Ontology Accession Term \\\n", " 0 Molecular Function GO:0004222 metalloendopeptidase activity \n", " 1 Biological Process GO:0006508 proteolysis \n", " \n", " GO Evidence \\\n", " 0 ISM Inferred from Sequence Model Term mapped ... \n", " 1 ISM Inferred from Sequence Model Term mapped ... \n", " \n", " Evidence Ontology (ECO) Code Reference Comments \n", " 0 ECO:0000259 match to InterPro signature eviden... NaN NaN \n", " 1 ECO:0000259 match to InterPro signature eviden... NaN NaN ,\n", " 'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Functional Predictions from Interpro': Analysis Accession \\\n", " 0 Pfam PF01435 \n", " 1 ProSiteProfiles PS51257 \n", " 2 CDD cd07331 \n", " \n", " Description Interpro Accession \\\n", " 0 Peptidase family M48 IPR001915 \n", " 1 Prokaryotic membrane lipoprotein lipid attachm... NaN \n", " 2 M48C_Oma1_like NaN \n", " \n", " Interpro Description Amino Acid Start Amino Acid Stop E-value \n", " 0 Peptidase M48 75 259 9.300000e-35 \n", " 1 NaN 1 21 6.000000e+00 \n", " 2 NaN 80 264 6.944380e-83 }}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results object is a two-fold nested dictionary: \n", " results \n", " | \n", " +---sbw25_pflu0915 \n", " | \n", " +---Gene Feature Overview \n", " +---Cross References \n", " +---Orthologs/Comparative Genomics \n", " . \n", " . \n", " . \n", " +---sbw25__pflu0916 \n", " | \n", " +---Gene Feature Overview \n", " +---Cross References \n", " +---Orthologs/Comparative Genomics \n", " . \n", " . \n", " . \n", " +---sbw25__pflu0917 \n", " | \n", " +---Gene Feature Overview \n", " +---Cross References \n", " +---Orthologs/Comparative Genomics \n", " . \n", " . \n", " . \n", " \n", "The lowest hierarchy (\"Gene Feature Overview\", \"Cross References\", \n", "etc) are the data tables downloaded from pseudomonas.com. They are \n", "instances of the pandas.DataFrame class, a highly versatile data \n", "structure which allows many advanced dataset operations like slicing, \n", "selection based on values and ranges, and much more. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### List all keys in the results dict" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['sbw25__pflu0915', 'sbw25__pflu0916', 'sbw25__pflu0917']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[k for k in results.keys()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get the data for one queried gene" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "pflu0915_data = results['sbw25__pflu0915']" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "['Gene Feature Overview',\n", " 'Cross-References',\n", " 'Product',\n", " 'Subcellular localization',\n", " 'Pathogen Association Analysis',\n", " 'Orthologs/Comparative Genomics',\n", " 'Interactions',\n", " 'References',\n", " 'Gene Ontology',\n", " 'Functional Classifications Manually Assigned by PseudoCAP',\n", " 'Functional Predictions from Interpro']" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# List all keys in the first gene.\n", "[k for k in pflu0915_data]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Display one table " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AnalysisAccessionDescriptionInterpro AccessionInterpro DescriptionAmino Acid StartAmino Acid StopE-value
0Gene3DG3DSA:3.90.1680.10NaNIPR036590SOS response associated peptidase-like12063.400000e-46
1SUPERFAMILYSSF143081NaNIPR036590SOS response associated peptidase-like22051.090000e-47
2PfamPF02586SOS response associated peptidase (SRAP)IPR003738SOS response associated peptidase (SRAP)11921.600000e-38
\n", "
" ], "text/plain": [ " Analysis Accession Description \\\n", "0 Gene3D G3DSA:3.90.1680.10 NaN \n", "1 SUPERFAMILY SSF143081 NaN \n", "2 Pfam PF02586 SOS response associated peptidase (SRAP) \n", "\n", " Interpro Accession Interpro Description \\\n", "0 IPR036590 SOS response associated peptidase-like \n", "1 IPR036590 SOS response associated peptidase-like \n", "2 IPR003738 SOS response associated peptidase (SRAP) \n", "\n", " Amino Acid Start Amino Acid Stop E-value \n", "0 1 206 3.400000e-46 \n", "1 2 205 1.090000e-47 \n", "2 1 192 1.600000e-38 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Display the functional predictions from Interpro.\n", "display(pflu0915_data['Functional Predictions from Interpro'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Display a given table for all three genes " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "sbw25__pflu0915\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AnalysisAccessionDescriptionInterpro AccessionInterpro DescriptionAmino Acid StartAmino Acid StopE-value
0Gene3DG3DSA:3.90.1680.10NaNIPR036590SOS response associated peptidase-like12063.400000e-46
1SUPERFAMILYSSF143081NaNIPR036590SOS response associated peptidase-like22051.090000e-47
2PfamPF02586SOS response associated peptidase (SRAP)IPR003738SOS response associated peptidase (SRAP)11921.600000e-38
\n", "
" ], "text/plain": [ " Analysis Accession Description \\\n", "0 Gene3D G3DSA:3.90.1680.10 NaN \n", "1 SUPERFAMILY SSF143081 NaN \n", "2 Pfam PF02586 SOS response associated peptidase (SRAP) \n", "\n", " Interpro Accession Interpro Description \\\n", "0 IPR036590 SOS response associated peptidase-like \n", "1 IPR036590 SOS response associated peptidase-like \n", "2 IPR003738 SOS response associated peptidase (SRAP) \n", "\n", " Amino Acid Start Amino Acid Stop E-value \n", "0 1 206 3.400000e-46 \n", "1 2 205 1.090000e-47 \n", "2 1 192 1.600000e-38 " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "sbw25__pflu0916\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AnalysisAccessionDescriptionInterpro AccessionInterpro DescriptionAmino Acid StartAmino Acid StopE-value
0CDDcd06225HAMPIPR003660HAMP domain3834318.02722E-7
1Gene3DG3DSA:1.10.287.950NaNNaNNaN3817121.2E-78
2Gene3DG3DSA:3.30.450.20NaNNaNNaN64722.7E-40
3PfamPF00672HAMP domainIPR003660HAMP domain3834311.3E-8
4SUPERFAMILYSSF58104NaNNaNNaN4037128.89E-82
5Gene3DG3DSA:3.30.450.20NaNNaNNaN2383412.7E-40
6PfamPF02743Cache domainIPR033479Double Cache domain 1473311.2E-16
7CoilsCoilNaNNaNNaN584604-
8Gene3DG3DSA:3.30.450.20NaNNaNNaN732372.7E-40
9PfamPF00015Methyl-accepting chemotaxis protein (MCP) sign...IPR004089Methyl-accepting chemotaxis protein (MCP) sign...4956782.6E-45
10ProSiteProfilesPS50111Bacterial chemotaxis sensory transducers domai...IPR004089Methyl-accepting chemotaxis protein (MCP) sign...44067648.239
11SMARTSM00304NaNIPR003660HAMP domain3814353.0E-11
12CDDcd11386MCP_signalNaNNaN4776723.08124E-57
13ProSiteProfilesPS50885HAMP domain profile.IPR003660HAMP domain38143510.351
14SMARTSM00283NaNIPR004089Methyl-accepting chemotaxis protein (MCP) sign...4507118.7E-86
\n", "
" ], "text/plain": [ " Analysis Accession \\\n", "0 CDD cd06225 \n", "1 Gene3D G3DSA:1.10.287.950 \n", "2 Gene3D G3DSA:3.30.450.20 \n", "3 Pfam PF00672 \n", "4 SUPERFAMILY SSF58104 \n", "5 Gene3D G3DSA:3.30.450.20 \n", "6 Pfam PF02743 \n", "7 Coils Coil \n", "8 Gene3D G3DSA:3.30.450.20 \n", "9 Pfam PF00015 \n", "10 ProSiteProfiles PS50111 \n", "11 SMART SM00304 \n", "12 CDD cd11386 \n", "13 ProSiteProfiles PS50885 \n", "14 SMART SM00283 \n", "\n", " Description Interpro Accession \\\n", "0 HAMP IPR003660 \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 HAMP domain IPR003660 \n", "4 NaN NaN \n", "5 NaN NaN \n", "6 Cache domain IPR033479 \n", "7 NaN NaN \n", "8 NaN NaN \n", "9 Methyl-accepting chemotaxis protein (MCP) sign... IPR004089 \n", "10 Bacterial chemotaxis sensory transducers domai... IPR004089 \n", "11 NaN IPR003660 \n", "12 MCP_signal NaN \n", "13 HAMP domain profile. IPR003660 \n", "14 NaN IPR004089 \n", "\n", " Interpro Description Amino Acid Start \\\n", "0 HAMP domain 383 \n", "1 NaN 381 \n", "2 NaN 64 \n", "3 HAMP domain 383 \n", "4 NaN 403 \n", "5 NaN 238 \n", "6 Double Cache domain 1 47 \n", "7 NaN 584 \n", "8 NaN 73 \n", "9 Methyl-accepting chemotaxis protein (MCP) sign... 495 \n", "10 Methyl-accepting chemotaxis protein (MCP) sign... 440 \n", "11 HAMP domain 381 \n", "12 NaN 477 \n", "13 HAMP domain 381 \n", "14 Methyl-accepting chemotaxis protein (MCP) sign... 450 \n", "\n", " Amino Acid Stop E-value \n", "0 431 8.02722E-7 \n", "1 712 1.2E-78 \n", "2 72 2.7E-40 \n", "3 431 1.3E-8 \n", "4 712 8.89E-82 \n", "5 341 2.7E-40 \n", "6 331 1.2E-16 \n", "7 604 - \n", "8 237 2.7E-40 \n", "9 678 2.6E-45 \n", "10 676 48.239 \n", "11 435 3.0E-11 \n", "12 672 3.08124E-57 \n", "13 435 10.351 \n", "14 711 8.7E-86 " ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n", "sbw25__pflu0917\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AnalysisAccessionDescriptionInterpro AccessionInterpro DescriptionAmino Acid StartAmino Acid StopE-value
0PfamPF01435Peptidase family M48IPR001915Peptidase M48752599.300000e-35
1ProSiteProfilesPS51257Prokaryotic membrane lipoprotein lipid attachm...NaNNaN1216.000000e+00
2CDDcd07331M48C_Oma1_likeNaNNaN802646.944380e-83
\n", "
" ], "text/plain": [ " Analysis Accession \\\n", "0 Pfam PF01435 \n", "1 ProSiteProfiles PS51257 \n", "2 CDD cd07331 \n", "\n", " Description Interpro Accession \\\n", "0 Peptidase family M48 IPR001915 \n", "1 Prokaryotic membrane lipoprotein lipid attachm... NaN \n", "2 M48C_Oma1_like NaN \n", "\n", " Interpro Description Amino Acid Start Amino Acid Stop E-value \n", "0 Peptidase M48 75 259 9.300000e-35 \n", "1 NaN 1 21 6.000000e+00 \n", "2 NaN 80 264 6.944380e-83 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Display functional predictions from all three genes.\n", "for f in results.keys():\n", " print(\"\\n\\n\")\n", " print(f)\n", " display(results[f]['Functional Predictions from Interpro'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Select all rows with a given value in one column " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "# Display all Pfam analysis" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "# Temporary list of data\n", "tmp = []\n", "\n", "# Iterate all three results.\n", "for q,r in results.items():\n", " # Take the functional predictions\n", " f = r['Functional Predictions from Interpro']\n", " # Select only rows where Analysis is \"Pfam\"\n", " pfam = f[f['Analysis'] == 'Pfam']\n", " # Add a column to denote the gene.\n", " newcol = [q]*len(pfam)\n", " pfam.insert(0, value=newcol, column=\"Feature\")\n", " \n", " # Append to the temporary holder.\n", " tmp.append(pfam)\n", "\n", "# Concatenate into one pandas DataFrame\n", "tmp = pandas.concat(tmp)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FeatureAnalysisAccessionDescriptionInterpro AccessionInterpro DescriptionAmino Acid StartAmino Acid StopE-value
2sbw25__pflu0915PfamPF02586SOS response associated peptidase (SRAP)IPR003738SOS response associated peptidase (SRAP)11921.6e-38
3sbw25__pflu0916PfamPF00672HAMP domainIPR003660HAMP domain3834311.3E-8
6sbw25__pflu0916PfamPF02743Cache domainIPR033479Double Cache domain 1473311.2E-16
9sbw25__pflu0916PfamPF00015Methyl-accepting chemotaxis protein (MCP) sign...IPR004089Methyl-accepting chemotaxis protein (MCP) sign...4956782.6E-45
0sbw25__pflu0917PfamPF01435Peptidase family M48IPR001915Peptidase M48752599.3e-35
\n", "
" ], "text/plain": [ " Feature Analysis Accession \\\n", "2 sbw25__pflu0915 Pfam PF02586 \n", "3 sbw25__pflu0916 Pfam PF00672 \n", "6 sbw25__pflu0916 Pfam PF02743 \n", "9 sbw25__pflu0916 Pfam PF00015 \n", "0 sbw25__pflu0917 Pfam PF01435 \n", "\n", " Description Interpro Accession \\\n", "2 SOS response associated peptidase (SRAP) IPR003738 \n", "3 HAMP domain IPR003660 \n", "6 Cache domain IPR033479 \n", "9 Methyl-accepting chemotaxis protein (MCP) sign... IPR004089 \n", "0 Peptidase family M48 IPR001915 \n", "\n", " Interpro Description Amino Acid Start \\\n", "2 SOS response associated peptidase (SRAP) 1 \n", "3 HAMP domain 383 \n", "6 Double Cache domain 1 47 \n", "9 Methyl-accepting chemotaxis protein (MCP) sign... 495 \n", "0 Peptidase M48 75 \n", "\n", " Amino Acid Stop E-value \n", "2 192 1.6e-38 \n", "3 431 1.3E-8 \n", "6 331 1.2E-16 \n", "9 678 2.6E-45 \n", "0 259 9.3e-35 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(tmp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save to disk" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'sbw25.json'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scraper.to_json(results, outfile=\"sbw25.json\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read results from disk" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "loaded = scraper.from_json('sbw25.json')" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'sbw25__pflu0915': {'Gene Feature Overview': 1\n", " Genomic location 1015020 - 1015640 (+ strand)\n", " Locus Tag PFLU0915\n", " Name None\n", " Replicon chromosome\n", " Strain Pseudomonas fluorescens SBW25,\n", " 'Cross-References': 1\n", " Entrez 7816630\n", " GI 229588458\n", " RefSeq YP_002870577.1,\n", " 'Product': 1\n", " Charge (pH 7) 2.14\n", " Coding Frame 1\n", " Evidence for Translation None\n", " Feature Type CDS\n", " Isoelectric Point (pI) 8.48\n", " Kyte-Doolittle Hydrophobicity Value -0.358\n", " Molecular Weight (kDa) 23.2\n", " Product\\tName hypothetical protein\n", " Synonyms None,\n", " 'Subcellular localization': Confidence \\\n", " Additional evidence for subcellular localization None \n", " Individual Mappings Localization Confidence PMID Unknown Class 3 ... \n", " Unknown Class 3 \n", " \n", " PMID \n", " Additional evidence for subcellular localization NaN \n", " Individual Mappings NaN \n", " Unknown 20472543.0 ,\n", " 'Pathogen Association Analysis': 1\n", " Results Common Found in both pathogen and nonpathogeni...,\n", " 'Orthologs/Comparative Genomics': 1\n", " Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database\n", " Pseudomonas Ortholog Group POG004285 (549 members)\n", " Putative Inparalogs None Found,\n", " 'Interactions': 1\n", " STRING database Search for predicted protein-protein interacti...,\n", " 'References': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Gene Ontology': Empty DataFrame\n", " Columns: [Ontology, Accession, Term, GO Evidence, Evidence Ontology (ECO) Code, Reference, Comments]\n", " Index: [],\n", " 'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Functional Predictions from Interpro': Analysis Accession Description \\\n", " 0 Gene3D G3DSA:3.90.1680.10 None \n", " 1 SUPERFAMILY SSF143081 None \n", " 2 Pfam PF02586 SOS response associated peptidase (SRAP) \n", " \n", " Interpro Accession Interpro Description \\\n", " 0 IPR036590 SOS response associated peptidase-like \n", " 1 IPR036590 SOS response associated peptidase-like \n", " 2 IPR003738 SOS response associated peptidase (SRAP) \n", " \n", " Amino Acid Start Amino Acid Stop E-value \n", " 0 1 206 3.400000e-46 \n", " 1 2 205 1.090000e-47 \n", " 2 1 192 1.600000e-38 },\n", " 'sbw25__pflu0916': {'Gene Feature Overview': 1\n", " Genomic location 1015719 - 1017857 (- strand)\n", " Locus Tag PFLU0916\n", " Name None\n", " Replicon chromosome\n", " Strain Pseudomonas fluorescens SBW25,\n", " 'Cross-References': 1\n", " Entrez 7816631\n", " GI 229588459\n", " INSDC CAY47182.1\n", " RefSeq YP_002870578.1\n", " UniParc UPI00019D9DE0\n", " UniProtKB Acc C3KBK5\n", " UniProtKB ID C3KBK5_PSEFS\n", " UniRef100 UniRef100_C3KBK5\n", " UniRef50 UniRef50_Q4K6C9\n", " UniRef90 UniRef90_C3KBK5,\n", " 'Product': 1\n", " Charge (pH 7) -20.37\n", " Coding Frame 1\n", " Evidence for Translation None\n", " Feature Type CDS\n", " Isoelectric Point (pI) 4.78\n", " Kyte-Doolittle Hydrophobicity Value -0.035\n", " Molecular Weight (kDa) 76.4\n", " Product\\tName putative methyl-accepting chemotaxis protein\n", " Synonyms None,\n", " 'Subcellular localization': Confidence \\\n", " Additional evidence for subcellular localization None \n", " Cytoplasmic Membrane Class 3 \n", " Individual Mappings Localization Confidence PMID Cytoplasmic Memb... \n", " \n", " PMID \n", " Additional evidence for subcellular localization NaN \n", " Cytoplasmic Membrane 20472543.0 \n", " Individual Mappings NaN ,\n", " 'Pathogen Association Analysis': 1\n", " Results Common Found in both pathogen and nonpathogeni...,\n", " 'Orthologs/Comparative Genomics': 1\n", " Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database\n", " Pseudomonas Ortholog Group POG002657 (1824 members)\n", " Putative Inparalogs None Found,\n", " 'Interactions': 1\n", " STRING database Search for predicted protein-protein interacti...,\n", " 'References': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Gene Ontology': Ontology Accession Term \\\n", " 0 Biological Process GO:0007165 signal transduction \n", " 1 Cellular Component GO:0016021 integral component of membrane \n", " 2 Cellular Component GO:0016020 membrane \n", " \n", " GO Evidence \\\n", " 0 ISM Inferred from Sequence Model Term mapped ... \n", " 1 ISM Inferred from Sequence Model Term mapped ... \n", " 2 ISM Inferred from Sequence Model Term mapped ... \n", " \n", " Evidence Ontology (ECO) Code Reference Comments \n", " 0 ECO:0000259 match to InterPro signature eviden... NaN NaN \n", " 1 ECO:0000259 match to InterPro signature eviden... NaN NaN \n", " 2 ECO:0000259 match to InterPro signature eviden... NaN NaN ,\n", " 'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Functional Predictions from Interpro': Analysis Accession \\\n", " 0 CDD cd06225 \n", " 1 Gene3D G3DSA:1.10.287.950 \n", " 10 ProSiteProfiles PS50111 \n", " 11 SMART SM00304 \n", " 12 CDD cd11386 \n", " 13 ProSiteProfiles PS50885 \n", " 14 SMART SM00283 \n", " 2 Gene3D G3DSA:3.30.450.20 \n", " 3 Pfam PF00672 \n", " 4 SUPERFAMILY SSF58104 \n", " 5 Gene3D G3DSA:3.30.450.20 \n", " 6 Pfam PF02743 \n", " 7 Coils Coil \n", " 8 Gene3D G3DSA:3.30.450.20 \n", " 9 Pfam PF00015 \n", " \n", " Description Interpro Accession \\\n", " 0 HAMP IPR003660 \n", " 1 None None \n", " 10 Bacterial chemotaxis sensory transducers domai... IPR004089 \n", " 11 None IPR003660 \n", " 12 MCP_signal None \n", " 13 HAMP domain profile. IPR003660 \n", " 14 None IPR004089 \n", " 2 None None \n", " 3 HAMP domain IPR003660 \n", " 4 None None \n", " 5 None None \n", " 6 Cache domain IPR033479 \n", " 7 None None \n", " 8 None None \n", " 9 Methyl-accepting chemotaxis protein (MCP) sign... IPR004089 \n", " \n", " Interpro Description Amino Acid Start \\\n", " 0 HAMP domain 383 \n", " 1 None 381 \n", " 10 Methyl-accepting chemotaxis protein (MCP) sign... 440 \n", " 11 HAMP domain 381 \n", " 12 None 477 \n", " 13 HAMP domain 381 \n", " 14 Methyl-accepting chemotaxis protein (MCP) sign... 450 \n", " 2 None 64 \n", " 3 HAMP domain 383 \n", " 4 None 403 \n", " 5 None 238 \n", " 6 Double Cache domain 1 47 \n", " 7 None 584 \n", " 8 None 73 \n", " 9 Methyl-accepting chemotaxis protein (MCP) sign... 495 \n", " \n", " Amino Acid Stop E-value \n", " 0 431 8.02722E-7 \n", " 1 712 1.2E-78 \n", " 10 676 48.239 \n", " 11 435 3.0E-11 \n", " 12 672 3.08124E-57 \n", " 13 435 10.351 \n", " 14 711 8.7E-86 \n", " 2 72 2.7E-40 \n", " 3 431 1.3E-8 \n", " 4 712 8.89E-82 \n", " 5 341 2.7E-40 \n", " 6 331 1.2E-16 \n", " 7 604 - \n", " 8 237 2.7E-40 \n", " 9 678 2.6E-45 },\n", " 'sbw25__pflu0917': {'Gene Feature Overview': 1\n", " Genomic location 1018094 - 1018918 (+ strand)\n", " Locus Tag PFLU0917\n", " Name None\n", " Replicon chromosome\n", " Strain Pseudomonas fluorescens SBW25,\n", " 'Cross-References': 1\n", " Entrez 7816632\n", " GI 229588460\n", " RefSeq YP_002870579.1,\n", " 'Product': 1\n", " Charge (pH 7) 1.89\n", " Coding Frame 1\n", " Evidence for Translation None\n", " Feature Type CDS\n", " Isoelectric Point (pI) 8.19\n", " Kyte-Doolittle Hydrophobicity Value -0.234\n", " Molecular Weight (kDa) 29338.2\n", " Product\\tName putative exported peptidase\n", " Synonyms None,\n", " 'Subcellular localization': Confidence \\\n", " Additional evidence for subcellular localization None \n", " Individual Mappings Localization Confidence PMID Unknown Class 3 ... \n", " Unknown Class 3 \n", " \n", " PMID \n", " Additional evidence for subcellular localization NaN \n", " Individual Mappings NaN \n", " Unknown 20472543.0 ,\n", " 'Pathogen Association Analysis': 1\n", " Results Common Found in both pathogen and nonpathogeni...,\n", " 'Orthologs/Comparative Genomics': 1\n", " Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database\n", " Pseudomonas Ortholog Group POG004284 (550 members)\n", " Putative Inparalogs None Found,\n", " 'Interactions': 1\n", " STRING database Search for predicted protein-protein interacti...,\n", " 'References': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Gene Ontology': Ontology Accession Term \\\n", " 0 Molecular Function GO:0004222 metalloendopeptidase activity \n", " 1 Biological Process GO:0006508 proteolysis \n", " \n", " GO Evidence \\\n", " 0 ISM Inferred from Sequence Model Term mapped ... \n", " 1 ISM Inferred from Sequence Model Term mapped ... \n", " \n", " Evidence Ontology (ECO) Code Reference Comments \n", " 0 ECO:0000259 match to InterPro signature eviden... NaN NaN \n", " 1 ECO:0000259 match to InterPro signature eviden... NaN NaN ,\n", " 'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame\n", " Columns: []\n", " Index: [],\n", " 'Functional Predictions from Interpro': Analysis Accession \\\n", " 0 Pfam PF01435 \n", " 1 ProSiteProfiles PS51257 \n", " 2 CDD cd07331 \n", " \n", " Description Interpro Accession \\\n", " 0 Peptidase family M48 IPR001915 \n", " 1 Prokaryotic membrane lipoprotein lipid attachm... None \n", " 2 M48C_Oma1_like None \n", " \n", " Interpro Description Amino Acid Start Amino Acid Stop E-value \n", " 0 Peptidase M48 75 259 9.300000e-35 \n", " 1 None 1 21 6.000000e+00 \n", " 2 None 80 264 6.944380e-83 }}" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "loaded" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example for a query with references" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=PA14_67210&e1=1&term1=UCBPP-PA14&assembly=complete .\n", "INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=PA14_67210&e1=1&term1=UCBPP-PA14&assembly=complete .\n", "INFO: Good response from https://www.pseudomonas.com/feature/show?id=1661780 .\n", "INFO: Good response from https://www.pseudomonas.com/feature/show?id=1661780&view=functions .\n" ] } ], "source": [ "results_pa = scraper.run_query(query=pdc_query(strain='UCBPP-PA14', feature='PA14_67210'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Display references with proper html links" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", "
citation pubmed_url
0Allsopp LP, Wood TE, Howard SA, Maggiorelli F, Nolan LM, et al. (2017). RsmA and AmrZ orchestrate the assembly of all three type VI secretion systems in Pseudomonas aeruginosa. Proc Natl Acad Sci U S A 114(29): 7707-7712.link
" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results_pa[\"UCBPP-PA14__PA14_67210\"]['References'].style.format({'pubmed_url': lambda x: 'link'.format(x)})" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }