Example notebook showcasing the use of the PseudomonasDotCom Scraper as a programmable interface to the pseudomonas.com database¶
List of content¶
Load required python modules¶
[2]:
# The scraper
from GenDBScraper.PseudomonasDotComScraper import PseudomonasDotComScraper as scraper
# The query object (derived from collections.namedtuple)
from GenDBScraper.PseudomonasDotComScraper import pdc_query
# Regular expressions
import re
# pandas DataFrame, the workhorse datastructure
import pandas
Setting things up¶
[3]:
# We want to get data for three adjacent genes, pflu0915, pflu0916, pflu0917
queries = [pdc_query(strain='sbw25',feature=feat) for feat in ['pflu0915', 'pflu0916', 'pflu0917']]
[4]:
# Set up the scraper
scraper = scraper(query=queries)
[5]:
# Connect to the database
scraper.connect()
INFO: Good response from https://www.pseudomonas.com .
Retrieve the data¶
[6]:
results = scraper.run_query()
DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0915&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0915&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459887 .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459887&view=functions .
WARNING: No data found for 'Functional Classifications Manually Assigned by PseudoCAP'. Will return empty pandas.DataFrame.
DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0916&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0916&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459889 .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459889&view=functions .
WARNING: No data found for 'Functional Classifications Manually Assigned by PseudoCAP'. Will return empty pandas.DataFrame.
DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0917&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0917&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459891 .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459891&view=functions .
WARNING: No data found for 'Functional Classifications Manually Assigned by PseudoCAP'. Will return empty pandas.DataFrame.
Display the data¶
[7]:
results
[7]:
{'sbw25__pflu0915': {'Gene Feature Overview': 1
0
Strain Pseudomonas fluorescens SBW25
Locus Tag PFLU0915
Name NaN
Replicon chromosome
Genomic location 1015020 - 1015640 (+ strand),
'Cross-References': 1
0
RefSeq YP_002870577.1
GI 229588458
Entrez 7816630,
'Product': 1
0
Feature Type CDS
Coding Frame 1
Product\tName hypothetical protein
Synonyms NaN
Evidence for Translation NaN
Charge (pH 7) 2.14
Kyte-Doolittle Hydrophobicity Value -0.358
Molecular Weight (kDa) 23.2
Isoelectric Point (pI) 8.48,
'Subcellular localization': Confidence \
Localization
Unknown Class 3
Individual Mappings Localization Confidence PMID Unknown Class 3 ...
Additional evidence for subcellular localization NaN
PMID
Localization
Unknown 20472543.0
Individual Mappings NaN
Additional evidence for subcellular localization NaN ,
'Pathogen Association Analysis': 1
0
Results Common Found in both pathogen and nonpathogeni...,
'Orthologs/Comparative Genomics': 1
0
Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database
Pseudomonas Ortholog Group POG004285 (549 members)
Putative Inparalogs None Found,
'Interactions': 1
0
STRING database Search for predicted protein-protein interacti...,
'References': Empty DataFrame
Columns: []
Index: [],
'Gene Ontology': Empty DataFrame
Columns: [Ontology, Accession, Term, GO Evidence, Evidence Ontology (ECO) Code, Reference, Comments]
Index: [],
'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
Columns: []
Index: [],
'Functional Predictions from Interpro': Analysis Accession Description \
0 Gene3D G3DSA:3.90.1680.10 NaN
1 SUPERFAMILY SSF143081 NaN
2 Pfam PF02586 SOS response associated peptidase (SRAP)
Interpro Accession Interpro Description \
0 IPR036590 SOS response associated peptidase-like
1 IPR036590 SOS response associated peptidase-like
2 IPR003738 SOS response associated peptidase (SRAP)
Amino Acid Start Amino Acid Stop E-value
0 1 206 3.400000e-46
1 2 205 1.090000e-47
2 1 192 1.600000e-38 },
'sbw25__pflu0916': {'Gene Feature Overview': 1
0
Strain Pseudomonas fluorescens SBW25
Locus Tag PFLU0916
Name NaN
Replicon chromosome
Genomic location 1015719 - 1017857 (- strand),
'Cross-References': 1
0
RefSeq YP_002870578.1
GI 229588459
Entrez 7816631
INSDC CAY47182.1
UniParc UPI00019D9DE0
UniProtKB Acc C3KBK5
UniProtKB ID C3KBK5_PSEFS
UniRef100 UniRef100_C3KBK5
UniRef50 UniRef50_Q4K6C9
UniRef90 UniRef90_C3KBK5,
'Product': 1
0
Feature Type CDS
Coding Frame 1
Product\tName putative methyl-accepting chemotaxis protein
Synonyms NaN
Evidence for Translation NaN
Charge (pH 7) -20.37
Kyte-Doolittle Hydrophobicity Value -0.035
Molecular Weight (kDa) 76.4
Isoelectric Point (pI) 4.78,
'Subcellular localization': Confidence \
Localization
Cytoplasmic Membrane Class 3
Individual Mappings Localization Confidence PMID Cytoplasmic Memb...
Additional evidence for subcellular localization NaN
PMID
Localization
Cytoplasmic Membrane 20472543.0
Individual Mappings NaN
Additional evidence for subcellular localization NaN ,
'Pathogen Association Analysis': 1
0
Results Common Found in both pathogen and nonpathogeni...,
'Orthologs/Comparative Genomics': 1
0
Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database
Pseudomonas Ortholog Group POG002657 (1824 members)
Putative Inparalogs None Found,
'Interactions': 1
0
STRING database Search for predicted protein-protein interacti...,
'References': Empty DataFrame
Columns: []
Index: [],
'Gene Ontology': Ontology Accession Term \
0 Biological Process GO:0007165 signal transduction
1 Cellular Component GO:0016021 integral component of membrane
2 Cellular Component GO:0016020 membrane
GO Evidence \
0 ISM Inferred from Sequence Model Term mapped ...
1 ISM Inferred from Sequence Model Term mapped ...
2 ISM Inferred from Sequence Model Term mapped ...
Evidence Ontology (ECO) Code Reference Comments
0 ECO:0000259 match to InterPro signature eviden... NaN NaN
1 ECO:0000259 match to InterPro signature eviden... NaN NaN
2 ECO:0000259 match to InterPro signature eviden... NaN NaN ,
'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
Columns: []
Index: [],
'Functional Predictions from Interpro': Analysis Accession \
0 CDD cd06225
1 Gene3D G3DSA:1.10.287.950
2 Gene3D G3DSA:3.30.450.20
3 Pfam PF00672
4 SUPERFAMILY SSF58104
5 Gene3D G3DSA:3.30.450.20
6 Pfam PF02743
7 Coils Coil
8 Gene3D G3DSA:3.30.450.20
9 Pfam PF00015
10 ProSiteProfiles PS50111
11 SMART SM00304
12 CDD cd11386
13 ProSiteProfiles PS50885
14 SMART SM00283
Description Interpro Accession \
0 HAMP IPR003660
1 NaN NaN
2 NaN NaN
3 HAMP domain IPR003660
4 NaN NaN
5 NaN NaN
6 Cache domain IPR033479
7 NaN NaN
8 NaN NaN
9 Methyl-accepting chemotaxis protein (MCP) sign... IPR004089
10 Bacterial chemotaxis sensory transducers domai... IPR004089
11 NaN IPR003660
12 MCP_signal NaN
13 HAMP domain profile. IPR003660
14 NaN IPR004089
Interpro Description Amino Acid Start \
0 HAMP domain 383
1 NaN 381
2 NaN 64
3 HAMP domain 383
4 NaN 403
5 NaN 238
6 Double Cache domain 1 47
7 NaN 584
8 NaN 73
9 Methyl-accepting chemotaxis protein (MCP) sign... 495
10 Methyl-accepting chemotaxis protein (MCP) sign... 440
11 HAMP domain 381
12 NaN 477
13 HAMP domain 381
14 Methyl-accepting chemotaxis protein (MCP) sign... 450
Amino Acid Stop E-value
0 431 8.02722E-7
1 712 1.2E-78
2 72 2.7E-40
3 431 1.3E-8
4 712 8.89E-82
5 341 2.7E-40
6 331 1.2E-16
7 604 -
8 237 2.7E-40
9 678 2.6E-45
10 676 48.239
11 435 3.0E-11
12 672 3.08124E-57
13 435 10.351
14 711 8.7E-86 },
'sbw25__pflu0917': {'Gene Feature Overview': 1
0
Strain Pseudomonas fluorescens SBW25
Locus Tag PFLU0917
Name NaN
Replicon chromosome
Genomic location 1018094 - 1018918 (+ strand),
'Cross-References': 1
0
RefSeq YP_002870579.1
GI 229588460
Entrez 7816632,
'Product': 1
0
Feature Type CDS
Coding Frame 1
Product\tName putative exported peptidase
Synonyms NaN
Evidence for Translation NaN
Charge (pH 7) 1.89
Kyte-Doolittle Hydrophobicity Value -0.234
Molecular Weight (kDa) 29338.2
Isoelectric Point (pI) 8.19,
'Subcellular localization': Confidence \
Localization
Unknown Class 3
Individual Mappings Localization Confidence PMID Unknown Class 3 ...
Additional evidence for subcellular localization NaN
PMID
Localization
Unknown 20472543.0
Individual Mappings NaN
Additional evidence for subcellular localization NaN ,
'Pathogen Association Analysis': 1
0
Results Common Found in both pathogen and nonpathogeni...,
'Orthologs/Comparative Genomics': 1
0
Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database
Pseudomonas Ortholog Group POG004284 (550 members)
Putative Inparalogs None Found,
'Interactions': 1
0
STRING database Search for predicted protein-protein interacti...,
'References': Empty DataFrame
Columns: []
Index: [],
'Gene Ontology': Ontology Accession Term \
0 Molecular Function GO:0004222 metalloendopeptidase activity
1 Biological Process GO:0006508 proteolysis
GO Evidence \
0 ISM Inferred from Sequence Model Term mapped ...
1 ISM Inferred from Sequence Model Term mapped ...
Evidence Ontology (ECO) Code Reference Comments
0 ECO:0000259 match to InterPro signature eviden... NaN NaN
1 ECO:0000259 match to InterPro signature eviden... NaN NaN ,
'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
Columns: []
Index: [],
'Functional Predictions from Interpro': Analysis Accession \
0 Pfam PF01435
1 ProSiteProfiles PS51257
2 CDD cd07331
Description Interpro Accession \
0 Peptidase family M48 IPR001915
1 Prokaryotic membrane lipoprotein lipid attachm... NaN
2 M48C_Oma1_like NaN
Interpro Description Amino Acid Start Amino Acid Stop E-value
0 Peptidase M48 75 259 9.300000e-35
1 NaN 1 21 6.000000e+00
2 NaN 80 264 6.944380e-83 }}
The results object is a two-fold nested dictionary:
results
|
+—sbw25_pflu0915
|
+—Gene Feature Overview
+—Cross References
+—Orthologs/Comparative Genomics
.
.
.
+—sbw25__pflu0916
|
+—Gene Feature Overview
+—Cross References
+—Orthologs/Comparative Genomics
.
.
.
+—sbw25__pflu0917
|
+—Gene Feature Overview
+—Cross References
+—Orthologs/Comparative Genomics
.
.
.
The lowest hierarchy (“Gene Feature Overview”, “Cross References”,
etc) are the data tables downloaded from pseudomonas.com. They are
instances of the pandas.DataFrame class, a highly versatile data
structure which allows many advanced dataset operations like slicing,
selection based on values and ranges, and much more.
List all keys in the results dict¶
[8]:
[k for k in results.keys()]
[8]:
['sbw25__pflu0915', 'sbw25__pflu0916', 'sbw25__pflu0917']
Get the data for one queried gene¶
[9]:
pflu0915_data = results['sbw25__pflu0915']
[10]:
# List all keys in the first gene.
[k for k in pflu0915_data]
[10]:
['Gene Feature Overview',
'Cross-References',
'Product',
'Subcellular localization',
'Pathogen Association Analysis',
'Orthologs/Comparative Genomics',
'Interactions',
'References',
'Gene Ontology',
'Functional Classifications Manually Assigned by PseudoCAP',
'Functional Predictions from Interpro']
Display one table¶
[11]:
# Display the functional predictions from Interpro.
display(pflu0915_data['Functional Predictions from Interpro'])
Analysis | Accession | Description | Interpro Accession | Interpro Description | Amino Acid Start | Amino Acid Stop | E-value | |
---|---|---|---|---|---|---|---|---|
0 | Gene3D | G3DSA:3.90.1680.10 | NaN | IPR036590 | SOS response associated peptidase-like | 1 | 206 | 3.400000e-46 |
1 | SUPERFAMILY | SSF143081 | NaN | IPR036590 | SOS response associated peptidase-like | 2 | 205 | 1.090000e-47 |
2 | Pfam | PF02586 | SOS response associated peptidase (SRAP) | IPR003738 | SOS response associated peptidase (SRAP) | 1 | 192 | 1.600000e-38 |
Display a given table for all three genes¶
[12]:
# Display functional predictions from all three genes.
for f in results.keys():
print("\n\n")
print(f)
display(results[f]['Functional Predictions from Interpro'])
sbw25__pflu0915
Analysis | Accession | Description | Interpro Accession | Interpro Description | Amino Acid Start | Amino Acid Stop | E-value | |
---|---|---|---|---|---|---|---|---|
0 | Gene3D | G3DSA:3.90.1680.10 | NaN | IPR036590 | SOS response associated peptidase-like | 1 | 206 | 3.400000e-46 |
1 | SUPERFAMILY | SSF143081 | NaN | IPR036590 | SOS response associated peptidase-like | 2 | 205 | 1.090000e-47 |
2 | Pfam | PF02586 | SOS response associated peptidase (SRAP) | IPR003738 | SOS response associated peptidase (SRAP) | 1 | 192 | 1.600000e-38 |
sbw25__pflu0916
Analysis | Accession | Description | Interpro Accession | Interpro Description | Amino Acid Start | Amino Acid Stop | E-value | |
---|---|---|---|---|---|---|---|---|
0 | CDD | cd06225 | HAMP | IPR003660 | HAMP domain | 383 | 431 | 8.02722E-7 |
1 | Gene3D | G3DSA:1.10.287.950 | NaN | NaN | NaN | 381 | 712 | 1.2E-78 |
2 | Gene3D | G3DSA:3.30.450.20 | NaN | NaN | NaN | 64 | 72 | 2.7E-40 |
3 | Pfam | PF00672 | HAMP domain | IPR003660 | HAMP domain | 383 | 431 | 1.3E-8 |
4 | SUPERFAMILY | SSF58104 | NaN | NaN | NaN | 403 | 712 | 8.89E-82 |
5 | Gene3D | G3DSA:3.30.450.20 | NaN | NaN | NaN | 238 | 341 | 2.7E-40 |
6 | Pfam | PF02743 | Cache domain | IPR033479 | Double Cache domain 1 | 47 | 331 | 1.2E-16 |
7 | Coils | Coil | NaN | NaN | NaN | 584 | 604 | - |
8 | Gene3D | G3DSA:3.30.450.20 | NaN | NaN | NaN | 73 | 237 | 2.7E-40 |
9 | Pfam | PF00015 | Methyl-accepting chemotaxis protein (MCP) sign... | IPR004089 | Methyl-accepting chemotaxis protein (MCP) sign... | 495 | 678 | 2.6E-45 |
10 | ProSiteProfiles | PS50111 | Bacterial chemotaxis sensory transducers domai... | IPR004089 | Methyl-accepting chemotaxis protein (MCP) sign... | 440 | 676 | 48.239 |
11 | SMART | SM00304 | NaN | IPR003660 | HAMP domain | 381 | 435 | 3.0E-11 |
12 | CDD | cd11386 | MCP_signal | NaN | NaN | 477 | 672 | 3.08124E-57 |
13 | ProSiteProfiles | PS50885 | HAMP domain profile. | IPR003660 | HAMP domain | 381 | 435 | 10.351 |
14 | SMART | SM00283 | NaN | IPR004089 | Methyl-accepting chemotaxis protein (MCP) sign... | 450 | 711 | 8.7E-86 |
sbw25__pflu0917
Analysis | Accession | Description | Interpro Accession | Interpro Description | Amino Acid Start | Amino Acid Stop | E-value | |
---|---|---|---|---|---|---|---|---|
0 | Pfam | PF01435 | Peptidase family M48 | IPR001915 | Peptidase M48 | 75 | 259 | 9.300000e-35 |
1 | ProSiteProfiles | PS51257 | Prokaryotic membrane lipoprotein lipid attachm... | NaN | NaN | 1 | 21 | 6.000000e+00 |
2 | CDD | cd07331 | M48C_Oma1_like | NaN | NaN | 80 | 264 | 6.944380e-83 |
Select all rows with a given value in one column¶
[13]:
# Display all Pfam analysis
[14]:
# Temporary list of data
tmp = []
# Iterate all three results.
for q,r in results.items():
# Take the functional predictions
f = r['Functional Predictions from Interpro']
# Select only rows where Analysis is "Pfam"
pfam = f[f['Analysis'] == 'Pfam']
# Add a column to denote the gene.
newcol = [q]*len(pfam)
pfam.insert(0, value=newcol, column="Feature")
# Append to the temporary holder.
tmp.append(pfam)
# Concatenate into one pandas DataFrame
tmp = pandas.concat(tmp)
[15]:
display(tmp)
Feature | Analysis | Accession | Description | Interpro Accession | Interpro Description | Amino Acid Start | Amino Acid Stop | E-value | |
---|---|---|---|---|---|---|---|---|---|
2 | sbw25__pflu0915 | Pfam | PF02586 | SOS response associated peptidase (SRAP) | IPR003738 | SOS response associated peptidase (SRAP) | 1 | 192 | 1.6e-38 |
3 | sbw25__pflu0916 | Pfam | PF00672 | HAMP domain | IPR003660 | HAMP domain | 383 | 431 | 1.3E-8 |
6 | sbw25__pflu0916 | Pfam | PF02743 | Cache domain | IPR033479 | Double Cache domain 1 | 47 | 331 | 1.2E-16 |
9 | sbw25__pflu0916 | Pfam | PF00015 | Methyl-accepting chemotaxis protein (MCP) sign... | IPR004089 | Methyl-accepting chemotaxis protein (MCP) sign... | 495 | 678 | 2.6E-45 |
0 | sbw25__pflu0917 | Pfam | PF01435 | Peptidase family M48 | IPR001915 | Peptidase M48 | 75 | 259 | 9.3e-35 |
Read results from disk¶
[17]:
loaded = scraper.from_json('sbw25.json')
[18]:
loaded
[18]:
{'sbw25__pflu0915': {'Gene Feature Overview': 1
Genomic location 1015020 - 1015640 (+ strand)
Locus Tag PFLU0915
Name None
Replicon chromosome
Strain Pseudomonas fluorescens SBW25,
'Cross-References': 1
Entrez 7816630
GI 229588458
RefSeq YP_002870577.1,
'Product': 1
Charge (pH 7) 2.14
Coding Frame 1
Evidence for Translation None
Feature Type CDS
Isoelectric Point (pI) 8.48
Kyte-Doolittle Hydrophobicity Value -0.358
Molecular Weight (kDa) 23.2
Product\tName hypothetical protein
Synonyms None,
'Subcellular localization': Confidence \
Additional evidence for subcellular localization None
Individual Mappings Localization Confidence PMID Unknown Class 3 ...
Unknown Class 3
PMID
Additional evidence for subcellular localization NaN
Individual Mappings NaN
Unknown 20472543.0 ,
'Pathogen Association Analysis': 1
Results Common Found in both pathogen and nonpathogeni...,
'Orthologs/Comparative Genomics': 1
Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database
Pseudomonas Ortholog Group POG004285 (549 members)
Putative Inparalogs None Found,
'Interactions': 1
STRING database Search for predicted protein-protein interacti...,
'References': Empty DataFrame
Columns: []
Index: [],
'Gene Ontology': Empty DataFrame
Columns: [Ontology, Accession, Term, GO Evidence, Evidence Ontology (ECO) Code, Reference, Comments]
Index: [],
'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
Columns: []
Index: [],
'Functional Predictions from Interpro': Analysis Accession Description \
0 Gene3D G3DSA:3.90.1680.10 None
1 SUPERFAMILY SSF143081 None
2 Pfam PF02586 SOS response associated peptidase (SRAP)
Interpro Accession Interpro Description \
0 IPR036590 SOS response associated peptidase-like
1 IPR036590 SOS response associated peptidase-like
2 IPR003738 SOS response associated peptidase (SRAP)
Amino Acid Start Amino Acid Stop E-value
0 1 206 3.400000e-46
1 2 205 1.090000e-47
2 1 192 1.600000e-38 },
'sbw25__pflu0916': {'Gene Feature Overview': 1
Genomic location 1015719 - 1017857 (- strand)
Locus Tag PFLU0916
Name None
Replicon chromosome
Strain Pseudomonas fluorescens SBW25,
'Cross-References': 1
Entrez 7816631
GI 229588459
INSDC CAY47182.1
RefSeq YP_002870578.1
UniParc UPI00019D9DE0
UniProtKB Acc C3KBK5
UniProtKB ID C3KBK5_PSEFS
UniRef100 UniRef100_C3KBK5
UniRef50 UniRef50_Q4K6C9
UniRef90 UniRef90_C3KBK5,
'Product': 1
Charge (pH 7) -20.37
Coding Frame 1
Evidence for Translation None
Feature Type CDS
Isoelectric Point (pI) 4.78
Kyte-Doolittle Hydrophobicity Value -0.035
Molecular Weight (kDa) 76.4
Product\tName putative methyl-accepting chemotaxis protein
Synonyms None,
'Subcellular localization': Confidence \
Additional evidence for subcellular localization None
Cytoplasmic Membrane Class 3
Individual Mappings Localization Confidence PMID Cytoplasmic Memb...
PMID
Additional evidence for subcellular localization NaN
Cytoplasmic Membrane 20472543.0
Individual Mappings NaN ,
'Pathogen Association Analysis': 1
Results Common Found in both pathogen and nonpathogeni...,
'Orthologs/Comparative Genomics': 1
Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database
Pseudomonas Ortholog Group POG002657 (1824 members)
Putative Inparalogs None Found,
'Interactions': 1
STRING database Search for predicted protein-protein interacti...,
'References': Empty DataFrame
Columns: []
Index: [],
'Gene Ontology': Ontology Accession Term \
0 Biological Process GO:0007165 signal transduction
1 Cellular Component GO:0016021 integral component of membrane
2 Cellular Component GO:0016020 membrane
GO Evidence \
0 ISM Inferred from Sequence Model Term mapped ...
1 ISM Inferred from Sequence Model Term mapped ...
2 ISM Inferred from Sequence Model Term mapped ...
Evidence Ontology (ECO) Code Reference Comments
0 ECO:0000259 match to InterPro signature eviden... NaN NaN
1 ECO:0000259 match to InterPro signature eviden... NaN NaN
2 ECO:0000259 match to InterPro signature eviden... NaN NaN ,
'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
Columns: []
Index: [],
'Functional Predictions from Interpro': Analysis Accession \
0 CDD cd06225
1 Gene3D G3DSA:1.10.287.950
10 ProSiteProfiles PS50111
11 SMART SM00304
12 CDD cd11386
13 ProSiteProfiles PS50885
14 SMART SM00283
2 Gene3D G3DSA:3.30.450.20
3 Pfam PF00672
4 SUPERFAMILY SSF58104
5 Gene3D G3DSA:3.30.450.20
6 Pfam PF02743
7 Coils Coil
8 Gene3D G3DSA:3.30.450.20
9 Pfam PF00015
Description Interpro Accession \
0 HAMP IPR003660
1 None None
10 Bacterial chemotaxis sensory transducers domai... IPR004089
11 None IPR003660
12 MCP_signal None
13 HAMP domain profile. IPR003660
14 None IPR004089
2 None None
3 HAMP domain IPR003660
4 None None
5 None None
6 Cache domain IPR033479
7 None None
8 None None
9 Methyl-accepting chemotaxis protein (MCP) sign... IPR004089
Interpro Description Amino Acid Start \
0 HAMP domain 383
1 None 381
10 Methyl-accepting chemotaxis protein (MCP) sign... 440
11 HAMP domain 381
12 None 477
13 HAMP domain 381
14 Methyl-accepting chemotaxis protein (MCP) sign... 450
2 None 64
3 HAMP domain 383
4 None 403
5 None 238
6 Double Cache domain 1 47
7 None 584
8 None 73
9 Methyl-accepting chemotaxis protein (MCP) sign... 495
Amino Acid Stop E-value
0 431 8.02722E-7
1 712 1.2E-78
10 676 48.239
11 435 3.0E-11
12 672 3.08124E-57
13 435 10.351
14 711 8.7E-86
2 72 2.7E-40
3 431 1.3E-8
4 712 8.89E-82
5 341 2.7E-40
6 331 1.2E-16
7 604 -
8 237 2.7E-40
9 678 2.6E-45 },
'sbw25__pflu0917': {'Gene Feature Overview': 1
Genomic location 1018094 - 1018918 (+ strand)
Locus Tag PFLU0917
Name None
Replicon chromosome
Strain Pseudomonas fluorescens SBW25,
'Cross-References': 1
Entrez 7816632
GI 229588460
RefSeq YP_002870579.1,
'Product': 1
Charge (pH 7) 1.89
Coding Frame 1
Evidence for Translation None
Feature Type CDS
Isoelectric Point (pI) 8.19
Kyte-Doolittle Hydrophobicity Value -0.234
Molecular Weight (kDa) 29338.2
Product\tName putative exported peptidase
Synonyms None,
'Subcellular localization': Confidence \
Additional evidence for subcellular localization None
Individual Mappings Localization Confidence PMID Unknown Class 3 ...
Unknown Class 3
PMID
Additional evidence for subcellular localization NaN
Individual Mappings NaN
Unknown 20472543.0 ,
'Pathogen Association Analysis': 1
Results Common Found in both pathogen and nonpathogeni...,
'Orthologs/Comparative Genomics': 1
Pseudomonas Ortholog Database View orthologs at Pseudomonas Ortholog Database
Pseudomonas Ortholog Group POG004284 (550 members)
Putative Inparalogs None Found,
'Interactions': 1
STRING database Search for predicted protein-protein interacti...,
'References': Empty DataFrame
Columns: []
Index: [],
'Gene Ontology': Ontology Accession Term \
0 Molecular Function GO:0004222 metalloendopeptidase activity
1 Biological Process GO:0006508 proteolysis
GO Evidence \
0 ISM Inferred from Sequence Model Term mapped ...
1 ISM Inferred from Sequence Model Term mapped ...
Evidence Ontology (ECO) Code Reference Comments
0 ECO:0000259 match to InterPro signature eviden... NaN NaN
1 ECO:0000259 match to InterPro signature eviden... NaN NaN ,
'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
Columns: []
Index: [],
'Functional Predictions from Interpro': Analysis Accession \
0 Pfam PF01435
1 ProSiteProfiles PS51257
2 CDD cd07331
Description Interpro Accession \
0 Peptidase family M48 IPR001915
1 Prokaryotic membrane lipoprotein lipid attachm... None
2 M48C_Oma1_like None
Interpro Description Amino Acid Start Amino Acid Stop E-value
0 Peptidase M48 75 259 9.300000e-35
1 None 1 21 6.000000e+00
2 None 80 264 6.944380e-83 }}
Example for a query with references¶
[19]:
results_pa = scraper.run_query(query=pdc_query(strain='UCBPP-PA14', feature='PA14_67210'))
DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=PA14_67210&e1=1&term1=UCBPP-PA14&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=PA14_67210&e1=1&term1=UCBPP-PA14&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1661780 .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1661780&view=functions .
Display references with proper html links¶
[23]:
results_pa["UCBPP-PA14__PA14_67210"]['References'].style.format({'pubmed_url': lambda x: '<a href={0}>link</a>'.format(x)})
[23]:
citation | pubmed_url | |
---|---|---|
0 | Allsopp LP, Wood TE, Howard SA, Maggiorelli F, Nolan LM, et al. (2017). RsmA and AmrZ orchestrate the assembly of all three type VI secretion systems in Pseudomonas aeruginosa. Proc Natl Acad Sci U S A 114(29): 7707-7712. | link |