Example notebook showcasing the use of the PseudomonasDotCom Scraper as a programmable interface to the pseudomonas.com database

Load required python modules

[2]:
# The scraper
from GenDBScraper.PseudomonasDotComScraper import PseudomonasDotComScraper as scraper

# The query object (derived from collections.namedtuple)
from GenDBScraper.PseudomonasDotComScraper import pdc_query

# Regular expressions
import re

# pandas DataFrame, the workhorse datastructure
import pandas

Setting things up

[3]:
# We want to get data for three adjacent genes, pflu0915, pflu0916, pflu0917
queries = [pdc_query(strain='sbw25',feature=feat) for feat in ['pflu0915', 'pflu0916', 'pflu0917']]
[4]:
# Set up the scraper
scraper = scraper(query=queries)
[5]:
# Connect to the database
scraper.connect()
INFO: Good response from https://www.pseudomonas.com .

Retrieve the data

[6]:
results = scraper.run_query()
DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0915&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0915&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459887 .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459887&view=functions .
WARNING: No data found for 'Functional Classifications Manually Assigned by PseudoCAP'. Will return empty pandas.DataFrame.
DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0916&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0916&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459889 .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459889&view=functions .
WARNING: No data found for 'Functional Classifications Manually Assigned by PseudoCAP'. Will return empty pandas.DataFrame.
DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0917&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=pflu0917&e1=1&term1=sbw25&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459891 .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1459891&view=functions .
WARNING: No data found for 'Functional Classifications Manually Assigned by PseudoCAP'. Will return empty pandas.DataFrame.

Display the data

[7]:
results
[7]:
{'sbw25__pflu0915': {'Gene Feature Overview':                                               1
  0
  Strain            Pseudomonas fluorescens SBW25
  Locus Tag                              PFLU0915
  Name                                        NaN
  Replicon                             chromosome
  Genomic location  1015020  - 1015640 (+ strand),
  'Cross-References':                      1
  0
  RefSeq  YP_002870577.1
  GI           229588458
  Entrez         7816630,
  'Product':                                                         1
  0
  Feature Type                                          CDS
  Coding Frame                                            1
  Product\tName                        hypothetical protein
  Synonyms                                              NaN
  Evidence for Translation                              NaN
  Charge (pH 7)                                        2.14
  Kyte-Doolittle Hydrophobicity Value                -0.358
  Molecular Weight (kDa)                               23.2
  Isoelectric Point (pI)                               8.48,
  'Subcellular localization':                                                                                          Confidence  \
  Localization
  Unknown                                                                                     Class 3
  Individual Mappings                               Localization Confidence PMID  Unknown Class 3 ...
  Additional evidence for subcellular localization                                                NaN

                                                          PMID
  Localization
  Unknown                                           20472543.0
  Individual Mappings                                      NaN
  Additional evidence for subcellular localization         NaN  ,
  'Pathogen Association Analysis':                                                          1
  0
  Results  Common Found in both pathogen and nonpathogeni...,
  'Orthologs/Comparative Genomics':                                                                              1
  0
  Pseudomonas Ortholog Database  View orthologs at Pseudomonas Ortholog Database
  Pseudomonas Ortholog Group                             POG004285 (549 members)
  Putative Inparalogs                                                 None Found,
  'Interactions':                                                                  1
  0
  STRING database  Search for predicted protein-protein interacti...,
  'References': Empty DataFrame
  Columns: []
  Index: [],
  'Gene Ontology': Empty DataFrame
  Columns: [Ontology, Accession, Term, GO Evidence, Evidence Ontology (ECO) Code, Reference, Comments]
  Index: [],
  'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
  Columns: []
  Index: [],
  'Functional Predictions from Interpro':       Analysis           Accession                               Description  \
  0       Gene3D  G3DSA:3.90.1680.10                                       NaN
  1  SUPERFAMILY           SSF143081                                       NaN
  2         Pfam             PF02586  SOS response associated peptidase (SRAP)

    Interpro Accession                      Interpro Description  \
  0          IPR036590    SOS response associated peptidase-like
  1          IPR036590    SOS response associated peptidase-like
  2          IPR003738  SOS response associated peptidase (SRAP)

     Amino Acid Start  Amino Acid Stop       E-value
  0                 1              206  3.400000e-46
  1                 2              205  1.090000e-47
  2                 1              192  1.600000e-38  },
 'sbw25__pflu0916': {'Gene Feature Overview':                                               1
  0
  Strain            Pseudomonas fluorescens SBW25
  Locus Tag                              PFLU0916
  Name                                        NaN
  Replicon                             chromosome
  Genomic location  1015719  - 1017857 (- strand),
  'Cross-References':                               1
  0
  RefSeq           YP_002870578.1
  GI                    229588459
  Entrez                  7816631
  INSDC                CAY47182.1
  UniParc           UPI00019D9DE0
  UniProtKB Acc            C3KBK5
  UniProtKB ID       C3KBK5_PSEFS
  UniRef100      UniRef100_C3KBK5
  UniRef50        UniRef50_Q4K6C9
  UniRef90        UniRef90_C3KBK5,
  'Product':                                                                                 1
  0
  Feature Type                                                                  CDS
  Coding Frame                                                                    1
  Product\tName                        putative methyl-accepting chemotaxis protein
  Synonyms                                                                      NaN
  Evidence for Translation                                                      NaN
  Charge (pH 7)                                                              -20.37
  Kyte-Doolittle Hydrophobicity Value                                        -0.035
  Molecular Weight (kDa)                                                       76.4
  Isoelectric Point (pI)                                                       4.78,
  'Subcellular localization':                                                                                          Confidence  \
  Localization
  Cytoplasmic Membrane                                                                        Class 3
  Individual Mappings                               Localization Confidence PMID  Cytoplasmic Memb...
  Additional evidence for subcellular localization                                                NaN

                                                          PMID
  Localization
  Cytoplasmic Membrane                              20472543.0
  Individual Mappings                                      NaN
  Additional evidence for subcellular localization         NaN  ,
  'Pathogen Association Analysis':                                                          1
  0
  Results  Common Found in both pathogen and nonpathogeni...,
  'Orthologs/Comparative Genomics':                                                                              1
  0
  Pseudomonas Ortholog Database  View orthologs at Pseudomonas Ortholog Database
  Pseudomonas Ortholog Group                            POG002657 (1824 members)
  Putative Inparalogs                                                 None Found,
  'Interactions':                                                                  1
  0
  STRING database  Search for predicted protein-protein interacti...,
  'References': Empty DataFrame
  Columns: []
  Index: [],
  'Gene Ontology':              Ontology   Accession                            Term  \
  0  Biological Process  GO:0007165             signal transduction
  1  Cellular Component  GO:0016021  integral component of membrane
  2  Cellular Component  GO:0016020                        membrane

                                           GO Evidence  \
  0  ISM Inferred from Sequence Model  Term mapped ...
  1  ISM Inferred from Sequence Model  Term mapped ...
  2  ISM Inferred from Sequence Model  Term mapped ...

                          Evidence Ontology (ECO) Code  Reference  Comments
  0  ECO:0000259 match to InterPro signature eviden...        NaN       NaN
  1  ECO:0000259 match to InterPro signature eviden...        NaN       NaN
  2  ECO:0000259 match to InterPro signature eviden...        NaN       NaN  ,
  'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
  Columns: []
  Index: [],
  'Functional Predictions from Interpro':            Analysis           Accession  \
  0               CDD             cd06225
  1            Gene3D  G3DSA:1.10.287.950
  2            Gene3D   G3DSA:3.30.450.20
  3              Pfam             PF00672
  4       SUPERFAMILY            SSF58104
  5            Gene3D   G3DSA:3.30.450.20
  6              Pfam             PF02743
  7             Coils                Coil
  8            Gene3D   G3DSA:3.30.450.20
  9              Pfam             PF00015
  10  ProSiteProfiles             PS50111
  11            SMART             SM00304
  12              CDD             cd11386
  13  ProSiteProfiles             PS50885
  14            SMART             SM00283

                                            Description Interpro Accession  \
  0                                                HAMP          IPR003660
  1                                                 NaN                NaN
  2                                                 NaN                NaN
  3                                         HAMP domain          IPR003660
  4                                                 NaN                NaN
  5                                                 NaN                NaN
  6                                        Cache domain          IPR033479
  7                                                 NaN                NaN
  8                                                 NaN                NaN
  9   Methyl-accepting chemotaxis protein (MCP) sign...          IPR004089
  10  Bacterial chemotaxis sensory transducers domai...          IPR004089
  11                                                NaN          IPR003660
  12                                         MCP_signal                NaN
  13                               HAMP domain profile.          IPR003660
  14                                                NaN          IPR004089

                                   Interpro Description  Amino Acid Start  \
  0                                         HAMP domain               383
  1                                                 NaN               381
  2                                                 NaN                64
  3                                         HAMP domain               383
  4                                                 NaN               403
  5                                                 NaN               238
  6                               Double Cache domain 1                47
  7                                                 NaN               584
  8                                                 NaN                73
  9   Methyl-accepting chemotaxis protein (MCP) sign...               495
  10  Methyl-accepting chemotaxis protein (MCP) sign...               440
  11                                        HAMP domain               381
  12                                                NaN               477
  13                                        HAMP domain               381
  14  Methyl-accepting chemotaxis protein (MCP) sign...               450

      Amino Acid Stop      E-value
  0               431   8.02722E-7
  1               712      1.2E-78
  2                72      2.7E-40
  3               431       1.3E-8
  4               712     8.89E-82
  5               341      2.7E-40
  6               331      1.2E-16
  7               604            -
  8               237      2.7E-40
  9               678      2.6E-45
  10              676       48.239
  11              435      3.0E-11
  12              672  3.08124E-57
  13              435       10.351
  14              711      8.7E-86  },
 'sbw25__pflu0917': {'Gene Feature Overview':                                               1
  0
  Strain            Pseudomonas fluorescens SBW25
  Locus Tag                              PFLU0917
  Name                                        NaN
  Replicon                             chromosome
  Genomic location  1018094  - 1018918 (+ strand),
  'Cross-References':                      1
  0
  RefSeq  YP_002870579.1
  GI           229588460
  Entrez         7816632,
  'Product':                                                                1
  0
  Feature Type                                                 CDS
  Coding Frame                                                   1
  Product\tName                        putative exported peptidase
  Synonyms                                                     NaN
  Evidence for Translation                                     NaN
  Charge (pH 7)                                               1.89
  Kyte-Doolittle Hydrophobicity Value                       -0.234
  Molecular Weight (kDa)                                   29338.2
  Isoelectric Point (pI)                                      8.19,
  'Subcellular localization':                                                                                          Confidence  \
  Localization
  Unknown                                                                                     Class 3
  Individual Mappings                               Localization Confidence PMID  Unknown Class 3 ...
  Additional evidence for subcellular localization                                                NaN

                                                          PMID
  Localization
  Unknown                                           20472543.0
  Individual Mappings                                      NaN
  Additional evidence for subcellular localization         NaN  ,
  'Pathogen Association Analysis':                                                          1
  0
  Results  Common Found in both pathogen and nonpathogeni...,
  'Orthologs/Comparative Genomics':                                                                              1
  0
  Pseudomonas Ortholog Database  View orthologs at Pseudomonas Ortholog Database
  Pseudomonas Ortholog Group                             POG004284 (550 members)
  Putative Inparalogs                                                 None Found,
  'Interactions':                                                                  1
  0
  STRING database  Search for predicted protein-protein interacti...,
  'References': Empty DataFrame
  Columns: []
  Index: [],
  'Gene Ontology':              Ontology   Accession                           Term  \
  0  Molecular Function  GO:0004222  metalloendopeptidase activity
  1  Biological Process  GO:0006508                    proteolysis

                                           GO Evidence  \
  0  ISM Inferred from Sequence Model  Term mapped ...
  1  ISM Inferred from Sequence Model  Term mapped ...

                          Evidence Ontology (ECO) Code  Reference  Comments
  0  ECO:0000259 match to InterPro signature eviden...        NaN       NaN
  1  ECO:0000259 match to InterPro signature eviden...        NaN       NaN  ,
  'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
  Columns: []
  Index: [],
  'Functional Predictions from Interpro':           Analysis Accession  \
  0             Pfam   PF01435
  1  ProSiteProfiles   PS51257
  2              CDD   cd07331

                                           Description Interpro Accession  \
  0                               Peptidase family M48          IPR001915
  1  Prokaryotic membrane lipoprotein lipid attachm...                NaN
  2                                     M48C_Oma1_like                NaN

    Interpro Description  Amino Acid Start  Amino Acid Stop       E-value
  0        Peptidase M48                75              259  9.300000e-35
  1                  NaN                 1               21  6.000000e+00
  2                  NaN                80              264  6.944380e-83  }}
The results object is a two-fold nested dictionary:
results
|
+—sbw25_pflu0915
|
+—Gene Feature Overview
+—Cross References
+—Orthologs/Comparative Genomics
.
.
.
+—sbw25__pflu0916
|
+—Gene Feature Overview
+—Cross References
+—Orthologs/Comparative Genomics
.
.
.
+—sbw25__pflu0917
|
+—Gene Feature Overview
+—Cross References
+—Orthologs/Comparative Genomics
.
.
.
The lowest hierarchy (“Gene Feature Overview”, “Cross References”,
etc) are the data tables downloaded from pseudomonas.com. They are
instances of the pandas.DataFrame class, a highly versatile data
structure which allows many advanced dataset operations like slicing,
selection based on values and ranges, and much more.

List all keys in the results dict

[8]:
[k for k in results.keys()]
[8]:
['sbw25__pflu0915', 'sbw25__pflu0916', 'sbw25__pflu0917']

Get the data for one queried gene

[9]:
pflu0915_data = results['sbw25__pflu0915']
[10]:
# List all keys in the first gene.
[k for k in pflu0915_data]
[10]:
['Gene Feature Overview',
 'Cross-References',
 'Product',
 'Subcellular localization',
 'Pathogen Association Analysis',
 'Orthologs/Comparative Genomics',
 'Interactions',
 'References',
 'Gene Ontology',
 'Functional Classifications Manually Assigned by PseudoCAP',
 'Functional Predictions from Interpro']

Display one table

[11]:
# Display the functional predictions from Interpro.
display(pflu0915_data['Functional Predictions from Interpro'])
Analysis Accession Description Interpro Accession Interpro Description Amino Acid Start Amino Acid Stop E-value
0 Gene3D G3DSA:3.90.1680.10 NaN IPR036590 SOS response associated peptidase-like 1 206 3.400000e-46
1 SUPERFAMILY SSF143081 NaN IPR036590 SOS response associated peptidase-like 2 205 1.090000e-47
2 Pfam PF02586 SOS response associated peptidase (SRAP) IPR003738 SOS response associated peptidase (SRAP) 1 192 1.600000e-38

Display a given table for all three genes

[12]:
# Display functional predictions from all three genes.
for f in results.keys():
    print("\n\n")
    print(f)
    display(results[f]['Functional Predictions from Interpro'])



sbw25__pflu0915
Analysis Accession Description Interpro Accession Interpro Description Amino Acid Start Amino Acid Stop E-value
0 Gene3D G3DSA:3.90.1680.10 NaN IPR036590 SOS response associated peptidase-like 1 206 3.400000e-46
1 SUPERFAMILY SSF143081 NaN IPR036590 SOS response associated peptidase-like 2 205 1.090000e-47
2 Pfam PF02586 SOS response associated peptidase (SRAP) IPR003738 SOS response associated peptidase (SRAP) 1 192 1.600000e-38



sbw25__pflu0916
Analysis Accession Description Interpro Accession Interpro Description Amino Acid Start Amino Acid Stop E-value
0 CDD cd06225 HAMP IPR003660 HAMP domain 383 431 8.02722E-7
1 Gene3D G3DSA:1.10.287.950 NaN NaN NaN 381 712 1.2E-78
2 Gene3D G3DSA:3.30.450.20 NaN NaN NaN 64 72 2.7E-40
3 Pfam PF00672 HAMP domain IPR003660 HAMP domain 383 431 1.3E-8
4 SUPERFAMILY SSF58104 NaN NaN NaN 403 712 8.89E-82
5 Gene3D G3DSA:3.30.450.20 NaN NaN NaN 238 341 2.7E-40
6 Pfam PF02743 Cache domain IPR033479 Double Cache domain 1 47 331 1.2E-16
7 Coils Coil NaN NaN NaN 584 604 -
8 Gene3D G3DSA:3.30.450.20 NaN NaN NaN 73 237 2.7E-40
9 Pfam PF00015 Methyl-accepting chemotaxis protein (MCP) sign... IPR004089 Methyl-accepting chemotaxis protein (MCP) sign... 495 678 2.6E-45
10 ProSiteProfiles PS50111 Bacterial chemotaxis sensory transducers domai... IPR004089 Methyl-accepting chemotaxis protein (MCP) sign... 440 676 48.239
11 SMART SM00304 NaN IPR003660 HAMP domain 381 435 3.0E-11
12 CDD cd11386 MCP_signal NaN NaN 477 672 3.08124E-57
13 ProSiteProfiles PS50885 HAMP domain profile. IPR003660 HAMP domain 381 435 10.351
14 SMART SM00283 NaN IPR004089 Methyl-accepting chemotaxis protein (MCP) sign... 450 711 8.7E-86



sbw25__pflu0917
Analysis Accession Description Interpro Accession Interpro Description Amino Acid Start Amino Acid Stop E-value
0 Pfam PF01435 Peptidase family M48 IPR001915 Peptidase M48 75 259 9.300000e-35
1 ProSiteProfiles PS51257 Prokaryotic membrane lipoprotein lipid attachm... NaN NaN 1 21 6.000000e+00
2 CDD cd07331 M48C_Oma1_like NaN NaN 80 264 6.944380e-83

Select all rows with a given value in one column

[13]:
# Display all Pfam analysis
[14]:
# Temporary list of data
tmp = []

# Iterate all three results.
for q,r in results.items():
    # Take the functional predictions
    f = r['Functional Predictions from Interpro']
    # Select only rows where Analysis is "Pfam"
    pfam = f[f['Analysis'] == 'Pfam']
    # Add a column to denote the gene.
    newcol = [q]*len(pfam)
    pfam.insert(0, value=newcol, column="Feature")

    # Append to the temporary holder.
    tmp.append(pfam)

# Concatenate into one pandas DataFrame
tmp = pandas.concat(tmp)
[15]:
display(tmp)
Feature Analysis Accession Description Interpro Accession Interpro Description Amino Acid Start Amino Acid Stop E-value
2 sbw25__pflu0915 Pfam PF02586 SOS response associated peptidase (SRAP) IPR003738 SOS response associated peptidase (SRAP) 1 192 1.6e-38
3 sbw25__pflu0916 Pfam PF00672 HAMP domain IPR003660 HAMP domain 383 431 1.3E-8
6 sbw25__pflu0916 Pfam PF02743 Cache domain IPR033479 Double Cache domain 1 47 331 1.2E-16
9 sbw25__pflu0916 Pfam PF00015 Methyl-accepting chemotaxis protein (MCP) sign... IPR004089 Methyl-accepting chemotaxis protein (MCP) sign... 495 678 2.6E-45
0 sbw25__pflu0917 Pfam PF01435 Peptidase family M48 IPR001915 Peptidase M48 75 259 9.3e-35

Save to disk

[16]:
scraper.to_json(results, outfile="sbw25.json")
[16]:
'sbw25.json'

Read results from disk

[17]:
loaded = scraper.from_json('sbw25.json')
[18]:
loaded
[18]:
{'sbw25__pflu0915': {'Gene Feature Overview':                                               1
  Genomic location  1015020  - 1015640 (+ strand)
  Locus Tag                              PFLU0915
  Name                                       None
  Replicon                             chromosome
  Strain            Pseudomonas fluorescens SBW25,
  'Cross-References':                      1
  Entrez         7816630
  GI           229588458
  RefSeq  YP_002870577.1,
  'Product':                                                         1
  Charge (pH 7)                                        2.14
  Coding Frame                                            1
  Evidence for Translation                             None
  Feature Type                                          CDS
  Isoelectric Point (pI)                               8.48
  Kyte-Doolittle Hydrophobicity Value                -0.358
  Molecular Weight (kDa)                               23.2
  Product\tName                        hypothetical protein
  Synonyms                                             None,
  'Subcellular localization':                                                                                          Confidence  \
  Additional evidence for subcellular localization                                               None
  Individual Mappings                               Localization Confidence PMID  Unknown Class 3 ...
  Unknown                                                                                     Class 3

                                                          PMID
  Additional evidence for subcellular localization         NaN
  Individual Mappings                                      NaN
  Unknown                                           20472543.0  ,
  'Pathogen Association Analysis':                                                          1
  Results  Common Found in both pathogen and nonpathogeni...,
  'Orthologs/Comparative Genomics':                                                                              1
  Pseudomonas Ortholog Database  View orthologs at Pseudomonas Ortholog Database
  Pseudomonas Ortholog Group                             POG004285 (549 members)
  Putative Inparalogs                                                 None Found,
  'Interactions':                                                                  1
  STRING database  Search for predicted protein-protein interacti...,
  'References': Empty DataFrame
  Columns: []
  Index: [],
  'Gene Ontology': Empty DataFrame
  Columns: [Ontology, Accession, Term, GO Evidence, Evidence Ontology (ECO) Code, Reference, Comments]
  Index: [],
  'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
  Columns: []
  Index: [],
  'Functional Predictions from Interpro':       Analysis           Accession                               Description  \
  0       Gene3D  G3DSA:3.90.1680.10                                      None
  1  SUPERFAMILY           SSF143081                                      None
  2         Pfam             PF02586  SOS response associated peptidase (SRAP)

    Interpro Accession                      Interpro Description  \
  0          IPR036590    SOS response associated peptidase-like
  1          IPR036590    SOS response associated peptidase-like
  2          IPR003738  SOS response associated peptidase (SRAP)

     Amino Acid Start  Amino Acid Stop       E-value
  0                 1              206  3.400000e-46
  1                 2              205  1.090000e-47
  2                 1              192  1.600000e-38  },
 'sbw25__pflu0916': {'Gene Feature Overview':                                               1
  Genomic location  1015719  - 1017857 (- strand)
  Locus Tag                              PFLU0916
  Name                                       None
  Replicon                             chromosome
  Strain            Pseudomonas fluorescens SBW25,
  'Cross-References':                               1
  Entrez                  7816631
  GI                    229588459
  INSDC                CAY47182.1
  RefSeq           YP_002870578.1
  UniParc           UPI00019D9DE0
  UniProtKB Acc            C3KBK5
  UniProtKB ID       C3KBK5_PSEFS
  UniRef100      UniRef100_C3KBK5
  UniRef50        UniRef50_Q4K6C9
  UniRef90        UniRef90_C3KBK5,
  'Product':                                                                                 1
  Charge (pH 7)                                                              -20.37
  Coding Frame                                                                    1
  Evidence for Translation                                                     None
  Feature Type                                                                  CDS
  Isoelectric Point (pI)                                                       4.78
  Kyte-Doolittle Hydrophobicity Value                                        -0.035
  Molecular Weight (kDa)                                                       76.4
  Product\tName                        putative methyl-accepting chemotaxis protein
  Synonyms                                                                     None,
  'Subcellular localization':                                                                                          Confidence  \
  Additional evidence for subcellular localization                                               None
  Cytoplasmic Membrane                                                                        Class 3
  Individual Mappings                               Localization Confidence PMID  Cytoplasmic Memb...

                                                          PMID
  Additional evidence for subcellular localization         NaN
  Cytoplasmic Membrane                              20472543.0
  Individual Mappings                                      NaN  ,
  'Pathogen Association Analysis':                                                          1
  Results  Common Found in both pathogen and nonpathogeni...,
  'Orthologs/Comparative Genomics':                                                                              1
  Pseudomonas Ortholog Database  View orthologs at Pseudomonas Ortholog Database
  Pseudomonas Ortholog Group                            POG002657 (1824 members)
  Putative Inparalogs                                                 None Found,
  'Interactions':                                                                  1
  STRING database  Search for predicted protein-protein interacti...,
  'References': Empty DataFrame
  Columns: []
  Index: [],
  'Gene Ontology':              Ontology   Accession                            Term  \
  0  Biological Process  GO:0007165             signal transduction
  1  Cellular Component  GO:0016021  integral component of membrane
  2  Cellular Component  GO:0016020                        membrane

                                           GO Evidence  \
  0  ISM Inferred from Sequence Model  Term mapped ...
  1  ISM Inferred from Sequence Model  Term mapped ...
  2  ISM Inferred from Sequence Model  Term mapped ...

                          Evidence Ontology (ECO) Code  Reference  Comments
  0  ECO:0000259 match to InterPro signature eviden...        NaN       NaN
  1  ECO:0000259 match to InterPro signature eviden...        NaN       NaN
  2  ECO:0000259 match to InterPro signature eviden...        NaN       NaN  ,
  'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
  Columns: []
  Index: [],
  'Functional Predictions from Interpro':            Analysis           Accession  \
  0               CDD             cd06225
  1            Gene3D  G3DSA:1.10.287.950
  10  ProSiteProfiles             PS50111
  11            SMART             SM00304
  12              CDD             cd11386
  13  ProSiteProfiles             PS50885
  14            SMART             SM00283
  2            Gene3D   G3DSA:3.30.450.20
  3              Pfam             PF00672
  4       SUPERFAMILY            SSF58104
  5            Gene3D   G3DSA:3.30.450.20
  6              Pfam             PF02743
  7             Coils                Coil
  8            Gene3D   G3DSA:3.30.450.20
  9              Pfam             PF00015

                                            Description Interpro Accession  \
  0                                                HAMP          IPR003660
  1                                                None               None
  10  Bacterial chemotaxis sensory transducers domai...          IPR004089
  11                                               None          IPR003660
  12                                         MCP_signal               None
  13                               HAMP domain profile.          IPR003660
  14                                               None          IPR004089
  2                                                None               None
  3                                         HAMP domain          IPR003660
  4                                                None               None
  5                                                None               None
  6                                        Cache domain          IPR033479
  7                                                None               None
  8                                                None               None
  9   Methyl-accepting chemotaxis protein (MCP) sign...          IPR004089

                                   Interpro Description  Amino Acid Start  \
  0                                         HAMP domain               383
  1                                                None               381
  10  Methyl-accepting chemotaxis protein (MCP) sign...               440
  11                                        HAMP domain               381
  12                                               None               477
  13                                        HAMP domain               381
  14  Methyl-accepting chemotaxis protein (MCP) sign...               450
  2                                                None                64
  3                                         HAMP domain               383
  4                                                None               403
  5                                                None               238
  6                               Double Cache domain 1                47
  7                                                None               584
  8                                                None                73
  9   Methyl-accepting chemotaxis protein (MCP) sign...               495

      Amino Acid Stop      E-value
  0               431   8.02722E-7
  1               712      1.2E-78
  10              676       48.239
  11              435      3.0E-11
  12              672  3.08124E-57
  13              435       10.351
  14              711      8.7E-86
  2                72      2.7E-40
  3               431       1.3E-8
  4               712     8.89E-82
  5               341      2.7E-40
  6               331      1.2E-16
  7               604            -
  8               237      2.7E-40
  9               678      2.6E-45  },
 'sbw25__pflu0917': {'Gene Feature Overview':                                               1
  Genomic location  1018094  - 1018918 (+ strand)
  Locus Tag                              PFLU0917
  Name                                       None
  Replicon                             chromosome
  Strain            Pseudomonas fluorescens SBW25,
  'Cross-References':                      1
  Entrez         7816632
  GI           229588460
  RefSeq  YP_002870579.1,
  'Product':                                                                1
  Charge (pH 7)                                               1.89
  Coding Frame                                                   1
  Evidence for Translation                                    None
  Feature Type                                                 CDS
  Isoelectric Point (pI)                                      8.19
  Kyte-Doolittle Hydrophobicity Value                       -0.234
  Molecular Weight (kDa)                                   29338.2
  Product\tName                        putative exported peptidase
  Synonyms                                                    None,
  'Subcellular localization':                                                                                          Confidence  \
  Additional evidence for subcellular localization                                               None
  Individual Mappings                               Localization Confidence PMID  Unknown Class 3 ...
  Unknown                                                                                     Class 3

                                                          PMID
  Additional evidence for subcellular localization         NaN
  Individual Mappings                                      NaN
  Unknown                                           20472543.0  ,
  'Pathogen Association Analysis':                                                          1
  Results  Common Found in both pathogen and nonpathogeni...,
  'Orthologs/Comparative Genomics':                                                                              1
  Pseudomonas Ortholog Database  View orthologs at Pseudomonas Ortholog Database
  Pseudomonas Ortholog Group                             POG004284 (550 members)
  Putative Inparalogs                                                 None Found,
  'Interactions':                                                                  1
  STRING database  Search for predicted protein-protein interacti...,
  'References': Empty DataFrame
  Columns: []
  Index: [],
  'Gene Ontology':              Ontology   Accession                           Term  \
  0  Molecular Function  GO:0004222  metalloendopeptidase activity
  1  Biological Process  GO:0006508                    proteolysis

                                           GO Evidence  \
  0  ISM Inferred from Sequence Model  Term mapped ...
  1  ISM Inferred from Sequence Model  Term mapped ...

                          Evidence Ontology (ECO) Code  Reference  Comments
  0  ECO:0000259 match to InterPro signature eviden...        NaN       NaN
  1  ECO:0000259 match to InterPro signature eviden...        NaN       NaN  ,
  'Functional Classifications Manually Assigned by PseudoCAP': Empty DataFrame
  Columns: []
  Index: [],
  'Functional Predictions from Interpro':           Analysis Accession  \
  0             Pfam   PF01435
  1  ProSiteProfiles   PS51257
  2              CDD   cd07331

                                           Description Interpro Accession  \
  0                               Peptidase family M48          IPR001915
  1  Prokaryotic membrane lipoprotein lipid attachm...               None
  2                                     M48C_Oma1_like               None

    Interpro Description  Amino Acid Start  Amino Acid Stop       E-value
  0        Peptidase M48                75              259  9.300000e-35
  1                 None                 1               21  6.000000e+00
  2                 None                80              264  6.944380e-83  }}

Example for a query with references

[19]:
results_pa = scraper.run_query(query=pdc_query(strain='UCBPP-PA14', feature='PA14_67210'))
DEBUG: Will now open https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=PA14_67210&e1=1&term1=UCBPP-PA14&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/primarySequenceFeature/list?c1=name&v1=PA14_67210&e1=1&term1=UCBPP-PA14&assembly=complete .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1661780 .
INFO: Good response from https://www.pseudomonas.com/feature/show?id=1661780&view=functions .