<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">IPR0220 - InterPepRank set</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-14222692-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.14222692</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.14222692">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">IPR0220 - InterPepRank set</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-14222692-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.14222692</IDNo>
      </titlStmt>
      <rspStmt />
      <prodStmt />
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2021-04-26" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2021-04-26" />
      </verStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.14222692">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">Peptide-protein interactions between a smaller or disordered peptide stretch and a folded receptor make up a large part of all protein-protein interactions. A common approach for modelling such interactions is to exhaustively sample the conformational space by fast-fourier-transform docking, and then refine a top percentage of decoys. Commonly, methods capable of ranking the decoys for selection in short enough time for larger scale studies rely on first-principle energy terms such as electrostatics, Van der Waals forces, or on pre-calculated statistical pairwise potentials.

 InterPepRank is a machine-learning based method for peptide-protein complex scoring and ranking, which encodes the structure of the complex as a graph; with physical pairwise interactions as edges and evolutionary and sequence features as nodes. The graph-network is trained to predict the LRMSD of decoys by using edge-conditioned graph convolutions on a large set of peptide-protein complex decoys.

Here we present the complete dataset used to train InterPepRank. The set contains 679 receptor-peptide pairs, each pair has 50 different peptide conformations docked by 70000 different rotations. in total 2.5 billion conformations. This is too large to be distributed as flat files. As such, the dataset is distributed as a set of ft-files describing which rotations and translations to apply to the corresponding peptide ligands to generate decoy poses docked to the receptor structures. To generate these structures, the apply_ftresult_improved.py script is available.

In addition it also contains a set of apo and holo models that was used to benchmark unbound docking.

All files and scripts are given as-is with no warranty.</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>