<codeBook xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="ddi:codebook:2_5 http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd" xmlns="ddi:codebook:2_5">
  <docDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">Modelling of Large Protein Complexes</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-19375172-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.19375172</IDNo>
      </titlStmt>
      <prodStmt>
        <producer xml:lang="en" abbr="SND">Swedish National Data Service</producer>
        <producer xml:lang="sv" abbr="SND">Svensk nationell datatjänst</producer>
      </prodStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.19375172">Landing page</holdings>
    </citation>
  </docDscr>
  <stdyDscr>
    <citation>
      <titlStmt>
        <titl xml:lang="sv"></titl>
        <parTitl xml:lang="en">Modelling of Large Protein Complexes</parTitl>
        <IDNo agency="SND">doi-10-17044-scilifelab-19375172-0</IDNo>
        <IDNo agency="DOI">https://doi.org/10.17044/SCILIFELAB.19375172</IDNo>
      </titlStmt>
      <rspStmt>
        <AuthEnty xml:lang="en" affiliation="Science for Life Laboratory">Bryant, Patrick</AuthEnty>
        <AuthEnty xml:lang="en" affiliation="Science for Life Laboratory">Elofsson, Arne</AuthEnty>
      </rspStmt>
      <prodStmt />
      <distStmt>
        <distrbtr xml:lang="en" abbr="SND" URI="https://snd.se">Swedish National Data Service</distrbtr>
        <distrbtr xml:lang="sv" abbr="SND" URI="https://snd.se">Svensk nationell datatjänst</distrbtr>
        <distDate xml:lang="en" date="2022-03-22" />
      </distStmt>
      <verStmt>
        <version elementVersion="0" elementVersionDate="2022-03-22" />
      </verStmt>
      <holdings URI="https://doi.org/10.17044/SCILIFELAB.19375172">Landing page</holdings>
    </citation>
    <stdyInfo>
      <subject />
      <abstract xml:lang="en" contentType="abstract">AlphaFold and AlphaFold-multimer can predict the structure of single- and multiple chain proteins with very high accuracy. However, predicting protein complexes with more than a handful of chains is still unfeasible, as the accuracy rapidly decreases with the number of chains and the protein size is limited by the memory on a GPU. Nevertheless, it might be possible to predict the structure of large complexes starting from predictions of subcomponents. Here, we take a graph traversal approach to assemble 175 protein complexes with 10-30 chains using predictions of subcomponents. We compute paths through a complex graph constructed of subcomponents using Monte Carlo Tree Search and assemble these in a stepwise fashion. Using subcomponents predicted from all possible trimeric interactions, 91 complexes (52%) are assembled to completion. We create a scoring function, mpDockQ, that can distinguish if assemblies are complete and predict their accuracy. Selecting complete complexes with TM-score ≥0.9 at FPR 10% using mpDockQ results in 20 complete complexes with a median TM-score of 0.92. The complete assembly protocol, starting from the sequences, is freely available at: https://gitlab.com/patrickbryant1/molpc  

The repository here contains MSAs and predicted subcomponents to reproduce the assembly for the "all-trimer" approach.</abstract>
      <sumDscr />
    </stdyInfo>
    <method>
      <dataColl />
    </method>
    <dataAccs>
      <useStmt>
        <restrctn xml:lang="en">Access to data through an external actor. </restrctn>
        <restrctn xml:lang="sv">Åtkomst till data via extern aktör. </restrctn>
      </useStmt>
    </dataAccs>
    <othrStdyMat />
  </stdyDscr>
</codeBook>