15.14 Mics SARS_Cov2 Protease data – Full trajectory: AMOEBA_15mics_sampling

You can find here our data representing 15.14-microsecond AMOEBA simulation of the apo enzyme started from the apo enzyme structure determined by X-ray crystallography (PDB entry 6LU7) with frames saved every 0.1-nanosecond.
The AMOEBA_15mics_sampling trajectory is divided in sampling iterations X that represent the (X-1)th adaptive sampling iteration. The iteration1 is the first of the fourteen 10ns trajectories that we used to start our adaptive sampling scheme.
For each iteration, we give the protein atoms only files (protein.dcd) and the all atoms files trajectory file (protein_water.arc).
We also provide the 6LU7 pdb file, the protease topology file and the protease+water topology file, both taken from Riken, doi: 10.17632/vpps4vhryg.2 and a de-biasing score file of the whole 15.14-microsecond trajectory.

For Tinker-HP users we also provide a .key and .xyz files.

Clusters: clustering_files

We provide all cluster structures and reduced cluster structures.

The clusters were obtained with the DBSCAN algorithm in DCD format. The reduced stucture files have been used for the volume computation. If the cluster size was larger than 1000, we took 1000 random structures from the cluster structure file otherwise we took the biggest hundred. |]

We used the following settings:

  1. DESRES: 1 cluster
    • X=1 structures: 1000
  2. Riken: 3 clusters
    • X=1 structures: 299
    • X=2 structures: 1000
    • X=3 structures: 1000
  3. Tinker-HP: 5 clusters
    • X=1 structures: 1000
    • X=2 structures: 1000
    • X=3 structures: 599
    • X=4 structures: 1000
    • X=5 structures: 899
For each Tinker-HP clusters, we also provide a de-biasing score file for full clusters and reduced clusters

How to use the de-biasing score?

We provide de-biased_observable.py a python script to compute de-biased observable average. It takes as arguments:

  1. name of the de-biasing score file, i.e, ’de-biasing_score’ (column file)
  2. name of the computed observable file, i.e, ’observable’ (column file).
Additionally, we provide de-biased_histogram.py6 to compute de-biased histogram and kernel density estimation of a given observable. It takes the same arguments and gives as output a picture (PNG format).
Finally, We added two python script examples: de-biased_observable_ex.py and de-biased_histogram_ex.py

They should be run with the following command :

python de-biased_observable_ex.py

Libraries needed: numpy, pandas, statsmodels, matplotlib

For further information about the de-biasing please refer to the paper method section.

Citation

The use of any AMOEBA trajectory data in any reports or publications of results obtained with the trajectory data should be acknowledged by including a citation to:

“Jaffrelot Inizan, Theo; Célerse, Frédéric; Adjoua, Olivier; El Ahdab, Dina; Jolly, Luc-Henri; Liu, Chengwen; et al. (2020): High-Resolution Mining of SARS-CoV-2 Main Protease Conformational Space: Supercomputer-Driven Unsupervised Adaptive Sampling.” Chem. Sci., 2021,12, 4889-4907. Doi :10.1039/D1SC00145K
We would like to acknowledge the exceptional work of D. E. Shaw Research and RIKEN Center for Biosystems Dynamics Research, from which we used datas. Please cite:
« D. E. Shaw Research, “Molecular Dynamics Simulations Related to SARS-CoV-2”D. E. Shaw Research Technical Data, 2020. http://www.deshawresearch.com/resources_sarscov2.html/

and

« Komatsu, T. S.; Koyama, Y.; OKIMOTO, N.; MORIMOTO, G.; OHNO, Y.; TAIJI, M. (2020), “COVID-19 related trajectory data of 10 microseconds all atom molecular dynamics simulation of SARS-CoV-2 dimeric main protease” Mendeley Data, V2. Doi: 10.17632/vpps4vhryg.2/

License

The DESRES and Riken trajectory datasets are released under a Creative Commons Attribution 4.0 International Public License, a copy of which is contained in the file CC4_License.txt provided in http://www.deshawresearch.com/resources_sarscov2.html/

Viewing in VMD

This trajectory may be viewed using the VMD version 1.8.7 or later (or any other tool capable of reading files in ARC and DCD format). The VMD software is available from the Theoretical and Computational Biophysics Group at the University of Illinois at Urbana-Champaign. The reference is:

« Humphrey, W., Dalke, A. and Schulten, K., VMD – Visual Molecular Dynamics », J. Molec. Graphics, 1996, vol. 14, pp. 33-38.

To view the full protein trajectory, use the command:

$ vmd AMOEBA_15mics_sampling/6lu7_rec_GMX_conf.pdb AMOEBA_15mics_sampling/sampling_iteration*/protein.dcd

You can find the download links of the files in the following table. Beware that the total size is over 1Tb.

Files to download
MISCELLANEOUS
File
Type
Size (bytes)
Date
6lu7_rec_GMX
ITP
1,390,833
2020-09-23 16:00:28
6lu7_rec_GMX_conf
PDB
758,547
2020-09-23 16:00:28
full_de-biasing_score
TXT
3,020,400
2020-09-23 16:00:28
protein_water_topologic
TOP
621
2020-09-23 16:00:28
charge
Fortran
1,477
2020-05-01 16:47:48
de-biased_histogram
Python
712
2020-09-23 17:51:27
de-biased_histogram_ex.py
Python
2,252
2020-09-23 17:51:27
de-biased_observable.py
Python
430
2020-09-23 17:51:28
de-biased_observable_ex.py Python
554
2020-09-23 17:51:28
activesite_volume.txt
TXT
47,647
2020-09-23 17:51:28
dimerizationsite_volume.txt
TXT
5,394
2020-09-23 17:51:29
g16-mar21.pdb
PDB
145318
2020-05-01 15:48:17
AMOEBA 15 MICROSECONDS SAMPLING
File
Type
Size (bytes)
Date
Iteration1 protein.dcd
VMD
157,427,476
2020-09-23 16:00:28
Iteration1 protein_water.arc Trajectories
9,447,725,000
2020-09-23 16:00:28
Iteration2 protein.dcd
VMD
1,124,480,276
2020-09-23 16:00:31
Iteration2 protein_water.arc Trajectories
67,483,750,000
2020-09-23 17:23:56
Iteration3 protein.dcd
VMD
1,124,480,276
2020-09-23 16:00:31
Iteration3 protein_water.arc Trajectories
67,483,750,000
2020-09-23 16:00:28
Iteration4 protein.dcd
VMD
1,124,480,276
2020-09-23 16:00:35
Iteration4 protein_water.arc Trajectories
67,483,750,000
2020-09-23 16:00:30
Iteration5 protein.dcd
VMD
1,124,480,276
2020-09-23 16:00:36
Iteration5 protein_water.arc Trajectories
67,483,750,000
2020-09-23 16:00:30
Iteration6 protein.dcd
VMD
1,124,480,276
2020-09-23 16:00:46
Iteration6 protein_water.arc Trajectories
67,483,750,000
2020-09-23 16:02:14
Iteration7 protein.dcd
VMD
1,124,480,276
2020-09-23 16:02:19
Iteration7 protein_water.arc Trajectories
67,483,750,000
2020-09-23 16:02:18
Iteration8 protein.dcd
VMD
1,124,480,276
2020-09-23 16:02:23
Iteration8 protein_water.arc Trajectories
67,483,750,000
2020-09-23 16:02:26
Iteration9 protein.dcd
VMD
1,124,480,276
2020-09-23 16:03:13
Iteration9 protein_water.arc Trajectories
67,483,750,000
2020-09-23 16:04:34
Iteration10 protein.dcd
VMD
1,124,480,276
2020-09-23 16:05:05
Iteration10 protein_water.arc Trajectories
67,483,750,000
2020-09-23 16:08:02
Iteration11 protein.dcd
VMD
1,124,480,276
2020-09-23 16:13:30
Iteration11 protein_water.arc Trajectories
67,483,750,000
2020-09-23 16:15:13
Iteration12 protein.dcd
VMD
1,124,480,276
2020-09-23 17:35:55
Iteration12 protein_water.arc Trajectories
67,483,750,000
2020-09-23 17:38:08
Iteration13 protein.dcd
VMD
1,124,480,276
2020-09-23 17:39:17
Iteration13 protein_water.arc Trajectories
67,483,750,000
2020-09-23 17:39:46
Iteration14 protein.dcd
VMD
1,124,480,276
2020-09-23 17:41:04
Iteration14 protein_water.arc Trajectories
67,483,750,000
2020-09-23 17:41:59
Iteration15 protein.dcd
VMD
1,124,480,276
2020-09-23 17:43:27
Iteration15 protein_water.arc Trajectories
67,483,750,000
2020-09-23 17:44:07
Iteration16 protein.dcd
VMD
1,124,480,276
2020-09-23 17:44:11
Iteration16 protein_water.arc Trajectories
67,483,750,000
2020-09-23 17:45:04
FULL STRUCTURE CLUSTERING FILES
File
Type
Size (bytes)
Date
desres_cluster cluster1_pca4_deshaw.dcd
VMD
1,319,127,764
2020-09-23 17:49:21
riken_clusters cluster1_pca4_riken.dcd
VMD
39,019,732
2020-09-23 17:49:13
riken_clusters cluster2_pca4_riken.dcd
VMD
339,368,340
2020-09-23 17:49:13
riken_clusters cluster3_pca4_riken.dcd
VMD
276,959,700
2020-09-23 17:49:14
tinker-hp_clusters cluster1_pca4_tinker-hp.dcd
VMD
1,319,577,556
2020-09-23 17:45:48
tinker-hp_clusters cluster1_pca4_tinker-hp_de-biasing_score.txt
TXT
217,796
2020-09-23 17:49:01
tinker-hp_clusters cluster2_pca4_tinker-hp.dcd
VMD
928,370,964
2020-09-23 17:47:21
tinker-hp_clusters cluster2_pca4_tinker-hp_de-biasing_score.txt
TXT
154,336
2020-09-23 17:49:05
tinker-hp_clusters cluster3_pca4_tinker-hp.dcd
VMD
69,942,932
2020-09-23 17:48:22
tinker-hp_clusters cluster3_pca4_tinker-hp_de-biasing_score.txt
TXT
10,941
2020-09-23 17:49:08
tinker-hp_clusters cluster4_pca4_tinker-hp.dcd
VMD
231,868,052
2020-09-23 17:48:32
tinker-hp_clusters cluster4_pca4_tinker-hp_de-biasing_score.txt
TXT
37,906
2020-09-23 17:49:12
tinker-hp_clusters cluster5_pca4_tinker-hp.dcd
VMD
107,725,460
2020-09-23 17:48:59
tinker-hp_clusters cluster5_pca4_tinker-hp_de-biasing_score.txt
TXT
16,053
2020-09-23 17:49:12
REDUCED STRUCTURE CLUSTERING FILES
File
Type
Size (bytes)
Date
desres_clusters cluster1_pca4_deshaw_red.dcd
VMD
112,448,276
2020-09-23 17:51:13
riken_clusters cluster1_pca4_riken_red.dcd
VMD
33,622,228
2020-09-23 17:50:50
riken_clusters cluster2_pca4_riken_red.dcd
VMD
112,448,276
2020-09-23 17:50:51
riken_clusters cluster3_pca4_riken_red.dcd
VMD
112,448,276
2020-09-23 17:51:12
tinker-hp_clusters cluster1_pca4_tinker-hp_red.dcd
VMD
112,448,276
2020-09-23 17:49:58
tinker-hp_clusters cluster1_pca4_tinker-hp_red_de-biasing_score.txt
TXT
18,523
2020-09-23 17:50:40
tinker-hp_clusters cluster2_pca4_tinker-hp_red.dcd
VMD
112,448,276
2020-09-23 17:50:11
tinker-hp_clusters cluster2_pca4_tinker-hp_red_de-biasing_score.txt
TXT
18,715
2020-09-23 17:50:43
tinker-hp_clusters cluster3_pca4_tinker-hp_red.dcd
VMD
67,356,628
2020-09-23 17:50:19
tinker-hp_clusters cluster3_pca4_tinker-hp_red_de-biasing_score.txt
TXT
10,487
2020-09-23 17:50:46
tinker-hp_clusters cluster4_pca4_tinker-hp_red.dcd
VMD
112,448,276
2020-09-23 17:50:25
tinker-hp_clusters cluster4_pca4_tinker-hp_red_de-biasing_score.txt
TXT
18,478
2020-09-23 17:50:47
tinker-hp_clusters cluster5_pca4_tinker-hp_red.dcd
VMD
101,091,028
2020-09-23 17:50:33
tinker-hp_clusters cluster5_pca4_tinker-hp_red_de-biasing_score.txt
TXT
14,989
2020-09-23 17:50:48