Conserved envelope protein of nCoV2 as the possible target to design polytope vaccine Exploration of Immunology

Aim: The envelope protein of novel coronavirus 2 (nCoV2) was reported to be highly conserved compared to its spike (S) protein which was shown to undergo several alterations in their amino acid sequences in the span of one year (2020-2021). Therefore, it is aimed to consider highly conserved structural protein of nCov2 namely envelope (E) protein to design the polytope for the formulation of the vaccine against coronavirus disease 2019 (Covid-19). Methods: Epitope the of binding the alleles.


Introduction
The coronavirus disease 2019  is an unanticipated pandemic and it became uncontrolled for a period of six months in the beginning of the year 2020 [1]. Its severity was witnessed by losing several fellow men, women and children of all ages. People experienced both its first wave and also recurrence of second wave in a short duration with the same intensity of severity. To some extent, the treatment regimens which included supplementation with oxygen, quarantine, plasma therapy, anti-inflammatory, and anti-viral drugs saved more than half of the victims. In the long run, to avoid Covid-19 recurrence, it is advisable to take precautionary measures such as the production of prophylactic vaccines and mass immunization.
Several scientists across the world have come out with the design and formulations of vaccines using the S protein of nCoV2 [2][3][4]. A few biopharma industries also have made vaccines with the S protein as a main target antigen. However, it is noticed that there is the viral escape due to mutations in amino acid residues of S protein [5][6][7] and hence the designed vaccines may not be totally appropriate. As this phenomenon of mutation is nurtured by natural selection, an alternative strategy needs to be taken to avoid its recurrence. Hence, in the present article, the envelope (E) protein of nCoV2 which is highly conserved and small structural protein is considered to design a polytope.
In brief, envelope protein of coronaviruses is designated as E protein. This is found as the smallest among the structural proteins of coronaviruses. It plays a significant role in the life cycle of virus which includes the assembly, envelope formation, pathogenesis and budding [8,9]. By participating as viroporin, E protein allows the flux of ions and enzymes for its replication [8,9]. Further, it is found to prompt pro-inflammatory pathways. Abdelmageed et al. [8] have designed a multiepitope-based peptide vaccine using the E-protein and suggested as the promising candidate for prophylactic vaccine to cover the population of China, Europe and East Asia. Further, these authors have designed ten peptides from E protein and evaluated molecular docking of ligand epitopes with HLA-A*02.01 (human leukocyte antigen). In yet another instance, Tilocca et al. [9] have shown the major immunogenic domains of E protein. Therefore, considering the importance of E protein to build in the literature, in the present study the E protein of nCoV2 is chosen to explore its conservancy and design of polytope to meet the on-going challenges in the Covid-19 pandemic.

Multiple sequence alignment
The representative envelope protein sequences, each with ~75 residues, of Coronaviridae were browsed from GenBank database. A total of eight envelope protein sequences comprising of Middle East respiratory syndrome (MERS), severe acute respiratory syndrome coronavirus (SARS CoV) 1 and SARS CoV2 were selected. The aim of choosing the representative sequences was to assess the envelope protein residue conservancy among the chosen sequences, phylogenetic and evolutionary affinities with their possible ancestry. These protein sequences of envelope proteins were retrieved in a notepad in FASTA format. The phylogenetic software suit namely MEGA X 10.1.1 version [10] was used for creating multiple sequence alignment using 'MUSCLE' option. The aligned file was exported in Mega format to the desktop.

Phylogeny
The multiple aligned sequences of the representative envelope proteins of Coronaviridae in Mega format were retrieved again in MEGA10.1.1 suit and conducted the phylogeny in maximum likelihood method with a bootstrap value of 100. The positions of amino acids in the aligned sequences containing gaps and possible missing data were allowed to eliminate in the chosen method. The resultant phylogenetic tree was retrieved in Pan Gu (PNG) format with the display of bootstrap value at each node along with the branch lengths of each operational taxonomic units (OTU) representing the possible recency of their emergence.

Design of polytope
The Immune Epitope Database (IEDB) online tool was employed to derive epitopes of the envelope protein sequence of SARS CoV2 namely gi|1826682072|gb|QIS30437.1| [11]. Upon submission of 75 residues bearing E protein sequence in IEDB analysis resource NetMHCpan (ver. 4.1) tool window with the option of most frequent occurrence of (major histocompatibility complex) MHC I alleles, there resulted an innumerable number of combinations. Of which, the first five epitopes with more than 75% score were chosen in the present study. These five epitopes were linked with a linker comprising of two amino acid residues cysteine and serine (CS) to facilitate in vivo proteasome cleavage and to display as a single unit namely polytope for further physico-chemical analyses, secondary structure determination, homology modelling, validation and population coverage.

Physico-chemical features of the polytope
The online tool namely ProtParam was used to evaluate the physico-chemical features of the derived polytope namely number of amino acid residues, molecular weight, isoelectric point (pI), number of charged residues, half-life, instability index, aliphatic index and grand average of hydropathicity (GRAVY) values.

Secondary structure analysis and homology modeling validation
The online tool namely Phyre2 and PEP2D were employed to determine the secondary structure analysis and homology modelling for the polytope of envelope protein of SARS CoV2. The Ramachandran plot, displaying most of the residues of polytope in the favoured region within the phi and psi values namely -40 to -90 and -50 to -10 respectively, indicated the predominance of alpha helix, was retrieved for the polytope through https://saves.mbi.ucla.edu/results?job=715968&p=procheck.

Prediction of discontinuous and linear epitopes within the designed polytope of envelope protein
Prediction of epitopes was done using online tool namely IEDB Ellipro. Two sets of discontinuous and three sets of linear epitopes were obtained and the score of each residue of polytope was displayed. The poses showed in silico binding of these predicted epitopes with an antibody.

Results
The nCoV2 is having five structural proteins. Of which, S protein in the span of one year (2020-2021) was shown to undergo several alterations in their amino acid sequences [6]. Even then, a few firms are focusing on the preparations of vaccines that are targeting to neutralize the epitopes of S proteins. Hence, these vaccines in the long run may not give an anticipated prophylactic protection to the vaccinated subjects. Therefore, it is aimed to consider highly conserved structural protein of nCov2 namely envelope protein to design the polytope for the formulation of the vaccine against Covid-19.
The eight E protein sequences chosen from Coronaviridae displayed a phylogram ( Figure 1) that revealed significant features. There are three primary clusters and two secondary clusters in the phylogram. The two secondary clusters were rooted on the primary cluster 1 (Figure 1) comprising of representatives from alpha coronavirus and Feline coronavirus suggesting that these must have contributed to the evolutionary origin of beta coronaviruses and the branch length of alpha coronavirus is 1.029 indicating that it is an ancient strain. Further, the primary cluster 3 along with its secondary cluster (Figure 1) showed the grouping of the MERS coronaviruses as one cluster, of course due to their similar taxonomic affinities [7]. Interestingly, the off-shoot of SARS CoV1 having a branch length of 1.007 and paired with the cluster 2 which represents SARS CoV2. Both these OTUs have evolved from East Asia. Moreover, the cluster 2 ( Figure 1) showed yet another significant observation that the vampire and human SARS CoV2 are paired together with least branch lengths (0.817) showing recency of their emergence with highest affinities.
The IEDB online tool was employed for predicting the MHC I binding affinities. The representative envelope protein sequence (gi|1826682072|gb|QIS30437.1|) [11] having affinities with the other two chosen SARS CoVs (Figure 1) was submitted to NetMHCpan (ver 4.1) tool with frequently occurring MHC class I alleles. This yielded innumerable epitopes, of which the first five epitopes with good score in the range of 0.78-0.93 (Table 1) were chosen. They were showing the binding affinities with MHC I. These residues were found located in the range of 51-69 of envelope protein sequence. These epitopes were linked with CS residues to allow the possible in vivo proteolysis and to construct an in silico polytope. The physicochemical features of the in silico generated polytope was shown in Table 2. Interestingly, it was shown that its pI was found in the alkaline range and its half-life was shown as one hour. Further, this polytope was found stable with an instability index value of 37.72. The GRAVY that represents hydrophobicity The evolutionary history of representative envelope proteins is inferred by using the Maximum Likelihood method and JTT matrixbased model [1]. The tree with the highest log likelihood (-677.16) is shown. The built-in discrete Gamma distribution is used to model evolutionary rate differences among sites. This analysis involved 8 amino acid sequences. All positions containing gaps and missing data are eliminated. of the polytope was importantly shown within the acceptable range of -2.0 to +2.0 indicating that the designed polytope is more hydrophilic and soluble in aqueous environment. The homology modelling of the polytope shown in Figure 2 was retrieved from the online tool, Phyre2. The predicted model was predominantly with alpha helix, short coils and two turns at the position of proline residues (Figure 3). Majority of residues of the proposed polytope were found placed in the most favourable region in the Ramachandran plot (RP) validating its alpha helical secondary structure (Figure 4). The predicted antigenicity of the polytope was shown to be 0.6456 by VaxiJen tool. The AllerTop tool indicated that the envisaged polytope was found to be non-allergen. The ToxiPred online tool predicted the designed polytope as non-toxin ( Table 3). The ToxiPred also indicated the reasonable values of hydrophilicity that would facilitate polytope to interact in an aqueous environment. The discontinuous and linear epitopes of polytope envelope protein were predicted using IEDB-Ellipro online tool (Table 4). There were two sets of discontinuous and three sets of linear epitopes predicted as mimotopes which showed binding poses with an antibody ( Figure 5). The MHC I alleles of Indian Asian population was considered to validate the envisaged epitopes to restrict their binding with a reasonable affinity score and percent coverage as shown in Figure 6 and Table 5.

Discussion
In the 21st century we witnessed a major devastation due to nCoV2 [1,12]. Of course, the sophistication in recombinant techniques, bioinformatics tools and social media, have made us awaken and appreciate the intricacies of Covid-19 within a short time of its emergence. The nCoV2 is one among the unique taxonomic groups in the family Coronaviridae within beta coronaviruses with zoonotic origin [6,7]. It has become highly virulent because of its RBD binding to ACE2 receptors of humans present in the alveolar lung epithelial cells [12][13][14]. The evolutionary affinities built for nCoV2 based on E protein amino acid sequence showed that it had its origin from East Asia (Wuhan) and again from the animal source [6,7]. The present study further confirmed through a phylogram ( Figure 1) shown with branch lengths reflected that nCoV2 and Vampire CoV2 coupled with SARS CoV1 took the origin from alpha coronaviruses. Importantly, nCoV2 and vampire CoV2 have paired as a single cluster in the phylogram (Figure 1) as closely related members of the family which confirms in compliance with the published reports that human nCoV2 is zoonotic in its origin [7,12] and it is recent as revealed by the least value of its branch length.
The multi-epitope based design of vaccines considering the spike protein amino acid sequences of nCoV2 have come into vogue since the beginning of the year 2020 from several laboratories [15][16][17][18][19]. Further, the production of vectored vaccines embedded with the gene of S protein and mRNA vaccine with the transcripts from the gene of S protein have been released into the market focussing only on one of the prominent structural proteins of nCoV2 ignoring the fact that the S protein is subjected for rapid mutations as shown by Global Initiative on Sharing All Influenza Data (GISAID) [6]. As a result, the other best alternative to be considered at this juncture is the conserved structural protein namely E protein of nCoV2. Therefore, in the present study the designed polytope of E protein is with 53 amino acids having both discontinuous and linear epitopes, non-toxin, non-allergen with potential for antigenicity and predicted MHC I binding with physico-chemical features for its solubility and stability, all that authenticate well to suit to the destined South Indian Asian population.
In an interesting study, the five structural proteins including E protein of nCoV2 were selected to design a 9 amino acid residue peptide as an epitope [6][7][8][9]. Tilocca et al. [9] employed immunoinformatics tools to deduce immunogenic domains in the E protein. Due to extensive mutations appearing in the nCoV2, ten peptides of E protein were projected as a multiepitope-based vaccine [8]. The values obtained for "probable antigen" for the polytope of E protein in our study is 0.6456, whereas similar value reported for the complete E protein was found to be less [8], however, the same authors reported that E protein ranked as the top in the prediction of "probable antigen" among other structural proteins namely M, S and N [17][18][19]. Most importantly, the SARS-CoV2 of Indian strains are renamed by WHO as Delta and Kappa labels [20] indicating the need for a wide spectrum based prophylaxis. Therefore, the author through this article strongly advocates that the E protein of nCoV2, containing a highly conserved sequence among the members of the family Coronaviridae, for the preparation and formulation of vaccine.
In conclusion, the envelope protein of SARS CoV2 displayed conservancy in its sequence and phylogenetic affinities with SARS CoV1 and vampire CoV2 as evidenced by appearing in one secondary cluster shown in the generated phylogram. The designed polytope of E protein is found as non-allergen, non-toxic and antigenic with the most favoured homology model showing its solubility, stability, MHC I binding and anchoring of the discontinuous and linear epitopes to an antibody. Each predicted epitopes restricted to MHC I alleles, frequently occurring in South Indian Asians, showed its potential as a possible vaccine candidate for formulation.