Characteristics of the Iron-responsive Element (IRE) Stems in the Untranslated Regions of Animal mRNAs

Background: Iron-responsive Elements (IREs) are hairpin structures located in the 5’ or 3’ untranslated region of some animal mRNAs. IREs have a highly conserved terminal loop and a UGC/C or C bulge five bases upstream of the terminal loop, which divides the hairpin stem into an upper stem and a lower stem. Objective: The objective of this study was to investigate the base-pair composition of the upper and lower stems of IREs to determine whether they are highly conserved among mRNAs from different genes. Methods: The mRNA sequences of six 5’IREs and five 3’IREs from several animal species were retrieved from the National Center for Biotechnology Information. The folding free energy of each IRE mRNA sequence was predicted using the RNAfold WebServer. Results: We found that the upper and lower stems of IREs are not highly conserved among the mRNAs of different genes. There are no statistically significant differences in the IRE structures or folding free energies between mammalian and non-mammalian species relative to either the ferritin heavy chain 5’IRE or ferroportin 5’IRE. There are no overall significant differences in the folding free energies between UGC/C-containing 5’IREs and C-bulge-containing 5’IREs, or between 5’IREs and 3’IREs. Conclusion: Further studies are needed to investigate whether the variations in IRE stem composition are responsible for fine-tuning the IRE/Iron-Regulatory Protein interactions among different mRNAs to maintain the balance of cellular iron metabolism, and to identify whether evolutionary processes drive the base-pair composition of the upper and lower stems of IREs toward any particular configuration.


INTRODUCTION
The interactions between Iron-Responsive Elements (IREs) and Iron-Regulatory Proteins (IRPs) are among the most extensively investigated post-transcriptional regulatory mechanisms, as cellular iron metabolism is regulated through the FTH; and ferritin light chain, FTL), iron export (Ferroportin, FPN), and heme biosynthesis catalysis (erythroid aminolevulinate synthase 2, ALAS2), as well as enzymes of the tricarboxylic acid cycle (mitochondrial aconitase 2, ACO2), and transcription factors produced in response to hypoxia (hypoxia-inducible transcription factor 2α, HIF2α). The inhibition of IRP-IRE binding at high cellular iron concentrations also destabilizes the 3'IRE-containing mRNAs, including the Transferrin Receptor (TFRC), Divalent Metal Transporter 1 (DMT1), and cell cycle regulator (cell division cycle 14A, CDC14A), which facilitates the degradation of these mRNAs and inhibits further iron uptake [1, 2, 6, 9 -13].
IREs are highly conserved hairpin stem-loop structures containing 26-30 nucleotides. The terminal loop sequence is 5'-CAGUGN-3' (N can be C, U, or A, but never G). There is a conserved UGC/C or C bulge five bases upstream of the terminal loop, which divides the hairpin stem into an upper stem and a lower stem [9,14]. In recent years, IRE-containing mRNAs, such as the mRNAs of amyloid precursor protein and α-synuclein, have been discovered by biochemical and computational approaches [15,16]. Some newly discovered IREs do not contain the conserved 5'-CAGUG-3' terminal loop sequence, and the C bulge is not located exactly five bases upstream of the terminal loop [15 -18].
The IRP mainly binds to IREs at two separate sites: the AGU apical loop and the C bulge [19,20]. The upper stem of the IRE, between the terminal loop and the bulge, and the lower stem below the bulge, both play important roles in maintaining the orientation of the loop and the bulge for proper IRE-IRP binding. Few studies have investigated the stems of IREs, nor have the separate roles of the upper and lower stems been examined. We hypothesize that the base-pair composition of the upper and lower stems of IREs are not highly conserved among mRNAs from different genes (except those with the irregular terminal loop and bulge) and that the IREs with a UGC/C internal loop have a somewhat different stem composition than those with a single C bulge. To test this hypothesis, we investigated six 5'IREs and five 3'IREs from animal mRNAs and analyzed the number of Watson-Crick base-pairs, wobble base-pairs, and mismatched pairs in the upper and lower stems of each type of IRE.

METHODS
Among the IREs in the mRNAs of nine genes investigated in this study, six are located in the 5'UTRs of mRNAs (FTH,FTL,ACO2,FPN,ALAS2,and HIF2α), and the other three are located in the 3'UTRs of mRNAs (CDC14A, DMT1, and TFRC). The mRNA sequences were retrieved from the National Center for Biotechnology Information (NCBI, http: //www.ncbi.nlm.nih.gov/nuccore/). Each mRNA investigated was found under the "Nucleotide" category, with the following filters activated: Species-Animals, Molecule types-mRNA, and Source databases-RefSeq. Of the various mRNAs that were retrieved, only those that contain experimentally supported base sequences were analyzed; sequences labeled as "Predicted mRNA" were not included.
From the mRNAs that were retrieved using these parameters, only the IRE sequences containing a conserved 5'-CAGUG-3' terminal loop and a conserved UGC/C or C bulge five bases upstream of the terminal loop were analyzed. For example, the 3'UTR of TRFC mRNA contains five IREs. However, only three of them have the exact 5'-CAGUG-3' loop sequence and a C bulge five bases upstream of the terminal loop; therefore, only these three 3'IRE sequences were analyzed.
For a complete comparison, we investigated IRE sequences in both mammals and non-mammalian animals for the mRNAs of nine genes. Some species have incomplete UTR information; therefore, either no IRE sequence or only partial sequences were retrieved. We did not include transcripts with absent or partial sequences in our analysis but included them in Tables 1-9 with a brief note.
The minimum folding free energy of each mRNA sequence in Tables 1-9 was predicted using the RNAfold WebServer (http://rna.tbi.univie.ac.at//cgibin/RNAWebSuite/RNAfold.cgi), which is a component of the ViennaRNA package developed by the Institute for Theoretical Chemistry at the University of Vienna.
Statistical analyses were performed using Excel to compare the differences in IRE structure and folding free energy between mammalian and non-mammalian species for the transcripts of each of the nine genes studied. Seven analyses were run to compare the mammalian and nonmammalian species relative to 1) the number of AU pairs in the upper stems of their IREs, 2) the number of GU pairs in the upper stems of their IREs, 3) the number of GC pairs in the upper stems of their IREs, 4) the number of AU pairs in the lower stems of their IREs, 5) the number of GU pairs in the lower stems of their IREs, 6) the number of GC pairs in the lower stems of their IREs, and 7) the predicted folding free energies of the IRE transcripts of each gene in this study. An F-test was performed to determine whether the variance (s 2 ) was significant at the 95% confidence level, followed by a ttest ("two-sample equal variance" or "two-sample unequal variance," according to the results from the F-test) to determine whether the difference between mammalian and nonmammalian species was significant at the 95% confidence level for each of the seven comparisons listed above. If Excel returned a calculated p-value of less than 0.05, a significant difference was indicated.
Statistical analyses were also performed to compare the folding free energies between 5'IREs and 3'IREs, and between UGC/C-containing IREs and C-bulge-containing IREs. Nonmammalian species could not be analyzed due to a lack of available IRE mRNA sequence data. The results are summarized in Table 10.  Tables 1-9 provide the base-pair compositions of the upper stem (i.e., the five base-pair helix between the terminal loop and the bulge) and the lower stem (i.e., the five base-pair helix below the bulge) in the 5'IREs of FTH, FTL, ACO2, FPN, ALAS2, and HIF2α mRNAs, and in the 3'IREs of CDC14A, DMT1, and TFRC mRNAs, respectively. The five base-pair upper stems of IREs are shown in bold; the five base-pair lower stems are underlined. The predicted folding free energy of each mRNA sequence is also presented in Tables 1-9. Regarding the upper stems, in the FTH 5'IRE ( Table 1), five of the mammalian species studied (Homo sapiens, Pongo abelli, Macaca mulatta, Sus scrofa, and Canis lupus familiaris) contain three AU Watson-Crick pairs, one GU wobble pair, and one GC Watson-Crick pair in their upper stems. Three other mammalian species (Bos taurus, Mus musculus, and Ovis aries) contain four AU pairs and one GC pair in their upper stems. Four non-mammalian species (Danio rerio, Salmo salar, Xenopus laevis, and Xenopus tropicalis) also contain four AU pairs and one GC pair, whereas the other two non-mammalian species (Gallus gallus and Anas platyrhynchos) contain two AU pairs, one GU pair, one GC pair, and one mismatched GA pair in their upper stems. Statistical analyses indicate that there are no significant differences (p > 0.05) between mammalian and non-mammalian species regarding the number of AU, GU, or GC pairs in the upper stem of the FTH 5'IRE.

RESULTS AND DISCUSSION
In the FTL 5'IRE ( Table 2), nine out of ten mammalian species studied contain three AU pairs, one GU pair, and one GC pair in their upper stems, whereas Mus musculus has four AU pairs and one GC pair. For the FTL mRNAs, no nonmammalian species contains a complete IRE sequence, so a statistical comparison to mammalian species could not be conducted. In the ACO2 5'IRE ( Table 3), all of the mammalian species studied contain four AU pairs and one GC pair in their upper stems, whereas the non-mammalian Gallus gallus contains three AU pairs, one GC pair, and one mismatched UC pair. Statistical analysis was not performed because data are available from only one of the non-mammalian species. In the FPN 5'IRE ( Table 4), all of the species studied contain three AU and two GC pairs in their upper stems. There is no difference between mammalian and non-mammalian species. In the ALAS2 5'IRE ( Table 5), all of the four mammalian species studied contain one AU, one GU, and three GC pairs in their upper stems, whereas the non-mammalian Danio rerio contains one AU, three GC, and one mismatched UU pair in its upper stem. Statistical analysis was not performed because data are available from only one of the non-mammalian species. In the HIF2α 5'IREs ( Table 6), the 5'IRE structure was only identified in humans among the mammalian species, and contains one AU and four GC pairs in its upper stem. The nonmammalian Ictalurus punctatus contains one AU, one GU, and three GC pairs in its upper stem; Danio rerio contains two AU, one GU, and two GC pairs; the other two non-mammals, Xenopus laevis and Xenopus tropicalis, both contain two AU and three GC pairs in their upper stems. Statistical analysis was not performed because data are available from only one of the mammalian species.
In the CDC14A 3'IRE ( Table 7), two of the mammalian species studied (Homo sapiens and Rattus norvegicus) contain four AU pairs and one GC pair in their upper stems, whereas Mus musculus contains three AU, one GC, and one mismatched AC pair. No non-mammalian species contain a complete IRE sequence in its CDC14A mRNA, so a statistical analysis was not conducted. In the DMT1 3'IRE ( Table 8), three of the mammalian species studied (Homo sapiens, Mus musculus, and Macaca fascicularis) contain two AU, one GU, and two GC pairs in their upper stems. The Rattus norvegicus mRNA has two 3'IREs: one contains two AU, one GU, and two GC pairs in its upper stem; the other has three AU, one GU, and one GC pair in its upper stem. No non-mammalian species contain a complete IRE sequence in their DMT1 mRNA, so a statistical analysis was not conducted. Of the three 3'IREs in mammalian TFRC mRNAs studied (Table 9), most fall into one of two categories of base-pair composition in their upper stems: either two AU and three GC pairs, or one AU, one GU, and three GC pairs. Two exceptions are Rattus norvegicus, which has one of the three 3'IREs containing one AU pair, three GC pairs, and one mismatched AC pair, and Cavia porcellus, which has one of the three 3'IREs containing one AU, one GU, two GC, and one mismatched AC pair. No nonmammalian species contain a complete IRE sequence in their TFRC mRNA, so a statistical analysis was not conducted.
Regarding the lower stems, for the FTH 5'IRE ( Table 1), seven mammalian and two non-mammalian species studied (Homo sapiens, Pongo abelli, Macaca mulatta, Sus scrofa, Bos taurus, Mus musculus, Ovis aries, Danio rerio, and Salmo salar) contain two AU, two GC, and one mismatched UC or AC pair in their lower stems. One mammalian (Canis lupus familiaris) and two non-mammalian species (Gallus gallus and Anas platyrhynchos) contain two AU and three GC pairs. The non-mammalian Xenopus laevis contains two AU, one GU, and two GC pairs in its lower stem. Statistical analyses indicate that there are no significant differences (p > 0.05) between mammalian and non-mammalian species regarding the number of AU, GU, or GC pairs in the lower stem of the FTH 5'IRE.
In the FTL 5'IRE (Table 2), all ten mammalian species studied contain one AU, one GU, one GC, and two mismatched (UC and CA/AA/GA) pairs in their lower stems. No nonmammalian species contains a complete IRE sequence in its FTL mRNA, so a statistical analysis was not conducted. In the ACO2 5'IRE ( Table 3), all of the mammalian species studied contain two AU, one GU, one GC, and one mismatched CC pair in their lower stems, whereas the non-mammalian Gallus gallus contains one AU and four mismatched pairs in its lower stem. Statistical analysis was not performed because data are available from only one of the non-mammalian species. In the FPN 5'IRE ( Table 4), all of the mammalian species studied contain four AU pairs and one GC pair in their lower stems. The non-mammalian Danio rerio contains three AU, one GU, and one GC; Gallus gallus contains three AU and two GC pairs in its lower stem. Statistical analyses indicate that there are no significant differences (p > 0.05) between mammalian and nonmammalian species regarding the number of AU, GU, or GC pairs in the lower stem of the FPN 5'IRE. In the ALAS2 5'IRE ( Table 5), all of the mammalian species studied contain two AU, one GU, one GC, and one mismatched (CA or GA) pair in their lower stems, while the non-mammalian Danio rerio contains two AU, one GC, and two mismatched (AG and AA) pairs. Statistical analysis was not performed because data are available from only one of the non-mammalian species. In the HIF2α 5'IREs (Table 6), the 5'IRE structure was only identified in humans among the mammalian species and contains three AU, one GC, and one mismatched AC pair in its lower stem, matching that of three non-mammalian species (Danio rerio, Xenopus laevis, and Xenopus tropicalis). The non-mammalian Ictalurus punctatus contains two AU, one GU, one GC, and one mismatched AC pair in its lower stem. Statistical analysis was not performed because data are available from only one of the mammalian species.      In the CDC14A 3'IRE ( Table 7), two of the mammalian species studied (Homo sapiens and Rattus norvegicus) contain four AU pairs and one mismatched UU pair in their lower stems, while Mus musculus has one AU, one GU, and three mismatched pairs. No non-mammalian species contains a complete IRE sequence in its CDC14A mRNA, so a statistical analysis was not conducted. In the DMT1 3'IRE ( Table 8), three of the mammalian species studied (Homo sapiens, Mus musculus, and Macaca fascicularis) contain two AU, one GU, and two GC pairs in their lower stems. The Rattus norvegicus mRNA has two 3'IREs; one contains two AU, one GU, and two GC pairs in its lower stem; the other contains three GU and two mismatched pairs. No non-mammalian species has a complete IRE sequence in its DMT1 mRNA, so a statistical analysis was not conducted. Of the three 3'IREs in mammalian TFRC mRNAs (Table 9), there are two major categories of base-pair arrangements in their lower stems: 1) five AU pairs, or 2) three AU and two GU pairs. The exceptions are Mus musculus and Rattus norvegicus, both of which have one of their three 3'IREs that contains four AU and one GU pair in the lower stem. No non-mammalian species has a complete IRE sequence in its TFRC mRNA, so a statistical analysis was not conducted.     In canonical Watson-Crick RNA base pairing, A forms a base pair with U through two hydrogen bonds; G forms a base pair with C through three hydrogen bonds. Therefore, an RNA with high GC content is more thermodynamically stable than one with low GC content. The non-Watson-Crick wobble pair GU is a common element in the secondary structure of RNA, where G forms a pair with U through two hydrogen bonds. The thermodynamic stability of a GU pair is less than that of a GC pair but comparable to that of an AU pair [21,22]. Since the glycosidic bond angles (i.e., the angle between the base and C1' sugar atom) in a GU wobble pair are different from those in an AU or GC pair, an RNA stem containing a GU pair is conformationally softer because the backbone is more easily distorted or altered at the site of a GU pair.
Among the IREs studied from the transcripts of nine different genes, three 5'IREs (FTH, FTL, and ACO2) and one 3'IRE (CDC14A) have lower GC content (only one GC pair) in their upper stems; the FPN 5'IRE and DMT1 3'IRE have medium GC content (two GC pairs); the ALAS2 5'IRE and TFRC 3'IRE have higher GC content (three GC pairs) in their upper stems. The 5'IRE in the human HIF2α mRNA is a special case. It contains the highest GC content (four GC pairs in its upper stem) among the IREs investigated. The non-mammalian species of HIF2α mRNA, however, have only two or three GC pairs in their 5'IRE upper stems. Overall, we found that the base-pair compositions of the upper stems of IREs are not highly conserved among mRNAs from the genes we investigated.
Data analysis of the IREs indicates that, in general, the lower stems contain fewer GC pairs than the upper stems and often have mismatched pairs. The only exception is the FTH 5'IRE, which contains two or three GC pairs in its lower stem, and only one GC pair in its upper stem ( Table 1). The more tightly bound (i.e., stiffer) lower stem may be necessary for the existence of a UGC/C internal loop instead of the single C bulge. However, another UGC/C-containing 5'IRE, FTL, has only one GC pair but two mismatched (UC and CA/AA/GA) pairs in its lower stem. Further studies are needed to elucidate this seeming contradiction.
Statistical analyses indicate that there are no significant differences (p > 0.05) in the IRE structures between mammalian and non-mammalian species for the FTH 5'IRE or FPN 5'IRE. In addition, there are no significant differences (p > 0.05) in the folding free energies between mammalian and non-mammalian species for the FTH 5'IRE or FPN 5'IRE. Statistical analyses were not performed for the other IREs due to the lack of sequence data for the mRNAs of either the nonmammalian or mammalian species. Table 10 lists the results of comparing the folding free energies of mammalian IREs in the following groups: 1) the FTH 5'IRE and FTL 5'IRE both contain a UGC/C internal loop, but their folding free energies are significantly different; 2) the ACO2 5'IRE, FPN 5'IRE, ALAS2 5'IRE, and HIF2α 5'IRE all contain a C bulge, but their folding free energies are significantly different. Note that the HIF2α 5'IRE was not compared with other C-bulge-containing 5'IREs because data are only available for the HIF2α mRNA of one species; 3) between UGC/C-containing 5'IREs and C-bulge-containing 5'IREs, only the FTH 5'IRE and ALAS2 5'IRE comparison was not statistically significant. Nonetheless, the pooled analysis of the UGC/C-containing 5'IREs versus C-bulgecontaining 5'IREs did not indicate a significant difference in their folding free energies.
As for the 3'IREs, 1) TFRC contains three 3'IREs, and a comparison of the folding free energies of each of its IREs to the others resulted in statistically significant results for all analyses except between the first and second IREs; 2) there was no significant difference between the folding free energies of the CDC12A 3'IRE and DMT1 3'IRE; 3) the folding free energies of the CDC12A 3'IRE and DMT1 3'IRE were separately compared with those of each TFRC 3'IRE, and with the pooled TFRC 3'IREs, with varying results (Table 10). Overall, the folding free energies are not significantly different between the pooled 5'IREs and the pooled 3'IREs.

CONCLUSION
In summary, the base pairs within the upper and lower stems of IREs are not highly conserved among the mRNAs investigated for this study. Both AU-rich and GC-rich upper stems exist. The lower stems, in general, contain fewer GC pairs than the upper stems. One exception is the UGC/Ccontaining FTH 5'IRE, whose lower stem includes more GC content than its upper stem. No statistically significant differences were found in the IRE structures or the folding free energies when comparing either the FTH 5'IRE or FPN 5'IRE of mammalian versus non-mammalian species. In addition, there were no overall significant differences between the folding free energies of UGC/C-containing 5'IREs and Cbulge-containing 5'IREs, or between 5'IREs and 3'IREs. Future studies may focus on investigating whether the evolutionary characteristics of the IRE stems in animal mRNAs differentially fine-tune the IRE/IRP interactions among different mRNAs to maintain the balance of cellular iron metabolism and whether evolutionary processes drive the basepair composition of the upper and lower stems of IREs toward any particular outcome (e.g., AU-rich, GC-rich, or a balanced composition).

HUMAN AND ANIMAL RIGHTS
No animals/humans were used for studies that are the basis of this research.

CONSENT FOR PUBLICATION
Not applicable.

AVAILABILITY OF DATA AND MATERIALS
All data generated and analyzed during this study are included in this published article.

FUNDING
This work is supported by the National Science Foundation under Awards No. EPS-1003907 and OIA-1458952.