Aire-dependent genes undergo Clp1-mediated 3’UTR shortening associated with higher transcript stability in the thymus


Clotilde Guyon,
114-145 minutes
https://elifesciences.org/articles/52985
Abstract

The ability of the immune system to avoid autoimmune disease relies on tolerization of thymocytes to self-antigens whose expression and presentation by thymic medullary epithelial cells (mTECs) is controlled predominantly by Aire at the transcriptional level and possibly regulated at other unrecognized levels. Aire-sensitive gene expression is influenced by several molecular factors, some of which belong to the 3’end processing complex, suggesting they might impact transcript stability and levels through an effect on 3’UTR shortening. We discovered that Aire-sensitive genes display a pronounced preference for short-3’UTR transcript isoforms in mTECs, a feature preceding Aire’s expression and correlated with the preferential selection of proximal polyA sites by the 3’end processing complex. Through an RNAi screen and generation of a lentigenic mouse, we found that one factor, Clp1, promotes 3’UTR shortening associated with higher transcript stability and expression of Aire-sensitive genes, revealing a post-transcriptional level of control of Aire-activated expression in mTECs.
Introduction



Immunological tolerance is a key feature of the immune system that protects against autoimmune disease by preventing immune reactions against self-constituents. Central tolerance is shaped in the thymus and relies on a unique property of a subset of medullary thymic epithelial cells (mTECs). This subset is composed of mTEChi that express high levels of MHC class II molecules and a huge diversity of self-antigens (Danan-Gotthold et al., 2016; Kyewski and Klein, 2006). Thus, developing T lymphocytes in the thymus are exposed to a broad spectrum of self-antigens displayed by mTEChi. Those lymphocytes that recognize their cognate antigens undergo either negative selection, thereby preventing the escape of potentially harmful autoreactive lymphocytes out of the thymus, or differentiate into thymic regulatory T cells beneficial for limiting autoreactivity (Cowan et al., 2013; Goodnow et al., 2005; Klein et al., 2014). Self-antigens expressed by mTEChi include a large number of tissue-restricted antigens (TRAs), so-named because they are normally restricted to one or a few peripheral tissues (Derbinski et al., 2001; Sansom et al., 2014). A large fraction of these TRAs in mTEChi are induced by a single transcriptional activator that is expressed almost exclusively in these cells - the autoimmune regulator Aire. Mice deficient for the Aire gene exhibit impaired TRA expression in mTEChi, whereas TRA expression remains normal in peripheral tissues of these mice. Consistent with inadequate development of central tolerance, Aire knockout (KO) mice develop autoantibodies directed at some of these TRAs, resulting in immune infiltrates in multiple tissues (Anderson et al., 2002). Correspondingly, loss-of-function mutations in the human AIRE gene result in a multi-organ autoimmune disorder known as autoimmune polyglandular syndrome type 1 (Nagamine et al., 1997; Peterson et al., 2004).

How the expression of thousands of Aire-sensitive self-antigen genes is controlled in mTEChi has been a subject of extensive investigation. Significant progress has been made, notably through the identification of a number of molecular factors that further activate the expression of prototypic Aire-sensitive genes in a model employing cell lines that express Aire ectopically by transfection with a constitutive Aire expression vector (Abramson et al., 2010; Giraud et al., 2014). These studies revealed a role for relaxation of chromatin in front of the elongating RNA polymerase (RNAP) II by the PRKDC-PARP1-SUPT16H complex (Abramson et al., 2010) and for an HNRNPL-associated release of the stalled RNAPII (Giraud et al., 2014). However, the effect of most of the identified factors on the full set of Aire-sensitive genes in mTEChi is unknown. It remains also uncertain whether these factors partake in a molecular mechanism directly orchestrated by Aire or in a basal transcriptional machinery that would control the expression of Aire-sensitive genes even before Aire is expressed in mTEChi. Lack of knowledge of the precise modus operandi of the identified factors potentially leaves major aspects of promiscuous mTEChi gene expression unknown.

Among the identified factors, seven of them, namely CLP1, DDX5, DDX17, PABPC1, PRKDC, SUPT16H and PARP1, have been reported to belong to the large multi-subunit 3’ end processing complex (de Vries et al., 2000; Shi et al., 2009) which controls pre-mRNA cleavage and polyadenylation at polyA sites (pAs) (Colgan and Manley, 1997). Hence, we asked whether any of these identified factors could influence Aire-sensitive gene expression partially or entirely by the way of an effect on pre-mRNA 3’ end maturation. Deep sequencing approaches have revealed that the vast majority of protein-coding genes in mammal genomes (70–79%) have multiple pAs mostly located in 3’UTRs (Derti et al., 2012; Hoque et al., 2013). These genes are subject to differential pA usage through alternative cleavage and polyadenylation directed by the 3’ end processing complex and are transcribed as isoforms with longer or shorter 3’UTRs depending on the pA usage (Tian and Manley, 2013). The 3’ end processing complex is composed of a core effector sub-complex comprising CLP1 (de Vries et al., 2000; Mandel et al., 2008) and a number of accessory proteins that include DDX5, DDX17, PABPC1, PRKDC, SUPT16H and PARP1 (Shi et al., 2009). Although the individual roles of accessory proteins on differential pA usage remain largely unknown, the core protein Clp1 has been reported to favor proximal pA selection in yeast based on depletion experiments (Holbein et al., 2011). Similarly, increasing levels of Clp1 bound to the 3’ end processing complex was also shown to favor proximal pA selection and shorter 3’UTR isoforms (Johnson et al., 2011). In contrast, a preference for distal pA selection was reported for higher Clp1 levels in a mouse myoblast cell line based on siRNA Clp1 loss-of-function experiments (Li et al., 2015).

Given the observations from prior work that many of the genes, other than Aire itself, that modulate the expression of Aire-induced genes, are members of the 3’ end processing complex, and that one such member of this complex, Clp1, has been reported to affect pA selection, we speculated that 3’UTR length and regulation might be involved in expression of TRAs in mTEChi. We therefore set out to investigate relationships between Aire sensitivity, 3’ end processing, and pA selection in mTEChi.
Results


Aire-sensitive genes show a preference for short-3’UTR transcript isoforms in mTEChi


To assess the proportion of long and short-3’UTR transcript isoforms in mTEChi, we selected the genes that harbor potential proximal alternative pAs in their annotated 3’UTRs according to the PolyA_DB 2 database which reports pAs identified from comparisons of cDNAs and ESTs from a very large panel of peripheral tissues (Lee et al., 2007; Figure 1—source data 1). For each gene, the relative expression of the long 3’UTR isoform versus all isoforms could be defined as the distal 3’UTR (d3’UTR) ratio, that is the expression of the region downstream of the proximal pA (d3’UTR) normalized to the upstream region in the last exon (Figure 1A). To determine whether the Aire-sensitive genes exhibit a biased proportion of long and short-3’UTR isoforms in mTEChi, we first performed RNA deep-sequencing (RNA-seq) of mTEChi sorted from WT and Aire-KO mice in order to identify the Aire-sensitive genes, that is those upregulated by Aire (Figure 1B and Figure 1—source data 2). We then compared the distribution of d3’UTR ratios in Aire-sensitive versus Aire-neutral genes in WT mTEChi. We found a significant shift towards smaller ratios, revealing a preference of Aire-sensitive genes for short-3’UTR isoforms in mTEChi (Figure 1C, Figure 1—figure supplement 1A and Figure 1—source data 3 and 4). The Aire-sensitive genes are expressed at much lower levels than the Aire-neutral genes. Hence, we sought to distinguish whether the observed smaller d3’UTR ratios were specifically associated with Aire-sensitive genes or not simply a co-correlate of low expression. We examined the d3’UTR ratios across expression levels of both Aire-sensitive and neutral genes. While the d3’UTR ratios vary dramatically across genes, at all expression levels, the loess-fitted curve of the Aire-sensitive genes is significantly lower than the one of the Aire-neutral genes, therefore revealing a preference of Aire-sensitive genes for smaller d3’UTR ratios in mTEChi that is independent of the levels of gene expression (Figure 1—figure supplement 2A). Since a much larger proportion of Aire-sensitive genes than Aire-neutral genes are known to be TRA genes, we further asked whether the preference of genes for short-3’UTRs was more aligned with Aire-sensitivity or being a TRA gene. To this end, we compared the d3’UTR ratios between the TRA and non-TRA genes as defined in reference (Sansom et al., 2014) in the subsets of Aire-sensitive and neutral genes. In these mTEChi, the short-3’UTR isoform preference was observed preferentially in Aire-sensitive genes regardless of whether or not they were TRA genes (Figure 1D).

Preference of Aire-sensitive genes for short-3’UTR transcript isoforms in mTEChi.

(A) Schematic of pA usage and 3’UTR isoform expression. 3’ ends of RNA isoforms of a hypothetical gene are shown. Usage of the proximal pA results in a reduced proportion of the long 3’UTR isoform, …

Figure 1—source data 1

d3’UTR annotation files in mice and humans for RNA-seq and microarray analyses.

A proximal pA was validated when its genomic location from the PolyA_DB 2 database differs from 20 bp at least to the genomic location of the UCSC annotated 3’UTR distal boundary or distal pA. In case of multiple proximal pAs, the most proximal one was considered. The two annotation files (GFF2_features_hg19_UTR_d.gtf and GFF2_features_mm9_UTR_d.gtf) are GTF files to be used with intersectBed and coverageBed for RNA-seq-based d3’UTR ratio calculation. The two annotation files (features_hg19_UTR_d.csv and features_mm9_UTR_d.csv) are to be used with our R-implementation of PLATA (Giraud et al., 2012) for individual probe-level microarray analyses and microarray-based d3’UTR ratio calculation.https://cdn.elifesciences.org/articles/52985/elife-52985-fig1-data1-v3.zip
Figure 1—source data 2

Annotation files in mice and humans for RNA-seq differential gene expression.

The two annotation files (UCSCmm9.gtf and UCSChg19.gtf) are GTF files to be used with intersectBed and coverageBed for RNA-seq differential gene expression analyses.https://cdn.elifesciences.org/articles/52985/elife-52985-fig1-data2-v3.zip
Figure 1—source data 3

d3’UTR ratio calculation in WT and Aire-KO mTEChi.

RNAseq_d3UTR_ratio_density.R: R-script used for d3’UTR ratio calculation and d3’UTR ratio density plot from RNA-seq data. The script needs the WT_AireKO_counts_UCSCmm9_DESEQ.csv file (DESeq differential expression file of WT vs Aire-KO mTEChi), the WT_AireKO_counts_UTRd.csv file (count of reads mapping the annotated features including d3’UTRs: GFF2_features_mm9_UTR_d.gtf provided in Figure 1—source data 1) and the number of reads of the WT and Aire-KO samples that map to the mm9 genome (114828554 for WT mTEChi and 179098324 for Aire-KO mTEChi).https://cdn.elifesciences.org/articles/52985/elife-52985-fig1-data3-v3.zip
Figure 1—source data 4

List of Aire-sensitive genes with proximal pAs in mTEChi.

na is for genes for which d3'UTR ratios could not be calculated due to the absence of reads mapping to regions upstream of proximal pAs in last exons.https://cdn.elifesciences.org/articles/52985/elife-52985-fig1-data4-v3.xlsx

To discriminate whether the preference for short-3’UTR isoforms in the Aire-sensitive genes was directly associated with the process of Aire’s induction of gene expression or rather was a feature of Aire-sensitive genes preserved in the absence of Aire, we analyzed the proportion of long 3’UTR isoforms for the Aire-sensitive genes in Aire-KO mTEChi. We note that by definition these genes are all expressed at lower levels in the absence of Aire but that most are still expressed at levels sufficient to determine 3’UTR isoform ratios. We observed a preference for the smaller ratios (Figure 1E, Figure 1—figure supplement 1B and Figure 1—source data 3 and 4) in Aire-sensitive versus neutral genes in the Aire-KO cells. For these KO mTEChi, we once again examined the d3’UTR ratios across all expression levels of both Aire-sensitive and neutral genes and again, like in WT mTEChi, found that the preference of Aire-sensitive genes for smaller d3’UTR ratios was independent of the levels of gene expression (Figure 1—figure supplement 2B). We further noted that the majority (~90%) of Aire-sensitive genes that exhibited small d3’UTR ratios (<0 .25="" a="" aire-ko="" also="" by="" characterized="" d3="" href="chrome-extension://ecabifbgmdmgdllomnfinbmaellmclnh/data/reader/index.html?id=3535&url=https%3A%2F%2Felifesciences.org%2Farticles%2F52985#fig1" in="" mtechi="" ratios="" small="" were="" wt="">Figure 1F and Figure 1—figure supplement 1C), Together, these observations showed that the short-3’UTR isoform preference of Aire-sensitive genes in mTEChi was specific to those genes responsive to Aire, whether or not Aire was actually present.

To determine whether the genes sensitive to Aire exhibit shorter 3’UTR isoforms in mTEChi than in their normal tissue of expression, we set out to identify the Aire-sensitive genes showing tissue-specific or selective expression and to calculate the d3’UTR ratios of the identified Aire-sensitive TRA genes in their tissue(s) of expression. To this end, we collected RNA-seq datasets corresponding to a variety of tissues (Shen et al., 2012; van den Berghe et al., 2013; Warren et al., 2013) and selected in each tissue the set of Aire-sensitive genes for which we identified a restricted expression in comparison to the other tissues by applying the SPM (Specificity Measurement) method (Pan et al., 2013; Figure 1—figure supplement 1D,E). We then calculated the d3’UTR ratios of these genes in each peripheral tissue and found variable d3’UTR ratios across tissues with the mTEChi result falling in the low end of this range (Figure 1G and Figure 1—figure supplement 1F). Taking the Aire-sensitive TRA genes individually, we compared their d3’UTR ratios in mTEChi versus their respective tissue of expression and found a dramatic bias towards higher ratios in peripheral tissues, confirming the preference of Aire-sensitive TRA genes for short-3’UTR transcript isoforms in mTEChi, as compared to the periphery (Figure 1H). In addition, we confirmed that small d3'UTR ratios of Aire-sensitive genes in mTEChi were not correlated with the sequencing depth of the mTEChi RNA-seq libraries nor with the length of the generated reads (Figure 1—figure supplement 3). These findings show that the Aire-sensitive TRA genes exhibit an increased proportion of short-3’UTR transcript isoforms in mTEChi versus a majority of peripheral tissues.
The 3’ end processing complex is preferentially located at proximal pAs of Aire-sensitive genes in AIRE-negative HEK293 cells


We sought to determine whether the preference for short-3’UTR isoforms of Aire-sensitive genes observed in Aire-KO mTEChi and conserved upon upregulation by Aire, is associated with a preferred proximal pA usage driven by the 3’ end processing complex. Current techniques dedicated to localize RNA-binding proteins on pre-mRNAs, for example ultraviolet crosslinking and immunoprecipitation (CLIP)-seq (König et al., 2012), need several millions of cells, precluding their use with primary Aire-KO or WT mTEChi for which only ~30,000 cells can be isolated per mouse. To circumvent this issue, we used the HEK293 cell line for these experiments. HEK293 cells are (i) negative for AIRE expression, (ii) responsive to the transactivation activity of transfected Aire (Abramson et al., 2010; Giraud et al., 2012), and (iii) have been profiled for the RNA binding of the 3’ end processing components by Martin et al. by PAR-CLIP experiments (Martin et al., 2012). Similarly to what we found in WT and Aire-KO mTEChi, we observed in Aire-transfected and Ctr-transfected HEK293 cells significant lower d3’UTR ratios for genes identified by Aire transfection to be Aire-sensitive versus Aire-neutral genes (Figure 2A and Figure 2—figure supplement 1A). It should be noted that many Aire-neutral genes featured moderately lower d3’UTR ratios in Aire-transfected HEK293 cells than Ctr-transfected cells, as a possible effect of Aire itself on an overall 3’UTR shortening in these cells, but this did not produce the large proportion of very small d3’UTR ratios (<0 .2="" 3="" a="" affinity="" aire-sensitive="" aire="" analyzing="" at="" been="" binding="" by="" cells.="" cells="" cleavage="" close="" complex="" core="" cstf2="" ctr-transfected="" distal="" end="" exhibit="" exhibited="" for="" genes="" has="" hek293="" highest="" href="chrome-extension://ecabifbgmdmgdllomnfinbmaellmclnh/data/reader/index.html?id=3535&url=https%3A%2F%2Felifesciences.org%2Farticles%2F52985#bib38" in="" localization="" maturing="" member="" of="" or="" pas="" pattern="" performed="" processing="" proximal="" reported="" sites="" that="" the="" their="" to="" transcripts="" was="" were="" within="">Martin et al., 2012). We first validated for Aire-neutral genes that lower d3’UTR ratios correlated with the preferential location of the 3’ end processing complex at proximal pAs (Figure 2—figure supplement 1B). Then we compared the location of the complex between the Aire-sensitive and neutral genes, and found a dramatic preference for proximal pAs versus distal pAs at Aire-sensitive genes (Figure 2B), showing that the 3’ end processing complex was already in place at proximal pAs of Aire-sensitive genes before Aire expression was enforced in HEK293 cells.

Increased binding of the 3’end processing complex at proximal pAs of Aire-sensitive genes in HEK293 cells.

(A) Densities in Ctr-transfected HEK293 cells (AIRE-) (Top) and Aire-transfected HEK293 cells (Bottom) of d3’UTR ratios from RNA-seq data of Aire-sensitive and neutral genes identified after Aire …

Figure 2—source data 1

Genomic location of pAs on hg19 extracted from the PolyA_DB 2 database for CLIP-seq analysis.

The four files (3UTRd+_const.xlsx, 3UTRd+_alt.xlsx, 3UTRd-_const.xlsx and 3UTRd-_alt.xlsx) were generated parsing the PolyA_DB 2 database for alternative (proximal) pAs and constitutive (distal) pAs. Alternative pAs with their genomic location are listed in two files according to the orientation of the genes (3UTRd+_alt.xlsx and 3UTRd-_alt.xlsx). Constitutive pAs with their genomic location are also listed in two different files (3UTRd+_const.xlsx and 3UTRd-_const.xlsx). These files are used to generate bed files of proximal and distal pA location at Aire-sensitive and neutral-genes for CLIP-seq analysis using the sitepro program (CEAS distribution).https://cdn.elifesciences.org/articles/52985/elife-52985-fig2-data1-v3.zip
CLP1 promotes 3’UTR shortening and higher expression at Aire-sensitive genes in HEK293 cells


Since the preference for short-3’UTR isoform expression of Aire-sensitive genes is associated with the increased binding of the 3’ end processing complex to proximal pAs in AIRE-negative HEK293 cells, we asked whether factors that belong to the large 3’ end processing complex (de Vries et al., 2000; Shi et al., 2009) could account for the short-3’UTR transcript isoform preference of Aire-sensitive genes in these cells and affect the expression of these genes. Among the members of the core and accessory subunits of the 3’ end processing complex, the cleavage factor CLP1 (core) and also DDX5, DDX17, PABPC1, PRKDC, SUPT16H and PARP1 (accessory) have been reported to control Aire-sensitive gene expression in an Aire-positive context (Giraud et al., 2014) with the possibility that the effect of some of these factors could result from their action on the basal expression of Aire-sensitive genes and therefore be observed in the absence of Aire. In addition to these candidate factors, we also tested the effect of HNRNPL, which although not part of the 3’ end processing complex, has been shown to regulate 3’ end processing of some human pre-mRNAs (Millevoi and Vagner, 2010) and control Aire transactivation function in mTEChi (Giraud et al., 2014).

First, to determine whether any of the candidate factors could be involved in 3’UTR shortening per se, we carried out short hairpin (sh)RNA-mediated interference in AIRE-negative HEK293 cells and generated expression profiles using Affymetrix HuGene ST1.0 microarrays (Figure 3—source data 1). These arrays typically include one or two short probes per exon, and for approximatively 35% of all genes with potential proximal pAs they also include at least two short probes in the d3’UTR region. In spite of the limited d3’UTR coverage and lower accuracy of hybridization measurements based on two short probes only, we found these arrays adequate to detect 3’UTR length variation at the genome-scale level. For each gene with a microarray-detectable d3'UTR region, the d3’UTR isoform ratio was calculated by dividing the measured d3’UTR expression by the whole-transcript expression based on all short probes mapping to the transcript (Figure 3—source data 1 and 2). Comparison of the percentages of genes exhibiting a significant increase or decrease of the d3’UTR ratios upon knockdown of a candidate gene versus the control LacZ gene, was used to evaluate the specific impact of the candidate on 3’UTR isoform expression (Figure 3—source data 3). For each candidate gene, we measured the knockdown efficiency of, typically, five different shRNAs and selected the three most effective (Supplementary file 1). With a threshold of at least two shRNAs per gene producing a significant excess of genes with increased d3’UTR ratios, we found that CLP1, DDX5, DDX17, PARP1 and HNRNPL contributed to 3’UTR shortening in HEK293 cells (Figure 3A and Figure 3—figure supplement 1). The absence of effect of PRKDC, SUPT16H and PABPC1 could indicate that these factors do not contribute to the activity of 3' end processing sub-complexes or that other factors, perhaps with redundant function, can compensate for their loss. As a control, we also performed knockdown of CPSF6, a core member of the 3’ end processing complex, that has been consistently reported to promote general 3’UTR lengthening (Li et al., 2015; Martin et al., 2012). As expected, knockdown of CPSF6 revealed a skewed distribution towards decreased d3’UTR ratios (Figure 3—figure supplement 2).

CLP1 controls the expression of Aire-sensitive genes with proximal pAs and their shortening in HEK293 cells.

(A) Individual probe-level analysis of microarray data from knockdown and control HEK293 cells. Vertical bars represent the proportion of genes with a significant increase or decrease of d3’UTR …

Figure 3—source data 1

Human Gene ST1.0 microarray probeset and individual probe expression extraction.

We use the aroma.affymetrix R-package and the commands listed in the aroma.affymetrix.commands.docx file for ‘probeset expression extraction’ and ‘individual probe expression extraction’. A mandatory folder organization associated with this package is needed for the analysis. The sample CEL files must be added in a dedicated folder. (HuGene-1_0 st-v1,r3.cdf) is the Chip Description File. (probeset.csv) is the list of probeset IDs corresponding to the genes on the array. (HuGeneST1features.csv) is the list of probeset IDs and their individual probe IDs.https://cdn.elifesciences.org/articles/52985/elife-52985-fig3-data1-v3.zip
Figure 3—source data 2

Microarray individual probe d3’UTR mapping and d3’UTR ratio calculation.

Microarray_d3UTR_hg19.R: R-script to perform individual probe d3’UTR mapping and d3’UTR ratio calculation. Dependent files: - annotation file including 3d’UTR features for microarray analysis (Figure 1—source data 1): features_hg19_UTR_d.csv - Individual probe location on hg19: HuGene-1_0 st-v1.hg19.probe.csv - Individual probe expression obtained from the comparison: CTR versus CLP1 sh2 KD samples (Figure 3—source data 1): ST1features_CTR_v_CLP1_SH2.csv Result file: global_HuGene_CTR_v_CLP1_SH2.csv In the result file, the feat_WTnorm and feat_KOnorm columns correspond, for the 3UTRd features, to the d3’UTR ratios in WT and KO samples, respectively.https://cdn.elifesciences.org/articles/52985/elife-52985-fig3-data2-v3.zip
Figure 3—source data 3

d3’UTR ratio imbalance obtained from microarray data.

Microarray_d3UTR_significance.R: R-script used to perform the analysis. (global_HuGene_CTR_v_CLP1_SH2_3dUTR.csv) corresponds to global_HuGene_CTR_v_CLP1_SH2.csv that was obtained from Figure 3—source data 2 and restricted to the d3’UTR features.https://cdn.elifesciences.org/articles/52985/elife-52985-fig3-data3-v3.zip

Then, we sought among the candidate factors that had an effect on 3’UTR shortening, those that also selectively impacted the basal (in absence of Aire) expression of the Aire-sensitive genes possessing proximal pAs by analyzing microarray whole-transcript expression in control (Ctr) and knockdown HEK293 cells. For each candidate factor, we computed the rank of differential expression of all genes in Ctr versus HEK293 cells knockdown for the tested candidate, and found that CLP1 was the only gene whose knockdown by two distinct shRNAs led to a significant reduction of the expression of Aire-sensitive genes (Figure 3B), focusing our subsequent study on CLP1. The effect of CLP1 knockdown on Aire-sensitive genes lacking annotated proximal pAs was much less than for the genes with proximal pAs, but a smaller effect on the annotated single-pA genes was observed for one of two CLP1 shRNAs (Figure 3C). This might simply be because some genes tagged as “proximal pA-“ in the incomplete PolyA_DB 2 database possessed undiscovered proximal pAs in mTEChi and were therefore responsive to the action of CLP1. Our findings regarding the differential response of pA+ and pA- genes to CLP1 loss-of-function suggested that the effect of CLP1 on Aire-sensitive gene expression in HEK293 cells was dependent on the potential of Aire-sensitive genes to switch from using a distal to a proximal pA site.

To assess the effect of CLP1 on 3’UTR shortening at Aire-sensitive genes, we performed RNA-seq experiments in Ctr and CLP1 knockdown HEK293 cells. Comparison of the 3’ end profiles revealed a statistically significant increase of the median d3’UTR ratios of Aire-sensitive genes for both CLP1 hit shRNAs (Figure 3D, Left), showing that CLP1 is able to promote 3’UTR shortening at Aire-sensitive genes in HEK293 cells. The d3’UTR ratios of Aire-sensitive genes after CLP1 knockdown did not reach those of Aire-neutral genes which could simply be due to remaining CLP1 in these cells following knockdown (measured knockdown as 75% and 72% for shRNA 2 and 3, respectively) or could indicate that additional non-CLP1-dependent factors also contribute to the difference in d3’UTR ratios of Aire-sensitive versus Aire-neutral genes. In contrast to Aire-sensitive genes, only small, statistically insignificant increases of d3’UTR ratios of Aire-neutral genes could be detected in CLP1 knockdown HEK293 cells, indicating that the effect of CLP1 on 3’UTR shortening preferentially affects the Aire-sensitive genes in mTEChi. The preferential 3’UTR shortening effect of CLP1 at Aire-sensitive genes was further supported by the observation of an enhanced proportion of genes subject to CLP1-mediated 3'UTR shortening among the Aire-sensitive versus neutral genes in HEK293 cells (Figure 3—figure supplement 3). This finding is consistent with the microarray results (Figure 3A) indicating that a small proportion of genes in the genome with microarray-detectable d3’UTR regions underwent 3’UTR length variation after CLP1 knockdown. Finally, using RNA-seq data, we confirmed the effect of CLP1 knockdown on the selective reduction of the expression of Aire-sensitive genes that were annotated to have alternative proximal pAs in HEK293 cells (Figure 3D, Right). Furthermore, we observed that the extent of the effect of CLP1 knockdown on the levels of Aire-sensitive gene expression in Aire-transfected HEK293 cells (Figure 3—figure supplement 4) was similar, although slightly stronger than the one we described above in AIRE-negative HEK293 cells.

Together these findings showed that CLP1 is able to promote 3’UTR shortening and increase expression of Aire-sensitive genes with proximal pAs, supporting a linked mechanism between CLP1-promoted 3’UTR shortening and gene expression enhancement at Aire-sensitive genes.
Clp1 promotes 3’UTR shortening and higher levels of Aire-upregulated transcripts in mTEChi


To test for the in vivo impact of Clp1 on 3’UTR length variation of the transcripts upregulated by Aire in mTEChi, we generated lentigenic Clp1 knockdown mice. Three shRNAs targeting the murine Clp1 with the highest knockdown efficiency (Supplementary file 1) were cloned as a multi-miR construct into a lentiviral vector, downstream of a doxycycline inducible promoter and upstream of the GFP as a marker of activity. Purified concentrated lentiviruses containing this construct and the lentiviral vector expressing the TetOn3G protein were used to microinfect fertilized oocytes, which were reimplanted into pseudopregnant females (Figure 4A). Of the 19 pups that we obtained with integration of both plasmids, two pups were expressing, after doxycycline treatment, the multi-miR in mTEChi and one pup was exhibiting a 60% reduction of Clp1 mRNA levels in GFP+ (Clp1 knockdown) versus GFP- (Ctr) mTEChi taken from the same mouse. These two cell populations, Clp1 knockdown and no-knockdown Ctr mTEChi, were separated by GFP signal by FACS and processed for RNA-seq (Figure 4—figure supplement 1). We found a significant increase of d3’UTR ratios of Aire-sensitive genes following Clp1 knockdown in mTEChi, as well as a less pronounced effect that does not reach significance for Aire-neutral genes regardless of whether they are TRA genes or not. This finding therefore supports that the effect of Clp1 on 3’UTR shortening preferentially affects Aire-sensitive genes in mTEChi (Figure 4B and Figure 4—source data 1). As in the HEK293 in vitro experiments, we observed that the d3’UTR ratios of the Aire-sensitive genes didn’t reach the ratios of the Aire-neutral genes after Clp1 knockdown, again due either to the incomplete 60% Clp1 knockdown or to the existence of other non-Clp1-dependent differences. The preferential 3’UTR shortening effect of Clp1 at Aire-sensitive genes was further supported by the observation of an enhanced proportion of genes subject to Clp1-mediated 3'UTR shortening among the Aire-sensitive versus neutral genes in mTEChi (Figure 4—figure supplement 2).

Clp1 controls the expression of Aire-upregulated genes with proximal pAs and their shortening in mTEChi.

(A) Schematic of the lentigenic knockdown strategy. shRNAs against Clp1 were transferred to a doxycycline-inducible expression system for microinfection of fertilized oocytes under the zona …

We then compared the levels of expression of the Aire-sensitive genes with potential proximal pAs in Ctr versus Clp1 knockdown mTEChi and found, in contrast to Aire-neutral genes, a significant reduction following Clp1 knockdown (Figure 4C, Left). The link between Clp1-mediated 3'UTR shortening and increased expression at Aire-sensitive genes is strengthened by the observation that the Aire-sensitive genes that undergo a higher than 2-fold d3’UTR ratio decrease in Ctr versus Clp1 knockdown lentigenic mTEChi, exhibit higher expression values (Figure 4—figure supplement 3). As observed in HEK293 cells, the effect of Clp1 knockdown was dampened for the Aire-sensitive genes lacking potential proximal pAs (Figure 4C, Right). We also found that the expression of the genes without proximal pAs was globally reduced in comparison to the genes with proximal pAs, supporting a linked mechanism between proximal pA usage and increased gene expression in mTEChi. Altogether these results showed that Clp1 promotes 3’UTR shortening of the transcripts upregulated by Aire in mTEChi and strongly suggested that it enhances their level of expression through proximal pA usage.

A role for Clp1 in mTEChi was further supported by its higher expression in mTEChi in comparison to a wide variety of tissues for which we collected RNA-seq datasets (Shen et al., 2012; van den Berghe et al., 2013; Warren et al., 2013; Figure 4D). As a comparison, none of the other candidate factors, Hnrnpl, Ddx5, Ddx17 and Paprp1 that contributed to 3’UTR shortening in HEK293 cells, were over-represented in mTEChi (Figure 4—figure supplement 4). Importantly, we also validated higher expression of Clp1 at the protein level in mTEChi versus their precursor cells, mTEClo (Gäbler et al., 2007; Hamazaki et al., 2007), cells from the whole thymus, and also the predominant thymus CD45+ leukocyte fraction (Figure 4—figure supplement 5). In addition, a role for Clp1 independent of Aire’s action on genes expression was supported by the observed similar levels of Clp1 expression in WT and Aire-KO mTEChi (Figure 4—figure supplement 6A) and the lack of evidence for Aire and CLP1 interaction (Figure 4—figure supplement 6B). Finally, we found a significant correlation between higher Clp1 expression and lower d3’UTR ratios across peripheral tissues for the Aire-sensitive TRA genes (Figure 4E, Top). In contrast, no such significant correlation was observed for Aire-neutral genes (Figure 4E, Bottom). These findings indicate that the effect of Clp1 on 3’UTR shortening at Aire-sensitive genes is independent of Aire’s action on gene expression, general across cell types, and conserved upon upregulation of gene expression by Aire in mTEChi.
Clp1-driven 3’UTR shortening of Aire-upregulated transcripts is associated with higher stability in mTEChi


To determine whether the effect of Clp1 on 3’UTR shortening of Aire-upregulated transcripts was associated with higher levels of these transcripts in mTEChi through increased stability, we assessed the stability of all transcripts in these cells. We used Actinomycin D (ActD) to inhibit new transcription (Sobell, 1985) and assessed the differences in transcript profiles between treated and untreated mTEChi. We harvested the cells at several timepoints for expression profiling by RNA-seq. Each RNA-seq dataset was normalized to total read counts and we calculated the expression fold-change of each gene in treated versus control samples, therefore reflecting differences in transcript levels in the absence of ongoing transcription. We selected among the Aire-sensitive genes two subsets: (i) those that underwent a higher than 2-fold d3’UTR ratio decrease in Ctr versus Clp1 knockdown lentigenic mTEChi and, (ii) genes with steady d3’UTR ratios (Figure 4—source data 1). Comparison of the two gene sets in ActD-treated versus control samples revealed that the Aire-sensitive genes subject to Clp1-driven 3’UTR shortening were initially comparable in their changes in transcript levels upon transcription inhibition to the changes for Aire-sensitive genes that showed no 3’UTR shortening but then showed increasing preservation of transcript levels at 3 hr and 6 or 12 hr, indicating a stabilization of this subset of 3’UTR-shortened transcripts (Figure 5; p=2.1×10−4, 1.3 × 10−12, and 5.6 × 10−10 at 3, 6 and 12 hr, respectively). Therefore, the Clp1-promoted 3’UTR shortening at Aire-sensitive genes in mTEChi is indeed associated with higher transcript stability.

Clp1-driven 3’UTR shortening of the Aire-upregulated transcripts show higher stability in mTEChi.

Relative expression of Aire-sensitive genes in ActD-treated (for indicated time durations) vs. control mTEChi depending on whether they undergo Clp1-mediated 3’UTR shortening as identified in Clp1 …

Figure 5—source data 1

Potential miRNA-specific target genes with conserved miRNA sites in their 3’UTRs or d3’UTRs, from the TargetScan 6.2 database.

In both files (miRNA_target_Refseq.gmt and miRNA_target_Refseq_d3UTR_filtered.csv), the first two columns show miRNA families and individual miRNA(s). (miRNA_target_Refseq.gmt) was used in GSEA to determine the miRNA-specific target gene sets showing significant enrichment in mTEChi or mTEClo. (miRNA_target_Refseq_d3UTR_filtered.csv) was used to determine the proportion of Aire-sensitive and neutral genes potentially targeted in their d3’UTRs by miRNAs. The file (Leading_edge_mTEChi_vs_lo_346_miRNAs.xlsx) summarizes the leading edge analysis of 346 miRNAs or miRNA families and lists the core targets that account for enrichment signals of gene sets significantly deviated (FDR < 0.05) in mTEChi versus mTEClo. Here all significant gene sets are reduced in mTEChi (enriched in mTEClo). Table values are rank metric scores. The last line of the table shows the number of miRNAs or miRNA families potentially recognizing specific leading edge targets.https://cdn.elifesciences.org/articles/52985/elife-52985-fig5-data1-v3.zip

Since short-3’UTR transcripts, in comparison to their longer counterparts, lack miRNA-binding sites specifically targeted by miRNAs known to destabilize transcripts and/or decrease translation, we assumed that the Aire-sensitive genes that underwent Clp1-promoted 3'UTR-shortening escape the post-transcriptional repression mediated by miRNAs in mTEChi. We assessed the level of miRNA-mediated post-transcriptional repression in mTEChi versus mTEClo precursor cells by assessing the differences in expression of sets of genes identified as potentially targeted by miRNAs, in mTEChi versus mTEClo. To this end, we performed a Gene Set Enrichment Analysis (GSEA) between the two mTEC populations for each set of genes potentially targeted by a specific miRNA or miRNA family (as defined by the TargetScan database, Figure 5—source data 1). We found 346 miRNA-specific target gene sets that had significantly reduced expression in mTEChi versus mTEClo (Figure 5—figure supplement 1, Figure 5—source data 1). In sharp contrast, we did not detect any of the miRNA-specific target gene sets that had reduced expression in mTEClo versus mTEChi. This suggests that miRNAs are an important contributor to transcript destabilization in mTEChi and that this effect is similar or less for the same miRNA target sets in mTEClo cells. To determine whether the Clp1-promoted 3'UTR shortening at Aire-sensitive genes can lead to escape from post-transcriptional repression mediated by the miRNAs associated with these 346 target-gene sets, we calculated, among the Aire-sensitive genes subject to Clp1-promoted shortening, the proportion of those that are potentially targeted by one of these miRNAs at a conserved site in the d3’UTR of their longer-transcript isoforms (Figure 5—source data 1). We found a large proportion of these cases (46%) had one or more such conserved miRNA target sites, providing a specific miRNA de-repression mechanism due to 3’UTR shortening for a large fraction of the Aire-sensitive genes. This analysis is only able to identify miRNAs that are highly conserved and that have disproportionate effect in mTEChi versus mTEClo, so it is expected to miss many of the cases of potential miRNA regulation that may be relieved by 3’UTR shortening in mTEChi.
Discussion



Aire-sensitive genes upregulated by Aire in mTEChi have been shown to be controlled also by a number of Aire’s allies or partners (Abramson et al., 2010; Giraud et al., 2014). Some of these factors have been reported to be part of the large 3’ end processing complex (de Vries et al., 2000; Shi et al., 2009), notably Clp1 which belongs to the core 3’ end processing sub-complex (de Vries et al., 2000; Mandel et al., 2008) and favors early cleavage and polyadenylation in some systems (Holbein et al., 2011; Johnson et al., 2011). In our present study, we demonstrated that Clp1 promotes 3’UTR shortening of the transcripts upregulated by Aire in mTEChi, and that these transcripts are associated with an enhanced stability, revealing a post-transcriptional mechanism whose escape leads to higher levels of expression of Aire-sensitive genes in mTEChi.

Comparison of RNA-seq expression profiles between Clp1 knockdown and Ctr mTEChi isolated from Clp1 lentigenic mice showed that Clp1 was able to promote 3’UTR shortening at Aire-sensitive genes, thereby contributing to the preference of short-3’UTR transcript isoforms of Aire-sensitive genes versus Aire-neutral genes in WT mTEChi. We found that this 3’UTR shortening was driven by Clp1 in mTEChi but also in Aire-non-expressing HEK293 cells, in which the level of Aire-sensitive gene expression is weak but still detectable by RNA-seq, thus showing that the effect of Clp1 on Aire-sensitive genes was not restricted to mTEChi nor dependent on Aire’s action. Although Clp1 is a main contributor to this described process, additional factors might also be involved.

Clp1 is a ubiquitous protein showing higher expression in mTEChi than in mTEClo, CD45+ thymic cells or thymic cells taken as a whole, but also showing higher gene expression in comparison with a large range of peripheral tissue cells. Interestingly, we found that the level of Clp1 expression was also significantly high in 14.5 embryonic liver cells, cells that have been reported to undergo sustained cellular activation leading to proliferation and maturation into hepatocytes (Kung et al., 2010). Similarities with the pattern of mTEC development (Gray et al., 2007; Irla et al., 2008) might point out cell activation as a potent inducer of Clp1 expression. Notably, we showed that higher levels of Clp1 expression were correlated with higher proportions of short-3’UTR transcript isoforms of tissue-restricted Aire-sensitive genes across peripheral tissues, suggesting the existence of a Clp1-promoted 3’UTR shortening mechanism occurring in a variety of cell types and, among those cell types we surveyed, most pronounced in mTEChi.

The observation that the levels of Clp1 expression and the distributions of the short-3’UTR transcript isoforms of Aire-sensitive genes were similar between Aire-KO and WT mTEChi, strongly suggests that the Clp1-driven 3’UTR shortening mechanism is already in place in mTEChi before Aire is activated. The independence of Aire’s action on gene expression and the effect of Clp1 on 3’UTR shortening is also consistent with the lack of direct interaction found between Aire and Clp1. Although these effects of Aire and Clp1 appear decoupled, it is also apparent as noted that sensitivity to these Aire and Clp1 effects have a higher-rate of co-occurrence in the same set of genes. Although we could detect and characterize the effect of Clp1 on the Aire-sensitive genes in mTEChi using annotated pAs from the PolyA_DB 2 database which mainly contains pAs identified from comparisons across peripheral tissues, the precise proportion of Aire-sensitive genes that are subject to differential pA usage and Clp1-driven shortening in mTEChi remains to be precisely defined. Current methods to capture mTEChi-specific pAs using next generation sequencing methods require very large numbers of cells relative to the number of mTEChi (~30,000) that can be isolated per mouse but emerging methods for accurate pA identification with lower input requirements could make this feasible from such low material quantity (Chen et al., 2017).

Clp1 is a member of the core 3’ end processing complex that we showed to be preferentially located at proximal pAs of Aire-sensitive genes in HEK293 cells, indicating that the effect of Clp1 on 3’UTR shortening could result from enhanced recruitment of the 3’ end processing complex to proximal pAs. Similar modes of action, involving members of the core 3’ end processing complex and resulting in transcripts with either shorter or longer 3’UTRs have been described for Clp1 in yeast (Johnson et al., 2011) and Cpsf6 in humans (Gruber et al., 2012; Martin et al., 2012), respectively. However, the basis for the preferential effect of Clp1 on the Aire-sensitive genes remains unknown, but note that it does appear to be conserved across cell types. One possibility is that the regulatory elements and associated basal transcriptional machinery at Aire-sensitive genes share conserved features that favor recruitment of Clp1.

3’UTRs have been described as potent sensors of the post-transcriptional repression mediated by miRNAs, resulting in mRNA destabilization and degradation (Bartel, 2009; Jonas and Izaurralde, 2015). In addition to miRNAs, RNA-binding proteins (RBPs) also contribute the regulation of transcript stability depending on the type of cis regulatory elements that they recognize on 3’UTRs, triggering either repression or activation signals (Spies et al., 2013). In mTEChi, we found increased stability of the Aire-sensitive transcripts that were subject to Clp1-promoted 3’UTR shortening. Thus, our findings supported an escape from the post-transcriptional repression of short-3’UTR transcripts in mTEChi, leading to enhanced stability and accumulation of these transcripts. Similar observations, resulting in increased transcript levels and higher protein translation, have notably been documented for genes subject to 3’UTR shortening whose long-3’UTR transcript isoforms were targeted by particular miRNAs or classes of RBPs (Chen and Shyu, 1995; Guo et al., 2010; Masamha et al., 2014; Mayr and Bartel, 2009; Vlasova et al., 2008).

The finding that the transcripts controlled by Aire and subjected to Clp1-promoted 3’UTR shortening in mTEChi escape the miRNA-mediated post-transcriptional repression that proves to be stronger in mTEChi than in mTEClo, is consistent with recent reports showing that maturation-dependent expression of miRNAs correlates with the expression of Aire in mTEChi (Ucar et al., 2013) and that Aire is able to control the expression of miRNAs in a murine mTEC line (Macedo et al., 2015) or in mTEChi in vivo through the comparison of WT and Aire null mutant thymi (Ucar et al., 2013). Interestingly, it has also been reported that, in contrast to Aire-independent TRA transcripts, some classical Aire-dependent TRAs exhibit profound refractoriness in reconstructed networking interactions with miRNAs (Macedo et al., 2015; Oliveira et al., 2016). An alteration in 3’UTRs of Aire-dependent TRA transcripts has been proposed to explain this observation (Passos et al., 2015). Consistent with this proposal, our study provides evidence that Aire-sensitive genes undergo 3’UTR shortening resulting in the escape of the generated shortened transcripts from the miRNA-mediated post-transcriptional repression in mTEChi.

In addition to impacting transcript stability through the escape of the post-transcriptional repression, short 3’UTRs have also been shown to shift the surface localization of membrane proteins and the functional cell compartment of other types of proteins, in favor of the endoplasmic reticulum (ER) (Berkovits and Mayr, 2015). Thus, one interesting hypothesis for future study is that Clp1-driven 3’UTR shortening at Aire-sensitive genes in mTEChi might not only impact the expression of Aire-dependent self-antigens but also favor their routing to the ER, from where they will be processed and presented, potentially enhancing their presentation to self-reactive T lymphocytes and, subsequently, central tolerance and protection against autoimmune manifestations.
Materials and methods


Reagent type
(species) or resourceDesignationSource or referenceIdentifiersAdditional
informationGenetic reagent (Mus-musculus) B6.129S2-Airetm1.1Doi/J Anderson et al., 2002 RRID:IMSR_JAX:004743 Provided by D. Mathis and C. Benoist (Harvard Medical School)
Cell line (Homo-sapiens) HEK293 cells Martin et al., 2012 Cell line obtained from M. Zavolan lab
Cell line (Mus-musculus) 1C6 mouse thymic epithelial cells Mizuochi et al., 1992
Transfected construct (mouse) pCMV-Aire-Flag plasmid Abramson et al., 2010
Antibody anti-mouse CD45-PerCPCy5.5
(Clone: 30-F11;
Rat IgG2b, k) Biolegend Cat#
103131
RRID:AB_893344 FACS (1:50)
Antibody anti-mouse Ly51-PE
(Clone: 6C3;
Rat IgG2a, k) Biolegend Cat#
108307
RRID:AB_313364 FACS (1:800)
Antibody anti-mouse
I-A/E-APC
(Clone: M5/114.15.2;
Rat IgG2b, k) eBioscience Cat#
17–5321
RRID:AB_469455 FACS (1:1200)
Antibody anti-Clp1
(Clone: EPR7181; Rabbit monoclonal) GeneTex Cat#
GTX63930 FACS (1:100)
Antibody anti-human CLP1 (Goat polyclonal) Santa Cruz Cat# sc-243005 IP and WB
Antibody Flag-tag M2 (Mouse monoclonal) Sigma Cat#
F1804
RRID:AB_262044 IP and WB
Antibody mouse CD45 microbeads Miltenyi Biotec Cat#
130-052-301
Recombinant DNA reagent PLKO.1-Puro shRNAs
(plasmids-lentiviruses) Broadinstitute (RNAi platform) shRNA details:
Supplementary file 1
Recombinant DNA reagent EF1-Tet3G
(plasmid-lentivirus) Vectalys
Recombinant DNA reagent pTRE3G-shClp1-GFP (plasmid-lentivirus) Vectalys Clp1 shRNAs:
Supplementary file 1
Sequence-based reagent qPCR primers this paper primer details:
Supplementary file 2
Sequence-based reagent oligo(dT)12–18 primers ThermoFisher Cat#
18418012
Commercial assay or kit Trans-IT-293 transfection reagent Mirus Cat#
MIR 2700
Commercial assay or kit Universal Magnetic Co-IP Kit Active Motif Cat#
54002
Commercial assay or kit Superscript II Reverse Transcriptase ThermoFisher Cat#
18064071
Chemical compound, drug collagenase D Roche Cat#
11088866001 (1 mg/mL final)
Chemical compound, drug DNase I Sigma Cat#
DN25 (1 mg/mL final)
Chemical compound, drug collagenase/dispase Roche Cat#
11097113001 (2 mg/mL final)
Chemical compound, drug actinomycin D Sigma Cat#
A9415 (1 µM)
Software, algorithm Bowtie Johns Hopkins University
(http://bowtie-bio.sourceforge.net/index.shtml) RRID:SCR_005476
Software, algorithm Samtools Samtools (http://samtools.sourceforge.net/) RRID:SCR_002105
Software, algorithm Bedtools University of Utah
(https://bedtools.readthedocs.io/en/latest/) RRID:SCR_006646
Software, algorithm DESeq EMBL
(http://bioconductor.org/packages/release/bioc/html/DESeq.html) RRID:SCR_000154
Software, algorithm RNAseq_d3UTR_ratio_density.R this paper provided: Figure 1—source data 3
Software, algorithm CEAS CEAS (http://ceas.cbi.pku.edu.cn/index.html) RRID:SCR_010946
Software, algorithm aroma.affymetrix R package UCSF (prev. UC Berkeley)
(https://www.aroma-project.org) RRID:SCR_010919 details:
Figure 3—source data 1
Software, algorithm Microarray_d3UTR_hg19.R this paper provided:
Figure 3—source data 2
Software, algorithm Microarray_d3UTR_significance.R this paper provided:
Figure 3—source data 3
Software, algorithm GSEA BroadInstitute
(http://www.broadinstitute.org/gsea/) RRID:SCR_003199

Mice

Aire-deficient mice on the C57BL/6 (B6) genetic background (Anderson et al., 2002) were kindly provided by D. Mathis and C. Benoist (Harvard Medical School, Boston, MA), and wild-type B6 mice were purchased from Charles River Laboratories. Mice were housed, bred and manipulated in specific-pathogen-free conditions at Cochin Institute according to the guidelines of the French Veterinary Department and under procedures approved by the Paris-Descartes Ethical Committee for Animal Experimentation (decision CEEA34.MG.021.11 or APAFIS #3683 N° 2015062411489297 for lentigenic mouse generation).
Cell cultureRequest a detailed protocol

Cell lines used include HEK293 from the lab of Dr Zavolan and the murine thymic epithelial 1C6 cell line (Mizuochi et al., 1992). These cells were maintained in the lab, negative for mycoplasma and cultured in DMEM high glucose medium complemented with 10% FBS, L-glutamate, sodium pyruvate 1 mM, non-essential amino acids and pen/strep antibiotics. HEK293 cells display a normal HEK293 morphology and the identity of 1C6 cells was confirmed by RNA-seq with high expression of typical thymic epithelial markers.
Isolation and analysis of medullary epithelial cellsRequest a detailed protocol

Thymi of 4-wk-old mice were dissected, trimmed of fat and connective tissue, chopped into small pieces and agitated to release thymocytes. Digestion with collagenase D (1 mg/mL final) (Roche) and DNase I (1 mg/mL final) (Sigma) was performed for 30 min at 37°C. The remaining fragments were then treated with a collagenase/dispase mixture (2 mg/mL final) (Roche) and DNase I (2 mg/mL final) at 37°C until a single-cell suspension was obtained. Cells were passed through 70 μm mesh and resuspended in staining buffer (PBS containing 1% FBS and 5 mM EDTA). For isolation of medullary epithelial cells, an additional step of thymocyte depletion was performed using magnetic CD45 MicroBeads (Miltenyi Biotec). The resuspended cells were incubated for 20 min at 4°C with the fluorophore-labeled antibodies CD45-PerCPCy5.5 (1:50) (Biolegend), Ly51-PE (1:800) (Biolegend), and I-A/E-APC (1:1,200) (eBioscience). Sorting of mTEChi/lo (CD45−Ly51−I-A/Ehigh/low) or, for lentigenic mice, of mTEChi +/- for GFP expression, was performed on a FACSAria III instrument (BD Bioscience). For Clp1 staining, cells labeled for membrane antigens were fixed in (3.7%) formaldehyde for 15 min, permeabilized in (0.5%) saponin for 15 min, and incubated with an antibody to Clp1 (1:100) (clone: EPR7181, GeneTex, GTX63930) and an Alexa Fluor 488-conjugated goat polyclonal antibody to rabbit (1:200) (TermoFisher, A11008). Cells were analyzed on an Accuri C6 instrument (BD Bioscience). All compensations were performed on single-color labeling of stromal cells and data analysis was done using the BD Accuri C6 Analysis software.
Actinomycin D treatmentRequest a detailed protocol

mTEChi were isolated and sorted (~4 x 105) from pooled thymi of B6 mice as described above, then treated with actinomycin D (1 µM) in MEM medium for 3, 6 and 12 hours at 37°C. The cells were then harvested and total RNA was isolated by TRIzol extraction (ThermoFisher).
Aire-transient transfectionsRequest a detailed protocol

HEK293 cells were seeded at a density of 600,000 per well in 6-well plates or at 3.5*106 per 10-cm2 culture dish. The next day, and depending on the dish format, HEK293 cells were transfected with either 2.5 or 8 µg of the pCMV-Aire-Flag plasmid (or control plasmid) using 7.5 or 32 µl of the TransIT-293 transfection reagent (Mirus). After 48 hr, cells cultured in 6-well plates were subjected to total RNA extraction for RNA-seq experiments, whereas those in the culture dish were subjected to protein extraction for coimmunoprecipitation.
CoimmunoprecipitationRequest a detailed protocol

Extraction of nuclear proteins and coimmunopreciptation were performed using the Universal Magnetic Co-IP Kit (Active Motif). Briefly, Aire-transfected HEK293 cells were lysed with hypotonic buffer and incubated on ice for 15 min. Cell lysates were centrifuged for 30 s at 14,000 x g, then the nuclei pellets were digested with an enzymatic shearing cocktail for 10 min at 37°C. After centrifugation of the nuclear lysates, the supernatants containing the nuclear proteins were first incubated with specific antibodies for 4 hr, then with Protein-G magnetic beads for 1 hr at 4°C with rotation. After four washes, bound proteins were eluted in laemmli/DTT buffer, separated by SDS/PAGE for 40 min at 200 V, transferred to PVDF membranes using the TurboTransfer System for 7 min at 25 V (BioRad) and blocked for 1 hr with TBS, 0.05% Tween, 3% milk. The western blot detection was done after incubation with primary (2 hr) and secondary antibodies (1 hr). Detection was performed by enhanced chemiluminiscence (ECL). Antibodies used for immunoprecipitation or revelation were: CLP1 (sc-243005, Santa Cruz), Flag-tag M2 (F1804, Sigma), goat IgG control (sc-2028, Santa Cruz), mouse IgG1 control (ab18443, Abcam), and horseradish peroxidase-conjugated anti-mouse IgG light chain specific (115-035-174, Jackson ImmunoResearch).
shRNA-mediated knockdownRequest a detailed protocol


Specific knockdown of CLP1, HNRNPL, DDX5, DDX17, PARP1, PRKDC, SUPT16H, PABPC1, CPSF6 and the control LacZ gene in HEK293 cells, or of Clp1 in the 1C6 mouse thymic epithelial cell line was performed by infection with lentivirus-expressing shRNAs. shRNAs were cloned into the lentivirus vector pLKO with an expression driven by the ubiquitously active U6 promoter, and were provided by the RNAi Consortium of the Broad Institute, as lentiviral particles at ~107 VP/mL. For each candidate, we tested an average of 5 specific shRNAs among those with the highest knockdown efficiency as measured by the RNAi platform of the Broad Institute.

HEK293 or 1C6 cells were seeded at a density of 250,000 or 650,000 per well in 6-well plates. The next day, cells were supplemented with 8 mg/ml polybrene and infected with 20 µL of shRNA-bearing lentiviruses. Each shRNA was tested in duplicate. The day after, the medium was changed to a fresh one containing 2 µg/ml puromycin. Cells were maintained in selective medium during 6 days and harvested for RNA extraction using TRIzol (ThermoFischer). Knockdown efficiency was analyzed by real-time PCR – carried out in technical triplicate – in comparing the level of expression of each candidate in the knockdown vs. control samples using the human or murine GAPDH gene for normalization with primers listed in Supplementary file 2. First-strand cDNA was synthesized using SuperScript II Reverse Transcriptase (ThermoFischer) and oligo(dT)12–18 primers. cDNA was used for subsequent PCR amplification using the 7300 Real-Time PCR system (Applied Biosystems) and SYBR Green Select Master Mix (ThermoFisher). Knockdown efficiency of each specific shRNA was summarized in Supplementary file 1. We used for subsequent analyses the extracted RNA corresponding to the three shRNAs yielding the highest reduction of their target mRNA (>60%).
Lentigenic mouse generationRequest a detailed protocol

Three shRNAs against Clp1 were cloned into a cluster of micro RNAs construct, the two most efficient shRNAs in two copies and the third in a single copy (Supplementary file 1). This construct was transferred to a lentiviral backbone, downstream of the doxycycline inducible promoter TRE3G and upstream of the ZsGreen protein. A second construct expressing the TetOn3G transactivator under the control of the EF1 promoter was generated. A single ultra-high purified and concentrated lentivector (2.2 × 109 TU/mL) containing both constructs was generated and purified by both Tangential Flow Filtration and Chromatography. Cloning and lentiviral production were performed by Vectalys (www.vectalys.com). Fertilized oocytes (B6) were microinjected under the zona pellucida with the lentivirus suspension as described (Giraud et al., 2014). A pool of 33 transduced oocytes were reimplanted into five pseudopregnant females. Newborns were selected for integration of both constructs by PCR with primers matching the ZsGreen or TetOn3G sequences (Supplementary file 2). At 3 weeks of age, mice were treated with doxycycline food pellets (2 g/kg) for two weeks and then sacrificed for mTEChi isolation. Mice expressing GFP in >10% of mTEChi were selected for GFP+ and GFP- mTEChi RNA-seq.
RNA-seq and d3’UTR ratiosRequest a detailed protocol


Total RNA was isolated by TRIzol extraction (ThermFisher). Only high-quality RNA (with RIN values around or over 8) was used to generate polyA-selected transcriptome libraries with the TruSeq RNA Library Prep Kit v2, following the manufacturer’s protocol (Illumina). Sequencing was performed using the Illumina HiSeq 1000 machine and was paired-end (2 × 100 bp) for mTEChi and Aire-KO mTEChi isolated from pooled thymi (deposited in the GEO database as GSE140683), and for mTEChi and mTEClo isolated from the same mice (deposited in the GEO GSE140815). Sequencing was single-end (50 bp) for transfected and knockdown HEK293 cells (deposited in the GEO database as GSE140738 and GSE140993), actinomycin D-treated mTEChi (as GSE140815), and mTEChi isolated from Clp1 lentigenic mice (as GSE140878). RNA libraries from thymic cells isolated from lentigenic mice or actinomycin D-treated mTEChi were constructed with the Smarter Ultra Low Input RNA kit (Clontech) combined to the Nextera library preparation kit (Illumina). Paired-end (100 bp) datasets were homogenized to single-end (50 bp) data by read-trimming and concatenation. Lower quality reads tagged by the Illumina’s CASAVA 1.8 pipeline were filtered out and mapped to the mouse or human reference genome (mm9 or hg19) using the Bowtie aligner (Langmead et al., 2009). For the multi-tissue comparison analysis, duplicate reads were removed with the Samtools rmdup function (1000 Genome Project Data Processing Subgroup et al., 2009). For read counting, we used the intersectBed and coverageBed programs of the BEDtool distribution (Quinlan and Hall, 2010) with the -f one option. It enabled the count of reads strictly contained in each exon of the Refseq genes whose annotation GTF file was obtained from the UCSC Table Browser, in choosing mm9 mouse (or hg19 human) genome/Genes and Gene Predictions/RefSeq Genes/refGene. GTF files for mice and humans are provided in Figure 1—source data 2. Differential fold-change expression between two datasets was computed using DESeq (Anders and Huber, 2010) and gene expression was quantified in each sample as reads per kilobase per million mapped reads (RPKM).

For d3’UTR ratio calculation, we counted the reads as above but with a GTF file that we generated as follows: First, from the UCSC Table Browser, we chose mm9 mouse (or hg19 human) genome/Genes and Gene Predictions/RefSeq Genes/refGene, and we obtained one file with coding exon features only and another file with 5' and 3'UTR features. We combined these two files into a single one and split the 3'UTR annotations into distal 3'UTR (d3'UTR) and proximal 3'UTR annotations at the sites of alternative pAs that we identified in parsing the PolyA_DB 2 database (http://exon.umdnj.edu/polya_db/v2/). A proximal pA was validated when its genomic location from the PolyA_DB 2 database differs from 20 bp at least to the genomic location of the UCSC annotated 3’UTR distal boundary or distal pA. In case of multiple proximal pAs, the most proximal one was considered. Then, we fused into a single feature the proximal 3'UTR annotations with the last coding exons (exon CDS) and formatted the file to fit a GTF format. d3’UTR annotation and GTF files for mice and humans are provided in Figure 1—source data 1. For each Refseq gene corresponding to an independent NM_Refseq ID, we divided the RPKM expression of the d3'UTR region by the expression of the upstream region of the last 3’ exon to generate the d3'UTR ratio used as a measure of pA usage. Note that using two contiguous regions in the last 3’ exons of the genes for d3’UTR ratio calculation, reduces the risk of a potential bias that could arise from a non-uniform RNA-seq coverage between the 3’ ends and their upstream regions. The R-script used for d3’UTR ratio calculation in WT and Aire-KO mTEChi is provided in Figure 1—source data 3.
Multi-tissue comparison analysisRequest a detailed protocol


First, an RNA-seq database of mouse tissues was assembled in collecting 22 RNA-seq datasets generated from polyA-selected RNA and Illumina sequencing. The raw read sequences were obtained from the GEO database: GSE36026 for bone marrow, bone marrow derived macrophage (BMDM), brown adipocytes tissue (BAT), cerebellum, cortex, heart, intestine, kidney, liver, lung, olfactory, placenta, spleen, testes, mouse embryonic fibroblast (MEF), mouse embryonic stem cells (mESC), E14.5 brain, E14.5 heart, E14.5 limb and E14.5 liver; GSM871703 for E14.5 telencephalon; GSM879225 for ventral tegmental area VTA. Reads of the VTA dataset, over 50 bp in length, were trimmed for homogeneous comparison with our RNA-seq data. We processed each collected dataset for gene expression profiling and d3’UTR ratio computing as above. To avoid center-to-center biases, we removed from the sequence assemblies the duplicate reads that could arise from PCR amplification errors during library construction. For multi-sample comparison analysis, our mTEChi datasets were also subjected to removal of duplicate reads.

Next, the tissue-specificity (one tissue of restricted expression) or selectivity (two-to-four tissues of restricted expression) of the Aire-sensitive genes was characterized by using the specificity measurement (SPM) and the contribution measurement (CTM) methods (Pan et al., 2013). For each gene, the SPM and the CTM values were dependent on its level of expression in each tissue. If no read was detected in a tissue, the latter was excluded from the comparison. For tissues of similar type, only the one showing the highest level of gene expression was included in the comparison. Cerebellum, cortex, E14.5 brain, E14.5 telencephalon and VTA referred to a group, as well as did E14.5 heart and heart, or E14.5 liver and liver. If the SPM value of a gene for a tissue is >0.9, then the gene is considered tissue-specific for this particular tissue. Otherwise, if the SPM values of a gene for two to four tissues were >0.3 and its CTM value for the corresponding tissues was >0.9, then the gene was considered tissue-selective for these tissues. If these conditions were not met, the gene was left unassigned.

Finally, for the analysis of 3’UTR length variation of Aire-sensitive PTA genes between mTEChi and their tissues of expression, peripheral d3’UTR ratios of tissue-specific genes were selected in their unique identified tissues of expression. For tissue-selective genes, the d3’UTR ratios for peripheral tissues were selected randomly among their tagged tissues of expression.
CLIP-seq analysisRequest a detailed protocol

The location of the 3’ end processing complex on the transcripts of Aire target genes was tracked by measuring the density of reads that map to these transcripts in PAR-CLIP sequencing data of RNAs bound to the endogenous CSTF2 from the GEO database (GSM917676). We processed these assemblies of RNA-mapped reads (as wig files) using the sitepro program (CEAS distribution) (Shin et al., 2009) to infer the read density at the vicinity of proximal and distal pAs of transcripts of Aire target genes in HEK293 cells. The genomic location of these proximal and distal pAs extracted from the PolyA_DB 2 database (Figure 2—source data 1) was loaded into sitepro as bed files of Aire-sensitive and neutral genes identified from RNA-seq differential expression of Aire versus control-transfected HEK293 cells.
Microarray gene expression profilingRequest a detailed protocol

Total RNA was prepared from harvested HEK293 cells knockdown for CLP1, HNRNPL, DDX5, DDX17, PARP1, PRKDC, SUPT16H, PABPC1, CPSF6 and the control LacZ gene using TRIzol (ThermoFisher). Single-stranded DNA in the sense orientation was synthesized from total RNA with random priming using the GeneChip WT Amplification kit (Affymetrix). The DNA was subsequently purified, fragmented, and terminally labeled using the GeneChip WT Terminal Labeling kit (Affymetrix) incorporating biotinylated ribonucleotides into the DNA. The labeled DNA was then hybridized to Human Gene ST1.0 microarrays (Affymetrix), washed, stained, and scanned. Raw probe-level data (.CEL files deposited in the GEO database as GSE141118) was normalized by the robust multiarray average (RMA) algorithm and summarized using the R-package aroma.affymetrix (www.aroma-project.org/) to obtain probeset expression values corresponding to the genes on the array, as described in (Figure 3—source data 1).
Individual probe-level microarray analysesRequest a detailed protocol

For each HEK293 sample of a microarray comparison, an expression file of individual probes of each gene on the array was generated using aroma.affymetrix, as described in (Figure 3—source data 1). Probes with expression values over the background in each sample, had their hg19 genomic location retrieved from the Affymetrix individual probe dataset (NetAffx Web site), and were mapped to the d3’UTR annotation file generated as described in the ‘RNA-seq and d3’UTR ratios’ section. The d3’UTRs ratios were then computed for the genes having at least two individual probes in their d3’UTR regions by dividing the specific d3’UTR expression by the whole-transcript expression. We performed the individual probe d3’UTR mapping and d3’UTR ratio calculation using an adaptation of our R-implementation of PLATA (Giraud et al., 2012), as described in Figure 3—source data 2. We then tested between two HEK293 samples the proportion of genes with a significant increase or decrease of d3’UTR ratios to the proportion of those whose variation is not significant (Figure 3—source data 3).
miRNA potential target gene identificationRequest a detailed protocol

We used the TargetScan 6.2 database (http://www.targetscan.org/vert_61/) to generate a list of miRNAs or miRNA families and their potential target genes harboring miRNA conserved sites in 3'UTRs (Friedman et al., 2009). We further selected from these potential target genes those for which the miRNA conserved sites are specifically located within the cleavable d3’UTRs. The miRNAs and their potential target genes with miRNA conserved sites within their 3’UTRs or d3’UTRs are provided in Figure 5—source data 1.
Gene set enrichment analysis (GSEA)Request a detailed protocol


GSEA was first used to test the effect of candidate gene knockdown on the expression of Aire-sensitive genes. The overlap between the transcripts impacted by Aire in HEK293 cells and those impacted by the knockdown of CLP1, HNRNPL, DDX5, DDX17, PARP1 was tested by the GSEA software (Subramanian et al., 2005) (Broad Institute). The Aire-sensitive genes were identified from RNA-seq data of control and Aire-transfected HEK293 cells. For this analysis, the top 30% of genes the most sensitive to Aire were considered.

GSEA was also used to test the potential involvement of miRNAs or miRNA families in the regulation of sets of genes in mTEChi versus mTEClo. To this end, we performed RNA-seq profiling of mTEChi and mTEClo that were isolated from the same pool of 4 thymi of B6 mice. We ran GSEA on the DESeq-normalized read counts of the mTEChi versus mTEClo comparison using the list of miRNAs and of their potential target genes harboring miRNA conserved sites in 3'UTRs as generated through parsing the TargetScan 6.2 database (above). A ‘classic’ scoring scheme was used for the GSEA analysis. Gene sets of 15 to 5000 genes were considered for the analysis. Null read count values were removed. To determine the genes that contribute the most to the observed miRNA-specific gene set reduction in mTEChi, we performed a leading edge analysis in including all miRNA-specific gene sets with FDR q-values <0 .05="" a="" href="https://elifesciences.org/articles/52985/figures#fig5sdata1">Figure 5—source data 1). We found that these leading edge genes were potentially targeted by a number of different miRNAs, whose median was 4. We selected for representation the leading edge genes targeted by more than four different miRNAs (>=5).
Statistical analysisRequest a detailed protocol

Determination of the statistical significance differences between two experimental groups was done by the non-parametric Wilcoxon test, unless specified. Since P-values are sensitive to sample size and tend to decrease as n increases, for each comparison, we matched the size of the control sample (when larger than the test sample) to the one of the test sample. For instance, for the Aire-sensitive versus neutral gene comparisons, we selected a number of Aire-neutral genes with the closest-to-1 expression foldchange, that matches the size of the tested Aire-sensitive sample.
References


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Decision letter



Acceptance summary:

In this manuscript, Giraud and colleagues address the detailed mechanism via which medullary thymic epithelial cells (mTECs) express a wide range of otherwise tissue restricted genes. This process is required to ensure that only a self-tolerant T cell repertoire leaves the thymus to populate the peripheral immune system, and is largely regulated by the Autoimmune Regulator, AIRE. The authors find that AIRE-sensitive genes preferentially exhibit short-3'UTR transcript isoforms in mTECs. They then demonstrate that CLP1, part of the large multi-subunit 3' end processing complex, promotes 3'UTR shortening associated with higher transcript stability and expression of AIRE-sensitive genes, revealing a previously uncovered post-transcriptional level of control of AIRE activated expression in mTECs.

Decision letter after peer review:

Thank you for submitting your article "Aire-dependent genes undergo Clp1-mediated 3'UTR shortening associated with higher transcript stability in the thymus" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Satyajit Rath as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Overall, the reviewers found this manuscript interesting and felt that it has potential to be a significant contribution to the field. However, they identified a number of substantive points that must be satisfactorily addressed before the manuscript can be considered for publication in eLife. In particular, the reviewers felt that the current evidence presented is not yet strong enough to support the author's proposed mechanism. Concerns were raised that the evidence is quite descriptive that these 3'UTR shortening events are associated with transcript stability. Additionally, concerns were raised about bioinformatics methods throughout the manuscript. Finally, some parts of the manuscript need rewriting to make the discussion of existing literature more comprehensive.

Essential revisions:

Taken together, the major points raised by the reviewers can be broken into three areas – required further biological insight; clarification/improvement of bioinformatics analyses; and textual changes, as follows:

Biological insight: A number of points were raised relating to the need for further evidence to support the authors' main conclusions.

Two reviewers raised issues relating to whether or not Clp1 3'UTR shortening is a mechanism related to Aire's function or may likewise affect Aire independent genes (See title of the paper). This important point should be fully addressed in a revised manuscript, including via answers to the specific points raised below:

– To support the claim that 3'UTR shortening is due to Clp1, the authors are asked to address in more detail whether the Aire-sensitive genes influenced by Aire-KO and Clp1-KD are mostly shared or with the similar pattern around pA sites. If not, why?

– The data presented all point towards use of proximal pA sites through the action of Clp1 being an independent layer of control of TRA expression (in the sense of being 'upstream' of Aire's function rather than being a consequence of Aire's action). mTECs are thought to express 'many' TRAs in an Aire independent manner (or do the authors feel otherwise?). Are these also affected by knockdown of Clp1? In sum, the question arises whether 'Aire dependent genes undergo Clp1-mediated 3'UTR shortening….' (as the title insinuates) or whether 'Weakly expressed genes undergo Clp1-mediated 3'UTR shortening…'

Two reviewers also asked for further data related to the role of Clp1 in mTECs. Again, these points should be addressed in a revised manuscript.

– Please provide additional data on how Clp1-driven 3'UTR shortening can lead to high transcript stability. Are any specific microRNAs or RNA-binding proteins involved during this process?

– Unfortunately, the authors only address the effect of Clp1 knockdown in the absence of Aire in HEK293 cells or in the presence of Aire in mTECs. It would be extremely interesting to relate this to the effect-size of Aire in HEK293 cells and in mTECs (e.g. how would Figure 3D, right, look like for Aire-expressing HEK293 cells in which Clp1 has been knocked-down). Alternatively (but certainly asking too much), it would have been very informative to see how Figure 4B, right, would look like for Aire-KO mTECs).

Bioinformatics: The reviewers raised a number of concerns and questions regarding bioinformatics methods and analyses, which should be fully addressed in a revised manuscript.

1) Please provide more detail on how the d3'UTR ratio, used to measure 3'UTR usage, was calculated. Did the method used consider non-uniform read bias? Whether d3'UTR ratio is based on transcript or genes? Also, please explain why d3'UTR ratio has a maximum value of 2 based on Figure 1C, E, Figure 2A etc?

2) Equal number Aire-neutral datasets were used as the control throughout the manuscript. The authors are asked to address why this small arbitrarily selected dataset can represent the global pattern of Aire-neutral genes. They should describe how the Aire-neutral genes were selected, and whether these are equally low in their basal expression levels (see below).

3) Aire-sensitive genes are generally expressed at very low levels as compared to most other genes (= Aire-insensitive/neutral genes) (see also Figure 1B). Please comment on whether or not in the case of 'low abundance' mRNAs, there might be a general technical bias towards detection of more 5' reads even among reads that are all upstream of the proximal pA site (which would result in an artificially low d3'UTR ratio). In this context, are the selected Aire neutral genes on average expressed at a similarly low level (see for instance Figure 4B, right: here, the median expression (RPKM) of Aire-neutral genes is substantially higher than that of Aire-sensitive genes; see also Figure 3D)?

4) Related to Figure 3A, the authors are asked to explain why some known APA factors did not change d3'UTR ratio.

5) For the multi-tissues analysis from various sources, did the authors consider batch effect and other technical bias? Specifically, related to Figure 1G, an important question is whether a gene that shows a short 3'UTR preference in mTECs also does so in the peripheral tissue. Is that what Figure 1G is supposed to address? It is very difficult to understand. Would it be feasible to plot for all the individual 574 Aire-sensitive TRAs from Figure 1D their d3'UTR ratio in mTEChi vs. their d3'UTR ratio in the respective peripheral tissue to ask in how far the distribution of datapoints deviates from a diagonal? Possibly, this might be a more intuitive way of showing whether or not the usage of a proximal 3' UTR differs between mTEChi and a respective peripheral tissue.

6) Please ensure that all of the bioinformatics methods and tools used are appropriately referenced, as this is currently not the case.

Textual changes:

Concerns were raised that the existing literature is only incompletely discussed, particularly in the Introduction. Please include further discussion of the current literature around post-transcriptional control mechanisms in mTECs involving changes in the mRNA 3´UTRs, and fully cite relevant previous references.

Also, subsection “Aire-sensitive genes show a preference for short-3’UTR transcript isoforms in mTEChi and in some peripheral tissues”, last paragraph, first sentence: This somewhat contrived sentence may deserve attention.

https://doi.org/10.7554/eLife.52985.sa1
Author response




Essential revisions:

Taken together, the major points raised by the reviewers can be broken into three areas – required further biological insight; clarification/improvement of bioinformatics analyses; and textual changes, as follows:

Biological insight: A number of points were raised relating to the need for further evidence to support the authors' main conclusions.

Two reviewers raised issues relating to whether or not Clp1 3'UTR shortening is a mechanism related to Aire's function or may likewise affect Aire independent genes (See title of the paper). This important point should be fully addressed in a revised manuscript, including via answers to the specific points raised below:

– To support the claim that 3'UTR shortening is due to Clp1, the authors are asked to address in more detail whether the Aire-sensitive genes influenced by Aire-KO and Clp1-KD are mostly shared or with the similar pattern around pA sites. If not, why?

We thank the reviewers for highlighting this important question about the relationship between the Aire-sensitive genes and the pA effects of Clp1 KD.

* First, we would like to reiterate that in our paper, we conclude that Clp1 promotes/ contributes to 3'UTR shortening at Aire-sensitive genes but do not rule out contributions by other factors. Figure 3D Left and Figure 4B show that the levels of d3'UTR ratios of Aire-sensitive genes in Clp1 KD samples do not reach those of Aire-neutral genes in the Ctr samples. We noted in the text that the data suggested that Clp1 can account for most, but perhaps not all, of the 3’UTR shortening effect, such that other additional factors may also contribute to the increased proportion of short-3'UTRs of Aire-sensitive genes (Results sections “CLP1 promotes 3’UTR shortening and higher expression at Aire-sensitive genes in HEK293 cells” and “Clp1 promotes 3’UTR shortening and higher levels of Aire-upregulated transcripts in mTEChi”).

* Regarding the specificity of the pA effect of Clp1 at Aire-sensitive genes, we assessed the magnitude of the difference in 3’UTR shortening for the Aire-sensitive versus Aire-neutral genes upon Clp1 KD. We determined the proportion of genes that exceeded a high threshold of short-3’UTR transcript isoform enrichment upon Clp1 KD, namely d3’UTR ratios of < 0.5 in Ctr vs. Clp1 KD mTEChi. For the Aire-sensitive genes, 181 out of 528 genes (34%) exhibited this large shift to the shorter 3’UTR, whereas for the Air-neutral genes, a significant smaller proportion, 1206 out of 6076 genes (20%) exhibited this degree of shift to the shorter 3’UTR form (Chisq test: P=9.0*10-15). Please see new Figure 4—figure supplement 2 and Results section “Clp1 promotes 3’UTR shortening and higher levels of Aire-upregulated transcripts in mTEChi”.

We also observed an enhanced proportion of genes subject to CLP1-mediated 3'UTR shortening among Aire-sensitive versus Aire-neutral genes in HEK293 cells (for CLP1-sh2: 33% in Aire-sensitive genes versus 9% in Aire-neutral genes, Chisq test: P=9.7*10-23; for CLP1-sh3: 28% versus 10%, Chisq test: P=5.7*10-13). Please see new Figure 3—figure supplement 3 and Results section “Clp1 promotes 3’UTR shortening and higher levels of Aire-upregulated transcripts in mTEChi”.

Thus, these results strongly support a highly preferential, but not necessarily exclusive 3'UTR shortening effect of Clp1 at Aire-sensitive genes versus other genes. We therefore modified in the Discussion the sentences that may have implied that the effect of Clp1 was strictly specific to Aire-sensitive genes.

* We also now highlight a correlation between Clp1-mediated 3'UTR shortening and increased expression at Aire-sensitive genes. Namely, in WT mTEChi the Aire-sensitive genes responsive to the 3'UTR shortening effect of Clp1 exhibited higher levels of expression than the Aire-sensitive genes neutral to Clp1’s effect. Please see new Figure 4—figure supplement 3 and Results section “Clp1 promotes 3’UTR shortening and higher levels of Aire-upregulated transcripts in mTEChi”.


– The data presented all point towards use of proximal pA sites through the action of Clp1 being an independent layer of control of TRA expression (in the sense of being 'upstream' of Aire's function rather than being a consequence of Aire's action). mTECs are thought to express 'many' TRAs in an Aire independent manner (or do the authors feel otherwise?). Are these also affected by knockdown of Clp1? In sum, the question arises whether 'Aire dependent genes undergo Clp1-mediated 3'UTR shortening….' (as the title insinuates) or whether 'Weakly expressed genes undergo Clp1-mediated 3'UTR shortening…'

We thank the reviewers for calling our attention to this point. Indeed, since Aire-sensitive genes are mainly composed of TRA genes that exhibit similar low levels of expression than Aire-independent TRA genes, we acknowledge that it is important to determine whether the Clp1-mediated 3’UTR shortening was preferentially affecting the Aire-sensitive genes or the TRA genes.

To address this point, we selected, within the whole set of Aire-neutral genes, the genes that have been unambiguously identified as either TRA genes and non-TRA genes (Sansom et al., 2014) and investigated the effect of Clp1 KD on the d3'UTR ratios of these two gene sets. We found only small statistically insignificant increases of d3'UTR ratios for either the TRA or non-TRA Aire-neutral genes following Clp1 KD, indicating that the preference of genes for Clp1’s effect is more aligned with Aire-sensitivity than whether the gene is a TRA gene. Please see revised Figure 4B and Results section “Clp1 promotes 3’UTR shortening and higher levels of Aire-upregulated transcripts in mTEChi”.


Two reviewers also asked for further data related to the role of Clp1 in mTECs. Again, these points should be addressed in a revised manuscript.

– Please provide additional data on how Clp1-driven 3'UTR shortening can lead to high transcript stability. Are any specific microRNAs or RNA-binding proteins involved during this process?

Short-3’UTR transcripts, in comparison to their longer counterparts, lack miRNA-binding sites and AU-rich elements, which are specifically targeted by miRNAs and RNA-binding proteins (RBPs) respectively. While miRNA targeting is known to destabilize transcripts and decrease their translation, RBP targeting can either trigger repression or activation signals depending on the type of cis elements that they recognize. Hence, transcript stability is controlled by the net effects of miRNAs and RBPs. In our study, we observe a clear association of Clp1-driven 3'UTR shortening with higher transcript stability in mTEChi, strongly suggesting that the 3'UTR-shortened transcripts escape the post-transcriptional repression mediated by potent miRNAs and/or destabilizing RBPs in mTEChi.

mTEChi correspond to a stage of TEC development that directly arises from immature mTEClo. Taking mTEClo as a comparator, we assessed the level of miRNA-mediated post-transcriptional repression in mTEChi versus mTEClo precursor cells by assessing the differences in expression of sets of genes identified as potentially targeted by miRNAs, in mTEChi versus mTEClo. To this end, we performed a Gene Set Enrichment Analysis (GSEA) between the two mTEC populations for each set of genes potentially targeted by a specific miRNA or miRNA family (as defined by the TargetScan database, new Figure 5—source data 1). We found 346 miRNA-specific target gene sets that had significantly reduced expression in mTEChi versus mTEClo. In sharp contrast, we did not detect any of the miRNA-specific target gene sets that had reduced expression in mTEClo versus mTEChi. This suggests that miRNAs are an important contributor to transcript destabilization in mTEChi and that this effect is similar or less for the same miRNA target sets in mTEClo cells. A GSEA leading edge analysis of the 346 miRNA-specific gene sets revealed 11259 target genes that accounted for the gene-set reduction in mTEChi, including 5532 genes targeted by a least 5 different miRNAs. Please see new Figure 5—figure supplement 1, new Figure 5—source data 1, Results section “Clp1-driven 3’UTR shortening of Aire-upregulated transcripts is associated with higher stability in mTEChi” and Materials and methods sections “miRNA potential target gene identification” and “Gene set enrichment analysis (GSEA)”. We updated our GEO depository GSE140815 with RNA-seq data of mTEChi and mTEClo isolated from the same pool of thymi (Materials and methods section “RNA-seq and d3’UTR ratios”).

To determine whether the Clp1-promoted 3'UTR shortening at Aire-sensitive genes can lead to escape from post-transcriptional repression mediated by the miRNAs associated with these 346 target-gene sets, we calculated, among the Aire-sensitive genes subject to Clp1-promoted shortening, the proportion of those that are potentially targeted by one of these miRNAs at a conserved site in the d3’UTR of their longer-transcript isoforms (new Figure 5—source data 1). We found a large proportion of these cases (46%) had one or more such conserved miRNA target sites, providing a specific miRNA de-repression mechanism due to 3’UTR shortening for a large fraction of the Aire-sensitive genes. This analysis is only able to identify miRNAs that are highly conserved and that have disproportionate effect in mTEChi versus mTEClo, so it is expected to miss many of the cases of potential miRNA regulation that may be relieved by 3’UTR shortening in mTEChi. Please see new Figure 5—source data 1 and Results section “Clp1-driven 3’UTR shortening of Aire-upregulated transcripts is associated with higher stability in mTEChi”.

These results strongly suggest that the Aire-induced transcripts that undergo Clp1-driven 3’UTR shortening in mTEChi escape the post-transcriptional repression mediated by a variety of miRNA species, resulting in higher stability of these transcripts in comparison to the longer 3’UTR transcripts.


– Unfortunately, the authors only address the effect of Clp1 knockdown in the absence of Aire in HEK293 cells or in the presence of Aire in mTECs. It would be extremely interesting to relate this to the effect-size of Aire in HEK293 cells and in mTECs (e.g. how would Figure 3D, right, look like for Aire-expressing HEK293 cells in which Clp1 has been knocked-down). Alternatively (but certainly asking too much), it would have been very informative to see how Figure 4B, right, would look like for Aire-KO mTECs).

We thank the reviewers for highlighting this question. To address it, we generated new RNA-seq data, as described below, that we deposited in the GEO database as an update of our depository GSE140738.

We infected HEK293 cells with CLP1-sh2, sh3 or the Ctr shRNA, and for each, performed transfection with the Aire expression plasmid. After qPCR validation of CLP1 knockdown (>50% KD), we generated RNA-seq data for each of these samples and analyzed the level of expression of the Aire-sensitive genes. These experiments revealed a significant effect of CLP1 knockdown, for both CLP1 shRNAs, on the levels of Aire-sensitive gene expression in the Aire-transfected HEK293 cells, similar in magnitude, and apparently slightly greater than in AIRE-negative HEK293 cells. Please see new Figure 3—figure supplement 4 (in comparison to Figure 3D Right) and Results section “CLP1 promotes 3’UTR shortening and higher expression at Aire-sensitive genes in HEK293 cells”.


Bioinformatics: The reviewers raised a number of concerns and questions regarding bioinformatics methods and analyses, which should be fully addressed in a revised manuscript.

1) Please provide more detail on how the d3'UTR ratio, used to measure 3'UTR usage, was calculated. Did the method used consider non-uniform read bias? Whether d3'UTR ratio is based on transcript or genes? Also, please explain why d3'UTR ratio has a maximum value of 2 based on Figure 1C, E, Figure 2A etc?

* Non-uniform read coverage can be an obstacle to the comparison of read density between different parts of the transcripts, in particular between the 3' ends and the upstream regions. One source of 3’ biased increased read coverage is mRNA degradation associated with mRNA selection through polyA tail targeting. In order to reduce mRNA degradation as much as possible, we used TRIzol for all RNA extractions, notably TRIzol LS for sorted cells. Importantly, we sequenced only high-quality RNA (with RIN values around or over 8). Please see Materials and methods section “RNA-seq and d3’UTR ratios”.

Another source of 3' biased increased read coverage is the retro-transcription during the preparation of the RNA-seq library if oligo-dTs are used. In this project all RNA-seq libraries were generated using the Truseq (Illumina) or the Smarter (Clontech)/ Nextera (Illumina) chemistry, both including retro-transcription with random primers, therefore avoiding additional bias in 3' read coverage.

To lower the impact of non-uniform RNA-seq coverage between 3' ends and their upstream regions on the calculation of d3'UTR ratios, we normalized the read density of the cleavable d3’UTR region of each gene with a potential pA, by the read density of its constitutive contiguous upstream region in its last 3’ exon, so that the two considered regions used for d3’UTR ratio calculation are in close proximity within 3’ ends of the genes. Please see Materials and methods section “RNA-seq and d3’UTR ratios”.

* We provide more detail about how we calculated distal 3’UTR ratio here and in the revised text. After the mapping of the RNA-seq reads to the mm9 genome using Bowtie, we counted the reads that lie within the d3'UTR regions and within their contiguous upstream regions of the last 3’ exons using intersectBed, coverageBed and a GTF file with d3’UTR annotations that we generated as follows:

First, from the UCSC Table Browser (https://genome.ucsc.edu/cgi-bin/hgTables), we chose mm9 mouse genome/ Genes and Gene Predictions/ RefSeq Genes/ refGene, and we obtained one file with coding exon features only and another file with 5' and 3'UTR features. We combined these two files into a single one and split the 3'UTR annotations into distal 3'UTR (d3'UTR) and proximal 3'UTR annotations at the sites of alternative pAs that we identified in parsing the PolyA_DB 2 database.

Then, we fused into a single feature the proximal 3'UTR annotation with the last coding exon (exon CDS) and formatted the file to fit a GTF format (revised Figure 1—source data 1). For each Refseq gene corresponding to an independent NM_Refseq ID, we divided the RPKM expression of the d3'UTR region by the expression of the upstream region of the last 3’ exon to generate the d3'UTR ratio used as a measure of pA usage. Please see revised Figure 1—source data 1 and Materials and methods section “RNA-seq and d3’UTR ratios”.

* To identify the genes differentially expressed between two conditions, we counted the reads that map to all exons of the genes, normalized the counts by using DESeq and computed the expression foldchange. We used, for read counting, a pre-formatted GTF file obtained from the UCSC Table Browser, in choosing mm9 mouse genome/ Genes and Gene Predictions/ RefSeq Genes/ refGene. Please see new Figure 1—source data 2 and Materials and methods section “RNA-seq and d3’UTR ratios”.

* d3'UTR ratios mostly range from 0 to 2 with a small number of genes with ratios reaching up to 10 – 30. These out of scale values possibly result from hampered sequencing of constitutive 3’ regions upstream of alternative pAs due to local high GC content, or from increased transcription within cleavable distal 3'UTR regions due to the presence of cryptic transcriptional start sites. For clearer representation in Figures 1C, E, Figure 2A, Figure 1—figure supplement 1A, B, we cropped out the negative density values – d3'UTR ratios are positive – and the density values above 2 corresponding to a minority of events. However, to provide a complete picture of the d3'UTR ratio distribution, as well as the most accurate relationship between d3’UTR ratios and levels of expression, we have generated scatterplots including all d3'UTR ratios. Please see new Figure 1—figure supplement 2 and Results section “Aire-sensitive genes show a preference for short-3’UTR transcript isoforms in mTEChi”.


2) Equal number Aire-neutral datasets were used as the control throughout the manuscript. The authors are asked to address why this small arbitrarily selected dataset can represent the global pattern of Aire-neutral genes. They should describe how the Aire-neutral genes were selected, and whether these are equally low in their basal expression levels (see below).

* P-values, including those obtained from the Wilcoxon test, are sensitive to sample size and tend to decrease as n increases. The Aire-neutral genes, with an expression foldchange between 0.5 and 2, correspond to thousands of genes, i.e., a much higher number than the Aire-sensitive genes (Figure 1B). For each comparison, to prevent a biased reduction of the p-values due to the very large size of the Aire-neutral sample, we matched its size to the one of the tested Aire-sensitive sample, in selecting the Aire-neutral genes with the closest-to-1 expression foldchange. For instance, in the WT mTEChi comparison of d3’UTR ratios (Figure 1C), we selected the Aire-neutral genes with an expression FC between 1/1.06 and 1.06 to match the size of the Aire-sensitive sample (n=947). In this selection, the genes "less neutral" to Aire's action, i.e., with expression FC away from 1 and closer to 0.5 or 2 were discarded. In addition, performing the d3’UTR ratio comparison using two different Aire-neutral 947 gene subsamples – one corresponding to expression FC between 1.06 and 1.3 and the other to expression FC between 1/1.18 and 1/1.06 – also led to high statistically significant d3’UTR ratio reduction of Aire-sensitive genes (P=8.3*10-12 and 8.1*10-9, respectively). Please see Materials and methods section “Statistical analysis”.

* In the Aire-sensitive/neutral gene comparisons, the Aire-neutral genes were not expression-matched to the Aire-sensitive genes. To fully address the reviewers’ concern about basal expression levels, we have taken into account the level of gene expression in performing a local regression (loess) of the d3'UTR ratios across expression levels of Aire-sensitive genes and the whole set of Aire-neutral genes in WT and Aire-KO mTEChi. While the d3’UTR ratios vary dramatically across genes, at all expression levels, the loess-fitted curve of the Aire-sensitive genes is significantly lower than the one of the Aire-neutral genes, therefore revealing, in WT and Aire-KO mTEChi, a preference of Aire-sensitive genes for smaller d3’UTR ratios that is independent of the levels of gene expression. Please see new Figure 1—figure supplement 2 and Results section “Aire-sensitive genes show a preference for short-3’UTR transcript isoforms in mTEChi”.


3) Aire-sensitive genes are generally expressed at very low levels as compared to most other genes (= Aire-insensitive/neutral genes) (see also Figure 1B). Please comment on whether or not in the case of 'low abundance' mRNAs, there might be a general technical bias towards detection of more 5' reads even among reads that are all upstream of the proximal pA site (which would result in an artificially low d3'UTR ratio). In this context, are the selected Aire neutral genes on average expressed at a similarly low level (see for instance Figure 4B, right: here, the median expression (RPKM) of Aire-neutral genes is substantially higher than that of Aire-sensitive genes; see also Figure 3D)?

In the Aire-sensitive/neutral gene comparisons, the Aire-neutral gene subsamples used as controls have the same average expression than the whole set of Aire-neutral genes. The Aire-sensitive genes are expressed at much lower levels than the Aire-neutral gene subsamples. The local regression of d3’UTR ratios against the levels of expression of Aire-neutral or Aire-sensitive genes (previous point and new Figure 1—figure supplement 2) showed that the low expressed genes exhibit, on average, higher d3’UTR ratios than the highly expressed genes, therefore showing that the smaller d3’UTR ratios associated with Aire-sensitive genes do not result from a biological or technical general bias due to their low expression.


4) Related to Figure 3A, the authors are asked to explain why some known APA factors did not change d3'UTR ratio.

The 3’ end processing complex is composed of ~35 core 3’ processing factors, including Clp1, that are organized into four specialized RNA binding sub-complexes, as well as ~50 accessory proteins that interact with the core factors and mediate crosstalk between 3’ processing and other cellular processes (Shi et al., 2009). The effect of a number of core factors on 3’UTR length variation has been evaluated in various cellular systems, showing that many of them have no effect on 3’UTR lengthening or shortening (Li et al., 2015). The absence of effect could indicate that these factors do not contribute to the activity of 3’ end processing sub-complexes or that other factors, perhaps with redundant function, can compensate for their loss. For instance, the lack of 3’UTR length variation following the knockdown of Cstf2, a member of the Cstf sub-complex, has been proposed to result from the redundant function of Cstf2T and Cstf2 (Martin et al., 2012, Li et al., 2015). Please see Results section “CLP1 promotes 3’UTR shortening and higher expression at Aire-sensitive genes in HEK293 cells”.


5) For the multi-tissues analysis from various sources, did the authors consider batch effect and other technical bias? Specifically, related to Figure 1G, an important question is whether a gene that shows a short 3'UTR preference in mTECs also does so in the peripheral tissue. Is that what Figure 1G is supposed to address? It is very difficult to understand. Would it be feasible to plot for all the individual 574 Aire-sensitive TRAs from Figure 1D their d3'UTR ratio in mTEChi vs. their d3'UTR ratio in the respective peripheral tissue to ask in how far the distribution of datapoints deviates from a diagonal? Possibly, this might be a more intuitive way of showing whether or not the usage of a proximal 3' UTR differs between mTEChi and a respective peripheral tissue.

* For the multi-tissue comparison, we included twenty tissue RNA-seq datasets generated by a single lab: that of Bing Ren at UCSD. These datasets were deposited in GEO under two different accession numbers: GSE29278 corresponding to a partial set and GSE36026 to the full set. We modified the Materials and methods section “Multi-tissue comparison analysis” to refer only to the GSE36026 accession number. These RNA-seq datasets were generated as part of the ENCODE project using standardized protocols in order to enable between-sample comparison. In addition to these twenty RNA-seq libraries, the two other libraries that we selected for the multi-tissue comparison, as well as the libraries corresponding to our mTEC samples, we all generated from polyA+ RNA using Illumina chemistry and sequenced on Illumina instruments.

A main artefact of the RNA-seq library preparation is biased duplicate reads that arise from reverse transcription, PCR or RNA fragmentation bias (Roberts et al., 2011; Sendler et al., 2011), and that introduce local distortion in the read coverage of the transcripts. These artefact duplicate reads are not readily distinguishable from the natural ones that arise from over-sequencing of highly expressed genes. Hence, duplicate reads, including the normal and artefact ones, are usually not removed from RNA-seq data, in which they contribute to the high dynamic range of gene expression values. Here, since our main interest focuses on the local read distribution within 3' ends between a series of samples that were not all generated by the same research center, we chose a conservative way to correct for the artefact reads, in removing all duplicate reads. Please see Materials and methods section “Multi-tissue comparison analysis”.

In addition, we have confirmed that small d3'UTR ratios of the Aire-sensitive genes in mTEChi were not correlated with the sequencing depth of the mTEChi RNA-seq libraries nor with the length of the generated reads (50bp). This result was obtained by generating successive subsamples of mapped reads (up to the lowest depth of the tissue RNA-seq libraries) by trimming the reads to 36bp (read length of Bing Ren's libraries), and by looking for an effect on d3’UTR ratios. We observed that the high sequencing depth and the 50bp read length have a very limited effect on d3'UTR ratios and that this very limited effect rather goes towards higher d3'UTR ratios. Please see new Figure 1—figure supplement 3 and Results section “Aire-sensitive genes show a preference for short-3’UTR transcript isoforms in mTEChi”.

* We followed the reviewers’ recommendation and simplified Figure 1G in a way that only the d3'UTR ratios of the Aire-sensitive TRA genes are shown. The preference for short-3'UTR transcript isoforms of these genes in mTEChi resulted from the comparison of the d3'UTR ratios across all tissues and the finding that the d3'UTR ratios of the Aire-sensitive TRA genes in mTEChi are amongst the smallest. Please see revised Figure 1G and Results section “Aire-sensitive genes show a preference for short-3’UTR transcript isoforms in mTEChi”.

We have generated a scatterplot of the d3'UTR ratios of 762 TRA genes (identified in the multi-tissue comparison) in mTEChi versus their respective tissue of expression and found a dramatic bias towards higher ratios in peripheral tissues, confirming the preference of Aire-sensitive TRA genes for short-3’UTR transcript isoforms in mTEChi, as compared to the periphery. Please see revised Figure 1H and the aforementioned section.

Roberts A, Trapnell C, Donaghey J, Rinn JL, Pachter L. 2011. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol 12: R22.

Sendler E, Johnson GD, Krawetz SA. 2011. Local and global factors affecting RNA sequencing analysis. Anal Biochem 419: 317–322.


6) Please ensure that all of the bioinformatics methods and tools used are appropriately referenced, as this is currently not the case.

* We have clarified how we generated the annotation files including the d3’UTR features delineated through proximal (alternative) pA genomic location obtained from the PolyA_DB 2 database. These annotation files used for RNA-seq and microarray analyses, as well as those used for RNA-seq differential expression, are provided in Figure 1—source data 1 and 2. We have also described more precisely how we carried out read count and differential expression and we provide the R-script that we used to calculate d3’UTR ratios in WT and Aire-KO mTEChi (new Figure 1—source data 3). References to the softwares that we used were added. Please see new Figure 1—source data 1, 2, 3, Materials and methods section “RNA-seq and d3’UTR ratios” and Results section “Aire-sensitive genes show a preference for short-3’UTR transcript isoforms in mTEChi”.

* We added the reference of the software used for CLIP-seq analysis and described more precisely how this analysis was carried out in using a CLIP-seq wig file and bed files containing proximal and distal pA location of Aire-sensitive or neutral genes. These bed files were generated using files corresponding to proximal and distal pA genomic location of genes extracted from the PolyA_DB 2 database (new Figure 2—source data 1). Please see new Figure 2—source data 1 and Materials and methods section “CLIP-seq analysis”.

* We have clarified how we performed microarray differential expression using the aroma.affymetrix R-package and how we extracted expression values of each individual probe of all genes on the array using the same R-package but with additional options listed in the new Figure 3—source data 1. We have also described how we mapped the individual probes to the d3’UTR features and calculated the d3’UTR ratios in providing our R-script and its dependent files (new Figure 3—source data 2). Finally, we provide the R-script that we used to assay the d3’UTR imbalance and its statistical significance (new Figure 3—source data 3). Please see new Figure 3—source data 1, 2, 3, Materials and methods section “Microarray gene expression profiling”, and “Individual probe-level microarray analyses”, and Results section “CLP1 promotes 3’UTR shortening and higher expression at Aire-sensitive genes in HEK293 cells”.

* We have also described how we used GSEA to test the effect of candidate gene knockdown on the expression of Aire-sensitive genes and the potential involvement of miRNAs or miRNA families in the regulation of sets of genes in mTEChi versus mTEClo. Details, as the list of potential target genes harboring miRNA conserved sites in 3'UTRs generated through parsing the TargetScan 6.2 database, were included (new Figure 5–source data 1). Please see new Figure 5—source data 1 and Materials and methods section “miRNA potential target gene identification” and “Gene set enrichment analysis (GSEA)”.


Textual changes:

Concerns were raised that the existing literature is only incompletely discussed, particularly in the Introduction. Please include further discussion of the current literature around post-transcriptional control mechanisms in mTECs involving changes in the mRNA 3´UTRs, and fully cite relevant previous references.

We thank the reviewers for highlighting the need to include the current literature on miRNAs and post-transcriptional regulation in mTECs. This notably echoes with the specific miRNA de-repression mechanism that we highlight in the revised manuscript. Please see the Discussion section.


Also, subsection “Aire-sensitive genes show a preference for short-3’UTR transcript isoforms in mTEChi and in some peripheral tissues”, last paragraph, first sentence: This somewhat contrived sentence may deserve attention.

We edited the sentence to make it smoother and more meaningful. Please see Results section “Aire-sensitive genes show a preference for short-3’UTR transcript isoforms in mTEChi”.

https://doi.org/10.7554/eLife.52985.sa2
Article and author information


Author details



Clotilde GuyonInstitut Cochin, INSERM U1016, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
ContributionConceptualization, Investigation, Methodology
Contributed equally withNada Jmari
Competing interestsNo competing interests declared





Nada JmariInstitut Cochin, INSERM U1016, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
ContributionConceptualization, Investigation, Methodology
Contributed equally withClotilde Guyon
Competing interestsNo competing interests declared



Francine Padonou
Institut Cochin, INSERM U1016, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
Université de Nantes, Inserm, Centre de Recherche en Transplantation et Immunologie, UMR 1064, ITUN, F-44000, Nantes, France
ContributionFormal analysis, Investigation
Competing interestsNo competing interests declared



Yen-Chin LiInstitut Cochin, INSERM U1016, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
ContributionFormal analysis
Competing interestsNo competing interests declared





Olga UcarDivision of Developmental Immunology, German Cancer Research Center, Heidelberg, Germany
ContributionInvestigation
Competing interestsNo competing interests declared





Noriyuki FujikadoDivision of Immunology, Department of Microbiology and Immunobiology, Harvard Medical School, Boston, United States
Present addressLilly Biotechnology Center, Lilly Research Laboratories, Eli Lilly and Company, San Diego, United States
ContributionInvestigation
Competing interestsNo competing interests declared





Fanny CoulpierEcole Normale Supérieure, PSL Research University, CNRS, INSERM, Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Plateforme Génomique, Paris, France
ContributionInvestigation
Competing interestsNo competing interests declared





Christophe BlanchetInstitut Français de Bioinformatique, IFB-Core, CNRS UMS 3601, Evry, France
ContributionFormal analysis
Competing interestsNo competing interests declared





David E RootThe Broad Institute of MIT and Harvard, Cambridge, United States
ContributionValidation, Methodology
Competing interestsNo competing interests declared



Matthieu Giraud
Institut Cochin, INSERM U1016, Université Paris Descartes, Sorbonne Paris Cité, Paris, France
Université de Nantes, Inserm, Centre de Recherche en Transplantation et Immunologie, UMR 1064, ITUN, F-44000, Nantes, France
ContributionConceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Methodology, Project administration
For correspondencematthieu.giraud@inserm.fr
Competing interestsNo competing interests declared "This ORCID iD identifies the author of this article:" 0000-0002-1208-9677
Funding

Agence Nationale de la Recherche (Research grant 2011-CHEX-001-R12004KK)
European Commission (Career Integration Grant PCIG9-GA-2011-294212)
Agence Nationale de la Recherche (Investissements d'Avenir ANR-10-INBS-09)
Agence Nationale de la Recherche (Investissements d'Avenir ANR-11-INBS-0013)
Fondation pour la Recherche Médicale (Graduate Student Fellowship FDT20150532551)
Fondation pour la Recherche Médicale (Bioinformatics engineer grant ING20121226316)

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements

We thank Dr Sheena Pinto for her useful comments and suggestions, that have helped improve this paper. We thank Drs. D Mathis and C Benoist (Harvard Medical School) for Aire-KO (B6) mice. We thank the members of the ‘Homologous recombination’ and the ‘Genomic’IC’ core facilities (Cochin Institute) for lentigenic generation and RNA-seq data production. This work was supported by the Agence Nationale de la Recherche (ANR) 2011-CHEX-001-R12004KK (to MG), the European Commission CIG grant PCIG9-GA-2011–294212 (to MG) and by the ‘Investissements d'Avenir’ program managed by the ANR to the France Génomique national infrastructure ANR-10-INBS-09 (FC) and the French Institute of Bioinformatics ANR-11-INBS-0013 (CB). CG and Y-CL were supported by fellowships from the Fondation pour la Recherche Médicale FDT20150532551 and ING20121226316, respectively. CG, NJ and MG designed the study and wrote the manuscript; CG and NJ performed most of the experimental work; FP performed bioinformatics analyses and experiments, notably co-immunoprecipitations; FP, Y-CL and MG performed bioinformatics analyses; CB provided us with computing resources on the IFB national service infrastructure in bioinformatics and help with script optimization, OU and NF with RNA-seq and microarray datasets, and DER with lentivirus materials and editing of the manuscript.

Comments