Overview¶

The abundant expression and diversity of microRNAs gives rise to a broad set of post-transcriptional gene regulatory effects influencing the cellular metabolic equilibrium. As well as sequence matching, structural accessibility plays a key role in allowing each microRNA to bind to its target. An often overlooked aspect of microRNAs is secondary structure, which is hindered by poor interoperability of environments, datasets, and tools. TM-SmiRs addresses this gap by creating an end-to-end Python package and graphical web interface for users to upload microRNA sequences. A newly implemented radar plot and PCA embedding approach, coupled with a composite structural accessibility score, provides visual intuition for rational target selection. By rendering the microRNA sequence-structure landscape visually accessible, TM-SmiRs promotes rational target selection, reduces pre-screening costs, and accelerates translational efforts bringing microRNA diagnostics and therapeutics closer to clinical realisation.

TM-SmiRs is a pip installable python package leveraging NUPACK to analyze RNA secondary structures and thermodynamic properties of individual sequences. As well as providing thermodynamic properties, it also:

computes sequence‐level similarity metrics against two reference RT-qPCR primer sequences,
creates radar plots for each uploaded microRNA,
projects desired input microRNAs into a precomputed principal component analysis (PCA) space. This has a new ‘Composite score’ metric superimposed to facilitate selection of more linear (blue) or with higher secondary structure (red) microRNA sequences. All classified human microRNAs annotated in miRBase as of 08 June 2025 have been added.

Workflow¶

How does it work?

A user creates an inputSmall.xlsx file containing a list of microRNA names (full miRNA column) and respective sequences (sequence column). The user uploaded file is compared against the precomputed reference database of microRNAs extracted from miRBase. If a match is found, precomputed features are directly retrieved. If no match is found, sequence properties of purine content (%), GC content (%), reference overlap, and thermodynamic properties are computed. The user specified temperature (default 24 °C) will influence output energy values and dot bracket notation. All uploaded microRNAs are projected into a precomputed PCA space with parameters fitted to the miRBase mature.fa dataset (downloaded June 2025). Computed results may be downloaded in excel format.

Code Example¶

How can it be run?

Once installed, the package may be run using the tm_smirs command. A simple workflow for can include the three steps of 1. building a reference database using mirbase_analyzer, 2. preprocessing input files using preprocessing_input, and 3. generation of interactive plotly plots using interactive_plotter. Unless interested in changing the underlying precomputed PCA features and miRBase extracted microRNAs, step 1 can be skipped as the package comes with a precomputed PCA space and reference database.

import tm_smirs as ts

# 0. Define desired parameters for NUPACK simulations
temp = 24  # Temperature in Celsius
material = "rna"  # fixed

# 0.1 Define also additional desired parameters for analysis
nc = 6  # Number of PCA components to use
np = 6 # Number of PCA components to plot - should match nc.

# 0.2 Set the input file path
filename = 'inputSmall'
user_defined_input = f'inputs/{filename}.xlsx'

# 1. Build the reference database + PCA
ts.mirbase_analyzer(sqlite_path="mirbase_mature_human.sqlite",
   out_folder="miRbase_referenceData", temp= temp, material = material,
   input_feat_merged="features_merged.xlsx", output_analysed="Analysed",
   n_components = nc, scaler="scaler.joblib", pca_in="pca.joblib", top_k=30,
   )

# 2. Preprocess the inputfile
ts.preprocessing_input(small_df = user_defined_input, big_df ='features_all.xlsx',
   output ='results/Intermediates/', temp= temp, material = material,
   input_inter = f'{filename}_matched.xlsx', output_radar='results/RadarPlots',
   scalar_input='scaler.joblib', pca_input='pca.joblib', num_pcs = np,
   output_pca ='PCAplots_matched', mahalanobis_ctrl ='mahalanobis_features_merged.xlsx',
   background_input ='features_merged.xlsx', top_percentile = tp
   )

# 3. Create interactive plotly files for exploring PCA space
ts.interactive_plotter(input_inter = f'{filename}_matched.xlsx',
   scalar_input ='scaler.joblib', pca_input ='pca.joblib', temp = temp,
   top_percentile = tp, mahalanobis_ctrl ='mahalanobis_features_merged.xlsx',
   background_input ='features_merged.xlsx',output_pca ='PCAplots_interactive',
   num_pcs = np, output_comp ='PCAplots_composite'
   )

Results¶

What does the package produce?

Radar plots
Static PCA plots stored in SVG format, with target miRs superimposed
Interactive PCA plots colored by a computed CompositeScore summed value, with target miRs superimposed.
A final excel file containing all computed features for each input microRNA, for easy download and further analysis.

Head over to User Upload to use the interface, and refer to Examples for sample applications.