About Us

PROJECT GOALS

Our Project Focuses On Advancing Urdu Back Translation Text Reuse Detection By Creating A Specialized Back-Translated Dataset. This Dataset Addresses The Unique Challenges Of Detecting Text Reuse In The Urdu Language. The Key Objectives Of Our Project Are:

01
Development of a Back-translated Dataset.
02
Leveraging Methodologies for Authentic Text Reuse Detection
03
Supporting Advancements in Urdu Text Reuse Detection
04
Evaluation of Detection and Performance using Evaluation matrix
05
Higher Sensitivity and Accuracy for Urdu Back Translation Text Reuse Detection Strategies

MEET PROJECT TEAM

DR

MUHAMMAD SHARJEEL

PROJECT SUPERVISOR

Assistant Professor at COMSATS University Islamabad, Lahore Campus, specializing in Natural Language Processing, Urdu NLP, and Text reuse Detection. Focused on advancing machine learning solutions for low-resource languages.

MALIK ASHAS

MALIK ASHAS

UMER AMIR

UMER AMIR

UMER AMIR

Team Lead

umeraamir45@gmail.com

USAMA TUFAIL

USAMA TUFAIL

Project Documentation

Access our comprehensive project documentation and reports

Project Proposal

2024-09-15

Initial project proposal document outlining the scope and objectives

Download Document

FYP Presentation

2023-12-20

Final Year Project presentation.

Download Document

High Fidelity Testing Report

2024-02-10

Comprehensive testing documentation and results

Download Document

Final Report

2024-05-01

Complete project documentation and implementation details

Download Document

Open Source Research

Explore our complete codebase and experimental results on GitHub. We believe in open science and sharing our research with the community.

Python
ML Models
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # Urdu Back Translation Text Reuse Detection from transformers import AutoTokenizer, AutoModel import torch def analyze_text_similarity(text1, text2): # Load pretrained model for Urdu model_name = "sentence-transformers/LaBSE" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(model_name) # Process texts encoded = tokenizer([text1, text2], padding=True, truncation=True, return_tensors="pt")

OUR TECH STACK

Next.js

Frontend Framework

React

UI Library

Tailwind CSS

Styling Framework

Python

Core Processing

Machine Learning

ML Experiments

MongoDB

Database