PROJECT GOALS

Our Project Focuses On Advancing Urdu Back Translation Text Reuse Detection By Creating A Specialized Back-Translated Dataset. This Dataset Addresses The Unique Challenges Of Detecting Text Reuse In The Urdu Language. The Key Objectives Of Our Project Are:

Development of a Back-translated Dataset.

Leveraging Methodologies for Authentic Text Reuse Detection

Supporting Advancements in Urdu Text Reuse Detection

Evaluation of Detection and Performance using Evaluation matrix

Higher Sensitivity and Accuracy for Urdu Back Translation Text Reuse Detection Strategies

Development of a Back-translated Dataset.

Leveraging Methodologies for Authentic Text Reuse Detection

Supporting Advancements in Urdu Text Reuse Detection

Evaluation of Detection and Performance using Evaluation matrix

Higher Sensitivity and Accuracy for Urdu Back Translation Text Reuse Detection Strategies

MEET PROJECT TEAM

Email: muhammadsharjeel@cuilahore.edu.pk

DR

MUHAMMAD SHARJEEL

PROJECT SUPERVISOR

Assistant Professor at COMSATS University Islamabad, Lahore Campus, specializing in Natural Language Processing, Urdu NLP, and Text reuse Detection. Focused on advancing machine learning solutions for low-resource languages.

MALIK ASHAS

malikashas786@gmail.com

UMER AMIR

Team Lead

umeraamir45@gmail.com

USAMA TUFAIL

FA21-BSE-053@cuilahore.edu.pk

Project Documentation

Access our comprehensive project documentation and reports

Project Proposal

2024-09-15

Initial project proposal document outlining the scope and objectives

Download Document

FYP Presentation

2023-12-20

Final Year Project presentation.

Download Document

High Fidelity Testing Report

2024-02-10

Comprehensive testing documentation and results

Download Document

Final Report

2024-05-01

Complete project documentation and implementation details

Download Document

Open Source Research

Explore our complete codebase and experimental results on GitHub. We believe in open science and sharing our research with the community.

Python

ML Models

# Urdu Back Translation Text Reuse Detection
from transformers import AutoTokenizer, AutoModel
import torch

def analyze_text_similarity(text1, text2):
    # Load pretrained model for Urdu
    model_name = "sentence-transformers/LaBSE"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModel.from_pretrained(model_name)
    
    # Process texts
    encoded = tokenizer([text1, text2], 
                       padding=True, 
                       truncation=True, 
                       return_tensors="pt")

OUR TECH STACK

Next.js

Frontend Framework

React

UI Library

Tailwind CSS

Styling Framework

Python

Core Processing

Machine Learning

ML Experiments

MongoDB

Database