Our Project Focuses On Advancing Urdu Back Translation Text Reuse Detection By Creating A Specialized Back-Translated Dataset. This Dataset Addresses The Unique Challenges Of Detecting Text Reuse In The Urdu Language. The Key Objectives Of Our Project Are:
Assistant Professor at COMSATS University Islamabad, Lahore Campus, specializing in Natural Language Processing, Urdu NLP, and Text reuse Detection. Focused on advancing machine learning solutions for low-resource languages.
Access our comprehensive project documentation and reports
2024-09-15
Initial project proposal document outlining the scope and objectives
Download Document2024-02-10
Comprehensive testing documentation and results
Download DocumentExplore our complete codebase and experimental results on GitHub. We believe in open science and sharing our research with the community.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Urdu Back Translation Text Reuse Detection
from transformers import AutoTokenizer, AutoModel
import torch
def analyze_text_similarity(text1, text2):
# Load pretrained model for Urdu
model_name = "sentence-transformers/LaBSE"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Process texts
encoded = tokenizer([text1, text2],
padding=True,
truncation=True,
return_tensors="pt")
Frontend Framework
UI Library
Styling Framework
Core Processing
ML Experiments
Database