The backed for an anime recommender system that combines multiple methods to provide a variety of recommendations to users based on different similarity metrics
Match celebrity users with their respective tweets by making use of Semantic Textual Similarity on over 900+ celebrity users' 2.5 million+ scraped tweets utilizing SBERT, streamlit, tweepy and FastAPI
Some Python scripts and Java classes to make several open source toolkits to work with CLEF (PubMed docs) and TREC Common Core datasets. The code developed aimed to improve document classification by exploring some pretrained word embeddings like BERT and PubMedBERT and to investigate a Semantic search approach incorporating the Sentence-BERT model for both datasets.
⚕️🦠 Developed a document retrieval system to return titles of scientific papers containing the answer to a given user question based on the first version of the COVID-19 Open Research Dataset (CORD-19) ☣️🧬
This project is about developing a document retrieval system to return titles of scientific papers containing the answer to a given user question. Two different sentence embedding approaches have been implemented and compared.
Scripts e utilitários para modelagem e identificação de tópicos relativos a depressão no Reddit, em língua portuguesa e inglesa, usando técnicas de modelagem de tópicos. Os modelos de tópicos Latent Dirichlet Allocation (LDA), Contextualized Topic Model (CTM) e Embedded Topic Model (ETM) foram explorados neste estudo.
Built a news topic classification model using pre trained sentence BERT model and traditional ML models. Sentence bert is used to generate sentence embedding and sbert embedding are used to train various ML models.Achieved an accuracy of 94.3% using SVM and Sbert model.
Project files contain PyTorch implementations for Siamese BiLSTM models for Semantic Text Similarity task on the SICK Dataset using FastText embeddings. Also contains Siamese BiLSTM-Transformer Encoder and SBERT fine-tuning implementations on the STS Data tasks.
Initially implement Document-Retrieval-System with SBERT embeddings and evaluate it in CORD-19 dataset. Afterwards, fine tune BERT model with SQuAD.v2 dataset so as to evaluate it in Question Answering task.
As specified in this issue: UKPLab/sentence-transformers#350 (comment)
The authors of SBERT recommend using CrossEncoders for sentence classification.