Datasets

MCRLab provide access to the following Data Sets:

Football dataset

We are pleased to release our football-specific sentiment dataset which we use to train our sentiment model for the football domain in our paper “Sentiment Identification in Football-Specific Tweets”. We have collected football-specific tweets that were posted during two popular football events: The FIFA World Cup 2014 (FIFA 2014) and the UEFA Champions League 2016/2017 (CL 2016/2017). We use crowdsourcing to annotate our dataset, stipulating that each tweet is manually annotated by four annotators so we can measure the agreement between them. In constructing the final dataset, we assigned tweets to sentiment categories based on the annotators’ agreement. If the annotators do not have a majority agreement on the sentiment of the tweet, it is discarded as noisy. The football-specific dataset consists of 54,526 tweets in total. 30,065 tweets pertain to the 2014 FIFA World Cup while 24,461 tweets reference the CL 2016/2017.

NOTE: The dataset is divided into two files :

  1. FIFA2014-sentiment-dataset: contains the tweets ids and labels for tweets related to the FIFA World Cup 2014 (FIFA_2014_sentiment_dataset).
  2. CL2016/17-sentiment-dataset: contains the tweets ids and labels for tweets related to UEFA Champions League 2016/2017 (CL_2016_17_Annotated_tweets).
  3. For retrieving the tweets’ text and other social factors such as the number of likes, retweets, user, image, etc., you can use Twitter APIs.
  4. Please, cite our paper if you use this dataset:

“ S. Aloufi and A. E. Saddik, “Sentiment Identification in Football-Specific Tweets,” in IEEE Access, vol. 6, pp. 78609-78621, 2018, doi: 10.1109/ACCESS.2018.2885117 ” https://ieeexplore.ieee.org/abstract/document/8561283

The SENS-ITFIFA_2014_sentiment_dataset

ontology aims to describe people’s surrounding in much perceptible status by aggregating the corresponding sensory data into a textual format. Please download the ontology from here: (https://mcrlab.net/wp-content/uploads/2018/06/SENS-IT.zip)  SENS-IT  

Human Affective States Ontology (HASO):

The Human Affective States Ontology (HASO) has been developed in the OWL language. It provides knowledge and a common vocabulary regarding human affective states (emotion, mood, sentiment), in a machine-accessible or machine-readable format. Nowadays, humans and computer applications often need to communicate and share knowledge. However, everyone expresses themselves in his or her own language, with different terms and meanings. Ontologies aim to unify the terms and meanings in order to enable effective communication between people and computers. Ontologies capture the domain knowledge and provide an approved understanding of the domain. The study of human emotion, mood, and sentiment is significant as these concepts have an impact on human behavior. Building an ontology for this domain allows us to then build a semantic application.

  1.  HASO (download Ontology from here: Proposed Ontology Human Affective States HASO covers a wide range of human affective states and therefore many topics. Through modularization, we create modules that handle parts of the ontology.
  2. Ontology modularization (download here: Proposed Ontology Human Affective States HASO Modularization) aids in scalability, reusability, and validation process.

HASIO Question Answering system (HASIOQA)


is a system that aims to overcome the complexity and difficulty of SPARQL Query by using natural language user interface. 
The system receives as input a question expressed in English and then convert the question to SPARQL query to retrieve the answer from the Human Affective States and their Influences Ontology ( HASIO). We implemented HASIOQA by using Eclipse environment and Jena Ontology API . Apache Jena is an open source Semantic Web framework for Java. It provides an API to extract data Ontology.  First, we defined regular expressions to match the natural language questions. Then we defined a parametrized SPARQL Query to run a query against HASIO through Jena based on the user natural language question. Please download HASIOQA here: (HASIO_Information_Catalogue)  

A Dataset for Psychological Human Needs Detection from Social Networks

We are pleased to release the dataset related to the paper. “A Dataset for Psychological Human Needs Detection from Social Networks“ is available under the Early Access area on IEEEXplore  “Digital Object Identifier:10.1109/ACCESS.2017.2706084
The dataset itself can be downloaded from: https://mcrlab.net/need-dataset/

Sentiment Analysis on Multi-view Social Data:

We are pleased to release our new MVSA dataset including more tweets and annotations. In the new dataset, each tweet is annotated by three annotators. We name this dataset as MVSA-multiple. Please go to the following page to download the data set: https://mcrlab.net/research/mvsa-sentiment-analysis-on-multi-view-social-data/

Mudva dataset:

1. MUDVA: A Multi-Sensory Dataset for the Vehicular CPS Applications: download the paper and access the full data set under https://drive.google.com/open?id=0B5PcvDP2jMVSV3ZpTmFPdmdFUHc   – the paper is: 

  1. Kazi Masudul Alam, Mohammad Hariz, Seyed Vahid Hosseinioun, Mukesh Saini, and Abdulmotaleb El Saddik, “MUDVA: A Multi-Sensory Dataset for the Vehicular CPS Applications”, in Proceedings of the 2016 IEEE Workshop on Multimedia Signal Processing (MMSP 2016), 21-23 September 2016, Montreal, Canada