DANeS Lab,IIT Patna

Research Interest

The core emphasis of our group is dealing with problems related to Network & Data Science along with Machine Learning and Artificial Intelligence techniques. Recently we are working on some interesting topics: Exploring Importance of Trust in Socio-Technical Network,Modeling growth of online bipartite network, Computational Journalism, Name disambiguation in bibliographic domain, Recommendation System with Side Informations, Information Retrieval from Online Social Network, Societal differences in Gender Based Violence, Traffic Network of Big Cities and Operations, Multimodal Cyberbullying and Cyberaggression Detection from social media, Analysis of Prescription Drug Abuse ,Financial Health Prediction of Online Merchandise Brands, etc.

EXPLORING IMPORTANCE OF TRUST IN SOCIO-TECHNICAL NETWORK

With the advent of peer-to-peer network and online social network, there is huge growth of user generated contents which urges to define the trustworthiness of those contents. Measuring the validity of any news in social media or selection of peer over the network became a great challenge. Though the existence of trust in any socio-technical network is a well addressed issue and there exists preferential attachment in network evolution not only collaboration network but also in other socio-technical network, but no such work identifies the network growth which is influenced by the under lying trust dynamics in the network. We are trying to learn that trust dynamics behind the network growth as well as the governing factors of information diffusion over the network to control those malicious user generated contents.

People

Modeling growth of online bipartite network

In the past few decades, enormous interactive web applications and services have been introduced after the commencement of Web 2.0. These services facilitate users to share posts, news, and links in online social media sites or organize and share bookmarks in folksonomy sites to read, share and vote posts in online discussion forums etc. because of this web users and online objects have shown continuous growth that led to the importance of modeling rapid growth of web activities. In our work we propose two distinct growth models for the online bipartite network also known as web-based user-object bipartite network. We have given brief descriptions of proposed models as: Each of the edges evolves from users and objects are attached using preferential attachment along with randomness. The fraction of evolved edges from users and objects are attached either preferentially or randomly. Proposed growth models are validated over several online networks i.e., MovieLens, Wikipedia, StackOverflow, Digg, Amazon, Twitter etc.

People

Computational Journalism

The popularity of Twitter as a news media platform encourages a large fraction of the news readers to post tweets on news articles so that the same can be widely discussed by a large user base. Utilization and understanding of these tweets can highlight the user opinions towards the news event. However, extracting the specific and relevant tweets related to news articles and further, extracting out relevant information from the tweets remains a challenge. We are working on various applications, like extraction of relevant tweets related to a news article, summarization of the user opinion towards the news article, understanding of user stance, utilization of user opinion to predict the popularity of news article, predicting user reactions towards each other, etc. We incorporate graph-based techniques coupled with natural language processing and machine learning algorithms to develop these applications.

People

Name disambiguation in bibliographic domain

Author’s name disambiguation is one of the very important problems in scientometrics. It affects contribution analysis of the individual researchers for example h-index of an author. It is important in various contexts like in citation analysis and link prediction between authors in collaboration networks. Name ambiguity disambiguation will helps in expert finding in a better way. This problem is a great challenge in many other areas also like social networking, bibliometric analysis, and forensic domain. In our research work, we are using various types of method for resolution of this type of ambiguity-- We are using machine learning and network science concept to deal with the problem. Our approach includes clustering techniques and we are planning to use some embedding based techniques. Data Set: We are using Arnetminer citation dataset and crawled the data from Web of Science collection.

People

Recommendation System with Side Information

Recommender systems recommend items more accurately by analyzing users’ potential interest on different brands’ items. In conjunction with users’ rating similarity, the presence of users’ implicit feedbacks like clicking items, viewing items specifications, watching videos etc. have been proved to be helpful for learning users’ embedding, that helps better rating prediction of users. Most of the researchers do not consider explicit feedbacks. We focus on solving the following issues: Explicit feedback can be used to validate the reliability of particular users and can be used to learn about the users’ characteristic. We are focusing on explicit feedbacks for a better recommendation.We are trying to merge explicit feedbacks and implicit feedbacks for our recommendation purposes. Dataset: We are using Amazon.com online review dataset.

People

Information Retrieval from Online Social Network

Retrieving relevant information from user generated social media comment based on specific requirements is an important area for the researchers. Retrieving the relevant comment has a great impact on making the important decision during critical situations. There are two major concern of our research. Retrieving Infrastructure Damage Tweets: We have build an real time model to retrieve the tweets related to damage and finding the impact of damage on the locations. We have used spilt-query based approach followed by improved pseuso relevance feedback approach to get the objective. Mining the Right Social Media based on the Requirement: Reddit and Twitter are most talked about and followed social media platforms for the news aggregation. In this work, by extracting and comparing various important features of both the platforms, we suggest the suitable platform for different requirement of followers/readers. Dataset: We are using Twitter and Reddit data.

People

societal differences in Gender Based Violence

Because of societal differences and biases, even there are traces of Gender Based Violence it is not reported. Even if it is, people because of their societal stigma, do not show uniform and justifiable reactions. Reactions are captured in their Online Content. Our research aims to capture the societal differences and highlight the spatial and societal blind spots using the online social media content.

People

Traffic Network of Big Cities and Operations

We aim to develope enhanced mathematical model for big city traffic Networks which makes many operations executable on such big networks and also improves the result of operations in equivalents space-time complexity. The primary operations we have successfully executed are as follows :Selection of major regions of mobility. Prediction of short term traffic from hours to weeks. Reducing the big networks into its parallel isolated components. Feature Transfer of networks into its components. our present work has wide implementation in real world applications and we aim further to improve in other aspects of traffics.

☛ This research work is efficiently guided by Dr. Joydeep Chandra and Prof. João Mendes-Moreira.

People

Multimodal Cyberbullying and Cyberaggression Detection from social media

As human beings utilize computing technologies to mediate multiple aspects of their lives, cyberbullying has grown as an important societal challenge. Cyberbullying may lead to deep psychiatric and emotional disorders for those affected. Hence, there is an urgent need to devise automated methods for cyberbullying detection and prevention. While recent cyberbullying detection efforts have defined sophisticated text processing methods for cyberbullying detection, there are as yet few efforts that leverage visual data processing to automatically detect cyberbullying. After detecting cyberbullying we identify the area from where this event spreads. Dataset information: Twitter, Instagram.

People

Analysis of Prescription Drug Abuse on Twitter

The problem prescription drug abuse has reached an epidemic status. With a growing volume of contents that share positive experiences about prescription drug abuse over Twitter, it is necessary to investigate how social media is contributing to the spread of such contents. Such contents negatively impact the awareness drives being carried to mitigate the issue. And hence we study the collecting behavior of users in Twitter tweeting positively about prescription drug abuse. We focus on solving the following issues: Characteristics of the network: We study the characteristics of the network of users, spreading of content in the network through cascades and spreading of awareness about different drugs in the network. Role of influencers: We work towards identifying key users in the network that are responsible for eliciting response related to drug abuse on Twitter using an influence score. We use the influence information to predict future participants in the network. Dataset: Tweets, retweets and follower-followee information of users who have positively tweeted about prescription drug abuse between 2012-2015.

People

Financial Health Prediction of Online Merchandise Brands

Conventional way of summarizing users’ ratings or sentiment of reviews on items of an online merchandise brand are not sufficient to evaluate the financial health of a brand. It overlooks the social standing and influence of individual users. We have proposed a method that evaluates helpfulness score and centrality score of users of a brand’s items based on their ratings and reviews collected from online merchandise sites. Our proposed method named as Social PromoterScore (SPS) combines rating score, helpfulness score, centrality and reputation of all users of a brand. The SPS value acts as an indicator for future financial health of a company. We are investigating different types of centrality score. Dataset: We are using Amazon.com online review dataset.

People