[CUE_Interns-Recommendations-03] Similarity-based people recommendation algorithms for social networking sites
Original Publication Date: 2009-Nov-18
Included in the Prior Art Database: 2009-Nov-18
Disclosed are similarity-based people recommendation algorithms for social networking sites
---based people recommendationbased people recommendationbased people recommendation
based people recommendation
algorithms for social networking sites
algorithms for social networking sitesalgorithms for social networking sites
There exist a growing number of social networking sites available today. As the number of users on these sites also grows, it becomes difficult for an individual user to find other users they may know or would like to network with.
One possible way to help users find people on sites with a large population is to proactively recommend people to a given user. Some algorithms tend to recommend people a user already knows or is familiar with. Those algorithms help a user complete their existing social network online. Other algorithms recommend mainly unknown people based on similarity, for example, same job, skills, keyword match etc. In a professional setting (like Beehive), those algorithms are important to support reaching out to new people who can help a user in their job and career.
that the "familiarity"-based approach will run out of recommendations once a user has completed their social network. "Similarity" approaches can constantly help expand your existing network with new contacts. However, similarity-based people recommenders are not yet very common today on social networking sites.
Facebook has recently added a "People you may know" page. They use a simple "Friend of a Friend" algorithm which recommends users based on the number of mutual friends you have in common. This algorithm recommends known and unknown people but recommendations are limited to one hop away from the user. The Sonar-Fringe people recommender in *IBM is similar to the Facebook recommender in that it recommends mostly "familiar" people. However, Sonar includes multiple data sources to compute familiarity.
A. The first algorithm works as follows
(1) Creating bag-of-words representation
First create a bag-of-words representation of each user, using all textual content that can be associated with the user. To improve the robustness of the word list and ensure the words are meaningful, use certain rule-based procedures to combine words and/or collapse the word list. Such procedures can include, but are not limited to:
1. A stemmer, such as the Porter stemmer, which reduces all words into their roots.
2. Combine words with short edit distances from each other.
3. Remove all words in a customized stop word list.
4. Remove all words whose word count is lower or higher than certain thresholds.
All remaining words associated with a user
are then used to create a word vector u
V =( )
vu w ) to describe
is the total number of distinct words used in all included texts after the a...