What’s this all about?
I wanted to have a simple recommender that suggests me some new, possibly interesting, pages on the web. The input data to this recommender should be my existing bookmark library. Currently I save all my articles and bookmarks on Evernote – so it would be nice if that recommender could take this data directly from there without conversion, exporting, importing and so on.
I didn’t find any simple solution to that problem – so I decided to build one on my own. First I created a wrapper for the delicioiusfeeds-API called deliciousfeeds4j. The second result is this post.
How does it work?
The idea is simple:
- find some users which bookmarked the same pages
- find the ones which are most common compared to me – based on the bookmarks that we share
- suggest all bookmarks from that top-n most common users, which I don’t know about yet
To make this all work I use the data from Delicious through their API.
Show me the code!
Note: I have tried to make this as simply as possible. So there are many things that can be done much better. But this code is only to show the basic idea – nothing else!
The Starter
object ties everything up. It loads the URLs from your Evernote-Account or if you like from a simple text file with one URL per line. Then the PageRecommender
kicks in and finds some new URLs to look at. Finally everything is print out on the console.
object Starter extends App {
//find the Urls stored in your evernote-account (you have to setup some things for this to work!)
val urls = EvernoteURLRetriever.findAllUrls()
//Uncomment this if you want to load your urls from a text-file instead...
//val urls = FileURLRetriever.readUrlsFromFile("res/test-urls.txt")
//use the 10 best-matching users for the recommendations
val pageRecommender: PageRecommender = new DeliciousUserBasedPageRecommender(20)
//get recommendations...
val recommendations = pageRecommender.recommend(urls)
println("\n\nFound some new urls: ")
recommendations.foreach(println)
//Here some further processing can be done -> save to file, group by url-authority, etc.
}
Here’s the code from the EvernoteURLRetriever
.
import com.evernote.auth.{EvernoteService, EvernoteAuth}
import com.evernote.clients.ClientFactory
import com.evernote.edam.`type`.Note
import com.evernote.edam.notestore.NoteFilter
import scala.collection.JavaConversions._
import scala.collection.mutable.ListBuffer
object EvernoteURLRetriever {
//This is fine for testing - no harm can be done since it's only the sandbox...
private val developerToken = "YOUR_SANDBOX_DEVELOPER_TOKEN"
private val evernoteServiceType = EvernoteService.SANDBOX
//Uncomment this if you want to use your real account...
//private val developerToken = "YOUR_PRODUCTION_DEVELOPER_TOKEN"
//private val evernoteServiceType = EvernoteService.PRODUCTION
/**
* Finds all notes from all notebooks which have a source-url starting with 'http'.
* @return found urls
*/
def findAllUrls(): Traversable[String] = {
//build the authentication
val evernoteAuth = new EvernoteAuth(evernoteServiceType, developerToken)
//get the note-store...
val factory = new ClientFactory(evernoteAuth)
val noteStore = factory.createNoteStoreClient()
//build a filter for all notes with a source-url set...
val filter = new NoteFilter()
filter.setWords("sourceURL:http*")
println("Evernote - searching for notes with source-url...")
//fetch the results...
val totalFound = noteStore.findNotes(filter, 0, 0).getTotalNotes
var notes = ListBuffer[Note]()
while (notes.size < totalFound)
notes ++= noteStore.findNotes(filter, notes.size, 50).getNotes
//now get the urls
val urls = notes.map(_.getAttributes.getSourceURL)
println("Evernote - found %s urls..." format urls.size)
urls
}
}
Here’s the code from the PageRecommender
trait.
trait PageRecommender {
/**
* Gets some urls and returns the recommended ones - based on the given data.
*
* @param urls - base data for recommendation
* @return recommended urls
*/
def recommend(urls: Traversable[String]): Traversable[String]
}
Here’s the code from the DeliciousUserBasedPageRecommender
class. It does all the heavy work of finding some new and possibly interesting URLs.
import com.delicious.deliciousfeeds4J.beans.Bookmark
import com.delicious.deliciousfeeds4J.DeliciousFeeds
import com.google.common.collect.{Multisets, HashMultiset, Multiset}
import org.apache.commons.lang.StringUtils.isEmpty
import scala.collection.JavaConversions._
import scala.collection.mutable
class DeliciousUserBasedPageRecommender(val topNUsers: Int) extends PageRecommender {
private val deliciousFeeds = new DeliciousFeeds
deliciousFeeds.setExpandUrls(true)
/**
* Gets some urls and returns the recommended ones - based on the given data.
*
* @param urls - base data for recommendation
* @return recommended urls
*/
def recommend(urls: Traversable[String]): Traversable[String] = {
//find all users who bookmarked the same urls, store them in multiset to find most similar ones
val userMultiset: Multiset[String] = HashMultiset.create()
for (url <- urls) {
getBookmarksByUrl(url) match {
case Some(bookmarks) => bookmarks.foreach(b => if (!isEmpty(b.getUser)) userMultiset.add(b.getUser))
case None =>
}
}
println("Recommender - found %s similar users, taking the top %s...".format(userMultiset.size, topNUsers))
val recommendedUrls = new mutable.HashSet[String]
//take the topN most similar users
val similarUsers = take(topNUsers, userMultiset)
println("Recommender - searching for other urls from that similar users...")
//find all urls from the most similar users
for (user <- similarUsers) {
getBookmarksByUser(user) match {
case Some(bookmarks) => bookmarks.foreach(recommendedUrls add _.getUrl)
case None =>
}
}
//remove the ones you already know
urls.foreach(recommendedUrls.remove)
println("Recommender - found %s recommended urls!" format recommendedUrls.size)
recommendedUrls
}
private def getBookmarksByUrl(url: String): Option[Traversable[Bookmark]] = try {
val bookmarks = deliciousFeeds.findBookmarksByUrl(10, url)
if (bookmarks != null) Some(bookmarks)
else None
} catch {
case e: Exception =>
e.printStackTrace()
None
}
private def getBookmarksByUser(user: String): Option[Traversable[Bookmark]] = try {
val bookmarks = deliciousFeeds.findBookmarksByUser(100, user)
if (bookmarks != null) Some(bookmarks)
else None
} catch {
case e: Exception =>
e.printStackTrace()
None
}
private def take[T](count: Int, multiset: Multiset[T]) = {
val sortedMultiset = Multisets.copyHighestCountFirst(multiset).elementSet().toList
sortedMultiset.take(count)
}
}
If you want to use the EvernoteURLRetriever
Before you can start, get your developer token…
- …for testing: Get a sandbox developer token
- …for your real account: Get a production developer token
Then edit the EvernoteURLRetriever
object. Your done 🙂
Get the Code
The complete source code with setup instructions can be found on Github.
How to improve this?
As I said earlier this is only intended to show the basic idea. Many improvements are possible.
One that comes to my mind is to use the recommendation engine from the Apache Mahout project. This step should improve quality of the recommendations.
What I did with that – some numbers
I used this with my Evernote account and the EvernoteURLRetriever
found about 1100 URLs. With this data as input I got about 850 new pages as suggestion which pointed to about 550 different domains.
After all there are many interesting sites I did not know about – of course there is also a lot that is not intersting to me.