Today our professor asked us to find a list of researchers who are suitable as program committees for the big data conference:
IEEE BigData is a brand new conference and does not have a “circle” yet. That means finding PCs is harder than other more established conferences. Our solution is to search the PCs in other major conferences such as WWW/KDD and find researchers who are doing big data.
After I count the number of PCs in SIGIR conference, I know that I need a tool to help me. SIGIR’11 has 223 researchers in its community! Think about repeating the following tasks for 223 times:
I want to automate the following the tasks:
There are two tasks that are hard:
1. Getting emails automatically. Some researchers simply don’t write emails on their webpages. To find the email, one have to open one of their research papers.
2. Decide if a researcher is related to big data. Actually we can formulate the problem into a binary classification task, which however requires some training data. So we go back to human labeling. Currently I define a set of keywords that indicate BigData:
let keywords =
and for each home page, I list the subset of keywords that occur in it. If no keyword is in his home page, then it is unlikely the researcher is doing something related to BigData. So the keyword filter is served as a helper for the human to decide.
I write an F# program and the code can be find on BitBucket. Here is a screen shot of running program:
When I click “WANT THIS”, the program inserts a record to the log file.
There two windows, one is for Google search and the other is for the page of the first search result. I keep two pages because occasionally the first result is not correct, I can then click other search results.
Some technical details:
1. WebBrowser in F# with only four lines!
let form = new Form(Visible=true, Text="Google Search Result")
let content = new WebBrowser()
2. Finding ralevtive path in an F# script (this makes the experiment easier to reproduce)
let folder = __SOURCE_DIRECTORY__ + @"\..\..\data\"
let logfile = folder + "log.csv"
let namelistfile = folder + "sigir10.txt"
3. Details in WebBrowser class
content.DocumentCompleted.Add(fun _ –> doing something when the page is fully loaded)