Is a search engine for a specific website a hadoop use-case?
I am formalizing a plan to write a search engine for reddit, partly for
learning and partly for personal usage. I want to be able to search
through comments, users and subreddits.
On the data side of things, is Hadoop an appropriate technology for a
reddit search engine? Or is it better to use some sort of relational
database?
My main concerns:
-is there enough content to justify Hadoop? Statistics indicate that there
are thousands of posts on reddit a day and millions of users
-is the data structure of reddit well-defined enough to simply query a
database?
-the input for the search would just be a string, so I am not sure what my
database query would look like. But, I have an idea of what types of
algorithms I might choose to use if I used Hadoop
Any thoughts would be super helpful!
No comments:
Post a Comment