A poll conducted by KDnuggets recently asked a question which I believe many of people like me may have interest in: What programming/statistics languages you used for an analytics / data mining / data science work in 2013?
The results show below. I’m glad that I know all top 4 languages and kinda use them everyday. And I’m also learning Hadoop by myself, which means future of data management, at least I believe.
How about you guys?
I’m a big fan of Twitter and also like big data. It is a headache for me to find someone who are good at big data to follow on Twitter because there are way too many people there.
Fortunately, Big Data Republic solved this problem for me. They have run a poll to figure out who is the most influential in big data on Twitter. Here is the list and you can scroll down to see the entire list.
[iframe src=”http://groups.peerindex.com/bigdatarepublic/big-data-100/embed” width=”600″ height=”1180″ scrolling=”yes”]
We all know that computer programing is a kind of core technique needed as a data scientist And algorithms are the foundation of computer science. So, I bet you have asked such question: what are the most important algorithms?
Dr. Christoph Koutschan from RICAM (Johann Radon Institute for Computational and Applied Mathematics) conducted a survey to figure out this question. Although the result doesn’t come out yet, and it is really difficult to reach a consensus on such a big question, here I list all the candidates in his survey and hope you can find some which you are familiar with and use everyday.
1. A* search algorithm
Graph search algorithm that finds a path from a given initial node to a given goal node. It employs a heuristic estimate that ranks each node by an estimate of the best route that goes through that node. It visits the nodes in order of this heuristic estimate. The A* algorithm is therefore an example of best-first search.
2. Beam Search
Beam search is a search algorithm that is an optimization of best-first search. Like best-first search, it uses a heuristic function to evaluate the promise of each node it examines. Beam search, however, only unfolds the first m most promising nodes at each depth, where m is a fixed number, the beam width.
3. Binary search
Technique for finding a particular value in a linear array, by ruling out half of the data at each step.Continue reading
Mark Alen, a PhD student at Berkeley summarized these fifteen rule for a data scientists. I think we can all learn from these principles.
1- Do not lie with data and do not bullshit: Be honest and frank about empirical evidences. And most importantly do not lie to yourself with data
2- Build everlasting tools and share them with others: Spend a portion of your daily work building tools that makes someone’s life easier. We are freaking humans, we are supposed to be tool builders!
3- Educate yourself continuously: you are a scientist for Bhudda’s sake. Read hardcore math and stats from graduate level textbooks. Never settle down for shitty explanations of a method that you receive from a coworker in the hallway. Learn fundamentals and you can do magic. Read recent papers, go to conferences, publish, and review papers. There is no shortcut for this.
Saw a joke about big data today, so funny:
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.
Top Posts & Pages
- Best way to add a footnote to a plot created with ggplot2
- Binomial Test using SAS
- Regular expression for Apache log parsing
- Clean up "everything" in RStudio
- Use SAS system options to suppress Log output
- One-sample Median Test using R
- Writing Latex in wordpress
- Quadratic Discriminant Analysis (QDA)
- One-sample Median Test using SAS
- ggplot2 plotting over multiple pages