Why R is the best coding language for data journalism

19 points by countrymile 6 months ago | 12 comments
  • Jimmc414 6 months ago
    • nickm12 6 months ago
      Maybe things have changed since I last programmed seriously in R nearly 10 years ago, but back then the language had serious design flaws that made it way too easy to write incorrect programs. Conflating scalar and vector types, weak typing, and various other design choices led to hard to fix bugs, even you even realized the answers were incorrect. Proponents of strong static typing might say the same sorts of things about Python, but Python is much better than R in this regard.

      I would hope that data journalism, like any other kind of journalism, would care about correctness.

      • rgavuliak 6 months ago
        It's a different world where a lot of the code is a one-off, building the right thing is more important than building it right (technically) and people don't really have a deep understanding (or curiosity) about the code.

        I don't think it's necessarily wrong and I have seen this approach bring in a tremendous amount of value despite what the Engineering intuition says. I myself came from that world and switched into Python and onboarded into the normal software dev world.

        • wodenokoto 6 months ago
          I don’t think R is conflating vectors and scalars. It straight up doesn’t let you access scalars.

          Hot take: if you are programming in R you are probably using the wrong tool. If you are analyzing, munging, modeling, visualizing data or fitting/training models, you are probably using the right tool.

          • tfehring 6 months ago
            If you parse e.g. a json file containing a scalar to an R object, that scalar will be represented as a length-1 vector. I get what you’re saying, but “it doesn’t let you access scalars” would suggest to me that it can’t parse or represent values that are defined elsewhere as scalars at all.
            • wodenokoto 6 months ago
              But the scalar is converted to a vector.

              And of the top of my head I have no idea how the round trip will end up. Will it spit out a 1 length vector in the json file or a scalar? And which ever you get, how would you do the opposite?

            • wdkrnls 6 months ago
              Speak for yourself. Programming in R is amazingly expressive for prototyping. Its semantics are extremely lispy, yet it provides excellent support for fast numerics. It keeps me focusing on the problems I want to solve and provides me the tools to abstract them quickly to handle related problems. Meanwhile, python keeps forcing me to care about pesky implementation details which I don't care about on a first or even second or third pass. I really can't understand people who like python over R. Did you guys not read SICP?
              • wodenokoto 6 months ago
                > I really can't understand people who like python over R. Did you guys not read SICP?

                I didn't.

                I do wonder what you are programming in R, that isn't loading, munging visualizing and fitting models.

                Do you have a github you can share? Or a SICP with examples in R?

            • kristofferg 6 months ago
              You are right - the language itself can be a bit messy. R’s advantage is the massive amount of packages available. Almost no matter what niche problem you are trying to solve - someone probably made a package for it. And if not all the functionality is there: you can just add.
              • wdkrnls 6 months ago
                Nope. R's advantage is that the language is extremely expressive and makes many things about it's implementation extremely transparent to it's users. The huge package count for a niche language is a direct result of that.
                • nauman_wah 6 months ago
                  200% agreed. I switched from python to R for data science machine learning and data visualization and found that python is far behind . I am able to generate and test and make machine learning models easily and in standardised manner using its tidymodels libraries while I am unable to generate models in python scikit-learn etc libraries
            • aplzr 6 months ago
              The article lists a few things that you can do with R, but fails to make good on its headline promise: explaining why R is the best language for data journalism.

              To me, and I think to many other people as well, the language most suited for anything data-related is Python, not R. I might be wrong, but If I am I won't know it after reading this article, because it doesn't compare R to other options on the table. R is only the best at something if it offers advantages over the other options, and to be honest I very much doubt that this is the case when comparing against Python.

              Anecdotally, a number of years ago at university I took a class titled "Statistical programming with R" because I had heard good things about it and was looking forward to a chance to learn a new tool. Unfortunately I learned pretty quickly that I had to fight R every step of the way to get it to do what I wanted. Everything seemed arcane, convoluted, and complicated. Went back to Python and never looked back. I don't doubt that one can do great things with R, but the effort needed to get there simply doesn't seem worth it to me when Python seems so much more accessible.

              Having said all that, I would be quite interested in a comparison of typical data science (or data journalism) tasks in both R and Python by someone who is good at both. After having read the article headline I had hoped it went into that direction. I was disappointed to see that it's essentially just a statement of opinion that isn't backed up in any meaningful way.