Ask HN: What's the point of automated skill assessment tests in the age of AI?

11 points by neverminder 1 year ago | 25 comments
Back in the day there were two choices: refuse do do it or jump through the hoops. Today nobody I know bothers with automated skill assessment tests any more. Folks just feed it into ChatGPT and be done.

Yesterday I've applied to a role I found interesting, received a confirmation email and seconds later - skill assessment test link. So it looks like nowadays humans no longer bother reading the CVs, it's all left to the bots. The test involved a tech stack I was not familiar with, so I thought it would be a good opportunity for the ChatGPT and it handled it well without even breaking a sweat. The though occurred to me that today one side uses AI to create and evaluate these tests and the other side uses AI to complete them. Is this the new reality? What's even the point?

  • huevosabio 1 year ago
    Here is the solution to this:

    1. Use AI to do the full on-site interview loop. At least for SWEs it is heavily structured, so something a good wrapper around GPT-4 can handle. It can even have an avatar if need be.

    2. The AI-led "on-site" is done at your own time. But you must have a camera enabled.

    3. HR just goes over the footage and other features to see if you have been using aids. In any case, the way you respond will be highly indicative of whether you used any aids.

    4. If you pass and there is no cheating detected, then go straight to meeting the hiring manager. This interview is mostly behavioral and guessing whether you are a good match.

    The problem today is that interviewing is expensive for both parties. Companies in particular get drowned with applications so they put all types of hurdles and auto-rejections.

    But with AI, you could flip the process around. You could give _everyone_ the chance to do the full technical interview! Then you leave the human matching components for the best performing candidates.

    • viraptor 1 year ago
      Ideally, companies would evaluate what answers ChatGPT is capable of and/or give tests which contain enough space for creativity that they'd be hard to pass this way. (Ask people to show off their knowledge of some domain without explicit instructions) If you get old style tests, they just haven't caught up to what's happening yet. Hopefully they'll realise after a few second stage interviews and change it.
      • neverminder 1 year ago
        I'd say that if a company doesn't want to invest time in interviewing a candidate in person and instead use automated tests, then the candidate has every right to respond in the same manner by using AI to beat those automated tests.
      • monkaiju 1 year ago
        Hopefully eventually we fallback to the more analogue solutions that have always been better anyway. Things like actually talking with applicants, seeing things they've actually built that have existed for some amount of time, talking to people in their networks, etc.
        • yodsanklai 1 year ago
          Companies will interview you on a white board, like many already do. There's nothing new here. This is why we've been taking tests in class and not at home where we could use friends or family to do them for us.
          • kragen 1 year ago
            an interesting question is how to filter out the folks who just feed it through chatgpt, because obviously the people who just lie to get out of any mild annoyance are the ones you least want as coworkers (though obviously people who can effectively use large language models will be incredibly valuable)

            i asked gpt-4 some programming questions today, and although it gave clear and convincing explanations in english, it made a lot of really dumb mistakes in writing the code. but this was in a mode where it couldn't call out to an external interpreter; maybe a lot of those will go away once it can test the code

            most programmers would have done worse, but is there a pattern to the dumb mistakes chatgpt makes that is clearly different from the dumb mistakes junior programmers make?

            like for example i gave it this

                #!/usr/bin/perl
                # See also cifs.py.  Edited GPT-4 output; see <https://bin.gy/clignitter>
                # Try watch -n .1 ./wing.pl.
                for(1..9999){$t=0|rand 3;($y,$x)=$t?($y/2,$x/2+($t-1?40:10)):(.3*$x-.4*$y+12,
                .4*$x+(.2+abs($$%99/99-.5))*$y);$a[$x][$y]=1}map{$r=$_*2;print map({substr
                " .':",2*$_->[$r]+$_->[$r+1],1}@a[0..79]),"\n"}(0..18);
            
            and asked it to translate it to lua, getting the following nonsense in response

                math.randomseed(os.time())
                local a = {}
                for i=1,9999 do
                    local t = math.floor(math.random() * 3)
                    local x,y
                    if t == 0 then
                        x,y = .3*a[1][2]-.4*a[2][1]+12, 
                .4*a[1][2]+(.2+math.abs(os.time()%99/99-.5))*a[2][1]
                    else
                        x,y = a[1][2]/2, a[2][1]/2+(t-1 and 40 or 10)
                    end
                    if not a[x] then a[x] = {} end
                    a[x][y] = true
                end
                local str = " .':"
                for r=0,18 do 
                    for c=0,79 do 
                        local val = (a[c*2] and (a[c*2][r*2] or 0) or 0) + (a[c*2+1] and 
                (a[c*2+1][r*2+1] or 0) or 0)
                        io.write(string.sub(str,val,val))
                    end 
                    print()
                end 
            
            of course, i've seen worse from interviewees, and understanding the original program is a high (and arguably counterproductive) bar, but perhaps the particular pattern of bonehead errors here could identify this as llm output? maybe the very fact of being so spectacular at understanding the original code is a red flag, at least combined with being so incompetent at writing the lua version?
            • RagnarD 1 year ago
              This underscores the problem with GPT-4 currently. It apparently can't do what any legitimate programmer can do: look at code and simulate operations in their head. Of course programs need to be run on actual hardware with real program support to find bugs, but they have to first be written and confirmed correct as well as possible, mentally. As it stands now they're doing some kind of fuzzy pattern matching that's kinda-sorta plausible and often completely wrong on inspection. And amusingly (?) if you ask it to try again, it'll come up quite different and also completely wrong code.

              To be really good at programming these systems need the equivalent of the "understanding" a human programmer has, not just some abstract fuzzy pattern matching.

              • kragen 1 year ago
                my impression is that it is a hell of a lot better than i am at simulating operations in its head, but not nearly as good as a regular interpreter; it's at least enormously faster at coming up with mostly correct explanations of tricky code
              • kragen 1 year ago
                on fully debugging the above i'm no longer sure most programmers would do worse; by my count i fixed 11 bugs in 21 lines of code. http://canonical.org/~kragen/sw/dev3/wing.lua has all the gruesome details
                • Our_Benefactors 1 year ago
                  > because obviously the people who just lie to get out of any mild annoyance are the ones you least want as coworkers (though obviously people who can effectively use large language models will be incredibly valuable)

                  So people who can use them effectively, but not like that!

                  Do you not see the irony in this statement?

                  • hluska 1 year ago
                    Their verb was ‘lie’. Do you want a lying coworker?
                    • Our_Benefactors 1 year ago
                      Have you stopped beating your wife yet?

                      I don’t see using an LLM as a coding assistant as “lying” anyways.

                • from-nibly 1 year ago
                  My opinion: if it was this easy to make interviews completely pointless then they were completely pointless before AI.

                  Why would a random skill assessment be able to tell if you are going to be a good employee? Your value isn't tied to you knowing a bunch of random facts. We have the vastness of the Internet for that.

                  All AI has done is point out how useless our current interview process is.

                  • shinycode 1 year ago
                    Some companies now ask to do a coding test with screen sharing. It's not a hard test but it's interesting because you can see the reasoning and ask for more details if necessary. Live sessions means the company is interested in humans but it requires potentially lots of wasted ressources to find the right person.
                    • HPsquared 1 year ago
                      I guess it tests the ability to successfully apply ChatGPT. Which is a fairly relevant skill now, I guess?
                      • baal80spam 1 year ago
                        Especially for "prompt engineers" ;-)
                      • jsyang00 1 year ago
                        What's even the point of a lot of these jobs in "the Age of AI"?

                        I don't know. You don't know. Let's hope it all keeps going a little longer so some of us can keep a roof over our heads.