• tinwhiskers@kbin.social
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    1 year ago

    Well of course, putting it on the open internet is very intentionally making it available for everyone to see. If you don’t want everyone to see it, don’t put it on the open internet. The issue is what people do with it, not whether they can access it. Copyright forbids distributing copyrighted data. The entire point of that it is so that you can make it available to be seen but protected from people copying it. However, there is no distribution or storage of copyrighted material with an LLM - there is no copy. I think OpenAI will be OK, but these things are never certain when the big lawyers are let loose.

    Distributing the training dataset, though, that could well be a problem.