(Translated by https://www.hiragana.jp/)
martin-boundary - Slashdot User

Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×

Comment Re:So is "AI" supposed to solve again? (Score 1) 16

Crypto, like any currency, represents time savings too. For example, instead of you planting some wheat, harvesting the wheat, milling it into flour, milking a cow and making butter, chopping your own firewood etc. you simply get to go to the bakery and pay for a croissant. That single transaction is saving you a lot of time.

Comment Re:Ouroboros (Score 1) 53

Yes, the "intelligence" is in the datasets, the output is merely interpolated data in a higher dimensional function space. When the datasets are depleted or sections are carved out, then AI models will interpolate random slop to paper over the holes. Here's an example of the kind of things that can be expected, mutatis mutandis.

Comment Re:Shocking... (Score 1) 53

AI companies are copyright thieves. They copy and train on random documents from the web which they have no explicit right to access or copy. But most documents on the web, images and text, are not public domain. And a very large proportion of these media are not even legally published on the servers in the first place, but actually leaked illegally or copied and pasted illegally from one site to another.

It is usually not cost effective to go after small time copyright thieves (see RIAA), but AI companies are flush with cash, so the payout (or settlement) could be substantial.

That is the phase 1 battle.

In phase 2, the AI companies train and copy directly from the interactions with the users. This data is lower quality, and theoretically would belong to the AI companies to do as they please. Except it can still be poisoned by users with a bit of cleverness: 1) if they deliberately create gibberish or wordsalad sessions, perhaps with the help of another AI, thereby reducing the value of the data, and 2) if they copy and paste copyright restricted documents illegally into the session, forcing the AI companies to filter the data or be liable for copyright infringement if it is found out.

In phase 3, the AI companies pay for all the content that they consume for training etc. The content is explicitly licensed for only particular tasks and number of users or "seats", and correct usage is periodically verified. This is the norm in the business world already, so the AI companies' biggest costs will be content licensing, insurance and hardware/power consumption. There is no reason to suppose that licensing will not become the main component of the cost after insurance as advances occur.

Comment Re:Majority of Japanese Companies have plans? (Score 1) 56

It is well known that current AI language technology works "best" in English. Intuitively, you can understand that statement as an acknowledgement that training/copy materials are predominantly in English, or as an acknowledgement that language features in Japanese are less simple than the Latin conventional stream of words, or as a statement about market size, priorities and allocated resources, or as an observation about the cultural backgrounds of the LLM developers.

In any case, the current LLM craze is not as impactful in the rest of the world as it may seem to residents of eg California.

Comment Re:Alternative solutions (Score 2) 38

They are trying to encode the contents of the Excel spreadsheet into a form that can be fed to their LLM as context, so that the LLM can complete it for you like a good chatbot.

The obvious table encoding (think CSV or similar) is too confusing for the AI, so they have written a bunch of tools that takes a basic spreadsheet and annotates (explains) its structure using heuristic rules, so that the AI can pick up on the summarized structure as if a user had explained it. Then the LLM can try to complete it for the user.

They call the annotated spreadsheet a "compressed" version, because one of the things that they do is to remove empty cells which only confuse the AI and cause a lot of memory and computation to be wasted for noninformative features (prompt cowboys, there's your jailbreak!). They also render the result in JSON with copious hints for the AI.

Current LLMs use a transformer architecture, which effectively looks at all pairs of tokens in the input to try to see if they are related and should trigger a subnetwork to do something. You can see that if you have a large table with a lot of empty cells, then looking at all pairs of empty cells can quickly lead to a lot of wasted effort, that's why they sat down with some programmers to create heuristics that can weed out such problem spots for the AI.

Comment Re:Software legacies (Score 1) 21

Wayland was part of a concerted effort to make Linux look and act like Windows desktops, aka "the year of the Linux desktop". It was believed that once Linux offered all the options that Windows users had grown accustomed to, there would be a mass exodus from the commercial world to the OSS world. Wayland was designed to be particularly attractive to PC gamers, who used to complain about framerates and resolution issues and mouse and keyboard input standards and that their games didn't work in OpenGL, just DirectX etc. This target demographic was used as justification for leaving out anything that wasn't of concern to them.

Likewise, OpenOffice was built to attract the Microsoft Office crowd. And pulseAudio, systemD were built to make the Linux system more desktop like, to cater for expectations from people who are used to a Windows laptop.

Meanwhile the "desktop" of most people on Earth is now a phone in their pocket which runs a web browser, social billboard apps, dating apps and small games which don't require great perfomance. These phones run proprietary OSes, and do things complely differently from Linux/systemD. Microsoft too is leaving the desktop aside for a natural language interface. The serious computing is done, as it always was, on headless servers. A lot of it in Python, it seems.

Slashdot Top Deals

Civilization, as we know it, will end sometime this evening. See SYSNOTE tomorrow for more information.

Working...