So how do developers teach AI to find unsafe-for-work characters? It starts with the acquisition of vast labeled datasets. These datasets are huge, they contain millions of example texts both explicit and non-explicit for the AI to learn patterns and classifications As of 2022, the average dataset used to train NSFW AI consists of 5 million images and 3 million text samples, which gives it a high level accuracy in detecting offensive content. Once the datasets are ingested, machine learning algorithms begin an analysis of each piece of new content as it arrives, improving detection over time.
The deep learning models, like Convolutional Neural Networks (CNNs), are very good to analyze images and this is what developers use. These networks take the data through numerous layers before they extract deeper features so the AI can tell acceptable content from not appropriate one. For instance, Google trains its AI models on CNNs and natural-language-processing (NLP) tools to analyze half a month of video per minute uploaded to YouTube.
Finally, the AI is put to the test and refined through evolution. Developers usually handle making the model more precise as well as increasing recall, by tweaking model parameters. A 2021 study showed that performing more training epochs (in this case, going from 10 to 50 per cycle through the dataset) improved accuracy almost as much as reducing the noise by a factor of five. It refines itself endlessly, which in turn leads to the AI becoming more precise.
Adversarial Learning lets developers deploy two AI models that learn from each other: one generates content for NSFW to score, and the other learns how better to recognise it as exactly such. This step allows the AI system to be more error resistant which increases its robustness as a whole. This process is performed as adversarial learning by Facebooks AI systems (across over 100 million posts per day) and ultimately ensures that the models are fine-tuned for real-world situations.
It is computationally extremely expensive and resource consuming to train a NSFW AI, that too in terms of cost. Training a strong NSFW AI model can run as high as $500,000 or more given the size of the project. Developing these AI-based solutions involves a non-trivial cost but when trained they can save enormous amounts for content moderation, leading platforms like Reddit to report 30% reduction in human-moderation costs through their AI content filtration.
AI more general: SFW character AI is proving capable of adapting to meet the needs of content moderation from users on social platforms (basically, what Bill Gates said way back when — AI will change the way we do all things). Although training AI to grasp the nuance and context of types of content is an ongoing mission, machine learning (in general) is advancing that will in time make these systems more effective.
In case you want to read more on how these systems operate and develop, check out nsfw character ai for an in-depth discussion.