Digital authenticity: how to spot AI-generated content

In today’s digital age, we often question whether we can trust images, videos, or texts. Tracing the source of information is becoming more and more difficult. Generative AI accelerates the creation of content at an incredible pace. Images and audio files that once required a skilled artist can now be generated by AI models in a matter of seconds. Models like OpenAI’s Sora can even produce high-quality videos!

This technology offers both opportunities and risks. On the one hand, it speeds up creative processes, but on the other hand, it can be misused for malicious purposes, such as phishing attacks or creating deceptively real deepfake videos. So how can we ensure that the content shared online is genuine?

Digital watermarks: invisible protection for content

Digital watermarks are one solution that helps verify the origin of images, videos, or audio files. These patterns are invisible to the human eye but can be detected by algorithms even after minor changes, like compressing or cropping an image, and are difficult to remove. They are primarily used to protect copyright.

However, applying watermarks to text is way more difficult because text has less redundancy than pixels in images. A related method is to insert small but visible errors into the original content. Google Maps, for instance, uses this method with fictional streets – if these streets appear in a copy, it signals copyright infringement.

Digital signatures: security through cryptography

Digital signatures are based on asymmetric cryptography. This means that the content is signed with a private key that only the creator possesses. Others can verify the authenticity of the content using a public key. Even the smallest alteration to the content invalidates the signature, making it nearly impossible to forge. Digital signatures already ensure transparency in online communication, for example through the HTTPS protocol for secure browsing.

In a world where all digital content would be protected by signatures, the origin and authenticity of any piece of media could be easily verified. For example, you could confirm who took a photo, when, and where. An initiative pushing this forward is the Coalition for Content Provenance and Authenticity (C2PA), which is developing technical standards to apply digital signatures to media and document its origin. Unlike watermarks, signatures are not permanently embedded into the content itself and can be removed without altering the material. In an ideal scenario, everyone would use digital signatures – then, missing signatures would raise doubts about the trustworthiness of the content.

GenAI detectors: AI vs. AI

GenAI detectors provide another way to recognize generated content. AI models are algorithms that leave behind certain patterns, such as specific wording or sentence structures. Other AI models can detect these. Tools like GPTZero can already identify with high accuracy whether a text originates from a generative AI model like ChatGPT or Gemini. While these detectors are not perfect yet, they provide an initial indication.

What does this mean for users?

Of all the options, digital signatures offer the strongest protection because they work across all types of content and are based on cryptographic methods. It will be interesting to see if projects like C2PA can establish trusted standards. Still, different measures may be needed depending on the purpose of ensuring the trustworthiness of digital content.
In addition to technological solutions, critical thinking remains one of the best tools for navigating the information age. The amount of available information is constantly growing; therefore, it is important to critically question, verify, and be aware of the capabilities of generative AI models.

For a more comprehensive article, check out the CONTACT Research Blog.

Are data science platforms a good idea?

According to Karl Valentin: Platforms are beautiful and take a lot of work off your neck. The idea of platforms for automatic data analysis comes at just the right time. In line with this, Gartner has now published a “Magic Quadrant for Data Science and Machine Learning Platforms”. The document itself can only be viewed behind a paywall, but on the net some of the companies mentioned in the report offer access to the document by entering the address.

Gartner particularly emphasizes that such a platform should provide everything you need from a single source, unlike various individual components that are not directly coordinated with each other.

Sounds good to me! However, data science is not an area where you can magically get ahead with a tool or even a platform. The development of solutions – for example, for predictive maintenance of the machines offered by a company – goes through various phases, with cleaning/wrangling and preprocessing accounting for most of the work. In this area, ETL (Extract, Transform, Load) and visualization tools such as Tableau can be ranked. And beyond the imaginary comfort zone of platforms that managers imagine, database queries and scripts for transformation and aggregation in Python or R are simply the means of choice. A look at data science online tutorials from top providers like Coursera underlines the importance of these – well – down-to-earth tools. “Statistical analysis, Python programming with NumPy, pandas, matplotlib, and Seaborn, Advanced statistical analysis, Tableau, machine learning with stats models and scikit-learn, deep learning with TensorFlow” is one of Udemy’s course programs.

In addition, the projects often get stuck in this preliminary stage or are cancelled. There are many reasons for this:

  • no analytical/statistical approach can be found
  • the original idea proves to be unfeasible
  • the data is not available in the quantity or quality you need
  • simple analyses and visualizations are enough and everything else would be “oversized”.

This is no big deal, as it only means that the automated use of Machine Learning and AI does not make a data treasure out of every data set. If, however, the productive benefit becomes apparent, it is necessary to prepare for the production pipeline and time or resource constraints. Usually you start from scratch and reproduce everything again, e.g. in Tensorflow for neural networks or in custom libraries.

The misunderstanding is that a) Data Science can be driven up to productive use without a trace and b) a one-stop-shop for Data Science (here “platform”) is needed that does everything in one go. That will never happen.

This is really good news, because it means that organizations can achieve their first goals without having to resort to large platforms. The reasonably careful selection of suitable tools (many of them open source) helps to achieve this.



Die digitale Transformation transformiert auch PLM

Die Unternehmen der Automobilindustrie beschäftigen sich intensiv mit der digitalen Transformation, und dabei geht es nicht mehr nur um Industrie 4.0 und die intelligente Vernetzung der Fertigung, sondern um die Umgestaltung der Unternehmensprozesse und Geschäftsmodelle. Wie intensiv, das wurde auf dem diesjährigen ProSTEP iViP-Symposium in Stuttgart deutlich, an den über 660 Besucher aus 19 Ländern teilnahmen. Ein neuer Rekord, und sicher nicht der letzte. Viele Vorträge spiegelten die Sorge wider, dass disruptive Technologien wie das Internet of Things (IoT) zur Verdrängung der alten Platzhirsche durch neue Herausforderer führen könnten. Der Erfolg von Tesla hat die Branche aufgeschreckt und der Name Nokia steht wie ein Menetekel an der Wand. Continue reading “Die digitale Transformation transformiert auch PLM”