Building a Semantic Search: Insights from the start of our journey

Research in the field of Artificial Intelligence (AI) is challenging but full of potential – especially for a new team. When CONTACT Research was formed in 2022, AI was designated as one of four central research areas right from the start. Initially, we concentrated on smaller projects, including traditional data analysis. However, with the growing popularity of ChatGPT, we shifted our attention to Large Language Models (LLMs) and took the opportunity to work with cutting-edge tools and technologies in this promising field. But as a research team, one critical question emerged: Where do we get started?

Here, we share some of our experiences which can serve as guidance to others embarking on their AI journey.

The beginning: Why similarity search became our starting point

From the outset, our goal was clear: we wanted more than just a research project, we aimed for a real use case that could ideally be integrated directly into our software. To get started quickly, we opted for small experiments and looked for a specific problem that we could solve step by step.

Our software stores vast amounts of data, from product information to project details. Powerful search capabilities make a decisive difference here. Our existing search function did not recognize synonyms or natural language, sometimes missing what users were really looking for. Together with valuable feedback, this quickly led to the conclusion that similarity search is an ideal starting point and should therefore be our first research topic. An LLM has the power to elevate our search functionality to a new level.

The right data makes the difference

Our vision was to make knowledge from various sources such as manuals, tutorials, and specifications easily accessible by asking a simple question. The first and most crucial step was to identify an appropriate data source: one large enough to provide meaningful results but not so extensive that resource constraints would impede progress. In addition, the dataset needed to be of high quality and easily available.

For the experiment, we chose the web-based documentation of our software. It contains no confidential information and is accessible to customers and partners. Initial experiments with this dataset quickly delivered promising results, so we intensified the development of a semantic search application.

What is semantic search?

In short, unlike the classic keyword search, semantic search also recognizes related terms and expands queries to include contextually-related results – even if these are phrased differently. How does this work? In our first step with semantic indexing, the LLM converts the content of source texts into vectors and saves them in a database. Search queries are similarly transformed into vectors, which are then compared to stored vectors using a “nearest neighbor” search. The LLM returns the results as a sorted list with links to the documentation.

Plan your infrastructure carefully!

Implementing our project required numerous technical and strategic decisions. For the pipeline that processes the data, LangChain best met our requirements. The hardware also poses challenges: for text volumes of this scale, laptops are insufficient, so servers or cloud infrastructure are required. A well-structured database is another critical factor for successful implementation.

Success through teamwork: Focusing on data, scope, and vision

Success in AI projects depends on more than just technology, it is also about the team. Essential roles include Data Engineers who bridge technical expertise and strategic goals, Data Scientists who analyze large amounts of data, and AI Architects who define the vision for AI usage and coordinate the team. While AI tools supported us with “simple” routine tasks and creative impulses, they could not replace the constructive exchange and close collaboration within the team.

Gather feedback and improve

At the end of this first phase, we shared an internal beta version of the Semantic Search with our colleagues. This allowed us to gather valuable feedback in order to plan our next steps. The enthusiasm for further development is high, fueling our motivation to continue.

What’s next?

Our journey in AI research has only just begun, but we have already identified important milestones. Many exciting questions lie ahead: Which model will best suit our long-term needs? How do we make the results accessible to users?

Our team continues to grow – in expertise, members, and visions. Each milestone brings us closer to our goal: integrating the full potential of AI into our work.

For detailed insights into the founding of our AI team and on the Semantic Search, visit the CONTACT Research Blog.

Asset Administration Shell in practice

What is an Asset Administration Shell?

Industry 4.0 promises more efficient and sustainable manufacturing processes through digitalization. The foundation for this is a seamless, automatic exchange of information between systems and products. This is where the Asset Administration Shell (AAS) comes into play.

An Asset Administration Shell is a vendor-independent standard for describing digital twins. Basically, it is the digital representation of an asset; either a physical product or a virtual object (e.g., documents or software).

The AAS defines the appearance of the asset in the digital world. It describes which information of a device is relevant for communication and how this information is presented. This means the AAS can provide all important data about the asset in a standardized and automated way.

Let us take a look at a practical application to understand the benefits of an AAS:

Use case: AAS as enabler for new services

As part of the ESCOM research project, CONTACT Software collaborates with GMN Paul Müller Industrie GmbH & Co. KG to implement AAS-based component services. The family-run company manufactures motor spindles which are installed by its customers as components in metalworking machine tools and then resold.

Before the project began, GMN had already developed a new sensor technology. It enables deep insights into the behavior of a spindle and provides information on overall operation of the spindle system. The company wants to use this data to offer new, product-related services:

  • Certified commissioning: Before GMN ships its spindles, the components are put through a defined test cycle on the company’s in-house test bench. GMN uses the data from this reference cycle to ensure that motor spindles are installed and commissioned correctly at the customer’s facility.
  • Predictive services: Using the IDEA-4S sensor microelectronics, customers shall be able to continuously record and analyze operating data that provide insights into the availability and operation of the spindles. If necessary, the data can be shared with GMN, for example, for problem analysis. This saves valuable time until the machine is back up and running. In the future, GMN will be able to offer smart predictive services like predictive maintenance.

About GMN Paul Müller Industrie GmbH

GMN Paul Müller Industrie GmbH & Co. KG is a family-owned mechanical engineering company based in Nuremberg, Germany. It produces high-precision ball bearings, machine spindles, freewheel clutches, non-contact seals, and electric drives that are used in various industries. The company manufactures most of these components individually for its customers on site and sells its products via a global sales network.

How do we realize the new services?

To provide such services, companies must be able to access and analyze the sensor data of their machines. Furthermore, machines (or their components) must be enabled to communicate independently with other assets and systems on the shopfloor.

For both tasks, GMN uses CONTACT Elements for IoT. The modular software not only helps the company to record, document and evaluate the reference and usage data of their spindles. It also includes functions that enable users to create, fill and manage the AAS for an asset.

Background

During the implementation of services based on spindle operating data, GMN benefits from the cooperation with a customer. This company installs the spindles in processing machines that GMN uses to manufacture its own products. As a result, GMN can gather the operating data in-house and use it to improve the next generation of spindles.

What role does the AAS play?

For the components to exchange information in a standardized form, an AAS must be created for the spindle at item and serial number level. This is also done using CONTACT Elements for IoT. The new services are mapped in a so-called AAS metamodel. It serves as a “link” to the service offers.

AAS and submodels

The AAS of an Industry 4.0 component consists of one or more submodels that each contain a structured set of characteristics. These submodels are defined by the Industrial Digital Twin Association (IDTA), an initiative in which 113 organizations from research, industry and software (including CONTACT Software) collaborate to define AAS standards. A list of all currently published submodels is available at https://industrialdigitaltwin.org/en/content-hub/submodels.

In CONTACT Elements for IoT, GMN can populate the AAS submodels with little effort. The platform includes a widget developed as a prototype during the research project. It provides an overview of which submodels currently exist for the asset and which are available but not yet created. Through the frontend, users can jump directly to the REST node server and upload or download submodels (in AAS/JSON format).

During the implementation of data-driven service offerings, GMN focuses on the submodels

  • Time Series Data (e.g., semantic information about time series data)
  • Digital Nameplate (e. g., information about the product, the manufacturer’s name, as well as product name and family),
  • Contact Information (standardized metadata of an asset) and
  • Carbon Footprint (information about the carbon footprint of an asset)

Filling the submodels is simple. This is demonstrated by the module Time Series Data. During the reference run of a motor spindle on the in-house test bench, the time series data is recorded by CONTACT Elements for IoT. The platform automatically transfers this data to the AAS submodel of the motor spindle being tested. At the same time, the platform creates a document for the reference run. This allows GMN to track its validity at any time and make it available to external stakeholders.

New services on the horizon

Using Asset Administration Shells allows GMN to realize its service ideas. This currently concerns the commissioning service and automated quality assurance services.

By analyzing the spindle data, the company can identify outliers in the operating data and make suitable recommendations for action. For example, different vibration velocities indicate an incorrect installation of the spindle in the machine or that time-varying processes are occurring. The analysis can also be used to provide insights about anomalies in operating behavior.

Dashboards in CONTACT Elements for IoT increase transparency. They provide GMN with all relevant information about the spindles on the test bench, from 3D models to status data. This overview is extremely valuable, particularly for quality management.

An AAS in our software Elements for IoT.

Summarized

Asset Administration Shells are vendor-independent standards for describing digital twins. They are among the most important levers for implementing new Industry 4.0 business models, as they enable communication between assets, systems, and organizations. The example of GMN demonstrates the practical benefits of the AAS. The company uses it to design new, product-related services based on information from the AAS of its products. GMN can successively improve these services by continuously analyzing operating data in CONTACT Elements for IoT.

How will the Data Act affect the industry?

Successful digital transformation requires access to data and its intelligent use. The EU has therefore defined a regulation that is intended to strengthen the European data market: the Data Act. Companies from traditional industries must adapt to it as soon as possible.

What is the Data Act?

The “Regulation on harmonised rules on fair access to and use of data” (Data Act) is a directive of the European Union that defines regulations regarding data access and use. It aims to create a fair, transparent framework for the exchange and use of data within the EU, thereby promoting innovation and increasing the competitiveness of European companies on the global data market.

The Data Act is a key component of the EU’s digital strategy. It was approved by the European Council on November 27, 2023 and came into force on January 11, 2024. Following a 20-month transition period, it is to be converted into directly applicable EU-wide law from September 12, 2025.

What is the motivation?

Data is a key resource in the digital economy. However, due to a lack of guidelines, legal requirements, and standards, a large part of the data generated remains unused, especially in industry.

Furthermore, we are currently observing a strong imbalance on the market: data is mostly owned by a small group of large companies. Compared to SMEs and start-ups, this gives them a considerable competitive advantage, which is reflected, for example, in one-sided contracts regarding data access and use.

To counteract this, the EU has developed the Data Act. It aims to democratize the market and create a balanced, fair data ecosystem. To this end, the EU has defined a legal framework ensuring that users of networked products or connected services can promptly access the generated data.

The objectives of the Data Act in a nutshell:

  • Clear rules for the use and exchange of data
  • Transparency and fairness within the data market
  • Protection of personal data
  • Secure data processing
  • Promotion of data-driven innovations
  • Increased competitiveness of EU companies

Who is affected by the Data Act?

The Data Act addresses companies, organizations, and individuals who

  • bring connected products to the market,
  • offer connected services,
  • as a data owner, share generated data with third parties,
  • receive data from data owners,
  • as a public institution, request data owners to share data, or
  • offer data processing services.

Persons who participate in data rooms and providers of applications that include smart contracts are also affected. Persons whose trade, business, or profession involves the implementation of smart contracts for others in connection with the execution of an agreement must also comply with the Data Act.

Which tasks result from the Data Act?

The Data Act imposes numerous new obligations on the industry. These include:

Making data accessible: Providers must ensure that users of connected devices or connected services have access to the data they generate.

Ensuring portability: The Data Act demands mechanisms that enable users to easily and securely transfer their data to third parties. This includes the development of standards and interfaces for data exchange.

Ensuring transparency and fairness: Companies must be transparent about what data they collect, how they use it, and who has access to it.

Ensuring data protection: The processing and disclosure of data must comply with applicable data protection laws (e.g., the GDPR).

Enabling cooperation with authorities: In many cases, it is necessary to pass on data to public institutions. This requires clear processes and responsibilities.

Data Act vs. Data Governance Act

The Data Act is not the only pillar of the European data strategy. It also includes the Data Governance Act (DGA), an existing regulation that defines processes and structures for the exchange of data between individuals, companies, and public institutions. In contrast, the Data Act focuses more on promoting the digital economy. It regulates which players are allowed to use the generated data under which conditions.

What are the consequences of violating the Data Act?

Unfortunately, it is not yet possible to predict how these aspects will be structured in detail. The EU regulation has not yet been transposed into German law. It therefore remains to be seen what obligations will arise in Germany and which supervisory authorities will oversee implementation.

However, one thing is clear: violations of the Data Act will result in fines, similar to the GDPR. There is also a risk that companies will be sued for damages by other market players if they fail to meet the requirements. Furthermore, it is possible that products and services that do not comply with the Data Act may no longer be sold in the EU.

Does the Data Act only create new duties?

The EU regulation does not only entail obligations. It opens up many new opportunities for SMEs in particular. If data is available to all market players in interoperable formats, this facilitates the implementation of innovative, data-based services, such as predictive maintenance.

This is precisely what the democratization of the data market aims to achieve. It gives companies more control over the way they handle their data and creates rules that facilitate data transfer. Both data owners and users will benefit from this.

Processes that are complex and time-consuming today will be accelerated. For example, the regulation provides clear rules for contract management. Cloud or edge providers, for instance, must contractually and technologically ensure that customers can transfer their data as easily as possible when they switch systems.

The industry will also benefit from increasing competition. For example, machine manufacturers who want to enable their products for the Internet of Things can currently only turn to a few providers for this purpose. The Data Act opens up this restricted circle. This not only increases the quality of products and services but also leads to lower prices.

According to a representative survey by the digital association Bitkom, Germany’s economy is currently divided on the Data Act. 49 percent of the 603 companies surveyed across all economic sectors see the new EU regulation as an opportunity for their business. On the other hand, 40 percent of respondents consider the Data Act to be a risk.

What is the best approach for companies?

Companies dealing with the Data Act quickly come up against complex issues: How do they ensure that the data interfaces of their machines, systems, and products are accessible to third parties? What impact does the sharing of data have on their business model? What opportunities does this present (e.g., new services and offers)?

Many of these questions are currently still unclear, making it difficult to prepare for the EU regulation. However, it is advisable to put the topic on the strategic agenda and seek an exchange with associations and other companies. This dialog helps assess the impact of the Data Act on your business.

Summary

With the Data Act, the EU wants to equip the European data market for international competition. The regulation promotes a secure, efficient flow of data and creates a framework that facilitates data exchange and use. This results in new business obligations, but also fairer market conditions.

How the Data Act will be implemented in Germany remains to be seen. Manufacturing companies should nevertheless get to grips with the contents as soon as possible. It is a complex set of rules that influences topics ranging from technological infrastructure to processes and contract design. Companies affected must adequately prepare themselves.

Further information

Handling data is becoming increasingly important for a company’s success. A reliable security architecture is essential, especially for cloud users. In our guide “IT security for companies”, you can read about the requirements for this and the factors you should consider when selecting software providers.