検索

記事
· 2025年3月30日 5m read

Vector Embeddings and Vocal Search. Analysis and innovation notes

This article shares analysis in solution cycle for the Open Exchange application TOOT ( Open Exchange application )

The hypothesis

A button on a web page can capture the users voice. IRIS integration could manipulate the recordings to extract semantic meaning that IRIS vector search can then offer for new types of AI solution opportunity.

The fun semantic meaning chosen was for musical vector search, to build new skills and knowledge along the way.

Looking for simple patterns

The human voice talking, whistling, humming has constraints realized in many historical low bandwidth data encodings and recording media.

How little information is needed to distinguish one musical sound from another as it relates to musical voice input?

Consider the sequence of musical letters:

   A, B, C, E, A

Like in programming with ASCII values of the characters:
* B is one unit higher than A
* C is one unit higher than B
* E is two unit higher than C
* A is four units lower than E
Represented as a numerical progression between two numbers:

 +1, +1, +2, -4

One less information point is needed than the original count of notes.
Logically testing after the first coffee of the morning the humming became:

  B, C, D, F, B

Note that the numerical sequence still matches for this elevated pitch as well.
This demonstrates that a numeric progression of difference in pitch seems more flexible to match user input than actual notes.

Note Duration

Whistling capability has a lower resolution than represented by traditional musical manuscripts.
A decision was made to resolve only TWO musical note duration types:

  • A long note that is 0.7 seconds or longer
  • A short note that is any less than 0.7 seconds.

Note Gaps

The gaps have quite poor resolution so we use only one information point.
A decision was made to define a note gap as:

a pause between two distinct notes of more than half a second.

Note Change Range

A decision was made to limit the maximum recordable transition between one note and another of 45 notes pitch.
Any note change in user input that exceeds this limit is truncated to either +45 or -45, depending on whether the pitch was increasing or decreasing. Unlike actual notes there is no penalty on the usefulness of a continuing tune search sequence.

   +1 0 +2 -4 +2 +1 -3

Can be trained to be semantically close to following change sequence in red.

   +1 1 +2 -4 +2 +1 -3

Single channel Input

A voice or whistle is a simple instrument with only one note at a time.
However this needs to be match against music ordinally composed of:
* Multiple instruments simultaneously
* Instruments that play chords ( several notes at a time )
Generally a whistle tends to follow ONE specific voice or instrument to represent a reference music.
In physical recordings scenarios individual voices and instruments can be recorded in distinct tracks which are later "mixed" together.
Correspondingly encoding/decoding utilities and data formats can also preserve and utilize "tracks" per voice / instrument.

It follows that hum / whistle input should search against the impression of each individual voice and instrument track.

Several "musical" encoding models / language formats were considered.
The simplest and mature option was utilizing MIDI format processing to satisfy determining reference encodings for whistle match and search.

Vocabulary summary

Apart from the usual begin, end and pad token sequences information points

  • 45 Long notes decreasing each with a specific magnitude of change
  • 45 Short notes decreasing each with a specific magnitude of change
  • Repeat same note with short duration
  • Repeat same note with long duration
  • 45 Long notes ascending each with a specific magnitude of change
  • 45 Short notes ascending each with a specific magnitude of change
  • A gap between notes

Synthetic data

IRIS globals are very fast and efficient at identifying combinations and their frequency across a large dataset.

The starting point for synthetic data was valid sequences.

These were modified cumulatively in different ways and scored by deviation:

  • SplitLongNote - One long note becomes two short where the second becomes a repeat
  • JoinLongNote - Two short notes where the second is a repeat, becomes a single long note.
  • VaryOneNote ( +2, +1, -1 or -2 )
  • DropSpace - Remove gap between note
  • AddSpace - Add gap between note

Next these scores are effecively layered in a global for each result.

It means that where another area of note-change-sequence in a track stream is closer to a mutated value, the highest score was always picked.

Dev Workflows

Search Workflow

DataLoad Workflow

Vector Embeddings were generated for multiple instrument tracks (22935 records), for sample tunes (6762 records)

Training Workflow

Two training experiments:

Unsupervised - Max 110,000 records ( 7 hours processing )

Similarity score supervised - Maximum 960,000 records (1.3 days processing)

Further requirements to explore

Better similarity scoring

The current implementation iteration is pruning by scores too hard.

Review cut-off for including low occurance and high occurance sequences in dataset.

Filtering background noise

In midi format the volume of voices or instruments are important in terms of the actual user and background noise. Possibly this gives some opportunity to clean / filter a datasource. Maybe exclude tracks that would never be referenced by human input. Currently the solution excludes "drums" by instrument track by title and by analysis of some progression repetition.

Synthetic midi instruments

The current approach of matching voice to instruments was to trying and side-step a mismatch in instrument characteristics, to see if that was good enough.

A canditate experiment would be add some characteristics back from user input while at the same time pivoting synthetic training data to have more human characteristics.

MIDI encodes additional information with pitch bending to achieve a smoother progression between notes.

Would be an avenue to explore to extending and refining the way WAV to MIDI translation is done.

Finally

Hope this was interesting diversion from the day-today and that you enjoy the app or that it inspires some new ideas..

ディスカッション (0)1
続けるにはログインするか新規登録を行ってください
記事
· 2025年3月30日 5m read

AI-Powered System Management with IRIS Agent

Introduction

As AI-driven automation becomes an essential part of modern information systems, integrating AI capabilities into existing platforms should be seamless and efficient. The IRIS Agent project showcases how generative AI can work effortlessly with InterSystems IRIS, leveraging its powerful interoperability framework—without the need to learn Python or build separate AI workflows from scratch.
This project explores how ChatGPT and Anthropic Claude, two of the most advanced AI models, can interact with IRIS using Custom GPTs and the Model Context Protocol (MCP). Rather than creating isolated AI pipelines, IRIS Agent treats AI as an interoperability service, allowing organizations to enhance automation, decision-making, and data processing without disrupting their existing architecture.
By leveraging IRIS’s built-in interoperability tools, developers can seamlessly integrate AI models just as they would any other system component. This approach ensures stability, security, scalability, and auditing while enabling natural language interaction, real-time data retrieval, and automated system administration—all within the familiar IRIS environment.

Technologies Behind IRIS Agent

The IRIS Agent project leverages a powerful technology stack to ensure efficiency, scalability, and seamless integration:

  • InterSystems IRIS – A robust platform for application development and data integration, using ObjectScript for server-side code.
  • Custom GPT Bot – A tailor-made AI assistant designed to streamline system interactions, based on ChatGPT.
  • Claude AI Desktop – Facilitates communication with the server via the Model Context Protocol (MCP).
  • Node.js – Handles MCP server communications.
  • OpenAPI – Standardized API documentation.
  • Docker – Containerizes the application for simplified deployment and dependency management.

Look inside

This is what our project looks like under the IRIS hood:

The Production consists of the following elements:

  • LanguageModelIn: a service that receives an API request and redirects it to work on the responsible operation.
  • Meds: an operation that searches for medications on a third-party API.
  • Metrics: an operation that searches for OPS information such as logs, errors, and messages in the Iris system.
  • Namespaces: an operation that lists, searches, and modifies namespaces in the Iris system.
  • Users: an operation that lists, searches, creates, and deletes users in the Iris system.

The service's functioning can be seen most clearly in the message viewer and traces. It can help us to track operations and diagnose issues, for example request:

…and response:


One of the main points of this project is that the LanguageModelIn service automatically generates Open API documentation for business operations within IRIS Interoperability production. This API  allows us to easily connect Custom GPTs to ChatGPT and the Model Context Protocol (MCP) server to Claude AI Desktop. 

 

Integration with ChatGPT 

After all installation and setup processes, which you can find in our ReadMe, let's ask our IRIS Agent GPT from OpenAI what its current functionality is:

   

And create a new user…

   


Checking updates...

   

 

Integration with External Data Sources

One of the standout features of the IRIS Agent is its ability to seamlessly query not only data stored within the InterSystems IRIS database but also external data sources. For our example, the agent integrates with the OpenFDA API to provide real-time drug information. This allows users to search for medication details, safety reports, and regulatory compliance data directly within the system. 
In our project, the API provides the ability to search for drugs by name, so let's find out about drugs with the word "flu" in the name.

   

If you want to try it right now, follow the link and start a conversation with our demo IRIS Agent.

Integration with Claude AI Desktop

One of the key features of the IRIS Agent is its ability to interact with Claude AI Desktop, a local AI assistant developed by Anthropic. The integration is powered by the Model Context Protocol (MCP), which facilitates seamless communication between Claude and external applications. To enable this, the IRIS Agent utilizes a dedicated MCP Node.js server that acts as an intermediary, processing incoming queries and routing them between Claude and the IRIS system.
This setup allows users to interact with the IRIS Agent directly through Claude’s interface, issuing natural language commands to retrieve system data, manage configurations, and execute administrative tasks. The MCP server ensures smooth data exchange, maintaining efficiency and security while providing administrators with a powerful AI-driven assistant for system management.

         

Conclusion

The IRIS Agent project showcases the potential of AI-driven automation in modern enterprise environments. By seamlessly integrating InterSystems IRIS, Claude AI, and Custom GPT, it simplifies system management, enhances efficiency, and opens the door to future innovations. Our project also demonstrates that you are already ready for the AI revolution with InterSystems IRIS!


If you enjoyed getting to know IRIS Agent and its features and would like to support us, please take a moment to vote for our app here https://openexchange.intersystems.com/contest/40. Thank you!


Developers of project:

 

Banksia Global is a leading technology consultancy that specializes in helping organizations harness the full potential of modern data management solutions. With a strong focus on the InterSystems IRIS platform, Banksia Global provides strategic guidance, system integration, and custom software development services to clients across various industries. Their expertise lies in delivering high-performance, scalable solutions that enable businesses to drive innovation, streamline operations, and unlock new opportunities. Known for its collaborative agile approach, Banksia Global works closely with clients to ensure that each solution is tailored to meet their specific needs, fostering long-term partnerships and sustainable growth.

8 Comments
ディスカッション (8)3
続けるにはログインするか新規登録を行ってください
質問
· 2025年3月29日

classe de usuarios dos portal administração

Mestres,

 

Sabem me dizer, qual a classe que contem as informações dos usuário do portal?

Preciso coletar algumas informações que são definidas no cadastro para tratar na minha aplicação.

 Exemplo: Comentário e habilitado.  

 

No mais, agradeço.

At. Davidson

2 Comments
ディスカッション (2)1
続けるにはログインするか新規登録を行ってください
質問
· 2025年3月29日

How to convert UTF8 special characters to the actual character

We are receiving the report in text format and it has special characters like ', - like that in the text. Source system is using the UTF8 encoding format hence the text is showing as ' � ' . Is there a way to convert the utf8 to actual character in the DTL.

 

Thank you,

6 Comments
ディスカッション (6)2
続けるにはログインするか新規登録を行ってください
記事
· 2025年3月28日 4m read

UK&I Data Summit in Birmingham - freebies inside!

Hi Community!

I'm very excited to share with you some highlights from yesterday, from UK&I Data Summit in Birmingham! There are exciting announcements, FREE tickets to the Summit and much more, so read on!

First, a bit of background. The first day was dedicated to different workshops, including FHIR Server, Data Studio, and Vector Search. And to end the day on a high but light note, there was a short session about the Developer Ecosystem, which you can actually watch on the Developer Community right now! So check it out, our Head of Developer Relations, @Dean Andrews, was talking about the recent improvements, latest figures, and lots of other interesting stuff.

The second day of the Data Summit was packed with technical and general knowledge sessions. 

First, Colin Henderson welcomed everyone to the event and outlined the agenda for the day.

Then Simon Baxter talked about the influence of AI on the future of technology and workforce, saying that 45% (or something) of workforce will become redundant and be substituted by AI. Everyone in the hall started analyzing if he or she is one of those 45%...

During the last keynote, Scott Gnau talked about the current state of development and strategic initiatives of InterSystems.

If you want to learn more, register for the InterSystems READY 2025 (Global Summit 2025)! On this note, everyone who registered for the UK&I Data Summit is invited to the InterSystems READY for FREE! If you missed this announcement, write a comment below, and we will sort it out for you!

During one of the coffee breaks, we hooked one of the speakers with our official Developer Ecosystem socks! What's more, even people who were sadly working in the office in Windsor that day and weren't attending the Summit, got their moment of glory! So, let me present to you, @Benjamin De Boe and @Anastasia Dyubaylo - km (or rather miles) apart but together in spirit 😂

By the way, you can order the new socks from the Global Masters if they are available in your country - go check!

In any case, no one could miss two pretty ladies manning (womanning?) the Developer Ecosystem table.

The tech stuff started after lunch in earnest.

@Benjamin De Boe and @Tom Woodfin started with the key capabilities and features of the InterSystems Data Platform, emphasising core capabilities, advanced features, and how to leverage the platform to drive growth and efficiencies.

Afterwards, @Benjamin De Boe continued to discuss Vector Search and RAG and how they improve query results, summarizing InterSystems' strategies regarding these technologies.

My favourite slide:

Because you absolutely have to have your priorities straight!

@Kinshuk Rakshith joined @Benjamin De Boe to talk about different implementations of the Vector Search.

The last tech session of the day was dedicated to Data Studio and was presented by @Saurav Gupta

During another coffee break, we chatted with some of our Developer Community members:

@Max Parry and @Irène Mykhailova 

@Tom Woodfin and @Dean Andrews 

@Laurel James (GJS), @Dean Andrews and @Olga Zavrazhnova 

@Irène Mykhailova and @Redmond O'Leary 

Of course, the Developer Ecosystems table was full of goodies and entertainment! For example, if you could throw a ball into the tube, you would get a choice of prizes: socks, luggage tag, shopper, cables, and travel mug.

 

And to finish the summit off, @Gokhan Uluderya invited everyone to the InterSystems READY in Orland and talked about the projects in the pipeline for the InterSystems. Don't forget about free tickets!

John Paladino praised Developer Community, among other projects, especially the Developer Community AI, which managed to answer his tricky question based on non-obvious sources.

Brian McBride delivered a closing keynote, addressing the absolute necessity of a vision of the future for tech companies to survive the passage of time.

Ultimately, Collin Henderson drew the bottom line and closed the Summit, thanking everyone for their participation and the team who organized it for their hard work. 

He also reminded attendees of the UK&I Data Summit that they can get a free ticket to the InterSystems READY and a promo code for a free Certification! So if you missed any of these two opportunities, please drop us a line in the comments, and we will sort it all out for you!

Hope you enjoyed my recital of the event and we hope to see you at the next year's UK&I Summit! 

Bye for now!

3 Comments
ディスカッション (3)3
続けるにはログインするか新規登録を行ってください