序
本文主要研究一下Spring AI的Multimodality
示例
chatModel示例
var imageResource = new ClassPathResource("/multimodal.test.png");
var userMessage = new UserMessage(
"Explain what do you see in this picture?", // content
new Media(MimeTypeUtils.IMAGE_PNG, this.imageResource)); // media
ChatResponse response = chatModel.call(new Prompt(this.userMessage));
chatClient示例
String response = ChatClient.create(chatModel).prompt()
.user(u -> u.text("Explain what do you see on this picture?")
.media(MimeTypeUtils.IMAGE_PNG, new ClassPathResource("/multimodal.test.png")))
.call()
.content();
目前是如下几种模型支持多模态
- Anthropic Claude 3
- AWS Bedrock Converse
- Azure Open AI (e.g. GPT-4o models)
- Mistral AI (e.g. Mistral Pixtral models)
- Ollama (e.g. LLaVA, BakLLaVA, Llama3.2 models)
- OpenAI (e.g. GPT-4 and GPT-4o models)
- Vertex AI Gemini (e.g. gemini-1.5-pro-001, gemini-1.5-flash-001 models)
源码
UserMessage
org/springframework/ai/chat/messages/UserMessage.java
public class UserMessage extends AbstractMessage implements MediaContent {
protected final List<Media> media;
public UserMessage(String textContent) {
this(MessageType.USER, textContent, new ArrayList<>(), Map.of());
}
public UserMessage(Resource resource) {
super(MessageType.USER, resource, Map.of());
this.media = new ArrayList<>();
}
public UserMessage(String textContent, List<Media> media) {
this(MessageType.USER, textContent, media, Map.of());
}
public UserMessage(String textContent, Media... media) {
this(textContent, Arrays.asList(media));
}
public UserMessage(String textContent, Collection<Media> mediaList, Map<String, Object> metadata) {
this(MessageType.USER, textContent, mediaList, metadata);
}
public UserMessage(MessageType messageType, String textContent, Collection<Media> media,
Map<String, Object> metadata) {
super(messageType, textContent, metadata);
Assert.notNull(media, "media data must not be null");
this.media = new ArrayList<>(media);
}
@Override
public String toString() {
return "UserMessage{" + "content='" + getText() + '\'' + ", properties=" + this.metadata + ", messageType="
+ this.messageType + '}';
}
@Override
public List<Media> getMedia() {
return this.media;
}
@Override
public String getText() {
return this.textContent;
}
}
UserMessage实现了MediaContent的getMedia方法
Media
org/springframework/ai/model/Media.java
public class Media {
private static final String NAME_PREFIX = "media-";
/**
* An Id of the media object, usually defined when the model returns a reference to
* media it has been passed.
*/
@Nullable
private String id;
private final MimeType mimeType;
private final Object data;
/**
* The name of the media object that can be referenced by the AI model.
* <p>
* Important security note: This field is vulnerable to prompt injections, as the
* model might inadvertently interpret it as instructions. It is recommended to
* specify neutral names.
*
* <p>
* The name must only contain:
* <ul>
* <li>Alphanumeric characters
* <li>Whitespace characters (no more than one in a row)
* <li>Hyphens
* <li>Parentheses
* <li>Square brackets
* </ul>
*/
private String name;
//......
}
Media定义了id、mimeType、data、name属性
Format
public static class Format {
// -----------------
// Document formats
// -----------------
/**
* Public constant mime type for {@code application/pdf}.
*/
public static final MimeType DOC_PDF = MimeType.valueOf("application/pdf");
/**
* Public constant mime type for {@code text/csv}.
*/
public static final MimeType DOC_CSV = MimeType.valueOf("text/csv");
/**
* Public constant mime type for {@code application/msword}.
*/
public static final MimeType DOC_DOC = MimeType.valueOf("application/msword");
/**
* Public constant mime type for
* {@code application/vnd.openxmlformats-officedocument.wordprocessingml.document}.
*/
public static final MimeType DOC_DOCX = MimeType
.valueOf("application/vnd.openxmlformats-officedocument.wordprocessingml.document");
/**
* Public constant mime type for {@code application/vnd.ms-excel}.
*/
public static final MimeType DOC_XLS = MimeType.valueOf("application/vnd.ms-excel");
/**
* Public constant mime type for
* {@code application/vnd.openxmlformats-officedocument.spreadsheetml.sheet}.
*/
public static final MimeType DOC_XLSX = MimeType
.valueOf("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
/**
* Public constant mime type for {@code text/html}.
*/
public static final MimeType DOC_HTML = MimeType.valueOf("text/html");
/**
* Public constant mime type for {@code text/plain}.
*/
public static final MimeType DOC_TXT = MimeType.valueOf("text/plain");
/**
* Public constant mime type for {@code text/markdown}.
*/
public static final MimeType DOC_MD = MimeType.valueOf("text/markdown");
// -----------------
// Video Formats
// -----------------
/**
* Public constant mime type for {@code video/x-matros}.
*/
public static final MimeType VIDEO_MKV = MimeType.valueOf("video/x-matros");
/**
* Public constant mime type for {@code video/quicktime}.
*/
public static final MimeType VIDEO_MOV = MimeType.valueOf("video/quicktime");
/**
* Public constant mime type for {@code video/mp4}.
*/
public static final MimeType VIDEO_MP4 = MimeType.valueOf("video/mp4");
/**
* Public constant mime type for {@code video/webm}.
*/
public static final MimeType VIDEO_WEBM = MimeType.valueOf("video/webm");
/**
* Public constant mime type for {@code video/x-flv}.
*/
public static final MimeType VIDEO_FLV = MimeType.valueOf("video/x-flv");
/**
* Public constant mime type for {@code video/mpeg}.
*/
public static final MimeType VIDEO_MPEG = MimeType.valueOf("video/mpeg");
/**
* Public constant mime type for {@code video/mpeg}.
*/
public static final MimeType VIDEO_MPG = MimeType.valueOf("video/mpeg");
/**
* Public constant mime type for {@code video/x-ms-wmv}.
*/
public static final MimeType VIDEO_WMV = MimeType.valueOf("video/x-ms-wmv");
/**
* Public constant mime type for {@code video/3gpp}.
*/
public static final MimeType VIDEO_THREE_GP = MimeType.valueOf("video/3gpp");
// -----------------
// Image Formats
// -----------------
/**
* Public constant mime type for {@code image/png}.
*/
public static final MimeType IMAGE_PNG = MimeType.valueOf("image/png");
/**
* Public constant mime type for {@code image/jpeg}.
*/
public static final MimeType IMAGE_JPEG = MimeType.valueOf("image/jpeg");
/**
* Public constant mime type for {@code image/gif}.
*/
public static final MimeType IMAGE_GIF = MimeType.valueOf("image/gif");
/**
* Public constant mime type for {@code image/webp}.
*/
public static final MimeType IMAGE_WEBP = MimeType.valueOf("image/webp");
}
Format定义了常用的几种MimeType
小结
Spring AI设计了各种message类型用于支持多模态,其中UserMessage有个media属性,类型List<Media>
,支持传入图像、音频、视频,MimeType用于指定是哪种类型。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。