本文主要研究一下Spring AI的Multimodality

示例

chatModel示例

var imageResource = new ClassPathResource("/multimodal.test.png");

var userMessage = new UserMessage(
    "Explain what do you see in this picture?", // content
    new Media(MimeTypeUtils.IMAGE_PNG, this.imageResource)); // media

ChatResponse response = chatModel.call(new Prompt(this.userMessage));

chatClient示例

String response = ChatClient.create(chatModel).prompt()
        .user(u -> u.text("Explain what do you see on this picture?")
                    .media(MimeTypeUtils.IMAGE_PNG, new ClassPathResource("/multimodal.test.png")))
        .call()
        .content();

目前是如下几种模型支持多模态

  • Anthropic Claude 3
  • AWS Bedrock Converse
  • Azure Open AI (e.g. GPT-4o models)
  • Mistral AI (e.g. Mistral Pixtral models)
  • Ollama (e.g. LLaVA, BakLLaVA, Llama3.2 models)
  • OpenAI (e.g. GPT-4 and GPT-4o models)
  • Vertex AI Gemini (e.g. gemini-1.5-pro-001, gemini-1.5-flash-001 models)

源码

UserMessage

org/springframework/ai/chat/messages/UserMessage.java

public class UserMessage extends AbstractMessage implements MediaContent {

    protected final List<Media> media;

    public UserMessage(String textContent) {
        this(MessageType.USER, textContent, new ArrayList<>(), Map.of());
    }

    public UserMessage(Resource resource) {
        super(MessageType.USER, resource, Map.of());
        this.media = new ArrayList<>();
    }

    public UserMessage(String textContent, List<Media> media) {
        this(MessageType.USER, textContent, media, Map.of());
    }

    public UserMessage(String textContent, Media... media) {
        this(textContent, Arrays.asList(media));
    }

    public UserMessage(String textContent, Collection<Media> mediaList, Map<String, Object> metadata) {
        this(MessageType.USER, textContent, mediaList, metadata);
    }

    public UserMessage(MessageType messageType, String textContent, Collection<Media> media,
            Map<String, Object> metadata) {
        super(messageType, textContent, metadata);
        Assert.notNull(media, "media data must not be null");
        this.media = new ArrayList<>(media);
    }

    @Override
    public String toString() {
        return "UserMessage{" + "content='" + getText() + '\'' + ", properties=" + this.metadata + ", messageType="
                + this.messageType + '}';
    }

    @Override
    public List<Media> getMedia() {
        return this.media;
    }

    @Override
    public String getText() {
        return this.textContent;
    }

}
UserMessage实现了MediaContent的getMedia方法

Media

org/springframework/ai/model/Media.java

public class Media {

    private static final String NAME_PREFIX = "media-";

    /**
     * An Id of the media object, usually defined when the model returns a reference to
     * media it has been passed.
     */
    @Nullable
    private String id;

    private final MimeType mimeType;

    private final Object data;

    /**
     * The name of the media object that can be referenced by the AI model.
     * <p>
     * Important security note: This field is vulnerable to prompt injections, as the
     * model might inadvertently interpret it as instructions. It is recommended to
     * specify neutral names.
     *
     * <p>
     * The name must only contain:
     * <ul>
     * <li>Alphanumeric characters
     * <li>Whitespace characters (no more than one in a row)
     * <li>Hyphens
     * <li>Parentheses
     * <li>Square brackets
     * </ul>
     */
    private String name;

    //......
}    
Media定义了id、mimeType、data、name属性

Format

    public static class Format {

        // -----------------
        // Document formats
        // -----------------
        /**
         * Public constant mime type for {@code application/pdf}.
         */
        public static final MimeType DOC_PDF = MimeType.valueOf("application/pdf");

        /**
         * Public constant mime type for {@code text/csv}.
         */
        public static final MimeType DOC_CSV = MimeType.valueOf("text/csv");

        /**
         * Public constant mime type for {@code application/msword}.
         */
        public static final MimeType DOC_DOC = MimeType.valueOf("application/msword");

        /**
         * Public constant mime type for
         * {@code application/vnd.openxmlformats-officedocument.wordprocessingml.document}.
         */
        public static final MimeType DOC_DOCX = MimeType
            .valueOf("application/vnd.openxmlformats-officedocument.wordprocessingml.document");

        /**
         * Public constant mime type for {@code application/vnd.ms-excel}.
         */
        public static final MimeType DOC_XLS = MimeType.valueOf("application/vnd.ms-excel");

        /**
         * Public constant mime type for
         * {@code application/vnd.openxmlformats-officedocument.spreadsheetml.sheet}.
         */
        public static final MimeType DOC_XLSX = MimeType
            .valueOf("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");

        /**
         * Public constant mime type for {@code text/html}.
         */
        public static final MimeType DOC_HTML = MimeType.valueOf("text/html");

        /**
         * Public constant mime type for {@code text/plain}.
         */
        public static final MimeType DOC_TXT = MimeType.valueOf("text/plain");

        /**
         * Public constant mime type for {@code text/markdown}.
         */
        public static final MimeType DOC_MD = MimeType.valueOf("text/markdown");

        // -----------------
        // Video Formats
        // -----------------
        /**
         * Public constant mime type for {@code video/x-matros}.
         */
        public static final MimeType VIDEO_MKV = MimeType.valueOf("video/x-matros");

        /**
         * Public constant mime type for {@code video/quicktime}.
         */
        public static final MimeType VIDEO_MOV = MimeType.valueOf("video/quicktime");

        /**
         * Public constant mime type for {@code video/mp4}.
         */
        public static final MimeType VIDEO_MP4 = MimeType.valueOf("video/mp4");

        /**
         * Public constant mime type for {@code video/webm}.
         */
        public static final MimeType VIDEO_WEBM = MimeType.valueOf("video/webm");

        /**
         * Public constant mime type for {@code video/x-flv}.
         */
        public static final MimeType VIDEO_FLV = MimeType.valueOf("video/x-flv");

        /**
         * Public constant mime type for {@code video/mpeg}.
         */
        public static final MimeType VIDEO_MPEG = MimeType.valueOf("video/mpeg");

        /**
         * Public constant mime type for {@code video/mpeg}.
         */
        public static final MimeType VIDEO_MPG = MimeType.valueOf("video/mpeg");

        /**
         * Public constant mime type for {@code video/x-ms-wmv}.
         */
        public static final MimeType VIDEO_WMV = MimeType.valueOf("video/x-ms-wmv");

        /**
         * Public constant mime type for {@code video/3gpp}.
         */
        public static final MimeType VIDEO_THREE_GP = MimeType.valueOf("video/3gpp");

        // -----------------
        // Image Formats
        // -----------------
        /**
         * Public constant mime type for {@code image/png}.
         */
        public static final MimeType IMAGE_PNG = MimeType.valueOf("image/png");

        /**
         * Public constant mime type for {@code image/jpeg}.
         */
        public static final MimeType IMAGE_JPEG = MimeType.valueOf("image/jpeg");

        /**
         * Public constant mime type for {@code image/gif}.
         */
        public static final MimeType IMAGE_GIF = MimeType.valueOf("image/gif");

        /**
         * Public constant mime type for {@code image/webp}.
         */
        public static final MimeType IMAGE_WEBP = MimeType.valueOf("image/webp");

    }
Format定义了常用的几种MimeType

小结

Spring AI设计了各种message类型用于支持多模态,其中UserMessage有个media属性,类型List<Media>,支持传入图像、音频、视频,MimeType用于指定是哪种类型。

doc


codecraft
11.9k 声望2k 粉丝

当一个代码的工匠回首往事时,不因虚度年华而悔恨,也不因碌碌无为而羞愧,这样,当他老的时候,可以很自豪告诉世人,我曾经将代码注入生命去打造互联网的浪潮之巅,那是个很疯狂的时代,我在一波波的浪潮上留下...