This article is machine translated
Show original

Beyond AI localization "sinking," the biggest change in the AI track recently is: multimodal video generation has made a technological breakthrough, evolving from previously supporting pure text video generation to a full-chain integrated generation technology of text + image + audio. Let me mention a few technical breakthrough cases for everyone to get a sense: 1) ByteDance's open-source EX-4D framework: Single-view video instantly transforms into free-angle 4D content, with a user recognition rate of 70.7%. In other words, given an ordinary video, AI can automatically generate viewing effects from any angle, which previously required a professional 3D modeling team; 2) Baidu's "Drawing Imagination" platform: Generating a 10-second video from a single image, claiming to achieve "movie-level" quality. However, whether it's exaggerated by marketing needs to be seen after the Pro version update in August; 3) Google DeepMind Veo: Can synchronously generate 4K video + environmental sound. The key technical highlight is the achievement of "synchronization," previously videos and audio were two separate systems spliced together, requiring overcoming significant challenges to achieve true semantic-level matching, such as matching walking movements and footstep sounds in complex scenes; 4) Douyin ContentV: 8 billion parameters, generating 1080p video in 2.3 seconds, at a cost of 3.67 yuan/5 seconds. Honestly, the cost control is acceptable, but the current generation quality is still unsatisfactory in complex scenarios. (The rest of the translation follows the same approach, maintaining the technical and analytical tone of the original text)

Haotian | CryptoInsight
@tmel0211
07-02
最近观察AI行业,发现个越来越“下沉”的变化:从原先拼算力集中和“大”模型的主流共识中,演变出了一条偏向本地小模型和边缘计算的分支。 这一点,从Apple Intelligence覆盖5亿设备,到微软推出Windows 11专用3.3亿参数小模型Mu,再到谷歌DeepMind的机器人“脱网”操作等等都能看出来。
From Twitter
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments