SFO Museum Vector Embeddings

This work should still be considered experimental and is subject to change without notice. These files are not updated on any kind of automated schedule yet. Once they are a machine-readable "index" file will be provided.

Description

This is a minimalist landing page to document vector embeddings releases produced by SFO Museum. These files are provided "as-is" for the purposes of testing and helping to understand what the "shape" of shared vector embeddings within the cultural heritage community might look like. Embeddings have been exported as Apache Parquet files.

For background please consult:

Records

Each row in a Parquet file contains a single vector embedding representing a record. Records have the following definition:

Field name Type Parquet type Description
provider string dict,zstd The name (or context) of the provider responsible for depiction_id.
depiction_id string dict,zstd The unique identifier for the depiction for which embeddings have been generated.
subject_id string dict,zstd The unique identifier associated with the record that depiction_id depicts.
model string dict,zstd The label for the model used to generate embeddings for depiction_id.
embeddings []float32 plain,list The vector embeddings generated for depiction_id using model.
created int64 The Unix timestamp when the vector embeddings were generated.
attributes map[string]string An arbitrary map of key-value properties associated with the embeddings. Record attributes are encouraged to include the required "OEmbeddings" fields (described below)but this is not a requirement.

OEmbeddings

Record attributes are a free-form set of key-value pairs. They are encouraged, but not required, to include the minimum set of common "OEmbeddings" properties which are meant to ensure the least amount of metadata necessary to allow suitable attribution and provenance for object records. These properties are defined in the OEmbeddings JSON schema. Here is that definition in table form:

Name Type Required Notes
type string yes Either “image” or “text”.
preview string yes The preview content for the vector embeddings. If type is “text” then this is expected to be a string. If type is “image” this is expected to be a string confirming to the JSON Schema “uri” type
depiction_url uri no A web page (or resource) for the depiction used to create the vector embeddings.
subject_url uri yes A web page (or resource) for the subject of the depiction used to create the vector embeddings.
subject_title string yes The title of the subject of the depiction. This may be an empty string.
subject_creditline string yes The creditline or attribution for the subject of the depiction. This may be an empty string.
provider_name string yes The name of the provider (holder) of the subject being depicted.
provider_url uri yes The primary web page for the provider (holder) of the subject being depicted.

Vector Embeddings

Most of these files were created using the tools provided by the sfomuseum/go-embeddings-harvest package.

SFO Museum

Collection

Name Type Description Dimensions Size Signatures (x.509)
sfomuseum-collection-1152-siglip2-naflex-20260423.parquet image Object images from the SFO Museum Aviation Collection. Included models: google/siglip2-so400m-patch16-naflex. 1152 374MB sfomuseum-collection-1152-siglip2-naflex-20260423-signatures.parquet (public key) experimental
sfomuseum-collection-1152-siglip2-patch14-20260519.parquet image Object images from the SFO Museum Aviation Collection. Included models: google/siglip2-so400m-patch16-patch14-384. 1152 374MB sfomuseum-collection-1152-siglip2-patch14-20260519-signatures.parquet (public key) experimental
sfomuseum-collection-512-mobileclip-20260608.parquet image Object images from the SFO Museum Aviation Collection. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 526MB sfomuseum-collection-512-mobileclip-20260608-signatures.parquet (public key) experimental

Exhibitions

Name Type Description Dimensions Size Signatures (x.509)
sfomuseum-exhibitions-1152-siglip2-naflex-20260424.parquet image Object images from SFO Museum exhibitions. Included models: google/siglip2-so400m-patch16-naflex. 1152 17MB sfomuseum-exhibitions-1152-siglip2-naflex-20260424-signatures.parquet (public key) experimental
sfomuseum-exhibitions-1152-siglip2-patch14-20260518.parquet image Object images from SFO Museum exhibitions. Included models: google/siglip2-so400m-patch14-384. 1152 17MB sfomuseum-exhibitions-1152-siglip2-patch14-20260518-signatures.parquet (public key) experimental
sfomuseum-exhibitions-512-mobileclip-20260414.parquet image Object images from SFO Museum exhibitions. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 25MB sfomuseum-exhibitions-512-mobileclip-20260414-signatures.parquet (public key) experimental

Instagram

Name Type Description Dimensions Size Signatures (x.509)
sfomuseum-instagram-1152-siglip2-naflex-20260424.parquet image Images and photographs from the SFO Museum Instagram account. Included models: google/siglip2-so400m-patch16-naflex. 1152 24MB sfomuseum-instagram-1152-siglip2-naflex-20260424-signatures.parquet (public key) experimental
sfomuseum-instagram-1152-siglip2-patch14-20260511.parquet image Images and photographs from the SFO Museum Instagram account. Included models: google/siglip2-so400m-patch14-384. 1152 24MB sfomuseum-instagram-1152-siglip2-patch14-20260511-signatures.parquet (public key) experimental
sfomuseum-instagram-512-mobileclip-20260414.parquet image Images and photographs from the SFO Museum Instagram account. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 47MB sfomuseum-instagram-512-mobileclip-20260414-signatures.parquet (public key) experimental

Installation photos

Name Type Description Dimensions Size Signatures
sfomuseum-media-1152-siglip2-naflex-20260518.parquet image Installation photos from SFO Museum exhibitions. Included models: google/siglip2-so400m-patch16-naflex. 1152 25MB sfomuseum-media-1152-siglip2-naflex-20260518-signatures.parquet (public key) experimental
sfomuseum-media-1152-siglip2-patch14-20260519.parquet image Installation photos from SFO Museum exhibitions. Included models: google/siglip2-so400m-patch14-384. 1152 25MB sfomuseum-media-1152-siglip2-patch14-20260519-signatures.parquet (public key) experimental
sfomuseum-media-512-mobileclip-20260414.parquet image Installation photos from SFO Museum exhibitions. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 35MB sfomuseum-media-512-mobileclip-20260414-signatures.parquet (public key) experimental

National Gallery of Art (NGA)

Name Type Description Dimensions Size
nga-opendata-1152-siglip2-naflex-20260502.parquet image Object images from the National Gallery of Art collection. Included models: google/siglip2-so400m-patch16-naflex. 1152 652MB
nga-opendata-1152-siglip2-patch14-naflex-20260522.parquet image Object images from the National Gallery of Art collection. Included models: google/siglip2-so400m-patch14-384. 1152 659MB
nga-opendata-512-mobileclip-20260413.parquet image Object images from the National Gallery of Art collection. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 903MB

Museum of Modern (MoMA)

Name Type Description Dimensions Size
moma-collection-1152-siglip2-naflex-20260606.parquet image Object images from the Museum of Modern Art collection. Included models: google/siglip2-so400m-patch16-naflex. 1152 450MB
moma-collection-1152-siglip2-patch14-20260523.parquet image Object images from the Museum of Modern Art collection. Included models: google/siglip2-so400m-patch14-384. 1152 444MB
moma-collection-512-mobileclip-20260423.parquet image Object images from the Museum of Modern Art collection. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 656MB

Smithsonian

National Museum of Asian Art

Name Type Description Dimensions Size
si-fsg-openaccess-1152-siglip2-naflex-20260607.parquet image Object images from the Smithsonian's National Museum of Asian Art. Included models: google/siglip2-so400m-patch16-naflex. 1152 38MB
si-fsg-openaccess-1152-siglip2-patch14-20260608.parquet image Object images from the Smithsonian's National Museum of Asian Art. Included models: google/siglip2-so400m-patch14-384. 1152 38MB
si-fsg-openaccess-512-mobileclip-20260608.parquet image Object images from the Smithsonian's National Museum of Asian Art. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 55MB

National Air and Space Museum

Name Type Description Dimensions Size
si-nasm-openaccess-1152-siglip2-naflex-20260608.parquet image Object images from the National Air and Space Museum. Included models: google/siglip2-so400m-patch16-naflex. 1152 32MB
si-nasm-openaccess-1152-siglip2-patch14-20260607.parquet image Object images from the National Air and Space Museum. Included models: google/siglip2-so400m-patch14-384. 1152 32MB
si-nasm-openaccess-512-mobileclip-20260608.parquet image Object images from the National Air and Space Museum. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 47MB

National Museum of American History

Name Type Description Dimensions Size
si-nmah-openaccess-1152-siglip2-naflex-20260606.parquet image Object images from the National Museum of American History. Included models: google/siglip2-so400m-patch16-naflex. 1152 95MB
si-nmah-openaccess-1152-siglip2-patch14-20260607.parquet image Object images from the National Museum of American History. Included models: google/siglip2-so400m-patch14-384. 1152 95MB
si-nmah-openaccess-512-mobileclip-20260608.parquet image Object images from the National Museum of American History. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 160MB

Smithsonian American Art Museum

Name Type Description Dimensions Size
si-saam-openaccess-1152-siglip2-naflex-20260607.parquet image Object images from the Smithsonian American Art Museum. Included models: google/siglip2-so400m-patch16-naflex. 1152 60M
si-saam-openaccess-1152-siglip2-patch14-20260807.parquet image Object images from the Smithsonian American Art Museum. Included models: google/siglip2-so400m-patch14-384. 1152 60MB
si-saam-openaccess-512-mobileclip-20260608.parquet image Object images from the Smithsonian American Art Museum. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 88MB

Flickr

Name Type Description Dimensions Size
20260410-flickr-49487266@N07-72157710813888403.parquet image Photos from the Flickr Commons San Diego Air and Space Museum California's Aviation History photoset. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 4MB
20260410-flickr-group-95693046@N00.parquet image Open or CreativeCommons licensed photos from the Airports SFO Flickr group pool. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. 512 29MB